Compare commits

..

848 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
6fdba8599d docs/VictoriaLogs/CHANGELOG.md: cut v0.7.0-victorialogs 2024-05-15 04:58:05 +02:00
Aliaksandr Valialkin
0aa19a2837 lib/logstorage: work-in-progress 2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin
b617dc9c0b lib/streamaggr: properly return output key from getOutputKey
The bug has been introduced in cc2647d212
2024-05-14 17:47:21 +02:00
Roman Khavronenko
b0c1f3d819 app/vmalert/rule: reduce number of allocations for getStaleSeries fn (#6269)
Allocations are reduced by re-using the byte buffer when converting
labels to string keys.
```
name               old allocs/op  new allocs/op  delta
GetStaleSeries-10       703 ± 0%       203 ± 0%   ~     (p=1.000 n=1+1)
```

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-14 14:43:39 +02:00
Nikolay
6a6e34ab8e app/vmauth: explicitly unregister metrics set for auth config (#6252)
it's needed to remove Summary metric type from the global state of
metrics package. metrics package tracks each bucket of summary and
periodically swaps old buckets with new.

Simple set unregister is not enough to release memory used by Set

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
2024-05-14 09:26:50 +02:00
Aliaksandr Valialkin
c90e6de13b docs/VictoriaLogs/CHANGELOG.md: cut v0.6.1-victorialogs 2024-05-14 03:06:53 +02:00
Aliaksandr Valialkin
da3af090c6 lib/logstorage: work-in-progress 2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin
cb35e62e04 lib/logstorage: work-in-progress
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258
2024-05-14 01:49:23 +02:00
Aliaksandr Valialkin
cc2647d212 lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit
Change the return values for these functions - now they return the unmarshaled result plus
the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling.

This improves performance of these functions in hot loops of VictoriaLogs a bit.
2024-05-14 01:23:54 +02:00
Aliaksandr Valialkin
707f3a69db lib/stringsutil: add LessNatural() function for natural sorting
Natural sorting is needed for sort_by_label_natural() and sort_by_label_natural_desc()
functions in MetricsQL - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6192
and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6256

Natural sorting will be also used by `| sort ...` pipe in VictoriaLogs -
see https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe
2024-05-13 16:56:47 +02:00
Hui Wang
4c80b17027 storage: correctly apply -inmemoryDataFlushInterval when it's set t… (#6221)
…o minimum supported value 1s
pendingRowsFlushInterval was bumped to 2s in
73f0a805e2
2024-05-13 16:44:30 +02:00
Andrii Chubatiuk
ce25d68b45 lib/streamaggr: added rate_sum and rate_avg to benchmarks, lint fix (#6264)
fixed lint for rate outputs
2024-05-13 16:40:37 +02:00
Andrii Chubatiuk
9c3d44c8c9 lib/streamaggr: added rate and rate_avg output (#6243)
Added `rate` and `rate_avg` output
Resource usage is the same as for increase output, tested on a benchmark

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:39:49 +02:00
hagen1778
17283fab6c lib/logstorage: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:35:11 +02:00
Andrii Chubatiuk
680b8c25c8 app/vmagent: removed deprecated -remoteWrite.multitenantURL flag support (#6253)
Removed deprecated `-remoteWrite.multitenantURL` flag to simplify global
stream aggregation

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:22:37 +02:00
Yury Molodov
37c22ee053 vmui/vmanomaly: add download config button (#6231)
This pull request adds a button to the vmanomaly ui that opens a modal
window for viewing and downloading the config file.

<img width="610" alt="button"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/29711459/0132b178-eb73-4272-8144-be7ed2a8dcaf">
<img height="300" alt="error"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/29711459/6d9f2627-77d7-4ce6-b73b-542ce1bbc999">
<img height="300" alt="modal"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/29711459/680bffdd-d6a3-445e-bd48-8f0feb30016e">
2024-05-13 12:25:31 +02:00
Yury Molodov
29bd120126 vmui/vmanomaly: fix default server url (#6178)
This PR for ui vmanomaly eliminates URL parameters to automatically use
the default server URL, simplifying URLs like:

From http://localhost:3000/#/?g0.expr=vm_blocks... to
http://localhost:3000
From http://localhost:3000/select/0/vmui/#/?g0.expr=vm_blocks... to
http://localhost:3000/select/0/vmui/ etc.
2024-05-13 12:24:50 +02:00
Aliaksandr Valialkin
de98688489 deployment: update VictoriaLogs Docker image from v0.5.2-victorialogs to v0.6.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.6.0-victorialogs
2024-05-12 23:22:50 +02:00
Aliaksandr Valialkin
bd75c0a898 deployment/docker/Makefile: group app-via-docker-* and package-via-docker-* rules with CGO_ENABLED=1 together for better maintainability 2024-05-12 23:09:55 +02:00
Aliaksandr Valialkin
cb19335a9f deployment/docker/Makefile: rename EXTRA_ENVS to EXTRA_DOCKER_ENVS
The purpose of EXTRA_DOCKER_ENVS name is more clear than EXTRA_ENVS.

While at it, make the following small fixes:

- Pass GOARM=5 to Docker builder when building Docker packages for GOARCH=arm in the same way
  it is passed to the builder when building production binaries for GOARCH=arm.
  See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4965

- Set GCO_ENABLED=1 for package-via-docker-amd64, which has been accidentally removed in 07496d7d92

- Consistently use 'CGO_ENABLED=... GOARCH=...' order of env vars at package-via-docker-*,
  because this order is used in app-via-docker-*

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6158
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6172

This is a follow-up for 07496d7d92 and 7958f38864
2024-05-12 22:41:38 +02:00
Aliaksandr Valialkin
8466ab109c docs/VictoriaLogs/LogsQL.md: cross-reference uniq pipe with uniq_values stats function 2024-05-12 16:45:24 +02:00
Aliaksandr Valialkin
89c4dc1d8d docs/VictoriaLogs/CHANGELOG.md: cut v0.6.0-victorialogs 2024-05-12 16:38:01 +02:00
Aliaksandr Valialkin
9dbd0f9085 lib/logstorage: initial implementation of pipes in LogsQL
See https://docs.victoriametrics.com/victorialogs/logsql/#pipes
2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin
e66465cb03 lib/encoding: optimizing UnmarshalVarUint64 and UnmarshalVarInt64 a bit 2024-05-12 16:32:11 +02:00
Aliaksandr Valialkin
51de9f30fc vendor: run make vendor-update 2024-05-12 16:17:38 +02:00
Aliaksandr Valialkin
75398cd0f8 go.mod: update the required Go version from 1.21 to 1.22
This is a follow-up for 95222b2079
2024-05-12 16:07:11 +02:00
Aliaksandr Valialkin
bac28f2e4d docs/vmauth.md: small fixes after proofreading 2024-05-12 12:35:23 +02:00
Aliaksandr Valialkin
590160ddbb lib/slicesutil: add helper functions for setting slice length and extending its capacity
The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.
2024-05-12 11:32:17 +02:00
Aliaksandr Valialkin
f20d452196 lib/storage: remove outdated misleading comments 2024-05-12 10:24:04 +02:00
Aliaksandr Valialkin
92de6ea340 app/vmselect: use strings.EqualFold instead of strings.ToLower where appropriate
Strings.EqualFold doesn't allocate memory contrary to strings.ToLower if the input string contains uppercase chars
2024-05-12 10:20:41 +02:00
Aliaksandr Valialkin
95608885ea app/vmselect/promql: properly estimate the needed amounts of memory for executing aggregate function over rollup function in incremental mode
Incremental aggregation processes only GOMAXPROCS time series at a time, so its' memory usage doesn't depend
on the number of input time series.

The issue has been introduced in 5138eaeea0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3203
2024-05-12 10:14:11 +02:00
Aliaksandr Valialkin
de7fc743ca README.md: mention that -tlsCertFile and -tlsKeyFile options aren't needed when automatic issuing of tls certificates is enabled with -tlsAutoCertHosts flag 2024-05-12 09:48:40 +02:00
Aliaksandr Valialkin
f4051dd1e0 docs/Single-server-VictoriaMetrics.md and docs/README.md: sync with README.md with make docs-sync after the commit c6c5a5a186 2024-05-12 09:40:27 +02:00
Roman Khavronenko
87fd400dfc Feature allow configuring disableOnDiskQueue and dropSamplesOnOverload per url (#6248)
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html):
allow configuring `-remoteWrite.disableOnDiskQueue` and
`-remoteWrite.dropSamplesOnOverload` cmd-line flags per each
`-remoteWrite.url`. See this [pull
request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065).
Thanks to @rbizos for implementaion!
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add
labels `path` and `url` to metrics
`vmagent_remotewrite_push_failures_total` and
`vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes
and dropped samples can be tracked per `-remoteWrite.url`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Raphael Bizos <r.bizos@criteo.com>
2024-05-10 12:09:21 +02:00
Github Actions
c87ce86d96 Automatic update operator docs from VictoriaMetrics/operator@0829591 (#6250) 2024-05-10 10:57:10 +08:00
qiangxuhui
80f3644ee3 Add build support for loong64 (#6222)
### Describe Your Changes

Added makefile rule for `GOARCH=loong64` to support building all
VictoriaMetrics components on the `loongarch64` platform.


### Checklist

The following checks are **mandatory**:
 
* [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: qiangxuhui <qiangxuhui@loongson.cn>
2024-05-09 14:22:03 +02:00
Github Actions
a8d0c1a62d Automatic update operator docs from VictoriaMetrics/operator@4a51b37 (#6245) 2024-05-09 11:40:53 +08:00
hagen1778
56531abd56 app/vmselect/vmui: add missing static files
These files weren't added to the git after `make vmui-build vmui-update` command
in commit 7fd9325e62 (diff-50d9a4b91bdad190f2db92553736267103ab4225dfb6642b675fb4b8196e6560)

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6224

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-08 14:22:34 +02:00
Roman Khavronenko
8a03e987cb lib/streamaggr: set correct suffix <output>_prometheus (#6228)
Set correct suffix `<output>_prometheus` for aggregation outputs
`increase_prometheus` and `total_prometheus`
Before, outputs `total` and `total_prometheus` or `increase` and
`increase_prometheus` had the same suffix.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-08 13:11:30 +02:00
Andrii Chubatiuk
a9283e06a3 streamaggr: made labels compressor shared (#6173)
Though labels compressor is quite resource intensive, each aggregator
and deduplicator instance has it's own compressor. Made it shared across
all aggregators to consume less resources while using multiple
aggregators.

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2024-05-08 13:10:53 +02:00
Zhu Jiekun
02851d7800 chore: [deployment] upgrade from go 1.22.2 to 1.22.3 to include security fixes (#6238)
### Describe Your Changes

upgrade from go 1.22.2 to 1.22.3 to include security fixes. Also see:
- https://go.dev/doc/devel/release
-
https://github.com/golang/go/issues?q=milestone%3AGo1.22.3+label%3ACherryPickApproved

### Checklist

The following checks are **mandatory**:

- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Jiekun <jiekun.dev@gmail.com>
2024-05-08 10:02:22 +02:00
Zhu Jiekun
17e3d019d2 feature: [vmagent] Add service discovery support for Vultr (#6068)
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041

#### Added
- Added service discovery support for Vultr.

#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.

#### Note
- Useful links: 
- Vultr API:
https://www.vultr.com/api/#tag/instances/operation/list-instances
    - Vultr client SDK: https://github.com/vultr/govultr
- Prometheus SD:
https://github.com/prometheus/prometheus/tree/main/discovery/vultr

---
### Checklist

The following checks are mandatory:

- [X] I have read the [Contributing
Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md)
- [x] All commits are signed and include `Signed-off-by` line. Use `git
commit -s` to include `Signed-off-by` your commits. See this
[doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about
how to sign your commits.
- [x] Tests are passing locally. Use `make test` to run all tests
locally.
- [x] Linting is passing locally. Use `make check-all` to run all
linters locally.

Further checks are optional for External Contributions:

- [X] Include a link to the GitHub issue in the commit message, if issue
exists.
- [x] Mention the change in the
[Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md).
Explain what has changed and why. If there is a related issue or
documentation change - link them as well.

  Tips for writing a good changelog message::

* Write a human-readable changelog message that describes the problem
and solution.
* Include a link to the issue or pull request in your changelog message.
* Use specific language identifying the fix, such as an error message,
metric name, or flag name.
* Provide a link to the relevant documentation for any new features you
add or modify.

- [ ] After your pull request is merged, please add a message to the
issue with instructions for how to test the fix or try the feature you
added. Here is an
[example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726)
- [x] Do not close the original issue before the change is released.
Please note, in some cases Github can automatically close the issue once
PR is merged. Re-open the issue in such case.
- [x] If the change somehow affects public interfaces (a new flag was
added or updated, or some behavior has changed) - add the corresponding
change to documentation.

Signed-off-by: Jiekun <jiekun.dev@gmail.com>
2024-05-08 10:01:48 +02:00
Oleg
c6c5a5a186 Statsd protocol compatibility (#5053)
In this PR I added compatibility with [statsd
protocol](https://github.com/b/statsd_spec) with tags to be able to send
metrics directly from statsd clients to vmagent or directly to VM.
For example its compatible with
[statsd-instrument](https://github.com/Shopify/statsd-instrument) and
[dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems

Related issues: #5052, #206, #4600
2024-05-07 21:46:08 +02:00
Github Actions
55c7dafb35 Automatic update operator docs from VictoriaMetrics/operator@2789953 (#6237) 2024-05-07 21:34:57 +02:00
Alexander Marshalov
3d4988ecf6 fix typo in scrape config examples (#6234) 2024-05-07 16:52:44 +02:00
Github Actions
134dcaef33 Automatic update operator docs from VictoriaMetrics/operator@6271553 (#6233) 2024-05-07 16:51:39 +02:00
Ted Possible
5a3abfa041 Exemplar support (#5982)
This code adds Exemplars to VMagent and the promscrape parser adhering
to OpenMetrics Specifications. This will allow forwarding of exemplars
to Prometheus and other third party apps that support OpenMetrics specs.

---------

Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>
2024-05-07 12:09:44 +02:00
hagen1778
2561a132ee docs: mention influxListenAddr in URLs format doc
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-06 15:29:24 +02:00
Andrii Chubatiuk
879771808b app/vmagent/remotewrite: do not cleanup timeseries which are used in multiple remote write contexts (#6206)
When at least one remote write has deduplication configured it cleans up
timeseries while they can be in use by another remote write without
deduplication

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-06 12:09:51 +02:00
hagen1778
c0050beadc docs: mention actual version in update nodes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-30 18:38:04 +02:00
Yury Molodov
046a4a5ecf vmui: fix issue preventing first query trace expansion (#6197)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6186
2024-04-30 13:32:29 +02:00
Hui Wang
e3c226cf92 docs: update vmalert and vmagent docs (#6207)
* restore and actualize doc section explaining duplicated labels error
* rm misleading comment about post-aggregation in stream aggregation
2024-04-30 10:27:06 +02:00
Corporte Gadfly
8bca4d2de4 deployment: minor grammatical fixes in alert descriptions (#6199) 2024-04-30 10:24:31 +02:00
Roman Khavronenko
e2590b339d app/vmauth: add test for LeastLoaded balance policy (#6144)
Check if least-loaded works correctly.
related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6136

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-30 10:22:17 +02:00
Zakhar Bessarab
329c3cbdf0 lib/mergeset: improve test coverage (#6118)
Add test to cover the code path with overflowing shards buffers and
triggering merge to partition.

This test covers the code path which leaded to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-30 10:21:37 +02:00
Github Actions
e908effd22 Automatic update operator docs from VictoriaMetrics/operator@8f025b3 (#6200)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2024-04-30 16:01:09 +08:00
hagen1778
d386a68b59 dashboards: add new panel Concurrent selects to vmstorage row
The panel will show how many ongoing select queries are processed by vmstorage
and should help to identify resource bottlenecks. See panel description for more details.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 13:55:19 +02:00
hagen1778
9e18724036 deployment: update per-tenant-statistic dashboard to be compatible with Grafana 10
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:38:56 +02:00
hagen1778
5917ac003e deployment: update backupmanager dashboard to be compatible with Grafana 10
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:35:25 +02:00
hagen1778
f0c4d372bd deployment: update operator dashboard to be compatible with Grafana 10
- Use TimeSeries panel instead of deprecated Graph
- Update panel styles
- Fix version panel

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:32:19 +02:00
hagen1778
9256df17fa deployment: bump Grafana version to 10.4.2
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:10:24 +02:00
hagen1778
8606b48ce5 dashboards: add Network Usage panel to Resource Usage row
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4478
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 11:54:17 +02:00
Dima Lazerka
564463259a deployment/dashboards: properly show version for non-stable docker images (#6150)
re: .*-(?:tags|heads)-(.*)-(?:0|dirty)-.*

cases:
victoria-metrics-20240419-160209-heads-enterprise-single-node-0-g08f933ab0c
enterprise-single-node

victoria-metrics-20240201-133950-tags-v1.97.1-enterprise-0-g760a8733b
v1.97.1-enterprise

victoria-metrics-20240419-160209-heads-rotation-part-2-0-ge2367b6d1-dirty-848b54cd
rotation-part-2-0-ge2367b6d1

victoria-metrics-20240419-160209-heads-lts-1.93-enterprise-search-contention-0-g30ef4aad21-amd64
lts-1.93-enterprise-search-contention

victoria-metrics-20240425-150852-tags-v1.101.0-enterprise-0-g718138c64
v1.101.0-enterprise

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Dzmitry Lazerka <dlazerka@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 11:11:28 +02:00
Zakhar Bessarab
6b493582da dashboards/victoria-metrics-single: allow selecting multiple instance values (#5870)
Allowing to select multiple instance IPs makes it much easier to view
metrics for longer periods of time in dynamic environments such as
Kubernetes. In k8s update will also cause IP to change making it harder
to use dashboard to check the status.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5869

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 10:34:36 +02:00
Roman Khavronenko
5e8c087d42 deployment: bump to 1.101.0 (#6194)
Step 17:
> Bump VictoriaMetrics version at deployment/docker/docker-compose.yml
and at deployment/docker/docker-compose-cluster.yml.

from https://docs.victoriametrics.com/release-guide/

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-26 13:18:51 +02:00
Github Actions
3a669b4b87 Automatic update operator docs from VictoriaMetrics/operator@bbf847c (#6189) 2024-04-26 10:14:40 +02:00
hagen1778
5334f0c2ce docs/CHANGELOG.md: cut v1.101.0
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-25 16:25:09 +02:00
hagen1778
7fd9325e62 app/vmselect: run make vmui-update
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-25 15:51:03 +02:00
hagen1778
a62551f773 docs: add change for c0e4ccb7b5
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-25 15:37:37 +02:00
Hui Wang
dd0d2c77c8 app/vmselect: implement cmd-line flags -search.disableImplicitConversions and -search.logImplicitConversions (#6180)
address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4338
support disable or log [implicit
conversions](https://docs.victoriametrics.com/metricsql/#implicit-query-conversions)
for subquery with cmd-line flags `-search.disableImplicitConversion` and
`-search.logImplicitConversion`

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-25 12:54:42 +02:00
Yury Molodov
57b7d16259 vmui: improve error message for server response issues (#6177)
Updates error messages for better clarity and guidance on server
response issues.
2024-04-25 12:52:13 +02:00
Yury Molodov
6193fa3dcf vmui: trigger auto-suggestion at any cursor position (#6155)
- Implemented auto-suggestion triggers for mid-string cursor positions
in vmui.
- Improved the suggestion list positioning to appear directly beneath
the active text editing area.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5864
2024-04-25 12:48:49 +02:00
hagen1778
6aaf1768f4 Revert "docs: removed code-style highlighting for commanad-line flags of VM components (#6147)"
This reverts commit 9bedbcfa2f.
2024-04-25 11:23:02 +02:00
Andrii Chubatiuk
07496d7d92 deployment: update makefile package-* targets (#6172)
Updated package targets in a same manner, how it's done for publish ones
in
7958f38864
2024-04-24 18:29:32 +02:00
hagen1778
2d9bbe1934 docs/changelog: mention downsampling fixes for ENT version of VM
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-24 17:03:21 +02:00
hagen1778
679844feaf Revert "app/vmbackup: introduce new flag type URL (#6152)"
This reverts commit 029060af60.
2024-04-24 13:47:57 +02:00
Roman Khavronenko
029060af60 app/vmbackup: introduce new flag type URL (#6152)
The new flag type is supposed to be used for specifying URL values which
could contain sensitive information such as auth tokens in GET params or
HTTP basic authentication.

The URL flag also allows loading its value from files if `file://`
prefix is specified. As example, the new flag type was used in
app/vmbackup as it requires specifying `authKey` param for making the
snapshot.

See related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973

Thanks to @wasim-nihal for initial implementation
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6060

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-24 10:57:54 +02:00
Github Actions
a14acd4fcc Automatic update operator docs from VictoriaMetrics/operator@6e1a876 (#6179) 2024-04-24 00:41:23 -07:00
Artem Navoiev
592a9fe9f2 docs: vmalert remove new lines in configuration section
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-04-23 22:30:31 +02:00
Artem Navoiev
aeba3f39db docs: vmalert remove new lines in configuration section
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-04-23 21:47:17 +02:00
hagen1778
035de57e5e dashboards: show max number of active merges instead of cumulative
The cumulative number of active merges could be red herring
as it its value depends on the number of vmstorages.
For example, vmstorage could be added or removed and this will affect
the panel.
Or, each vmstorage could start a merging process (i.e. for downsampling)
and visiually it could look like a massive change.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-23 16:41:48 +02:00
hagen1778
22cfab1ea2 docs/changelog: mention icnreased CPU usage in 1.100.1 release for ENT distributions
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-23 16:34:35 +02:00
hagen1778
4251292708 app/vmagent: mention corner case with dangling queues and identical URLs
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6140

We don't cover this corner case as it has low chance for reproduction.
Precisely, the requirements are following:
1. vmagent need to be configured with multiple identical `remoteWrite.url` flags;
2. At least one of the persistent queues need to be non-empty, which already
signalizes about issues with setup;
3. vmagent need to be restarted with removing of one of `remoteWrite.url` flags.

We do not document this case in vmagent.md as it seems to be a rare corner case
and its explanation will require too much of explanation and confuse users.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-23 14:49:45 +02:00
Github Actions
301bd387d4 Automatic update operator docs from VictoriaMetrics/operator@e343a8e (#6168) 2024-04-23 12:12:35 +02:00
Andrii Chubatiuk
7958f38864 deployment/docker: allow cross-platform building on arm64 platform (#6158)
Added x86_64 libraries to allow building cross-platform images on arm64
2024-04-23 12:12:10 +02:00
Denys Holius
bde4693a90 deployment/docker/victorialogs/fluentbit-docker/docker-compose.yml: update fluentbit version from v2.1.4 to v3.0.2 (#6120)
see also https://fluentbit.io/announcements/v3.0.0/
2024-04-23 01:07:45 -07:00
Roman Khavronenko
5f487c7090 app/vmalert: fix links with anchors in vmalert's UI (#6146)
Starting from v1.99.0 vmalert could ignore anchors pointing to specific
rule groups if `search` param was present in URL.
This change makes anchors compatible with `search` param in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-22 15:02:10 +02:00
Github Actions
6dbb4c1671 Automatic update operator docs from VictoriaMetrics/operator@7e2ba6f (#6165) 2024-04-22 14:55:41 +02:00
Denys Holius
9bedbcfa2f docs: removed code-style highlighting for commanad-line flags of VM components (#6147)
Using `sh` or `console` formatting doesn't do word-breaking on render. This makes flags description
harder to read, as users need to scroll the web page horizontally.
Removing the formatting renders the description with normal word-breaking.
2024-04-22 14:53:26 +02:00
hagen1778
bae3874e6a app/streamaggr: follow-up after c0e4ccb7b5
* rm vmagent mentions from vminsert flags
* improve documentation wording, add links to related sections
* mention `ignore_first_intervals` in the stream aggr options
* update flags description
* add basic test for config parsing validation

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-22 14:22:59 +02:00
Andrii Chubatiuk
c0e4ccb7b5 lib/streamaggr: add option to ignore first N aggregation intervals (#6137)
Stream aggregation may yield inaccurate results if it processes incomplete data. 
This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent.
 If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations. 
To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues
will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations
will be correct.
2024-04-22 13:52:04 +02:00
Aliaksandr Valialkin
81b2fb88cc docs/MetricsQL.md: add links to raw samples chapter at https://docs.victoriametrics.com/keyconcepts/ 2024-04-21 00:21:51 +02:00
Aliaksandr Valialkin
3f8f7f2c41 docs/CONTRIBUTING.md: small re-wording 2024-04-21 00:07:44 +02:00
Aliaksandr Valialkin
0e75581dd9 docs: remove duplicate information regarding contribution guidelines to VictoriaMetrics
Substitute duplicate information with the link to https://docs.victoriametrics.com/contributing/ across all the docs.
2024-04-21 00:00:50 +02:00
Aliaksandr Valialkin
bbdd7743a8 docs/CONTRIBUTING.md: more formatting fixes and clarifications 2024-04-20 23:48:56 +02:00
Aliaksandr Valialkin
c52f73b3e1 .github: move pull_request_template.md from .github/PULL_REQUEST_TEMPLATE/ to .github/
I couldn't figure out how to show the provided pull request template to users
according to https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository ,
I let's try 'global pull request for the whole repository' approach.
2024-04-20 23:42:57 +02:00
Aliaksandr Valialkin
d18968f8cb docs/CONTRIBUTING.md: small corrections after a402847eb6 2024-04-20 23:21:54 +02:00
Aliaksandr Valialkin
a402847eb6 Move CONTRIBUTING.md to docs/
It is better from visibility PoV if the CONTRIBUTING.md file is visible at https://docs.victoriametrics.com/contributing/ .
This page can be indexed by search engines and searched by our users later.
It is also easier to provide a link to this page now comparing to the old link, which is much harder to remember:
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md

While at it, syncrhonize docs/CONTRIBUTING.md with .github/PULL_REQUEST_TEMPLATE/pull_request_template.md ,
so they do not contain duplicate information, which can be outdated over time. Now all the relevant information
is located at docs/CONTRIBUTING.md, while .github/PULL_REQUEST_TEMPLATE/pull_request_template.md refers this doc.

This is a follow-up for c006db1798

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6040
2024-04-20 23:11:22 +02:00
Aliaksandr Valialkin
4770294732 lib/protoparser: substitute hybrid channel-based pools with plain sync.Pool
Using plain sync.Pool simplifies the code without increasing memory usage and CPU usage.
So it is better to use plain sync.Pool from readability and maintainability PoV.

This is a follow-up for 8942f290eb
2024-04-20 21:59:51 +02:00
Aliaksandr Valialkin
8942f290eb app/vminsert: replace hybrid sync.Pool+channel-based pool scheme for poolCtx with plain sync.Pool
This simplifies the code, while doesn't increase memory usage under low and high data ingestion rate.

This is a follow-up for 1decbcf6eb
2024-04-20 21:44:53 +02:00
Aliaksandr Valialkin
1decbcf6eb app/vminsert/influx: replace hybrid channel-based pool+sync.Pool with plain sync.Pool for pushCtx
The memory usage for plain sync.Pool doesn't increase comparing to the memory usage for the hybrid scheme,
so it is better to use plain sync.Pool in order to simplify the code and make it more readable and maintainable.

This is a follow-up for c22da2f917
2024-04-20 21:41:06 +02:00
Aliaksandr Valialkin
c22da2f917 app/vmagent/influx: replace hybrid channel-based pool + sync.Pool with plain sync.Pool for pushCtx
Data ingestion benchmark doesn't show memory usage difference between two approaches,
so let's use simpler approach in order to improve code readability and maintainability.

This is a follow-up for 77c597738c
2024-04-20 21:38:11 +02:00
Aliaksandr Valialkin
77c597738c app/vmagent/common: use plain sync.Pool instead of a mix of sync.Pool with channel-based pool for PushCtx
This scheme was used for reducing memory usage when vmagent runs on a machine with big number of CPU cores
and the ingestion rate isn't too big. The scheme with channel-based pool could reduce memory usage,
since it minimizes the number of PushCtx structs in the pool in this case.

Performance tests didn't reveal significant difference in memory usage under both low and high ingestion rate
between plain sync.Pool and the current hybrid scheme, so replace the scheme with plain sync.Pool in order
to simplify the code.
2024-04-20 21:27:05 +02:00
Aliaksandr Valialkin
7531e9084a all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices
This makes the code a bit clear.
2024-04-20 21:00:03 +02:00
Aliaksandr Valialkin
3e728c41f6 app/vminsert/common: remove obsolete optimization for reducing memory usage for InsertCtx pool
This optimization is no longer needed according to benchmarks with ingestion rate.

This simplifies the code a bit.
2024-04-20 20:51:53 +02:00
Bartosz Fenski
cceb4c62a3 typo (#6159)
s/infromation/information/
2024-04-20 05:12:15 -07:00
Github Actions
b11d4ccee3 Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@5b8a0ba (#6156) 2024-04-20 02:31:36 -07:00
Aliaksandr Valialkin
34e253f9d6 app/vmselect/promql: add support for matching against multiple numeric constants via q == (c1,...,cN) and q != (c1,...,cN) syntax 2024-04-19 17:56:29 +02:00
Aliaksandr Valialkin
249a467ea4 docs/CHANGELOG.md: document v1.97.4 LTS release
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.4
2024-04-19 13:10:21 +02:00
Aliaksandr Valialkin
2529f3e8eb docs/CHANGELOG.md: document v1.93.14 LTS release
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.14
2024-04-19 12:52:21 +02:00
Aliaksandr Valialkin
ec17d4390b docs/LTS-releases.md: change the latest v1.93.x LTS release from v1.93.13 to v1.93.14
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.14
2024-04-19 12:50:32 +02:00
Aliaksandr Valialkin
8f535f9e76 app/vmagent/remotewrite: add support for replication additionally to sharding when both -remoteWrite.shardByURL and -remoteWrite.shardByURLReplicas=RF command-line flags are set
This allows setting up data replication among failure domains if the replication factor is smaller than the number of failure domains.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6054

See https://docs.victoriametrics.com/vmagent/#sharding-among-remote-storages
2024-04-19 11:27:33 +02:00
Github Actions
54605f5dd1 Automatic update operator docs from VictoriaMetrics/operator@3d7bcba (#6149) 2024-04-19 09:17:37 +02:00
Hui Wang
a84491324d vmalert: avoid blocking APIs when alerting rule uses template functio… (#6129)
* vmalert: avoid blocking APIs when alerting rule uses template function `query`

* app/vmalert: small refactoring

* simplify labels and templates expanding
* simplify `newAlert` interface
* fix `TestGroupStart` which mistakenly skipped annotations
and response labels check

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* reduce alerts lock time when restore

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-19 09:16:26 +02:00
Roman Khavronenko
316b19a5d1 app/vmalert: make TestGroupStart more reliable (#6130)
There was a sleep statement in the test, waiting for Group
to perform a couple of evaluation. But looks like
it worked unreliable for some CI tests like the one below
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/8718213844/job/23915007958?pr=6115

This commit changes the sleep statement on a function that
waits for a specific number of evaluations. It should make this
test faster in general case, and more reliable for slow environemnts.
2024-04-19 09:06:40 +02:00
Aliaksandr Valialkin
565dcefc29 docs/Cluster-VictoriaMetrics.md: typo fix after dadeb6620a: folling -> following 2024-04-19 00:24:46 +02:00
Aliaksandr Valialkin
dadeb6620a app/vmselect/netstorage: add support for the ability to set cross-group replication factor at vmselect
The cross-group replication factor can be set via `-globalReplicationFactor` command-line flag at vmselect.
In this case vmselect continues returning full responses if up to globalReplicationFactor-1 groups are unavailable.

See https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect for details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6054
2024-04-19 00:18:15 +02:00
Aliaksandr Valialkin
6b1cc9b946 lib/storage: search for all the values for the given label before applying filters and limits
It is incorrect applying the limit on the number of values to search without applying filters,
since the returned subset of label values may miss the label values matching the given filters.

This is a follow-up for 66630c7960
2024-04-18 20:29:36 +02:00
Aliaksandr Valialkin
d3635aae7f app/{vlselect,vmselect}: run make vmui-update vmui-logs-update 2024-04-18 17:33:16 +02:00
Github Actions
453988e244 Automatic update operator docs from VictoriaMetrics/operator@4dfa9a6 (#6141)
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-04-18 14:49:17 +02:00
Denys Holius
89374935f2 Bump snap go builder version from v1.20 to v1.22 (#6142)
* snap/snapcraft.yaml: bump go builder version for snap from v1.20 to v1.22

* snap/snapcraft.yaml: drop arm64 architecture as not supported from snap multi build

see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5998
2024-04-18 14:48:32 +02:00
Github Actions
45f7e58d6c Automatic update operator docs from VictoriaMetrics/operator@781dbba (#6143) 2024-04-18 12:03:41 +02:00
Github Actions
c13729f07c Automatic update operator docs from VictoriaMetrics/operator@731de81 (#6139)
Co-authored-by: Alexander Marshalov <_@marshalov.org>
2024-04-18 10:25:29 +02:00
Github Actions
21b2b1bf19 Automatic update operator docs from VictoriaMetrics/operator@7f6c6d8 (#6138) 2024-04-18 10:22:02 +02:00
Aliaksandr Valialkin
844a1eb9a3 all: replace old https://docs.victoriametrics.com/PerTenantStatistic.html url with the new one - https://docs.victoriametrics.com/pertenantstatistic/ 2024-04-18 03:35:44 +02:00
Aliaksandr Valialkin
b2ce0657b2 all: replace old https://docs.victoriametrics.com/Quick-Start.html url with the new one - https://docs.victoriametrics.com/quick-start/ 2024-04-18 03:32:41 +02:00
Aliaksandr Valialkin
bfb69d346e all: replace old https://docs.victoriametrics.com/scrape_config_examples.html url with the new one - https://docs.victoriametrics.com/scrape_config_examples/ 2024-04-18 03:29:46 +02:00
Aliaksandr Valialkin
59d495d469 all: replace old https://docs.victoriametrics.com/Troubleshooting.html url with the new one - https://docs.victoriametrics.com/troubleshooting/ 2024-04-18 03:26:36 +02:00
Aliaksandr Valialkin
2e3580905f all: replace old https://docs.victoriametrics.com/relabeling.html url with the new one - https://docs.victoriametrics.com/relabeling/ 2024-04-18 03:22:22 +02:00
Aliaksandr Valialkin
e6027e043a all: replace old https://docs.victoriametrics.com/Articles.html url with the new one - https://docs.victoriametrics.com/articles/ 2024-04-18 03:19:31 +02:00
Aliaksandr Valialkin
a196370837 all: replace old https://docs.victoriametrics.com/CaseStudies.html url with the new one - https://docs.victoriametrics.com/casestudies/ 2024-04-18 03:16:05 +02:00
Aliaksandr Valialkin
e9642e99f2 all: replace old https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/single-server-victoriametrics/ 2024-04-18 03:11:03 +02:00
Aliaksandr Valialkin
8ee89c021f all: replace old https://docs.victoriametrics.com/url-examples.html url with the new one - https://docs.victoriametrics.com/url-examples/ 2024-04-18 03:01:31 +02:00
Aliaksandr Valialkin
f4b1cbfef0 all: replace old https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/cluster-victoriametrics/ 2024-04-18 02:54:20 +02:00
Aliaksandr Valialkin
4dfc61821a all: replace old https://docs.victoriametrics.com/FAQ.html url with the new one - https://docs.victoriametrics.com/faq/ 2024-04-18 02:45:50 +02:00
Aliaksandr Valialkin
d26d536c90 all: replace old https://docs.victoriametrics.com/CHANGELOG_YYYY.html url with the new one - https://docs.victoriametrics.com/changelog_YYYY/ 2024-04-18 02:42:09 +02:00
Aliaksandr Valialkin
2b19a3472c all: replace old https://docs.victoriametrics.com/CHANGELOG.html url with the new one - https://docs.victoriametrics.com/changelog/ 2024-04-18 02:38:30 +02:00
Aliaksandr Valialkin
e128ef7ace all: replace old https://docs.victoriametrics.com/keyConcepts.html url with the new one - https://docs.victoriametrics.com/keyconcepts/ 2024-04-18 02:33:29 +02:00
Aliaksandr Valialkin
828e78ceb4 all: replace old https://docs.victoriametrics.com/sd_configs.html url with the new one - https://docs.victoriametrics.com/sd_configs/ 2024-04-18 02:27:47 +02:00
Aliaksandr Valialkin
4d2b9fe6b2 all: replace old https://docs.victoriametrics.com/stream-aggregation.html url with the new one - https://docs.victoriametrics.com/stream-aggregation/ 2024-04-18 02:19:11 +02:00
Aliaksandr Valialkin
8eeb045d3f all: replace old https://docs.victoriametrics.com/MetricsQL.html url with the new one - https://docs.victoriametrics.com/metricsql/ 2024-04-18 02:14:53 +02:00
Aliaksandr Valialkin
6c14d08cb3 all: replace old https://docs.victoriametrics.com/vmgateway.html url with the new one - https://docs.victoriametrics.com/vmgateway/ 2024-04-18 02:08:18 +02:00
Aliaksandr Valialkin
a4b120f9be all: replace old https://docs.victoriametrics.com/vmbackupmanager.html url with the new one - https://docs.victoriametrics.com/vmbackupmanager/ 2024-04-18 02:03:58 +02:00
Aliaksandr Valialkin
f018cf6ca8 all: replace old https://docs.victoriametrics.com/vmrestore.html url with the new one - https://docs.victoriametrics.com/vmrestore/ 2024-04-18 02:00:27 +02:00
Aliaksandr Valialkin
6e6bae3e8d all: replace old https://docs.victoriametrics.com/vmbackup.html url with the new one - https://docs.victoriametrics.com/vmbackup/ 2024-04-18 01:57:04 +02:00
Aliaksandr Valialkin
19b6fd490c all: replace old https://docs.victoriametrics.com/vmctl.html url with the new one - https://docs.victoriametrics.com/vmctl/ 2024-04-18 01:53:36 +02:00
Aliaksandr Valialkin
1e24b334f1 all: replace old https://docs.victoriametrics.com/vmauth.html url with the new one - https://docs.victoriametrics.com/vmauth/ 2024-04-18 01:49:41 +02:00
Aliaksandr Valialkin
b4fac26360 all: replace old https://docs.victoriametrics.com/vmalert.html url with the new one - https://docs.victoriametrics.com/vmalert/ 2024-04-18 01:44:12 +02:00
Aliaksandr Valialkin
4927e64700 all: replace remaining https://docs.victoriametrics.com/vmagent.html urls with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:36:13 +02:00
Aliaksandr Valialkin
c81a633b02 all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:31:37 +02:00
Aliaksandr Valialkin
66630c7960 lib/storage: improve performance for /api/v1/label/labelName/values when match[] contains only a single filter on labelName
This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case:

  /api/v1/label/__name__/values?match[]={__name__=~"*.some_value.*"}

When the user types `some_value` in the query input field.
2024-04-18 01:15:20 +02:00
Aliaksandr Valialkin
50ac22df78 lib/httpserver: add support for automatic issuing of TLS certificates via Lets Encrypt service
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5949
2024-04-17 23:50:57 +02:00
Aliaksandr Valialkin
b421f1ab80 app/{vminsert,vmselect}: support for srv+addr scheme for specifying DNS SRV addresses at -storageNode flag
The new scheme is consistent with SRV urls introduced at b426d10847 and dc326f70b4

Deprecte the old scheme: `dns+srv:addr` by removing it from the docs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053
2024-04-17 23:15:52 +02:00
Aliaksandr Valialkin
184e4ef4d7 vendor: run make vendor-update 2024-04-17 22:54:56 +02:00
Aliaksandr Valialkin
bd454f5063 lib/netutil: move creation of GetCertificate callback into a separate function
This improves code readability a bit
2024-04-17 22:10:43 +02:00
Aliaksandr Valialkin
8412219781 docs/vmagent.md: typo fixes after dc326f70b4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053
2024-04-17 21:12:46 +02:00
Aliaksandr Valialkin
37b04bbb81 vendor: run make vendor-update 2024-04-17 20:59:48 +02:00
Aliaksandr Valialkin
dc326f70b4 app/vmagent: support for DNS SRV urls at -remoteWrite.url, scrape target urls and service discovery urls
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053
2024-04-17 20:54:39 +02:00
Aliaksandr Valialkin
b426d10847 app/vmauth: add support for configuring backends via DNS SRV urls 2024-04-17 20:46:22 +02:00
Aliaksandr Valialkin
07ed958b82 app/vmauth: add support for client TLS sertificates for backend requests over https
While at it, also add support for TLS ServerName for backend requests over https
2024-04-17 17:12:22 +02:00
Aliaksandr Valialkin
2d5e5badcf app/vmauth: use lib/promauth for creating backend roundtripper
This simplifies further maintenance and opens doors for additional config options
supported by lib/promauth. For example, an ability to specify client TLS certificates.
2024-04-17 16:46:29 +02:00
Aliaksandr Valialkin
21a9b1f920 docs/vmauth.md: add Authorization and Routing chapters 2024-04-17 16:17:14 +02:00
Aliaksandr Valialkin
db85744e04 app/vmauth: follow-up for b155b20de4
- Use exact matching by default for the query arg value provided via arg=value syntax at src_query_args.
  Regex matching can be enabled by using =~ instead of = . For example, arg=~regex.
  This ensures that the exact matching works as expected without the need to escape special regex chars.

- Add helper functions for creating QueryArg, Header and Regex structs in tests.
  This improves maintainability of the tests.

- Remove url.QueryUnescape() call on the url in TestCreateTargetURLSuccess(), since this is bogus approach.
  The url.QueryUnescape() must be applied to individual query args, and it mustn't be applied to the whole url,
  since in this case it may perform invalid unescaping in the context of the url, or make the resulting url invalid.

While at it, properly marshal all the fields inside UserInfo config to yaml in tests.
Previously Header and QueryArg structs were improperly marshaled because the custom MarshalYAML
is called only on pointers to Header and QueryArg structs. This improves test coverage.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6070
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6115
2024-04-17 14:27:52 +02:00
Aliaksandr Valialkin
161404813c docs/Cluster-VictoriaMetrics.md: clarify per-node workload increase when one of vmstorage node is unavailable in the cluster
The docs update is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6099#issuecomment-2060856417
2024-04-17 11:56:55 +02:00
Yury Molodov
ab5fd386bc vmui: preserve Select value if it matches list on blur (#6101) 2024-04-17 11:37:46 +02:00
Aliaksandr Valialkin
8acadb5080 docs/Cluster-VictoriaMetrics.md: remove incorrect information about the increase of workload on the remaining vmstorage nodes when one vmstorage node is unavailable
If one out of 5 vmstorage nodes is unavailable, then the remaining 4 vmstorage nodes will recieve 1/5=20% increase of workload, not 5%.
If one out of 10 vmstorage nodes is unavailable, then the remaining 9 vmstorage nodes will receive 1/10=10% increase of workload, not 1%.

This is a follow-up for 458338afa5

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6099
2024-04-17 11:25:11 +02:00
Roman Khavronenko
b155b20de4 app/vmauth: support regex matching in src_query_args (#6115)
Support regex matching when routing incoming requests based on HTTP query args
via `src_query_args` option at `url_map`.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6070

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-17 09:54:43 +02:00
Zakhar Bessarab
b4d8837917 app/vmauth: do not increment backend_errors when hitting concurrency limit (#6078)
* app/vmauth: do not increment backend_errors when hitting concurrency limit

Previously, both "vmauth_concurrent_requests_limit_reached_total" and "vmauth_user_request_backend_errors_total" were incremented.
This was based on the assumption that if concurrency limit is hit the backend must be failing to handle the request thus meaning an error.

This assumption does not work in case the endpoint can be overloaded by the misbehaving client sending too many requests within the timeframe.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-17 09:38:19 +02:00
Github Actions
1cbaec73ad Automatic update operator docs from VictoriaMetrics/operator@99fbc98 (#6122) 2024-04-17 09:34:47 +02:00
Aliaksandr Valialkin
fff31aa8b0 docs/CHANGELOG.md: move the description for the bugfix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6110 from v1.100.1 to the tip section
The bugfix isn't included in v1.100.1 release.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6110
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6111
This is a follow-up for 7308bad777
2024-04-16 19:59:13 +02:00
Aliaksandr Valialkin
e3a26c0db6 lib/promscrape/discovery/consul: typo fix in the comment: enteprise -> enterprise 2024-04-16 19:34:18 +02:00
Aliaksandr Valialkin
85d09e5a2d lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json
This should improve debuggability of unexpected deletion of directories inside partitions.

While at it, log the proper path to parts.json when the directory for big part is missing in the partition.
parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.
2024-04-16 19:11:32 +02:00
Aliaksandr Valialkin
6bcc6c938b lib/storage: improve comments inside functions responsible for creating indexes for newly registered time series 2024-04-16 19:11:32 +02:00
Zakhar Bessarab
458338afa5 docs/cluster: update cluster resizing info (#6099)
* docs/cluster: update cluster resizing info

- add example of resources distribution
- add info about how to handle uneven disk usage after adding a new storage node

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/Cluster-VictoriaMetrics.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update docs/Cluster-VictoriaMetrics.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-04-16 15:44:05 +02:00
Github Actions
aaa18e565d Automatic update operator docs from VictoriaMetrics/operator@13f6dac (#6119) 2024-04-16 14:58:29 +02:00
hagen1778
4f55aa29db docs: mention HTTP sink configuration example for Vector
Follow-up 16eeb4e

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-16 14:07:16 +02:00
hagen1778
9064602d00 deployment/vector: add example for JSON stream config
Follow-up 16eeb4eb33

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-16 13:57:54 +02:00
Devin Buhl
16eeb4eb33 victorialogs: mention vector supports http/json stream (#6114)
https://github.com/vectordotdev/vector/issues/18883#issuecomment-1771424716
2024-04-16 13:52:03 +02:00
guangwu
9dd5db2b77 app/vmctl: properly close file descriptor in verify-block action (#6106) 2024-04-16 11:33:04 +02:00
Vadim Rutkovsky
66c5fc3243 dashboards: fix typo in VictoriaLogs panel (#6102)
Comprasion -> compression
2024-04-16 09:50:46 +02:00
yudrywet
43835704b7 chore: fix some typos in comments (#6103)
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-16 09:48:52 +02:00
Alexander Marshalov
7308bad777 vmalert: support any status code from the range 200-299 from alertmanager as successful (#6111)
* any status code from the range 200-299 from alertmanager to vmalert is not considered an error from now on (#6110)

* add changelog
2024-04-16 09:33:11 +02:00
Github Actions
7db8ba41e7 Automatic update operator docs from VictoriaMetrics/operator@cf48a99 (#6113) 2024-04-16 09:54:29 +08:00
Dmytro Kozlov
7b20de4674 docs: fix typo in the curl command (#6109) 2024-04-15 14:29:26 +02:00
Yury Molodov
f06f55edb6 vmui/vmanomaly: integrate vmanomaly query_server (#6017)
* vmui: fix parsing of fractional values

* vmui/vmanomaly: update display logic to align with vmanomaly /query_range API

* vmui/vmanomaly: rename flag anomalyView to isAnomalyView
2024-04-15 09:25:52 +02:00
Dima Lazerka
22497c2c98 VMUI: Update builder Node version (#5908)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Dzmitry Lazerka <dlazerka@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-15 09:07:30 +02:00
Github Actions
cba2f6dce1 Automatic update operator docs from VictoriaMetrics/operator@73a1996 (#6100) 2024-04-13 10:04:25 +08:00
hagen1778
e39a1a98f5 docs: add 1.100.1 release date
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-12 12:20:27 +02:00
Hui Wang
2123821e0f bump victoriametrics components to v1.100.1 (#6095)
* bump victoriametrics components to v1.100.1

* add one
2024-04-12 18:16:17 +08:00
Zakhar Bessarab
b8ba9ea769 docs/changelog: fix markdown formatting (#6094)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-12 17:59:21 +08:00
Zakhar Bessarab
8f457c550d docs/changelog: add update node to reset cache when upgrading to 1.100.1 (#6093)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-12 11:52:20 +02:00
hagen1778
267c28362b docs: mention that 1.100.1 isn't released yet in 1.99 section
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-11 21:46:28 +02:00
hagen1778
14f3f72829 docs: mention that 1.100.1 isn't released yet
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-11 16:34:37 +02:00
Aliaksandr Valialkin
9ee51e34cc deployment: upgrade VictoriaLogs from v0.5.1-victorialogs to v0.5.2-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.5.2-victorialogs
2024-04-11 09:53:24 +02:00
Aliaksandr Valialkin
7c0003d8a4 vendor: run make vendor-update 2024-04-11 09:46:22 +02:00
Aliaksandr Valialkin
e6b1ea6740 docs/VictoriaLogs/CHANGELOG.md: cut v0.5.2-victorialogs release 2024-04-11 09:42:06 +02:00
Aliaksandr Valialkin
db0c669cf4 docs/VictoriaLogs/CHANGELOG.md: document the bugfix at 2205de2391
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6042
2024-04-11 09:40:58 +02:00
Aliaksandr Valialkin
c9aca0c3b6 docs/CHANGELOG.md: cut v1.100.1 release 2024-04-11 09:35:08 +02:00
Aliaksandr Valialkin
8bcbdc106c docs/CHANGELOG.md: mention that the bug with incorrect registration of new entries in the IndexDB has been introduced in v1.99.0 2024-04-11 09:33:17 +02:00
Zakhar Bessarab
2205de2391 lib/mergeset: fix flushing incorrect set of inmemoryBlocks (#6089)
Follow-up for bace9a2501

Related:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6069
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-11 09:26:06 +02:00
Github Actions
a4945c0bf0 Automatic update operator docs from VictoriaMetrics/operator@17082f0 (#6090) 2024-04-11 09:15:04 +02:00
Zakhar Bessarab
b6f94cdda7 docs/changelog: add entry for bd398974334678a29972475b0566e0bd2434d197
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-11 09:11:49 +02:00
b02dda1440 docs: fix a typo in docs/operator/resources/vmalert.md (#6088) 2024-04-10 15:28:33 +02:00
Denys Holius
3a9b34a67c fixed EXPOSE port for VictoriaLogs' Dockerfiles (#6082) 2024-04-09 13:54:52 -07:00
wanshuangcheng
83216e956c chore: fix function names in comment (#6076)
Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>
2024-04-08 01:11:12 -07:00
Artem Navoiev
fc8d9dd317 github actions: sync doc action, do not build search index, just copy content
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-04-05 10:57:22 +02:00
Aliaksandr Valialkin
0dda3978a5 docs/CHANGELOG.md: remove description of the reverted change at c79bf3925c
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985
2024-04-04 16:55:45 +03:00
Aliaksandr Valialkin
7bf090525d deployment/docker: update VictoriaLogs tag from v0.5.0-victorialogs to v0.5.1-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.5.1-victorialogs
2024-04-04 16:52:30 +03:00
Aliaksandr Valialkin
79bc2288c0 docs/VictoriaLogs/CHANGELOG.md: cut v0.5.1-victorialogs release 2024-04-04 16:41:33 +03:00
Aliaksandr Valialkin
b62496d997 docs/VictoriaLogs/CHANGELOG.md: document the fix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5920
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5927
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5920

This is a follow-up for 46fd0ed693
2024-04-04 16:38:38 +03:00
Aliaksandr Valialkin
8cd6f7ea3c deployment: update VictoriaMetrics docer image from v1.99.0 to v1.100.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.100.0
2024-04-04 15:29:00 +03:00
Aliaksandr Valialkin
00e5e00a5a docs: change old url from https://docs.victoriametrics.com/enterprise.html to new url https://docs.victoriametrics.com/enterprise/ 2024-04-04 15:21:59 +03:00
Aliaksandr Valialkin
69d4075945 docs/Single-server-VictoriaMetrics.md: remove misleading filters word from the text, which recommends evaluating downsampling in VictoriaMetrics enterprise 2024-04-04 14:58:28 +03:00
Aliaksandr Valialkin
c57f43d3f0 docs: clarify that downsampling drops all the samples except the last one on every downsampling interval 2024-04-04 14:48:28 +03:00
Aliaksandr Valialkin
39924c8079 docs/CHANGELOG.md: typo fix: remove duplicate samples word 2024-04-04 14:46:13 +03:00
Aliaksandr Valialkin
f924bf5432 docs/CHANGELOG.md: advise upgrading from v1.99.0 to v1.100.0 because of the issue at v1.99.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
2024-04-04 14:41:14 +03:00
Aliaksandr Valialkin
2d4ce05895 docs/CHANGELOG.md: move custom backup interval feature to tip, since it wasnt included in v1.100.0 release
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5966
This is a follow-up for baa869742208f311d84e800ea527a7f07eb1ca18
2024-04-04 12:33:12 +03:00
Zakhar Bessarab
28d4ad24a6 Vmbackupmanager: add support of custom backup interval (#742)
* app/vmbackupmanager: add support of defining custom backup interval

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: update vmbackupmanager docs

Update docs after https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5966

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-04-04 12:31:31 +03:00
Aliaksandr Valialkin
d0ab3b2b02 docs/CHANGELOG.md: add release date for v1.100.0 2024-04-04 12:28:24 +03:00
Aliaksandr Valialkin
e9da0b1714 docs/CHANGELOG.md: cut v1.100.0 2024-04-04 03:47:18 +03:00
Aliaksandr Valialkin
f8d10a7106 lib/streamaggr: update the minimum allowed timestamp for incoming samples before flushing the samples to the storage
This should prevent from dropping samples with old timestamps during long flushes.

This is a follow-up for 1cedaf61cb
2024-04-04 02:25:51 +03:00
Aliaksandr Valialkin
931dd3f320 app/{vmagent,vminsert}: accept Prometheus remote write protocol requests at /prometheus/api/v1/push additionally to /api/v1/push
This is a follow-up for 7ccdb57ea4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5990
2024-04-04 02:15:52 +03:00
Eugene Ma
7ccdb57ea4 add "/api/v1/push" to request handler (#5990)
Co-authored-by: Eugene Ma <eugene.ma@airbnb.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 02:10:24 +03:00
Aliaksandr Valialkin
619964c5fc app/{vmselect,vlselect}: run make vmui-update vmui-logs-update after the recent changes at app/vmui 2024-04-04 02:08:24 +03:00
nemobis
7baae7f42a Add note about final deduplication space needs (#5996)
Addresses #5975

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 02:04:12 +03:00
Zakhar Bessarab
c006db1798 [docs][github] Update contributing information (#6040)
* add pull request template

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* update text

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* update text

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* Update .github/PULL_REQUEST_TEMPLATE/pull_request_template.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update .github/PULL_REQUEST_TEMPLATE/pull_request_template.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* update messaging add example

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* docs/contributing: add info about mandatory requirements before sending a PR

Added the following info:
- commits signing / sign-off
- how to run tests / linting with make commands

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-04-04 02:01:44 +03:00
dependabot[bot]
81657729ce build(deps-dev): bump express in /app/vmui/packages/vmui (#6038)
Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2.
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/master/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.18.2...4.19.2)

---
updated-dependencies:
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 01:57:03 +03:00
dependabot[bot]
6fd369afa1 build(deps-dev): bump webpack-dev-middleware in /app/vmui/packages/vmui (#6011)
Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.3 to 5.3.4.
- [Release notes](https://github.com/webpack/webpack-dev-middleware/releases)
- [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-dev-middleware/compare/v5.3.3...v5.3.4)

---
updated-dependencies:
- dependency-name: webpack-dev-middleware
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 01:55:46 +03:00
dependabot[bot]
d99b64d31f build(deps-dev): bump follow-redirects in /app/vmui/packages/vmui (#5978)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.5 to 1.15.6.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.15.5...v1.15.6)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 01:54:37 +03:00
Yury Molodov
706bac30ae vmui: fix step update on input blur in Firefox/Safari (#6034)
* vmui: fix step value application on input blur

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 01:52:55 +03:00
Yury Molodov
26e981ced2 vmui: fix trigger auto-suggestion (#6033)
* vmui: fix ui freeze on query paste #5923

* vmui: fix auto-suggestion trigger issue after whitespace char #5866

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-04 01:48:37 +03:00
Aliaksandr Valialkin
5b1b9c2f7d app/vmui/Dockerfile-web: update alpine docker image from 3.19.0 to 3.19.1
This is a follow-up for fcc8b14f86
2024-04-04 01:42:28 +03:00
Aliaksandr Valialkin
d776c22592 deployment: update Go builder from 1.22.1 to 1.22.2
See https://github.com/golang/go/issues?q=milestone%3AGo1.22.2+label%3ACherryPickApproved
2024-04-04 01:40:28 +03:00
Aliaksandr Valialkin
b212c9d6f5 vendor: run make vendor-update 2024-04-04 01:34:44 +03:00
Aliaksandr Valialkin
967d5496cf app/vmagent: follow-up for b3b29ba6ac
- Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag
- Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config
- Document the change as a feature instead of a bug at docs/CHANGELOG.md
- Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files.
- Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport
  instead of a callback, which can change the internal http.Transport.
- Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs.
  This should reduce memory usage a bit when tls isn't used for scraping big number of targets.
- Do not re-read TLS root CA files on every processed request. Re-read them once per second.
  This should reduce CPU usage when scraping big number of targets over https.
- Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded
  from byte slices via crypto/tls.X509KeyPair().
- Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171
2024-04-04 01:27:35 +03:00
Aliaksandr Valialkin
b958fb1e76 docs/CHANGELOG.md: add - in front of -logInvalidAuthTokens command-line flag in order to be consistent with command-line flag naming
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6029
2024-04-03 20:04:09 +03:00
Zakhar Bessarab
a8acf3767a vmgateway: add an ability to log invalid auth tokens (#743)
* app/vmgateway: add an ability to log invalid auth tokens

This is useful for debugging to make it easier for user to find issues in token contents.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6029
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add info about new vmgateway flag

- add changelog entry
- add info about logInvalidAuthTokens flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmgateway/filters/auth: improve reject reason visibility

Explicitly return a rejection reason for request when "logInvalidAuthTokens" is enabled.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-04-03 20:02:50 +03:00
Aliaksandr Valialkin
1de6cd4442 app/vmalert: document that -rule.stripFilePath command-line flag is available only in enterprise version of vmalert 2024-04-03 19:56:57 +03:00
Zakhar Bessarab
f80ac120f3 lib/promscrape/config: fix missing timeout for http client (#6063)
Follow-up for b3b29ba6

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-03 18:18:48 +02:00
Thomas
93c3be2530 chore(docs): fix vmalertmanager typo (#6056)
Fixes: #6055

Signed-off-by: Thomas Way <thomas@6f.io>
Co-authored-by: Alexander Marshalov <_@marshalov.org>
2024-04-03 11:02:30 +02:00
Github Actions
a51a2bc692 Automatic update operator docs from VictoriaMetrics/operator@92cdca3 (#6052) 2024-04-03 12:02:41 +04:00
Zakhar Bessarab
b3b29ba6ac lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk (#5725)
* lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk

Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change.

Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`.

This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: update NewRoundTripper API

Update API to allow user to update only parameters required for transport.

Add warning log when reloading Root CA failed.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: fix mutex acquire logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: replace RWMutex with regular mutex to simplify the code

- remove additional mutex used for getRootCABytes - require callee to use mutex
- replace RWMutex with regular mutex

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: refactor

- hold the mutex lock to avoid round tripper being re-created twice
- move recreation logic into separate func to simplify the code

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-04-03 10:01:43 +02:00
Aliaksandr Valialkin
6910e72c99 docs/CHANGELOG.md: typo fix: resonses -> responses 2024-04-03 03:20:18 +03:00
Aliaksandr Valialkin
fb42380ef3 lib/protoparser/opentelemetry: follow-up after 47892b4a4c
- Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming
- Clarify the description of the change at docs/CHANGELOG.md
- Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens
- Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens.
- Add tests for various edge cases for Prometheus metric names' normalization
  according to the code at b865505850/pkg/translator/prometheus/normalize_name.go
- Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035
2024-04-03 02:25:29 +03:00
Aliaksandr Valialkin
3de8656551 app/vmagent/remotewrite: follow-up for 166b97b8d0 and b6bd9a97a3
- Make the configuration more clear by accepting the list of ignored labels during sharding
  via a dedicated command-line flag - -remoteWrite.shardByURL.ignoreLabels.
  This prevents from overloading the meaning of -remoteWrite.shardByURL.labels command-line flag.

- Removed superfluous memory allocation per each processed sample if sharding by remote storage is enabled.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5938
2024-04-03 00:54:01 +03:00
Aliaksandr Valialkin
55bd43f28e docs: follow-up for ac9c2a796f
Remove description for -search.maxExportDuration and -search.maxStatusRequestDuration command-line flags
from the 'Resource usage limits' chapter, since these flags are rarely used for limiting resource usage
and they are already documented in the 'List of command-line flags' chapter.
2024-04-02 23:57:37 +03:00
Aliaksandr Valialkin
e4eccd7074 app/vmselect/graphite: follow-up for 23ab865035
- Fix docs for new functions at app/vmselect/graphite/functions.json
- Properly drain series lists on errors in aggregateSeriesListsGeneric() and aggregateSeriesList()
- Add links to docs for the added functions at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5809
2024-04-02 23:39:00 +03:00
Aliaksandr Valialkin
918cccaddf all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-02 23:16:29 +03:00
Aliaksandr Valialkin
c3a72b6cdb lib/storage: consistently use stopCh instead of stop 2024-04-02 21:24:57 +03:00
Aliaksandr Valialkin
be36ceb1cf app/vmauth: add ability to authorize via any opaque HTTP request header value
This can be done via `auth_token` option at -auth.config - see https://docs.victoriametrics.com/vmauth/#auth-config
2024-04-02 21:16:11 +03:00
Aliaksandr Valialkin
21bfb66650 app/vmauth: add ability to read auth tokens from multiple http request headers
This is needed for VictoriaMetrics Cloud, where the same token could be passed either
via Authorization or via X-Amz-Firehose-Access-Key header - see 4487dac30b (r140500722)

This is a follow-up for 4487dac30b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6009
2024-04-02 19:29:00 +03:00
Artem Navoiev
9bd3cadce6 app/{vmagent/insert} fix typo in Firehose
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-04-02 17:41:21 +02:00
Aliaksandr Valialkin
4487dac30b app/vmauth: follow-up for bc90f4aae6
- Allow specifying only a single HTTP header for reading auth tokens via -httpAuthHeader command-line flag.
  This is better from security PoV, since this prevents from accidental reading of auth token from undesired
  HTTP header. By default the -httpAuthHeader equals to Authorization. When it is overridden, then
  auth token isn't read from Authorization header - it is read only from the specified header.

- Document the -httpAuthHeader command-line flag at https://docs.victoriametrics.com/vmauth/#reading-auth-tokens-from-other-http-headers

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6009
2024-04-02 18:35:21 +03:00
Aliaksandr Valialkin
904e95fc69 app/vmagent: simplify code after 509df44d03
- Simplify the code in order to improve its maintenance
- Properly pass tenant ID when processing multi-tenant opentelemetry request at vmagent

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016
2024-04-02 17:58:13 +03:00
Artem Navoiev
76b1fc6ac1 add more delays to verify that this is not a reason for flaky tests
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-04-02 16:22:36 +02:00
Fred Navruzov
daa1326b98 docs/vmanomaly: typos fix (#6047) 2024-04-01 13:23:44 -07:00
Fred Navruzov
c300ce659f docs/vmanomaly: v1.12 updates & fixes (#6046)
* docs/vmanomaly: v1.12.0 & link updates

* add autotuned description to model section

* - update refs of vmanomaly on enterprise and vmalert pages
- add diagrams for model types
- update self-monitoring section

* - fix typos
- remove .index.html from links
2024-04-01 16:41:55 +03:00
Aliaksandr Valialkin
c79bf3925c Revert "app/vmselect: make vmselect resilient to absence of cache folder (#5987)"
This reverts commit cb23685681.

Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders
before their use. This may complicate detecting and fixing such bugs in the future.

There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 :
- To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp).
- To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory,
  which cannot be removed automatically by the OS.

The case when the user accidentally deletes the directory with some files created by VictoriaMetrics
shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically.
It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error
in this case.
2024-03-30 07:29:24 +02:00
Aliaksandr Valialkin
49a6dca2d5 docs/CHANGELOG.md: mention that the bug with improper use of -search.maxExportDuration instead of -search.maxLabelsAPIDuration has been introduced in v1.99.0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5992
This is a follow-up for bc79f7196d
2024-03-30 07:05:09 +02:00
Aliaksandr Valialkin
8f59ca423b docs/VictoriaLogs/CHANGELOG.md: improve the description of the bugfix from 43b5d8bc7a, so it can be googled by users 2024-03-30 06:54:48 +02:00
Aliaksandr Valialkin
830b871baf app/vmagent: properly shutdown when -maxIngestionRate limit is reached
The remotewrite.Stop() expects that there are no pending calls to TryPush().
This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop().
Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop().

While at it, move the rate limiter into lib/ratelimiter package, since it has two users.
Also move the description of the feature to the correct place at docs/CHANGELOG.md.
Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags.

This is a follow-up for 02bccd1eb9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900
2024-03-30 06:43:48 +02:00
Aliaksandr Valialkin
f5848a5c8b docs/managed-victoriametrics/alerting-vmalert-managed-victoria-metrics.md: user proper image paths according to docs/assets/README.md 2024-03-30 05:09:41 +02:00
Aliaksandr Valialkin
f17248eb3f docs/managed-victoriametrics: use proper names for the linked images according to docs/assets/README.md
This is a follow-up for db3709c87d

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5989
2024-03-30 05:00:19 +02:00
Aliaksandr Valialkin
4cb70ee9a3 vendor: update github.com/VictoriaMetrics/metrics and github.com/VictoriaMetrics/metricsql to newer versions
This is needed for updating broken links to MetricsQL docs:

https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/MetricsQL -> https://docs.victoriametrics.com/metricsql/

This is a follow-up for 7e3511ffbd
2024-03-30 04:44:19 +02:00
Aliaksandr Valialkin
4d71a33cb5 docs/CHANGELOG.md: remove the update notes regarding converting custom HTTP header keys to canonical form
Custom HTTP headers are set via net/http.Header.Set or net/http.Header.Add functions.
These functions always convert header keys to canonical form. So the change at b577413d3b
isn't visible to users of VictoriaMetrics components.

There is no need in documenting this change at docs/CHANGELOG.md, since it doesn't give any useful information to users.

This is a follow-up for e6dd52b04c
2024-03-30 04:26:37 +02:00
Zakhar Bessarab
af3922b1df lib/storage: add ability to use downsampling for the given series filter (#733)
* lib/storage: add ability to use downsampling for the given series filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add information about downsampling filters

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix MetricsQL filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: treat missing downsampling filter as a bug

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/part_header: verify correctness of downsampling filters when opening partition

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: save only appliable rules in part metadata

Filter and save only rules which are appliable to partition based on MinTimestamp of stored data.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: update log messages for final dedup

Properly specify a reason of re-running deduplication for partition.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules

Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule.
For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000].

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Follow-up

- Apply the first matching downsampling period if multiple filters match the given time series.
  This allows fine-tuning the downsampling config for the specific needs.
- Take into account downsampling filters during search queries.
- Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches.
- Properly parse series filters with colons inside them.
- Document the feature at docs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 04:12:23 +02:00
Aliaksandr Valialkin
131f357098 lib/storage/table.go: reduce the difference with enterprise branch 2024-03-30 03:22:51 +02:00
Aliaksandr Valialkin
4001ca36b8 lib/storage/partition.go: reduce code difference a bit with enterprise branch 2024-03-30 01:39:27 +02:00
Nikolay
a05303eaa0 lib/storage: adds metrics for downsampling (#382)
* lib/storage: adds metrics for downsampling
vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled
vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled

These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 01:11:49 +02:00
hagen1778
e79b05b4ab docs: update vmalert troubleshooting docs
* rm recommendation to keep look-behind window empty, as it is not correct
* mention the change of default value for `-search.latencyOffset`

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-29 18:00:37 +01:00
hagen1778
2e843a8ed9 docs: follow-up after 623d257faf
623d257faf
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-29 14:29:02 +01:00
Jiekun
623d257faf app/vmalert: respect batch size limit for remote write on shutdown (#6039)
During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.

This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others). 

This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025
2024-03-29 14:27:50 +01:00
hagen1778
b6bd9a97a3 app/vmagent: follow-up 166b97b8d0
* add tests for sharding function
* update flags description
* add changelog note

166b97b8d0
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-29 14:08:08 +01:00
Andrii Chubatiuk
47892b4a4c opentelemetry: added cmd flag to sanitize metric names (#6035) 2024-03-29 13:51:24 +01:00
Eugene Ma
166b97b8d0 vmagent: support sharding by excluded labels (#5938)
To horizontally scale streaming aggregation, you might want to deploy a separate hashing tier 
of vmagents that route to a separate aggregation tier. The hashing tier should shard by all labels 
except the instance-level labels, to ensure the input metrics are routed correctly to the aggregator 
instance responsible for those labels.
For this to achieve we introduce `remoteWrite.shardByURL.inverseLabels` flag to inverse logic of `remoteWrite.shardByURL.labels`

---------

Co-authored-by: Eugene Ma <eugene.ma@airbnb.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-29 13:26:02 +01:00
Dmytro Kozlov
ac9c2a796f docs: describe timeout query argument (#6020)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-03-28 16:17:33 +01:00
Hui Wang
47e7ad2e01 docs: fix golangci-lint check (#6036)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-03-28 08:58:27 +01:00
Hui Wang
d7224b2d1c vmalert: fix sending alert messages (#6028)
* vmalert: fix sending alert messages
1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered;
2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile.

* docs: update changelog notes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-03-28 08:55:10 +01:00
Aliaksandr Valialkin
77eca6bb37 docs/MetricsQL.md: typo fix: outlier_iqr_over_time(memory_usage_bytes[1h]) triggers when memory_usage_bytes goes outside the usual value range for the last hour, not the last 24 hours
This is a follow-up for ea81f6fc36
2024-03-27 20:59:43 +02:00
hagen1778
f937439657 docs: fix new line for update notes in CHANGELOG
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-27 16:24:16 +01:00
hagen1778
d72b565c03 docs: mention new guide How to use OpenTelemetry metrics with VictoriaMetrics in docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-27 16:23:28 +01:00
Nikolay
f8f4025dca docs/opentelemetry: adds opentemetry get started guide (#5861)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2024-03-27 16:04:43 +01:00
Aliaksandr Valialkin
4a359d5f67 lib/storage: follow-up for 76f00cea6b
Store the deadline when the metricID entries must be deleted from indexdb
if metricID->metricName entry isn't found after the deadline. This should
make the code more clear comparing the the previous version, where the timestamp
of the first metricID->metricName lookup miss was stored in missingMetricIDs.

Remove the misleading comment about the importance of the order for creating entries
in the inverted index when registering new time series. The order doesn't matter,
since any subset of the created entries can become visible for search
before any other subset after registering in indexdb.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
2024-03-27 11:41:28 +02:00
Github Actions
20d0183195 Automatic update operator docs from VictoriaMetrics/operator@c336013 (#6023) 2024-03-26 17:55:02 +01:00
Github Actions
1263cab870 Automatic update operator docs from VictoriaMetrics/operator@ac29c88 (#6022) 2024-03-26 23:35:41 +08:00
hagen1778
5ffece51bf docs: follow-up after 23ab865035
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-26 15:17:16 +01:00
rbizos
23ab865035 adding AggregateSeriesLists graphite function (#5809)
* adding aggregate series list graphite function

adding also aliases for sum diff and multiply

* Adding tests for aggregateSeriesLists and aliases
2024-03-26 15:13:34 +01:00
Zakhar Bessarab
51f5ac1929 lib/storage/table: wait for merges to be completed when closing a table (#5965)
* lib/storage/table: properly wait for force merges to be completed during shutdown

Properly keep track of running background merges and wait for merges completion when closing the table.
Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-03-26 13:49:09 +01:00
Andrii Chubatiuk
bc90f4aae6 vmauth: support other auth header names besides Authorization (#6009) 2024-03-26 13:21:07 +01:00
Andrii Chubatiuk
509df44d03 app/{vmagent,vminsert}: fixed firehose response (#6016) 2024-03-26 13:20:41 +01:00
Roman Khavronenko
cb23685681 app/vmselect: make vmselect resilient to absence of cache folder (#5987)
vmselect uses a cache folder in file system for two purposes:
1. Storing rollup cache results on shutdown;
2. Storing temporary search results from vmstorage during query executions.

It could happen that cache folder is deleted accidentally by user, or by OS
during cleanup routines. This would cause vmselect to:
1. panic on /metrics call, because `MustGetFreeSpace` will fail;
2. return query error user, as it won't be able to store temporary search results.

The changes in this commit are the following:
1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing;
2. Make vmselect to try re-creating the cache folder if it can't persist tmp search
results.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-03-26 12:59:50 +01:00
hagen1778
bc79f7196d docs: follow-up for 70eaa06f08
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-25 15:30:56 +01:00
kbweave
70eaa06f08 app/vmselect: use GetDeadlineForLabelsAPI for LabelAPI requests (#5992) 2024-03-25 15:07:34 +01:00
Artem Navoiev
b08cbd0400 add more redirects
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-25 10:09:15 +01:00
Artem Navoiev
b569fa0b2c fix typo in kyiv city name
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-23 21:56:31 +01:00
Nikolay
43b5d8bc7a app/vlselect: follow-up for 0514091948 (#6004)
removes println lines
2024-03-22 08:46:40 +01:00
Denys Holius
0c0ed61ce7 Makefile: bump version of golangci-lint to the latest v1.57.1 (#6001)
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-22 08:45:10 +01:00
Alexander Marshalov
02bccd1eb9 [vmagent] added ingestion rate limiting with new flag -maxIngestionRate (#5900)
* [vmagent] added ingestion rate limiting with new flag `-maxIngestionRate`. This flag can be used to limit the number of samples ingested by vmagent per second. If the limit is exceeded, the ingestion rate will be throttled.

* fix changelog

* fix review comment
2024-03-21 17:14:49 +01:00
Nikolay
db3709c87d docs/managed: adds alertmanager configuration examples (#5989)
* docs/managed: adds alertmanager configuration examples

* apply review suggestions
2024-03-21 13:33:49 +01:00
hagen1778
93a29fce4e docs: add missing API version to VMSingle example
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-19 18:55:00 +01:00
hagen1778
21d9393c9e docs: mention Query Analyzer in docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-19 13:31:14 +01:00
Yury Molodov
46fd0ed693 vmui: fix the _time filter insertion for all queries in VictoriaLogs UI #5920 (#5927)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5920
2024-03-18 14:10:24 +01:00
Artem Navoiev
b399852742 remove workflow that syncs docs with wiki
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-18 12:54:32 +01:00
Artem Navoiev
7e3511ffbd remove wiki link from snap.yaml
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-18 12:22:56 +01:00
Dmytro Kozlov
5f8b91186a app/vmctl: break explore phase in vm-native mode by time intervals
When `--vm-native-step-interval` is specified, explore phase will be executed
within specified intervals. Discovered metric names will be associated with
time intervals at which they were discovered. This suppose to reduce number
of requests vmctl makes per metric name since it will skip time intervals
when metric name didn't exist.

This should also reduce probability of exceeding complexity limits
for number of selected series in one request during explore phase.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5369
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-18 12:18:32 +01:00
hagen1778
e6dd52b04c lib/promauth: follow-up b577413d3b
Convert test result expectations to canonical form.
Starting from b577413d3b specified header keys are forced
into canonical form https://pkg.go.dev/net/http#CanonicalHeaderKey

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-18 11:12:45 +01:00
Aliaksandr Valialkin
4553521f9a lib/streamaggr: ignore out of order samples for last output
This is a follow-up for 6a465f6e29

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-18 01:03:36 +02:00
Aliaksandr Valialkin
f95e9f13ae vendor: run make vendor-update 2024-03-18 00:51:41 +02:00
Aliaksandr Valialkin
76f00cea6b lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search
The metricID->metricName entry can remain invisible for search for some time after registering new metricName.
This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName
entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948

See also 20812008a7
2024-03-18 00:34:32 +02:00
Aliaksandr Valialkin
729b263670 lib/httputils: rename CAFile -> caFile in order to be consistent with local var naming in Go
This is a follow-up for 83e55456e2
2024-03-17 23:19:52 +02:00
Aliaksandr Valialkin
f45f39d80e Revert "deployment/docs: use lower-case links to VictoriaLogs docs"
This reverts commit a0937b01c1.

Reason for revert: MixedCase links started working again.
See, for example, https://docs.victoriametrics.com/VictoriaLogs/querying/#vmui
2024-03-17 23:13:23 +02:00
Aliaksandr Valialkin
1cedaf61cb app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation
See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples
2024-03-17 23:03:47 +02:00
Aliaksandr Valialkin
6a465f6e29 lib/streamaggr: ignore out of order samples when calculating increase, increase_prometheus, total and total_prometheus outputs
Out of order samples may result in unexpected spikes for these outputs.
So it is better to ignore such samples.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-17 22:03:03 +02:00
Aliaksandr Valialkin
cbd80efcc1 lib/streamaggr: follow-up for 15e33d56f1
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5939
2024-03-17 21:37:16 +02:00
Aliaksandr Valialkin
ab2b3f1785 docs/CHANGELOG.md: clarify that -datasource.lookback commnad-line flag is no-op in the upcoming release
Document the solution - to switch to eval_delay option at group config.

This is a follow-up for e80b44f19d
2024-03-17 21:04:56 +02:00
Aliaksandr Valialkin
fb502700f7 docs/CHANGELOG.md: document the bugfix from cb259116b4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5802
2024-03-17 20:41:24 +02:00
Aliaksandr Valialkin
b577413d3b lib/promauth: properly set Host header in requests to scrape targets.
The `Host` header must be set via net/http.Request.Host field, since net/http.Client
ignores this header if it is set via Request.Header.Set().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970
2024-03-17 20:22:54 +02:00
Artem Navoiev
5f9fb58dde dashboards: statistic per tenant dashboard use variable for datasource in pie charts
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-16 13:46:56 +01:00
hagen1778
a2ea8bc97b app/vmctl: fix arguments order in httputils.TLSConfig
follow-up after 9d5bf5ba5d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-14 11:45:39 +01:00
Khushi Jain
9d5bf5ba5d app/vmctl: fix the order of arguments in TLS config func (#5972) 2024-03-14 11:22:12 +01:00
Github Actions
f4d16919ee Automatic update operator docs from VictoriaMetrics/operator@ae6e1b6 (#5964) 2024-03-13 23:12:51 +01:00
Daria Karavaieva
75aa704ee6 docs/vmanomaly: fix 404 links (#5968) 2024-03-13 21:04:00 +01:00
hagen1778
2e91dd18c7 docs: mention missing vmalert change for memory usage reduction in 1.97.3
521f9ffb43
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-13 20:18:10 +01:00
hagen1778
d7d685f2af deployment/docs: mention other log shippers for VictoriaLogs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-13 11:31:56 +01:00
hagen1778
a0937b01c1 deployment/docs: use lower-case links to VictoriaLogs docs
Links with upper-case simply don't work for unknown reason.
Once the reason is fixed on docs side, this commit can be reverted.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-13 11:28:06 +01:00
hagen1778
b3d84489ec docs: follow-up 15e33d56f1
Update documentation according to changes in deduplication logic.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-12 22:57:23 +01:00
Andrii Chubatiuk
15e33d56f1 lib/streamaggr: pick sample with bigger timestamp or value on deduplicator (#5939)
Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-12 22:47:29 +01:00
Hui Wang
e80b44f19d vmalert: deprecate cmd-line flag -datasource.lookback (#5877)
* vmalert: deprecate cmd-line flag `-datasource.lookback`

* fix lint

* review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-03-12 16:16:50 +01:00
Github Actions
e8bb64bad5 Automatic update operator docs from VictoriaMetrics/operator@de88675 (#5958) 2024-03-12 15:57:10 +01:00
Zakhar Bessarab
45d8d41e1e docs: explicitly mention VMUI is available in cluster (#5955)
It is confusing for cluster users to find that VMUI is available at vmselect as it is only mentioned in the list of URLs. Explicit mention of vmselect URL in docs will make it easier to discover.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-03-12 15:56:45 +01:00
hagen1778
69dbfa7bc2 docs: mention bug investigation in 1.99
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-12 12:57:45 +01:00
Tien M. Nguyen
f5115c8f1b feat: include cluster info in alert CPUThrottlingHigh (#5956) 2024-03-12 14:51:32 +04:00
Aliaksandr Valialkin
df5b73ed0d docs: replace speed up with more clear accelerate wording 2024-03-12 02:54:46 +02:00
Aliaksandr Valialkin
d1d2771bee lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-12 02:43:16 +02:00
nemobis
1ed6df7901 docs: fix typo in stalenes (#5950)
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-11 19:50:31 +01:00
Aliaksandr Valialkin
d46d87a9e0 lib/storage: move the conversion of tag filters to composite tag filters into indexSearch.searchMetricIDsInternal
This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055

This is a follow-up for 2d31fd7855
2024-03-11 20:40:28 +02:00
Github Actions
869755b77d Automatic update operator docs from VictoriaMetrics/operator@9b1a6e6 (#5946) 2024-03-10 20:13:17 +01:00
Aliaksandr Valialkin
2d31fd7855 lib/storage: use composite indexes (metricName, label=value) when searching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb
This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-10 12:57:34 +02:00
Artem Navoiev
ef2b8d1f17 docs:vmbackup fix typo sped -> speed
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-09 20:36:03 -03:00
hagen1778
cb1e618a16 app/vmauth: properly initialize URLPrefix in tests
It is assumed that URLPrefix.busOriginal will be initialized
durin Unmarshal of the config. But in tests we set fields manually,
so this field never get initialized properly.

Fixes the error `panic: runtime error: integer divide by zero`
at `vmauth.getLeastLoadedBackendURL`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-08 21:10:11 +01:00
hagen1778
0b7ce70df4 app/vmctl: support TLS configuration for VictoriaMetrics destination
VictoriaMetrics destination is specified via `--vm-*` cmd-line flags
and is used in opentsdb, influx, prometheus, remote-read modes.

updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5426

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-08 20:47:36 +01:00
hagen1778
83a8c24281 app/vmctl: follow-up b9f7c3169a
* fix typos in flags description
* move the change to #tip section in changelog

b9f7c3169a
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-08 20:04:02 +01:00
Khushi Jain
b9f7c3169a app/vmctl : Provide TLS config options for vm native protocol (#5824)
Co-authored-by: Khushi Jain <khushi.jain@nokia.com>
2024-03-08 19:52:55 +01:00
Zakhar Bessarab
25eeb2b16c docs: fix typo in flags description (#5942)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-03-08 11:22:51 +01:00
Github Actions
90c7c67793 Automatic update operator docs from VictoriaMetrics/operator@9caa896 (#5941) 2024-03-08 10:52:10 +04:00
Aliaksandr Valialkin
98b31b7f7c docs/vmauth.md: update -help output after e08b91baafc95da090f75e9c29a27d8f62a2b76e 2024-03-07 01:37:39 +02:00
Aliaksandr Valialkin
0d8bec9c6c docs/CHANGELOG.md: typo fixes 2024-03-07 01:35:34 +02:00
Aliaksandr Valialkin
cb259116b4 lib/promauth: set the Host header to tlsServerName if itsn't empty
If tlsServerName isn't empty, then it is likely the https request is sent to IP instead of hostname.
In this case the request will fail, since Go automatically sets the Host header to the IP instead
of the desired hostname at tlsServerName. So set the Host header to tlsServerName if itsn't empty.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5802
2024-03-07 01:22:01 +02:00
Aliaksandr Valialkin
c0a93cf183 docs/vmauth.md: typo fixes after 7b2b980181 2024-03-07 01:08:33 +02:00
Aliaksandr Valialkin
7b2b980181 app/vmauth: allow discovering backend ips behind shared hostname and spreading load among the discovered ips
This is done with the `discover_backend_ips` option at `user` and `url_map` level.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5707
2024-03-07 01:02:16 +02:00
Aliaksandr Valialkin
76ef84fcae app/vmauth: add src_headers option at url_map, which allows routing incoming requests to different backends depending on request headers 2024-03-06 21:56:32 +02:00
Aliaksandr Valialkin
d8688c9e82 vendor: run make vendor-update 2024-03-06 21:24:42 +02:00
Aliaksandr Valialkin
8efe12d66e app/vmauth: simplify configuration for src_query_args
Use the shorter form:

src_query_args:
- arg1=value1
- arg2=value2

instead of

src_query_args:
- name: arg1
  value: value2
- name: arg2
  value: value2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5878
2024-03-06 21:19:45 +02:00
Aliaksandr Valialkin
97dd7e26ad deployment/docker: update Go builder from Go1.21.7 to Go1.22.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.22.1+label%3ACherryPickApproved
2024-03-06 21:04:11 +02:00
Aliaksandr Valialkin
b2efacb624 docs/vmauth.md: mention that request query args can used for routing decisions
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5878
2024-03-06 20:58:29 +02:00
Aliaksandr Valialkin
61d1af8050 app/vmauth: add ability to route requests based on HTTP query args via src_query_args option
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5878
2024-03-06 20:52:25 +02:00
Aliaksandr Valialkin
9c1331a38a app/vmauth: small code cleanup for working with auth tokens 2024-03-06 20:05:59 +02:00
Aliaksandr Valialkin
5582a24ecf lib/streamaggr: add tests for keep_metric_names and drop_input_labels options 2024-03-06 18:34:04 +02:00
Aliaksandr Valialkin
96f913c83e app/vmauth: use slices.Contains() instead of hasInt() 2024-03-06 17:35:55 +02:00
Yury Molodov
76a6f806ae vmui: configure npm cache path to resolve EACCES issues on macOS. (#5928) 2024-03-06 14:06:41 +02:00
Github Actions
5b42f15ccc Automatic update operator docs from VictoriaMetrics/operator@403a78a (#5932) 2024-03-06 12:55:35 +01:00
Aliaksandr Valialkin
b4b38f782c app/vmagent/remotewrite: clarify the reason behind the default value for -remoteWrite.queues in the same way as the reason for -maxConcurrentInserts is defined at 73f5fb0f0c 2024-03-06 13:43:08 +02:00
Aliaksandr Valialkin
b33b620af6 app/vmselect/prometheus: do not drop match[] filters if -search.ignoreExtraFiltersAtLabelsAPI flag is set
The `match[]` filter is mandatory at /api/v1/series, so it mustn't be dropped here.

There is no sense in dropping `match[]` filter together with `extra_label` and `extra_filters[]`
at /api/v1/labels and /api/v1/label/.../values if -search.ignoreExtraFiltersAtLabelsAPI commnad-line flag is set,
since:
- the `match[]` filter triggers slow path at these APIs;
- the `extra_label` and `extra_filters[]` filters narrow down the number of matched time series,
  so they improve performance comparing to the case when only `match[]` filter is left,
  while `extra_label` and `extra_filters[]` filters are dropped.

This is a follow-up for 0b7a23a91d
2024-03-06 13:31:51 +02:00
Daria Karavaieva
e036433b8b redirect alias (#5934) 2024-03-06 11:28:31 +01:00
Yury Molodov
3971ce0625 vmui: improve tracing styles (#5926)
Improved trace display for better visual separation of branches:
* Increased left padding for each element
* Added padding for the last element in the branch
2024-03-06 10:36:13 +01:00
hagen1778
73f5fb0f0c lib/writeconcurrencylimiter: mention dependency on CPU cores for -maxConcurrentInserts flag
The change also removes misleading `default` value from README for `maxConcurrentInserts`
cmd-line flag.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-05 18:55:38 +01:00
Github Actions
c2ff1cfd30 Automatic update operator docs from VictoriaMetrics/operator@f028fdf (#5929) 2024-03-05 17:18:23 +01:00
hagen1778
f781c42ea4 dashboards: add more context to cluster dashboard panels
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-05 15:00:49 +01:00
hagen1778
1c6230c977 docs: clarify deduplication is needed in multi-retention setup
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-05 08:02:48 +01:00
Aliaksandr Valialkin
da611ad628 app/{vmagent,vminsert}: add -streamAggr.dropInputSamples command-line flag for dropping the specified labels from input samples before deduplication and streaming aggregation 2024-03-05 02:15:01 +02:00
Aliaksandr Valialkin
ed523b5bbc app/{vminsert,vmagent}: allow using -streamAggr.dedupInterval without -streamAggr.config
This allows performing online de-duplication of incoming samples
2024-03-05 00:45:30 +02:00
Aliaksandr Valialkin
22d63ac7cd lib/streamaggr: do not reset aggregation state after the aggregation took longer than the configured interval
It is better from user PoV preserving this state until the next flush
2024-03-04 20:03:06 +02:00
Aliaksandr Valialkin
32653db7d5 lib/streamaggr: add missing "s" suffix in the warning message when the de-duplication or aggregation couldnt be finished in a timely manner 2024-03-04 19:37:58 +02:00
Aliaksandr Valialkin
6319d029a8 lib/streamaggr: benchmark only flush routines in BenchmarkDedupAggrFlushSerial and BenchmarkAggregatorsFlushSerial 2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin
074abd5bee Revert "lib/streamaggr: do not flush dedup shards in parallel"
This reverts commit eb40395a1c.

Reason for revert: it has been appeared that the performance gain on multiple CPU cores
wasn't visible because the benchmark was generating incorrect pushSample.key.

See a207e0bf687d65f5198207477248d70c69284296
2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin
e70177c5fb lib/streamaggr: properly generate pushSample.key in benchmarks 2024-03-04 19:12:27 +02:00
Aliaksandr Valialkin
b232968bb4 lib/streamaggr: reduce the number of pointers at "total" aggregation state
This should reduce load on GC when scanning heap objects.
2024-03-04 19:12:27 +02:00
Aliaksandr Valialkin
d42667fc41 lib/streamaggr: use multiple job label values in BenchmarkAggregatorsPush instead of single value
This should make the benchmark closer to production cases
2024-03-04 19:12:26 +02:00
Aliaksandr Valialkin
f5bbffd45f lib/streamaggr: use multiple job labels in BenchmarkAggregatorsPush 2024-03-04 19:12:26 +02:00
Github Actions
a1e9af3abe Automatic update operator docs from VictoriaMetrics/operator@c8ff654 (#5918) 2024-03-04 17:09:14 +01:00
Aliaksandr Valialkin
eb40395a1c lib/streamaggr: do not flush dedup shards in parallel
This significantly increases CPU usage on systems with many CPU cores, while doesn't reduce flush latency too much
2024-03-04 17:00:20 +02:00
Aliaksandr Valialkin
946814afee lib/streamaggr: reduce memory allocations when registering new series in deduplication and aggregation structs 2024-03-04 17:00:19 +02:00
Aliaksandr Valialkin
925f60841f lib/streamaggr: make aggregate.runFlusher() more roubst and clear 2024-03-04 17:00:19 +02:00
Aliaksandr Valialkin
aa5e7e268c lib/streamaggr: properly drop samples on the first incomplete interval
Previously samples were dropped on the first incomplete interval and the next complete interval.
Also make sure that the de-duplication is performed just before flushing the aggregate state.
This should help the case then dedup_interval = interval.
2024-03-04 17:00:18 +02:00
hagen1778
0ab1069363 dashboards: update links in various panels
* use docs.victoriametrics.com instead of github docs
* add links to common terms used in VictoriaMetrics

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-04 15:43:31 +01:00
Aliaksandr Valialkin
86494518da lib/streamaggr: explicitly call resetSeries after flushSeries
This makes the code less fragile
2024-03-04 06:01:18 +02:00
Aliaksandr Valialkin
ac3cf3f357 lib/streamaggr: enable time alignment for aggregate flushed to multiples of interval
For example, if `interval: 1m`, then data flush occurs at the end of every minute,
while `interval: 1h` leads to data flush at the end of every hour.

Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.
2024-03-04 05:42:58 +02:00
Aliaksandr Valialkin
2b8253185b docs/stream-aggregation.md: add troubleshooting section with solutions for common problems in streaming aggregation 2024-03-04 03:04:46 +02:00
Aliaksandr Valialkin
138a4d1c2b lib/streamaggr: ignore the first sample in new time series during staleness_interval seconds after the stream aggregation start for total and increase outputs 2024-03-04 01:49:26 +02:00
Aliaksandr Valialkin
0422ae01ba lib/streamaggr: flush dedup state and aggregation state in parallel on all the available CPU cores
This should reduce the time needed for aggregation state flush on systems with many CPU cores
2024-03-04 01:21:50 +02:00
Aliaksandr Valialkin
3c06b3af92 lib/streamaggr: add a benchmark for flushing dedup state 2024-03-04 01:16:30 +02:00
Aliaksandr Valialkin
9648c88b71 lib/streamaggr: add a benchmark for measuring the performance of aggregator.flush 2024-03-04 00:45:48 +02:00
Aliaksandr Valialkin
54a1c506e3 lib/streamaggr: add a benchmark for de-duplicating of 1M samples 2024-03-04 00:26:59 +02:00
Aliaksandr Valialkin
614d34e539 lib/prompbmarshal: use clear() instead of a loop for clearing tss inside ResetTimeSeries() 2024-03-03 23:40:34 +02:00
Aliaksandr Valialkin
4e65636b44 lib/promutils: optimize LabelsCompressor.Decompress by using a specialized labelsMap struct instead of sync.Map
The labelsMap struct employs the fact that label indexes are condensed around 0,
so it stores the referred labels in a slice instead of map and uses slice index as label key.
This allows increasing the LabelsCompressor.Decompress performance by up to 3x.
This also reduces the latency of data flush in stream aggregation.
2024-03-03 23:21:25 +02:00
Aliaksandr Valialkin
643c51795c docs/CHANGELOG.md: typo fix 2024-03-02 04:52:34 +02:00
Aliaksandr Valialkin
97e02f2633 docs/stream-aggregation.md: typo fixes 2024-03-02 04:35:25 +02:00
Aliaksandr Valialkin
89ace61436 docs/stream-aggregation.md: remove superflouous output_relabel_configs from the config example for histogram aggregation 2024-03-02 03:36:00 +02:00
Aliaksandr Valialkin
28a9e92b5e lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
  for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
  deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
  metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
  per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
  This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
  - vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
  - vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
  - vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
  - vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
  - vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
  - vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
  - vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
    which took longer than the configured interval
  - vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
    which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md

The memory usage reduction increases CPU usage during stream aggregation by up to 30%.

This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 02:42:50 +02:00
Aliaksandr Valialkin
eb8e95516f lib/streamaggr: allow one second aggregation interval 2024-03-01 21:33:16 +02:00
Aliaksandr Valialkin
18db573b10 app/vminsert/common: push many time series at once into stream aggregation
This should reduce overhead on streamaggr.Aggregators.Push call
2024-03-01 21:33:16 +02:00
Aliaksandr Valialkin
cf2e80a869 lib/promrelabel: use clear() function inside CleanLabels() 2024-03-01 21:33:15 +02:00
hagen1778
69ab55b6f7 docs: mention docs link for VictoriaLogs docker env
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-01 07:18:20 +01:00
Aliaksandr Valialkin
5b33da5e19 docs: update -help output after recent changes in VictoriaMetrics components 2024-03-01 05:31:05 +02:00
Aliaksandr Valialkin
c1a5f75bd3 docs/CHANGELOG.md: typo fix 2024-03-01 04:41:10 +02:00
Aliaksandr Valialkin
44b721f201 docs: bump the latest VictoriaMetrics release from v1.98.0 to v1.99.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.99.0
2024-03-01 04:23:49 +02:00
Aliaksandr Valialkin
a6cba91fd6 docs/LTS-releases.md: update latest LTS release to v1.97.3 and v1.93.13
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.3
and https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.13
2024-03-01 04:20:43 +02:00
Aliaksandr Valialkin
a7aa119f35 deployment: update VictoriaLogs docker image from v0.4.2-victorialogs to v0.5.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.5.0-victorialogs
2024-03-01 04:18:40 +02:00
Aliaksandr Valialkin
50ead3d32f deployment: update VictoriaMetrics docker image from v1.98.0 to v1.99.0 2024-03-01 04:15:11 +02:00
Aliaksandr Valialkin
9cd4b0537a docs/CHANGELOG.md: cut v1.99.0 release 2024-03-01 03:33:30 +02:00
Aliaksandr Valialkin
5fe3f23aee docs/CHANGELOG.md: document v1.97.3 LTS release 2024-03-01 03:31:12 +02:00
Aliaksandr Valialkin
47d1ea1d3a docs: consistently use https://docs.victoriametrics.com/lts-releases/ link for LTS releases 2024-03-01 02:51:41 +02:00
Aliaksandr Valialkin
db2320ee83 docs/CHANGELOG.md: document v1.93.13 LTS release
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.13
2024-03-01 02:40:37 +02:00
Aliaksandr Valialkin
489eb88169 docs/VictoriaLogs/CHANGELOG.md: typo fix: bufix -> bugfix 2024-03-01 02:40:23 +02:00
Aliaksandr Valialkin
c8c2c5f8e5 lib/fs: fix GOOS=windows build after f8baf29b6e 2024-03-01 01:46:29 +02:00
Aliaksandr Valialkin
d62b14685a docs/VictoriaLogs/CHANGELOG.md: cut v0.5.0-victorialogs 2024-03-01 01:35:04 +02:00
Aliaksandr Valialkin
732e1427f9 app/vlselect/vmui: run make vmui-logs-update after c51031dd70
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5799
2024-03-01 01:33:11 +02:00
Yury Molodov
c51031dd70 vmui: add field for log entries limit (#5799)
* vmui: add field for log entries limit (#5674)

* vmui: refactor useFetchLogs

* vmui: fix log query encoding

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-01 01:30:32 +02:00
Aliaksandr Valialkin
a304372d88 vendor: run make vendor-update 2024-03-01 00:55:47 +02:00
Aliaksandr Valialkin
9ea69622a0 app/{vmselect,vlselect}/vmui: run make vmui-update vmui-logs-update after e130f29659
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5862
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5152
2024-03-01 00:50:42 +02:00
Yury Molodov
e130f29659 vmui: add gap display option for charts #5152 (#5862) 2024-03-01 00:42:50 +02:00
Aliaksandr Valialkin
5aa3dfbd20 lib/protoparser/opentelemetry/firehose: verify that the full response is parsed properly in ProcessRequestBody
This is a follow-up for bf9cb84575
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5899
2024-03-01 00:39:10 +02:00
Andrii Chubatiuk
bf9cb84575 opentelemetry: fix firehose message parsing (#5899)
Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan>
2024-03-01 00:23:54 +02:00
Aliaksandr Valialkin
fdf0cc9f25 docs: update docs after 0b7a23a91d 2024-03-01 00:17:42 +02:00
Aliaksandr Valialkin
0b7a23a91d app/vmselect/prometheus: ignore match[] additionally to extra_filters[] and extra_label if -search.ignoreExtraFiltersAtLabelsAPI command-line flag is set
The match[] at /api/v1/labels and /api/v1/label/.../values also may lead to slow requests and
high resource usage if it matches big number of time series. So it must be igrnored if -search.ignoreExtraFiltersAtLabelsAPI
command-line flag is set.

This is a follow-up for fab02faa3f
2024-02-29 23:40:00 +02:00
Aliaksandr Valialkin
cfe774ab50 app/{vmagent,vminsert}: follow-up for 434a5803e7
Document the /opentelemetry/v1/metrics endpoint instead of /opentelemetry/api/v1/push,
since the /v1/metrics suffix is hardcoded in OpenTelemetry protocol specification.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5871
2024-02-29 18:02:50 +02:00
Nikolay
434a5803e7 app/{vmagent,vminsert}: adds /v1/metrics suffix for opentelemetry route path (#5871)
* app/{vmagent,vminsert}: adds /v1/metrics suffix for opentelemetry route path
it must fix compatibility with opentemetry-collector [spec](https://opentelemetry.io/docs/specs/otlp/\#otlphttp-request)
this suffix is hard-coded and cannot be changed with collector configuration

* Apply suggestions from code review

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-29 17:58:49 +02:00
hagen1778
812d17f588 docs: re-fresh vmctl docs
* add recommendation about network bandiwdth between vmctl, source and destination
* use more relevant time series selector in examples - see https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5835

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-29 16:58:02 +01:00
Github Actions
ca8a4cfa0d Automatic update operator docs from VictoriaMetrics/operator@7e8f811 (#5902) 2024-02-29 17:53:01 +02:00
Aliaksandr Valialkin
e5c69262e2 app/vmselect/promql: use unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:50:07 +02:00
Aliaksandr Valialkin
146fccc22d app/vmselect/netstorage: usae unsafe.SliceData instead of deprecated reflect.SliceHeader 2024-02-29 17:36:28 +02:00
Aliaksandr Valialkin
6a8dc74ee7 lib/mergeset: use unsafe.Slice and unsafe.String instead of deprecated reflect.SliceHeader with unsafe conversion from slice header to string header 2024-02-29 17:29:33 +02:00
Aliaksandr Valialkin
38e0397ebd lib/bytesutil: use unsafe.String instead of unsafe conversion of slice header to string header 2024-02-29 17:27:51 +02:00
Aliaksandr Valialkin
e959f54351 lib/fs: properly handle the case when data=nil is passed to mUnmap 2024-02-29 17:26:07 +02:00
Aliaksandr Valialkin
c75bfd5b07 lib/storage: use unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:24:34 +02:00
Aliaksandr Valialkin
bb48d416fc lib/protoparser/csvimport: unse unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:19:57 +02:00
Aliaksandr Valialkin
f8baf29b6e lib/fs: use unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:18:33 +02:00
Aliaksandr Valialkin
7a04f99c72 lib/fastnum: use unsafe.Slice() instead of deprecated reflect.SliceHeader 2024-02-29 17:17:13 +02:00
Aliaksandr Valialkin
a3cf3d7de1 lib/bytesutil: make BenchmarkToUnsafeString and BenchmarkToUnsafeBytes more reliable
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5880
2024-02-29 17:11:03 +02:00
helen
8266b77d0e Optimize TouUnsafeBytes to make it leaner, more standards-compliant and (#5880)
slightly faster.
2024-02-29 17:10:10 +02:00
Alexander Marshalov
b9b4e859be fixed broken link in docs to google pub sub (#5901) 2024-02-29 15:14:17 +01:00
Hui Wang
dd7dd0b1db metricsql: fix label_join() when dst_label is equal to one of the `… (#5886)
* metricsql: fix label_join() when `dst_label` is equal to one of the `src_label`

* Update app/vmselect/promql/transform.go

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-29 16:00:22 +02:00
Aliaksandr Valialkin
d62055ad33 docs/VictoriaLogs/CHANGELOG.md: document the bugfix at a5795f533d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5897
2024-02-29 15:34:41 +02:00
XLONG96
a5795f533d lib/logstorage: avoid panic when parsing regex with stream filter (#5897) 2024-02-29 15:31:54 +02:00
Aliaksandr Valialkin
f2266f40b7 docs/CHANGELOG.md: clarify the description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5859
This is a follow-up for 3c74aa6b3d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5888
2024-02-29 15:28:33 +02:00
Aliaksandr Valialkin
e22836c636 app/{vmalert,vmctl}: consistently use http.NewRequestWithContext() instead of http.NewRequest() + req.WithContext() 2024-02-29 15:25:43 +02:00
Aliaksandr Valialkin
3c74aa6b3d app/vmbackupmanager: follow-up after f2885a5c41dfaeecfecba0aceb8ae65b15e15d1c
- Document the change at docs/CHANGELOG.md
- Give more clear name for maxHealthChecksRetries constant
- Improve logging when storage isn't reachable
- Use http.NewRequestWithContext() instead of setting up a request without context
  and then creating a new request with context vi req.WithContext() call.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5859
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/735
2024-02-29 15:17:50 +02:00
Hui Wang
b74231a642 caseStudies: add NetEaseCloudMusic post (#5891) 2024-02-29 14:57:36 +02:00
Artem Navoiev
3328b09ae8 docs: update statistic per tenant, add use cases section (#5892)
* docs: update statistic per tenant, add use cases section

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* prettify

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* Update docs/PerTenantStatistic.md

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-29 14:55:30 +02:00
Aliaksandr Valialkin
c6221c9046 docs/anomaly-detection/guides: bump alertmanager version from v0.25.0 to v0.27.0 after 3723c809a1 2024-02-29 14:51:28 +02:00
Aliaksandr Valialkin
04d13f6149 app/{vminsert,vmagent}: follow-up after 67a55b89a4
- Document the ability to read OpenTelemetry data from Amazon Firehose at docs/CHANGELOG.md

- Simplify parsing Firehose data. There is no need in trying to optimize the parsing with fastjson
  and byte slice tricks, since OpenTelemetry protocol is really slooow because of over-engineering.
  It is better to write clear code for better maintanability in the future.

- Move Firehose parser from /lib/protoparser/firehose to lib/protoparser/opentelemetry/firehose,
  since it is used only by opentelemetry parser.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5893
2024-02-29 14:38:23 +02:00
Denys Holius
3723c809a1 docker-compose: bump alertmanager version from v0.25.0 to the latest v0.27.0 for VictroiaMetrics Single/Cluster/vmanomaly (#5890)
see https://github.com/prometheus/alertmanager/releases/tag/v0.27.0
2024-02-29 13:21:02 +01:00
Andrii Chubatiuk
67a55b89a4 {vmagent,vminsert}: added firehose http destination opentelemetry data ingestion support (#5893)
Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-29 14:03:24 +02:00
Github Actions
1217c1f2da Automatic update operator docs from VictoriaMetrics/operator@9f1c910 (#5895) 2024-02-29 13:56:13 +02:00
Aliaksandr Valialkin
62498a1e68 deployment/docker: downgrade Go builder from 1.22.0 to 1.21.7
Go1.22.0 contains the bug https://github.com/golang/go/issues/65705 ,
which prevents vmagent from normal operation.
2024-02-29 13:52:26 +02:00
Aliaksandr Valialkin
fab02faa3f app/vmselect/prometheus: add -search.ignoreExtraFiltersAtLabelsAPI command-line flag for ignoring extra_filters and extra_label args at /api/v1/labels, /api/v1/label/.../values and /api/v1/series 2024-02-29 12:59:11 +02:00
Aliaksandr Valialkin
6f203ebc9f lib/streamaggr: make the BenchmarkAggregatorsPushByJobAvg closer to production case with long list of labels per sample 2024-02-29 02:39:16 +02:00
Artem Navoiev
44e15c866a dashboards: update statistic per tenant dashbaord. Change to timeseries panel, add churn rate over 24h and query duration, add billing section
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-28 16:04:20 +01:00
Hui Wang
8c33ba537a chore: add actual request size in error message (#5889) 2024-02-28 22:33:08 +08:00
lzfhust
935a42d08c add xiaohongshu in Case studies (#5885) 2024-02-28 11:12:39 +02:00
Aliaksandr Valialkin
0c88ba326f docs/Release-Guide.md: https://github.com/VictoriaMetrics/victoriametrics-lts-rpm should contain the latest release
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5811
2024-02-27 13:11:46 +02:00
Roman Khavronenko
f0b4dd7426 deployment: create a separate env for VictoriaLogs (#5857)
* deployment: create a separate env for VictoriaLogs

The new environment consists of the following components:
* VictoriaLogs
* fluentbit for collecting logs and sending to VictoriaLogs
* VictoriaMetrics for scraping and storing metrics from fluentbit and VictoriaLogs
* Grafana with VictoriaLogs datasource for monitoring

-----------------

The motivation for creating a separate environment is to simplify existing environments
and make it easier to update or modify them in future.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-26 10:33:04 +01:00
Aliaksandr Valialkin
aae71832e5 vendor: retain v0.46.0 for github.com/prometheus/common , since v0.48.0 leads to build error
The error is:

vendor/github.com/prometheus/client_golang/prometheus/testutil/promlint/promlint.go:71:38: undefined: expfmt.FmtText
2024-02-25 02:20:24 +02:00
Aliaksandr Valialkin
35f592a02c app/vmselect/promql: properly handle args in count_values_over_time() function
Prevsiously they were swapped - the first arg should be the label name and the second arg should be label filters

This is a follow-up for e389b7b959e8144fdff5075bf7a5a39b2b0c6dd3

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5847
2024-02-25 01:48:18 +02:00
Aliaksandr Valialkin
4bdb3d9fd9 vendor: run make vendor-update 2024-02-24 03:22:39 +02:00
Aliaksandr Valialkin
6697da73e5 app: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 02:44:24 +02:00
Aliaksandr Valialkin
7e1dd8ab9d lib: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 02:07:53 +02:00
Aliaksandr Valialkin
d5ca67e667 lib/backup/actions: expose vm_backups_downloaded_bytes_total metric in order to be consistent with vm_backups_uploaded_bytes_total metric 2024-02-24 01:14:50 +02:00
Aliaksandr Valialkin
906a35bdbb lib/backup/actions: update vm_backups_uploaded_bytes_total metric along the file upload instead of after the file upload
This solves two issues:

1. The vm_backups_uploaded_bytes_total metric will grow more smoothly
2. This prevents from int overflow at metrics.Counter.Add() when uploading files bigger than 2GiB
2024-02-24 01:07:20 +02:00
Aliaksandr Valialkin
ece86cd314 lib/backup/actions: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 01:02:21 +02:00
Aliaksandr Valialkin
55f1f24e62 lib/storage: replace the remaining atomic.* functions with atomic.* types for the sake of consistency
See ea9e2b19a5
2024-02-24 00:53:30 +02:00
Aliaksandr Valialkin
b3d9d36fb3 lib/storage: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-24 00:15:26 +02:00
Aliaksandr Valialkin
4617dc8bbe lib/logstorage: consistently use atomic.* types instead of atomic.* functions on regular types
See ea9e2b19a5
2024-02-23 23:46:13 +02:00
Aliaksandr Valialkin
f81b480905 lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-23 23:29:35 +02:00
Aliaksandr Valialkin
275335c181 lib/logstorage: consistently use atomic.* type for refCount and mustDrop fields in datadb and storage structs in the same way as it is used in lib/storage
See ea9e2b19a5 and a204fd69f1
2024-02-23 23:04:42 +02:00
Aliaksandr Valialkin
5c89150fc9 lib/mergeset: consistently use atomic.* type for refCount and mustDrop fields in table struct in the same way as it is used in lib/storage
See ea9e2b19a5 and a204fd69f1
2024-02-23 22:59:23 +02:00
Aliaksandr Valialkin
a204fd69f1 lib/storage: consistently use atomic.* type for refCount and mustDrop fields in indexDB, table and partition structs
See ea9e2b19a5
2024-02-23 22:54:59 +02:00
Aliaksandr Valialkin
0f1ea36dc8 lib/storage: convert dedupsDuringMerge from uint64 to atomic.Uint64
This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions
on int and bool types.

See ea9e2b19a5
2024-02-23 22:52:00 +02:00
Aliaksandr Valialkin
ea9e2b19a5 lib/{storage,mergeset}: properly fix 'unaligned 64-bit atomic operation' panic on 32-bit architectures
The issue has been introduced in bace9a2501
The improper fix was in the d4c0615dcd ,
since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field
by 4-byte boundary inside partition struct.

The proper fix is to use atomic.Int64 field - this guarantees that the access to this field
won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860
and https://github.com/golang/go/issues/19057
2024-02-23 22:27:06 +02:00
Aliaksandr Valialkin
cf94522389 lib/httpserver: return back the default value for -http.connTimeout to 2 minutes
It has been appeared that there are VictoriaMetrics users, who rely on the fact that
VictoriaMetrics components were closing incoming connections to -httpListenAddr every 2 minutes
by default. So let's return back this value by default in order to fix the breaking change
made at d8c1db7953 .

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1961891450 .
2024-02-23 22:03:37 +02:00
Aliaksandr Valialkin
ea858b6e72 docs/Troubleshooting.md: add Too much disk space used chapter` 2024-02-23 22:03:36 +02:00
hagen1778
6390b54c4d docs: mention missing change 521f9ffb430edf6aea7720a966b5b079cf15e34e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-23 19:55:06 +01:00
hagen1778
c8d1d2ab72 lib/storage: cleanup after d4c0615dcd
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-23 18:53:55 +01:00
Dmytro Kozlov
d4c0615dcd lib/storage: fix aligning (#5860) 2024-02-23 16:37:21 +01:00
Roman Khavronenko
840ab60111 deployment: add topology visualizations (#5858)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-23 12:20:32 +01:00
Aliaksandr Valialkin
340638d4b0 app/vmstorage: cleanup after 9bad52b687 2024-02-23 04:55:17 +02:00
Aliaksandr Valialkin
9bad52b687 app/vmstorage: deprecate -snapshotCreateTimeout command-line flag
Creating snapshot shouldn't time out under normal conditions.
The timeout was related to the bug, which has been fixed in 6460475e3b .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2024-02-23 04:49:23 +02:00
Aliaksandr Valialkin
f79944532b lib/storage: do not drop (date, metricID) entries for the date older than 2 days if samples are ingested at this date
Previously the (date, metricID) entries for dates older than the last 2 days were removed.
This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling).

The issue has been introduced in 431aa16c8d
2024-02-23 04:06:19 +02:00
Aliaksandr Valialkin
f46eaf92eb app/vmselect: add -search.maxLabelsAPIDuration and -search.maxLabelsAPISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values
This commit returns back limits for these endpoints, which have been removed at 5d66ee88bd ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-02-23 02:57:16 +02:00
Aliaksandr Valialkin
8995b04886 app/{vmselect,vlselect}/vmui: run make vmui-update vmui-logs-update after recent changes to app/vmui 2024-02-23 01:40:48 +02:00
Yury Molodov
abf82c3657 vmui: add a time picker to the "Logs Explorer" page (#5808)
* vmui: add a time picker to the "Logs Explorer" page #5673

* Update app/vmui/packages/vmui/src/pages/ExploreLogs/hooks/useFetchLogs.ts

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-23 01:38:30 +02:00
Yury Molodov
9e44870d5c vmui: fix display Popper.tsx (#5842)
* vmui: fix display Popper.tsx

* vmui/docs: fix display Popper.tsx

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-23 01:32:48 +02:00
Aliaksandr Valialkin
348eec39ba docs/Single-server-VictoriaMetrics.md: sync with docs/README.md after 79b57f625c 2024-02-23 01:09:53 +02:00
Aliaksandr Valialkin
3997319f45 docs/CHANGELOG.md: document d68bb658ce
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5833
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5834
2024-02-23 01:03:48 +02:00
Anton L
d68bb658ce #5833 Fix Deadlock when using shardByURL of VMAgent (#5834) 2024-02-23 00:59:47 +02:00
Aliaksandr Valialkin
df7d3c55ed lib/promutils: hide the math.Round() logic inside ParseTimeMsec() function
This should prevent from bugs similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5801 in the future

This is a follow-up for ce3ec3ff2e
2024-02-23 00:55:32 +02:00
Aliaksandr Valialkin
5934002b57 lib/mergeset: run go fmt after bace9a2501 2024-02-23 00:53:28 +02:00
Aliaksandr Valialkin
bace9a2501 lib/{mergeset,storage}: convert bufferred items to searchable parts more optimally
Do not convert shard items to part when a shard becomes full. Instead, collect multiple
full shards and then convert them to a searchable part at once. This reduces
the number of searchable parts, which, in turn, should increase query performance,
since queries need to scan smaller number of parts.
2024-02-23 00:16:34 +02:00
hagen1778
3bec0549dc docs: re-classify change of default http timeout to bugfix
The change introduced in d8c1db7953 (diff-2bfab3db5cc1baf4c6d3ff6b19901926e3bdf4411ec685dac973e5fcff1c723b)
was backported to v1.97.2. Therefore, it is a `bugfix` and should be explicitly
mentioned in the changelog of v1.97.2

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-22 21:57:39 +01:00
Nikolay
07855de142 app/vmselect: change export/csv timestamp format for rfc3339 to respect milliseconds (#5853)
* app/vmselect: adds milliseconds to the csv export response for rfc3339
* milliseconds is a standard prescion for VictoriaMetrics query request responses
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5837

* app/victoria-metrics: adds tests for csv export/import
follow-up after 3541a8d0cf96dd4f8563624c4aab6816615d0756


---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-02-22 20:31:22 +01:00
Aliaksandr Valialkin
e8b3045062 lib/storage: handle common case when the number of rows passed to flushRowsToInmemoryParts() doesnt exceed maxRawRowsPerShard 2024-02-22 20:44:11 +02:00
Aliaksandr Valialkin
73f0a805e2 lib/{storage,mergeset}: convert beffered items into searchable in-memory parts exactly once per the given flush interval
Previously the interval between item addition and its conversion to searchable in-memory part
could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp()
to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking
the time when buffered items must be converted to searchable in-memory parts, since time.Now()
calls aren't located in hot paths.

Increase the flush interval for converting buffered samples to searchable in-memory parts
from one second to two seconds. This should reduce the number of blocks, which are needed
to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage.

While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal
data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems
with big amounts of RAM under high data ingestion rate.
2024-02-22 20:21:14 +02:00
Aliaksandr Valialkin
463bc27312 lib/storage: avoid superflouos copy of block header data 2024-02-22 20:21:14 +02:00
Fred Navruzov
34aa25d681 - v1.11 doc updates (#5852)
- fix dead links
2024-02-22 18:48:54 +01:00
Dan Dascalescu
79b57f625c docs: CSV RFC3339 format uses server timezone (#5839) 2024-02-22 18:13:41 +01:00
Aliaksandr Valialkin
8d9d7a8a12 app/vmstorage: expose vm_snapshots metric, which shows the current number of snapshots
While at it, refresh docs about snapshots - https://docs.victoriametrics.com/#how-to-work-with-snapshots
2024-02-22 18:32:57 +02:00
Aliaksandr Valialkin
cd34142a14 README.md: sync with docs/Single-server-VictoriaMetrics.md after 5b652bccad 2024-02-22 18:32:11 +02:00
Aliaksandr Valialkin
aec9cd4316 lib/storage: do not pool rawRowsBlock when flushing rawRows to in-memory blocks
The pooled rawRowsBlock objects occupies big amounts of memory between flushes,
and the flushes are relatively rare. So it is better to don't use the pool
and to allocate rawRow blocks on demand. This should reduce the average
memory usage between flushes.
2024-02-22 17:37:48 +02:00
Aliaksandr Valialkin
b7dfe9894c lib/storage: do not keep rawRows buffer across flush() calls
The buffer can be quite big under high ingestion rate (e.g. more than 100MB).
This leads to increased memory usage between buffer flushes.
So it is better to re-create the buffer on every flush in order to reduce memory usage
between buffer flushes.
2024-02-22 17:22:26 +02:00
Aliaksandr Valialkin
fa19daf3bd docs/MetricsQL.md: improve text formatting for better readability 2024-02-22 14:00:47 +02:00
Aliaksandr Valialkin
f7c3dee1c3 app/vmselect/promql: add count_values_over_time() MetricsQL function
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5847
2024-02-22 13:39:29 +02:00
Aliaksandr Valialkin
a6eacfdb11 app/vmselect/promql: move needSilenceIntervalForRollupFunc from eval.go to rollup.go
This should improve maintainability of the code related to rollup functions,
since it is located in rollup.go

While at it, properly return empty results from holt_winters(), rate_over_sum(),
sum2_over_time(), geomean_over_time() and distinct_over_time() when there are no real samples
on the selected lookbehind window. Previously the previous sample value was mistakenly
returned from these functions.
2024-02-22 13:39:28 +02:00
Alexander Marshalov
ce3ec3ff2e [lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5814)
* [lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)

* fixed tests

* fixed test

* Revert "fixed test"

This reverts commit 8a29764806.

* Revert "fixed tests"

This reverts commit 9ce13d1042.

* Revert "[lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)"

This reverts commit a7a04bd4

* [lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-02-22 10:20:54 +01:00
Artem Navoiev
7332431b90 docs: change header from h1 to h2 1.97.2. The markdown requires the proper structure in hierachy of title so h1 can not be a child of h1,h2... only be a separate item, in our structure title is the parent h1
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-21 16:15:41 +01:00
Github Actions
2c98e8712c Automatic update operator docs from VictoriaMetrics/operator@d88157b (#5845) 2024-02-21 16:13:34 +01:00
Github Actions
c35c1b007c Automatic update operator docs from VictoriaMetrics/operator@c393852 (#5844) 2024-02-21 13:28:05 +01:00
Github Actions
0af5f88744 Automatic update operator docs from VictoriaMetrics/operator@4791fd1 (#5843) 2024-02-21 12:52:32 +01:00
hagen1778
7ee131de8b deployment/docker: add comments to components in docker-compose manifests
This should help readers to understand interconnectivity between components.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 18:31:45 +01:00
Dan Dascalescu
17cf031fa1 app/vmselect: simplify wording for too many samples error (#5827) 2024-02-20 16:26:38 +01:00
Roman Khavronenko
bb1279bfc4 vmctl : Provide TLS config options for Open TSDB datasource #5797 (#5832)
Originally implemented here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5797

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: khushijain21 <khushij393@gmail.com>
2024-02-20 16:22:58 +01:00
Daria Karavaieva
4034d081f4 Vmanomaly Quickstart Fix absolute links (#5831)
* links fix

* typo fix
2024-02-20 16:19:18 +01:00
Daria Karavaieva
b2baf7d472 Vmanomaly QuickStart (#5800)
* first edit

* typo 1

* typo 2

* fixes 3

* fixes 4

* fixes 5

* fixes, cross links

* v1.10 config

* models why, self-monitoring fix

* config next steps

* fixes

* minor fix
2024-02-20 15:20:57 +01:00
hagen1778
dc25c30fdc docs: move recent changes to Tip
These changes were mistakenly put to existing release

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 14:58:15 +01:00
igorbernstein
cc5a274e4d deployment/docker: clean up loading of victoriametrics-datasource (#5793)
Currently the docker-compose examples for loading `victoriametrics-datasource` uses 2 environment variables:
-  `GF_ALLOW_LOADING_UNSIGNED_PLUGINS`
- `GF_DEFAULT_APP_MODE`

I believe both of the env vars are trying to achieve the same thing. `GF_DEFAULT_APP_MODE` disables code signing for all plugins and `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` intends to disable code signing for just `victoriametrics-datasource`.
Keeping the scope narrowed to just `victoriametrics-datasource` would be preferable in this case.

Unfortunately `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` is misspelled. According to [grafana docs](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#override-configuration-with-environment-variables), the format is supposed to be `GF_<SectionName>_<KeyName>`. In other words the current env var is missing the section name.

This PR proposes to:
1. fix the typo
2. remove the global disablement of code signing

Alternatively, if you prefer to keep codesigning disabled globally, please remove `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` env var as it confuses things
2024-02-20 14:31:15 +01:00
hagen1778
e2dad3a2ac app/vmalert: consistently sort groups by name and filename on /groups page
This should prevent non-deterministic sorting for groups with identical names.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 13:50:57 +01:00
hagen1778
11b03d9fc8 app/vmalert: follow-up after b60dcbe11f
* support case-insensitive search
* reflect search condition in URL, so link can be sharable
* support filtering on /alerts page
* fix collapseAll/expandAll logic to respect only shown entries
* add changelog

b60dcbe11f
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 13:07:05 +01:00
Victor Amorim dos Santos
b60dcbe11f vmalert: add filter by group or rule name to UI (#5791)
Co-authored-by: Yury Molodov <yurymolodov@gmail.com>
2024-02-20 12:31:41 +01:00
Yury Molodov
524c0a2e07 vmui: update package-lock.json (#5822)
This should address detected security vulnerabilities
2024-02-20 10:03:33 +01:00
Artem Navoiev
5b652bccad docs: mention slack inviter and slack channel (#5817)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-19 17:18:31 +01:00
Aliaksandr Valialkin
7c2c987ff9 docs/VictoriaLogs/CHANGELOG.md: document cafd6f08b3
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5400
2024-02-18 23:17:59 +02:00
Aliaksandr Valialkin
e525b98fbf docs/VictoriaLogs/CHANGELOG.md: document 333bda8702
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5447
2024-02-18 23:13:15 +02:00
Aliaksandr Valialkin
0514091948 app/vlselect: follow-up for 451d2abf50
- Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb.
  See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request.
- Document the change at docs/VictoriaLogs/CHANGELOG.md
- Document the `limit` query arg at docs/VictoriaLogs/querying/README.md
- Make the change less intrusive.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778
2024-02-18 23:05:51 +02:00
Dmytro Kozlov
451d2abf50 Enable the limit query param for the /select/logsql/query (#5778)
* app/vlselect: add limit for logs query

* app/vlselect: CHANGELOG.md

* app/vlselect: stop search process if limit is reached, update logic, remove default limit

* app/vlselect: fix tests

* app/vlselect: fix filter tests

* app/vlselect: fix tests
2024-02-18 22:58:47 +02:00
Aliaksandr Valialkin
c42ddce159 lib/promscrape: add support for enable_compression option in the same way as Prometheus does
Updates https://github.com/prometheus/prometheus/pull/13166
Updates https://github.com/prometheus/prometheus/issues/12319

Do not document enable_compression option at docs/sd_configs.md, since vmagent already supports
more clear disable_compression option - see https://docs.victoriametrics.com/vmagent/#scrape_config-enhancements
2024-02-18 19:40:39 +02:00
Aliaksandr Valialkin
5a092e161c lib/promscrape/discovery/kuma: add support for client_id option
See https://github.com/prometheus/prometheus/pull/13278
2024-02-18 19:19:40 +02:00
Aliaksandr Valialkin
2e30842582 vendor: update github.com/VictoriaMetrics/metricsql from v0.72.1 to v0.73.0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5383
2024-02-18 18:41:41 +02:00
Aliaksandr Valialkin
149d83e596 docs/MetricsQL.md: properly document how MetricsQL selects the lookbehind window
- rate(m) isn't equivalent to rate(m[1i]) when step is smaller than the interval between samples.
- default_rollup(m) isn't equivalent to default_rollup(m[1i]) when step is smaller than the interval between samples.

These changes have been made in v1.85.3 as a part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3483 .
See the corresponding commit - 9fa3f1dc57 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5816
2024-02-18 13:45:21 +02:00
Aliaksandr Valialkin
6a6ea89da5 vendor: update github.com/VictoriaMetrics/metrics from v1.31.0 to v1.32.0 2024-02-18 12:42:22 +02:00
Aliaksandr Valialkin
dfbb6e0826 docs/LTS-releases.md: cosmetic fixes 2024-02-17 18:09:36 +02:00
Aliaksandr Valialkin
d03719e72d docs/CHANGELOG.md: document f8207e33a2 2024-02-17 17:52:53 +02:00
Aliaksandr Valialkin
cee901cdf4 docs/CHANGELOG.md: add missing for in the description of the TLS configuration features for vmctl
This is a follow-up for 6a07cb1bdb and f973711e56
2024-02-17 17:44:22 +02:00
Aliaksandr Valialkin
c68bcddd13 docs/Single-server-VictoriaMetrics.md: enumerate all the VictoriaMetrics components 2024-02-17 17:33:56 +02:00
Aliaksandr Valialkin
3ed7d62627 docs/LTS-releases.md: add a dedicated page describing LTS lines of releases for VictoriaMetrics 2024-02-17 17:33:55 +02:00
Alexander Marshalov
f8207e33a2 lib/httputils: fixed error message for getting zero duration (#5795) (#5812) 2024-02-16 15:24:59 +01:00
hagen1778
f973711e56 app/vmctl: follow-up after 0c293a66ec
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 15:22:44 +01:00
Khushi Jain
0c293a66ec app/vmctl : support TLS config options for remote read mode (#5798) 2024-02-16 15:12:43 +01:00
hagen1778
6a07cb1bdb app/vmctl: follow-up after 7cd1b7d047
* cleanup code
* update docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 15:08:51 +01:00
Khushi Jain
7cd1b7d047 app/vmctl : support TLS config options for InfluxDB datasource (#5783)
* vmctl: TLS flags for influx DB

* added httputils function

* Add changelog and doc

---------

Co-authored-by: Khushi Jain <khushi.jain@nokia.com>
2024-02-16 14:59:18 +01:00
hagen1778
ecccd2a1cc dashboards: add legend details to network panels in cluster dash
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 10:20:38 +01:00
Fred Navruzov
172e196ac9 docs: vmanomaly - updates of v1.10.0 and model type section (#5813)
* - apply v1.10 changes
- chapter on model types (uni/multivariate and rolling)

* - update self-monitoring labels description
- fix typos

* fix duplicated text and link rendering
2024-02-16 11:02:41 +02:00
hagen1778
3170ad3f44 docs: update formatting for usage examples
- Use `sh` format for examples
- Reduce length of progress bars

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-15 14:55:42 +01:00
Aliaksandr Valialkin
6b9bedd0f9 app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition 2024-02-15 14:51:19 +02:00
Aliaksandr Valialkin
eb1505ba14 lib/uint64set: go fmt after c0a9b87f46 2024-02-15 14:50:44 +02:00
Aliaksandr Valialkin
c0a9b87f46 lib/mergeset: optimize Set.AddMulti() a bit for len(items) < 10000
This should improve the search speed for time series matching the given label filters
2024-02-15 14:30:15 +02:00
Aliaksandr Valialkin
cdac04997a lib/uint64set: benchmark AddMulti on small number of items, since this case is the most frequent in lib/storage 2024-02-15 14:28:36 +02:00
Aliaksandr Valialkin
926854b0f3 docs/CHANGELOG.md: document v1.93.12 LTS release 2024-02-14 20:17:44 +02:00
Aliaksandr Valialkin
39cba7e4fa docs/CHANGELOG.md: document v1.97.2 LTS release 2024-02-14 18:50:50 +02:00
Aliaksandr Valialkin
08e6100050 all: update Docker image tag for VictoriaMetrics components from v1.97.1 to v1.98.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.98.0
2024-02-14 17:14:41 +02:00
Aliaksandr Valialkin
baaa88001e lib/promrelabel: store the original labels before returning them them to promutils.PutLabels()
This should reduce memory allocations.

This is a follow-up for b09bd6c42a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2024-02-14 16:09:10 +02:00
Aliaksandr Valialkin
e4ad41b5ff docs/CHANGELOG.md: cut v1.98.0 release 2024-02-14 16:00:50 +02:00
Aliaksandr Valialkin
9c9021fd8b vendor: run make vendor-update 2024-02-14 15:44:03 +02:00
Aliaksandr Valialkin
b09bd6c42a lib/promrelabel: factor out applyInternal code into ApplyDebug and Apply functions
This improves readability and maintanability

Also remove memory allocation from SortLabels()
2024-02-14 14:27:08 +02:00
Aliaksandr Valialkin
b564729d75 lib/promscrape: avoid copying labels when -promscrape.dropOriginalLabels command-line flag is set
This should save some CPU

This regression has been introduced in 487f6380d0
when working on https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2024-02-14 03:25:36 +02:00
Aliaksandr Valialkin
4954eee187 docs/CHANGELOG.md: various typo cleanups 2024-02-14 02:45:17 +02:00
Aliaksandr Valialkin
53643b620a app/vmselect/vmui: run make vmui-update after 1c9f13d6c7 2024-02-14 02:35:55 +02:00
Yury Molodov
1c9f13d6c7 vmui: improve the context for autocomplete #5736 #5737 #5739 (#5804)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-14 00:22:51 +00:00
Aliaksandr Valialkin
d6e22f2888 app/vmselect: add sum_eq_over_time, sum_gt_over_time and sum_le_over_time functions to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4641
2024-02-13 23:40:07 +02:00
Nikolay
88329d84ca app/vmauth: properly release memory during config reload (#5805)
* app/vmauth: properly release memory during config reload
previously metrics package hold a refrence for channels for users concurrent requests.
it case of churn at `name`  field of users configuration, new metric was created. But previous one wasn't deleted.
It prevented full parsed configuration from being garbace collected.

now all config related metrics are bound to corresponding metrics.Set and unregistered during config reload process.

It also must fix an issue with incorrect values for current concurrent user requests

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-13 18:49:17 +00:00
Aliaksandr Valialkin
3dddf700f1 docs: add link to https://docs.victoriametrics.com/scrape_config_examples/ to docs about configuring target scraping at vmagent and single-node VictoriaMetrics 2024-02-13 20:47:43 +02:00
Aliaksandr Valialkin
4281be4c19 docs/vmbackupmanager.md: mention -license command-line flag instead of deprecated -eula 2024-02-13 20:41:26 +02:00
Aliaksandr Valialkin
f120477231 docs/vmauth.md: add Config reload chapter, which explains how to reload -auth.config at vmauth
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1194
2024-02-13 20:29:35 +02:00
Aliaksandr Valialkin
582681ce58 .github/workflows: update actions/cache from v3 to v4
See https://github.com/actions/cache?tab=readme-ov-file#v4
2024-02-13 19:36:21 +02:00
Aliaksandr Valialkin
dc7256b304 docs/enterprise.md: remove the mention of deprecated -eula command-line flag 2024-02-13 18:46:56 +02:00
Aliaksandr Valialkin
2f3091460f vendor: update github.com/VictoriaMetrics/metricsql from v0.70.1 to v0.71.0
This adds an ability to propagate label filters across label_set() and alias() functions.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1827#issuecomment-1654095358
2024-02-13 06:36:59 +02:00
Aliaksandr Valialkin
e963d6c789 app/vmagent/remotewrite: add -remoteWrite.tlsHandshakeTimeout command-line flag for tuning tls handshake timeout to -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
2024-02-13 02:46:33 +02:00
Aliaksandr Valialkin
cafdc7e21a docs/vmauth.md: add missing dot 2024-02-13 01:09:04 +02:00
Aliaksandr Valialkin
062cbb1130 app/vmauth: add support for mTLS-based routing of incoming requests to different backends depending on the subject field in the TLS certificate provided by the user
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1547
2024-02-13 01:03:20 +02:00
Aliaksandr Valialkin
94252e1794 docs/keyConcepts.md: do not duplicate the list of supported data ingestion protocols - just refer to the original list at https://docs.victoriametrics.com/#how-to-import-time-series-data
This should simplify keeping docs in sync
2024-02-12 22:54:49 +02:00
Aliaksandr Valialkin
f0a62de3a9 Makefile: run go mod tidy with -compat=1.22 after 95222b2079 2024-02-12 22:36:23 +02:00
Aliaksandr Valialkin
c5f9d9f0d6 vendor: run make vendor-update 2024-02-12 22:31:30 +02:00
Aliaksandr Valialkin
b92c9a045d docs/vmbackup.md: remove the unneeded -storageDataPath command-line from the example for making server-side copy of the backup 2024-02-12 22:24:34 +02:00
Aliaksandr Valialkin
95222b2079 all: upgrade Go builder from Go1.21.7 to Go1.22.0
See https://go.dev/doc/go1.22
2024-02-12 21:59:51 +02:00
Aliaksandr Valialkin
a49a50701a lib/mergeset: do not panic on too long items passed to Table.AddItems()
Instead, log a sample of these long items once per 5 seconds into error log,
so users could notice and fix the issue with too long labels or too many labels.

Previously this panic could occur in production when ingesting samples with too long labels.
2024-02-12 19:32:18 +02:00
Aliaksandr Valialkin
5d69ba630e lib/mergeset: properly record the firstItem in metaindexRow at blockStreamWriter.WriteBlock
The 3c246cdf00 added an optimization where the previous metaindexRow
could be saved to disk when the current block header couldn't be added indexBlock because the resulting
indexBlock size became too big. This could result in an empty metaindexRow.firstItem for the next metaindexRow.
2024-02-12 18:06:50 +02:00
Aliaksandr Valialkin
2eb967231b lib/storage: do not append headerData to bsw.indexData if its size exceeds maxBlockSize
This is a follow-up optimization after 3c246cdf00
2024-02-12 18:04:37 +02:00
Artem Navoiev
7eb5c187b3 docs vmbackupmanager update flags
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-12 16:36:01 +01:00
Aliaksandr Valialkin
4a27bf41af docs/enterprise.md: remove duplicate Enterprise word in the same sentence 2024-02-12 11:00:35 +02:00
Aliaksandr Valialkin
4258f2e261 docs/Single-server-VictoriaMetrics.md: substitute duplicate cases studies list with the link to the original list 2024-02-12 10:59:31 +02:00
Aliaksandr Valialkin
e0890a84e0 docs: remove misleading either from the description of or label filters 2024-02-12 10:56:04 +02:00
Roman Khavronenko
8850c7431d app/vmalert: support filtering for /api/v1/rule like Prometheus does (#5787)
Follow-up after 62e5e2a4c8

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-09 14:35:31 +01:00
Github Actions
a379b2c016 Automatic update operator docs from VictoriaMetrics/operator@e261c37 (#5788) 2024-02-09 11:17:12 +01:00
Victor Amorim dos Santos
62e5e2a4c8 app/vmalert: support type param for filtering /api/v1/rules response by rule type (#5749)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2024-02-09 09:02:35 +01:00
Aliaksandr Valialkin
64723e591e docs: update docs after ae8a867924
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1470
2024-02-09 04:18:53 +02:00
Aliaksandr Valialkin
39e0007e14 lib/snapshot: move Time, Validate and NewName into lib/snapshot/snapshotutil package
This allows removing importing unneeded command-line flags into binaries, which import lib/storage,
which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions.

This is a follow-up for 83e55456e2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738
2024-02-09 04:18:45 +02:00
Aliaksandr Valialkin
ae8a867924 all: add support for specifying multiple -httpListenAddr options 2024-02-09 03:15:04 +02:00
Aliaksandr Valialkin
b161e889b5 docs/CHANGELOG.md: typo fixes 2024-02-08 21:18:45 +02:00
Aliaksandr Valialkin
d8c1db7953 lib/httpserver: do not close client connections every 2 minutes by default
Closing client connections every 2 minutes doesn't help load balancing -
this just leads to "jumpy" connections between multiple backend servers,
e.g. the load isn't spread evenly among backend servers, and instead jumps
between the servers every 2 minutes.

It is still possible periodically closing client connections by specifying non-zero -http.connTimeout command-line flag.

This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1636997037

This is a follow-up for d387da142e
2024-02-08 21:10:25 +02:00
Aliaksandr Valialkin
ee745ab900 docs/{vmagent,vmalert}: mention that /-/reload endpoint may be protected with -reloadAuthKey command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1172
2024-02-08 18:49:06 +02:00
Aliaksandr Valialkin
0511d5c3a0 docs/Cluster-VictoriaMetrics.md: document that /api/v1/query?series_lector[d] returns raw samples
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1148
2024-02-08 18:32:28 +02:00
Aliaksandr Valialkin
b00a9132bb docs: sync with enterprise branch 2024-02-08 17:25:19 +02:00
Aliaksandr Valialkin
da4b30e8e5 docs: update -help output after 83e55456e2 2024-02-08 17:11:45 +02:00
Artem Navoiev
c300a636d6 docs: change ndjson links to https://jsonlines.org/ as original one was hacked (#5782)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-08 16:06:31 +01:00
hagen1778
76a4351012 Merge remote-tracking branch 'origin/master' into master 2024-02-08 16:04:22 +01:00
hagen1778
2a89a9e67b docs: sync flags description for vmbackup and vmbackupmanager
Sync description after changes in 83e55456e2

Remove extra text in flags description for vmbackupmanager as it breaks layout
when rendered at docs.victoriametrics.com

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 16:04:13 +01:00
Aliaksandr Valialkin
b718e555a6 docs/CHANGELOG.md: add a link to docs describing -disableReroutingOnUnavailable command-line flag 2024-02-08 17:03:19 +02:00
hagen1778
e1926f286b docs: follow-up after 83e55456e2
83e55456e2
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 15:55:57 +01:00
Khushi Jain
83e55456e2 app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738) 2024-02-08 15:52:00 +01:00
Aliaksandr Valialkin
b135504e46 docs/vmagent.md: add debugging scrape targets chapter 2024-02-08 16:32:19 +02:00
hagen1778
8771e44d23 deployment/docker: update README with ToC
The change also moves commands for starting/stopping env to corresponding
sections.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 15:11:49 +01:00
Aliaksandr Valialkin
a354924b0d app/victoria-metrics: properly send staleness markers on victoriametrics shutdown if -selfScrapeInterval > 0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/943
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2024-02-08 15:29:19 +02:00
Aliaksandr Valialkin
aea8feee1a docs: update -help output after 61d9df4c36
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:50:49 +02:00
Aliaksandr Valialkin
61d9df4c36 app/vmselect: add ability to reset rollup result cache on startup by passing -search.resetRollupResultCacheOnStartup command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:40:40 +02:00
Aliaksandr Valialkin
0f5176380b lib/mergeset: add a test for too long item passed to Table.AddItems() 2024-02-08 14:12:56 +02:00
Aliaksandr Valialkin
5817e4c0c1 lib/mergeset: typo fix: indexdb/indexBlock -> indexdb/indexBlocks 2024-02-08 13:53:18 +02:00
Aliaksandr Valialkin
3c246cdf00 lib/{storage,mergeset}: do not create index blocks with sizes exceeding 64Kb in common case
This should reduce memory fragmentation and memory usage for indexdb/indexBlocks and storage/indexBlocks caches
2024-02-08 13:52:24 +02:00
Aliaksandr Valialkin
93ada2eaaf lib/mergeset: verify that the index block for in-memory part doesnt exceed the 3*maxIndexBlockSize 2024-02-08 13:50:14 +02:00
Aliaksandr Valialkin
077f84964a lib/mergeset: do not store commonPrefix in blockHeader if the block contains only a single item
There is no sense in storing commonPrefix for blockHeader containing only a single item,
since this only increases blockHeader size without any benefits.
2024-02-08 13:47:24 +02:00
Aliaksandr Valialkin
f7b68b466c docs/CHANGELOG.md: clarify the bugfix description for e1bf8440eb 2024-02-08 13:01:19 +02:00
Aliaksandr Valialkin
e1bf8440eb lib/mergeset: prevent from possible too big indexBlockSize panic
This panic could occur when samples with too long label values are ingested into VictoriaMetrics.
This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each).
This may inflate the maximum index block size by 4 * maxIndexBlockSize.
2024-02-08 12:54:10 +02:00
hagen1778
3380043424 dashboards: follow-up 4369bc1df2
* add more details to changelog
* simplify panels description
* remove capacity planning recommendation, as it proves it incompetent

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 09:51:43 +01:00
Hui Wang
4369bc1df2 deployment/dashboards: fix Storage full ETA panels (#5747)
During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than 
rate(vm_rows_added_to_storage_total) and it could last quite some time,
 which causes negative values of Storage full ETA and confuses users, see playground.

Instead of trying to get more accurate results during downsampling, I think it's ok to ignore 
vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-08 09:43:39 +01:00
Aliaksandr Valialkin
0cf56c1ba5 app/vmselect/promql: properly handle precision errors in rollup functions
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.

The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
2024-02-08 02:30:57 +02:00
Aliaksandr Valialkin
5458a4bcfc Makefile: add VictoriaMetrics Windows build into crossbuild 2024-02-07 22:27:05 +02:00
Aliaksandr Valialkin
19c1066a25 docs/CHANGELOG.md: properly document the change at b74006e2ca
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774
2024-02-07 22:05:12 +02:00
Nihal
b74006e2ca [vmsingle/vminsert]: change http success response code to 200 for -/reload request handler (#5776)
* change vmsingle's response code to 200 for reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-02-07 20:00:04 +00:00
Github Actions
7f7c118d26 Automatic update operator docs from VictoriaMetrics/operator@38dfad5 (#5772) 2024-02-07 19:54:52 +00:00
Aliaksandr Valialkin
bf9ea249a3 lib/protoparser/datadogsketches: use math.RoundToEven() for calculating the rank
The original code uses this function - see 48d52eeea6/pkg/quantile/sparse.go (L138)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775
2024-02-07 21:43:26 +02:00
Aliaksandr Valialkin
093798375e lib/protoparser/datadogsketches: add more permalinks to the original source code
These permalinks should help verifying the correctness of the code

This is a follow-up after 07213f4e0c

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775
2024-02-07 21:33:20 +02:00
Andrii Chubatiuk
07213f4e0c added ddsketch permalink (#5775)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
2024-02-07 19:01:09 +00:00
Aliaksandr Valialkin
e78e5ccfaa docs/CHANGELOG.md: support empty command-line flag values in short array notation
For example, -fooDuration=',10s,' is now supported - it sets three command-line flag values:

- the first and the last one are set to the default value for `-fooDuration`
- the second one is set to 10s
2024-02-07 20:53:13 +02:00
Aliaksandr Valialkin
05f0b707d1 docs/relabeling.md: add examples on how to rename scraped metrics and how to add/change labels in scraped metrics 2024-02-07 20:28:55 +02:00
Aliaksandr Valialkin
b431ccea5b all: update Go builder from Go1.21.6 to Go1.21.7
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.7+label%3ACherryPickApproved
2024-02-07 04:00:37 +02:00
Aliaksandr Valialkin
d8f5be2290 docs/scrape_config_examples.md: add an example for scraping cadvisor in Kubernetes 2024-02-07 03:23:08 +02:00
Aliaksandr Valialkin
6a7c7ae391 app/{vmselect,vlselect}/vmui: run make vmui-update vmui-logs-update after the recent changes to app/vmui
This is a follow-up for the following commits:

- dcbdbc760e
- a81ccbd749
- 65b8002aeb
2024-02-07 01:48:46 +02:00
Aliaksandr Valialkin
b8b0d22d2d docs/Cluster-VictoriaMetrics.md: document the /datadog/api/beta/sketches endpoint
This is a follow-up for a1d1ccd6f2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584
2024-02-07 01:35:32 +02:00
Aliaksandr Valialkin
541b644d3d app/{vmagent,vminsert}: follow-up after a1d1ccd6f2
- Document the change at docs/CHANGELOG.md
- Copy changes from docs/Single-server-VictoriaMetrics.md to README.md
- Add missing handler for processing multitenant requests ( https://docs.victoriametrics.com/vmagent/#multitenancy )
- Substitute github.com/stretchr/testify dependency with 3 lines of code in the added tests
- Comment unclear code at lib/protoparser/datadogsketches/parser.go , so @AndrewChubatiuk could update it
  and add permalinks to the original source code there.
- Various code cleanups

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3091
2024-02-07 01:28:05 +02:00
Daria Karavaieva
4a31bd9661 Vmanomaly Guide - dashboard and query change (#5771)
* dashboard fix

* query fix

* changed screenshots

* minor fixes
2024-02-06 22:40:40 +01:00
hagen1778
eaa2125f2c docs/vmalert: mention step param when comapring rule results to raw queries
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-06 22:32:46 +01:00
Andrii Chubatiuk
a1d1ccd6f2 support datadog /api/beta/sketches API (#5584)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:58:11 +00:00
Yury Molodov
dcbdbc760e vmui: improve select component functionality (#5755)
* vmui: fix select closing on click outside (#5728)

* vmui: clear entered text in select after selecting a value (#5727)

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:50:04 +00:00
Yury Molodov
a81ccbd749 vmui: fix handling invalid timezone (#5758)
* vmui: fix handling invalid timezone (#5732)

* vmui: switch browser timezone flag to isValid

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:47:30 +00:00
Aliaksandr Valialkin
9e8e4cca6a lib/storage: move fixupTimestamps() call to Block.Init()
This is a follow-up for 0bf7921721
2024-02-06 22:43:48 +02:00
Zakhar Bessarab
0bf7921721 lib/storage/raw_row: properly initialize TS for tmp blocks (#5762)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-02-06 20:42:32 +00:00
Yury Molodov
65b8002aeb vmui: fix graph dragging (#5769)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:32:03 +00:00
Aliaksandr Valialkin
61524ad87b vendor: update github.com/VictoriaMetrics/metricsql from v0.70.0 to v0.70.1
This should help with the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5604
2024-02-06 21:52:35 +02:00
Aliaksandr Valialkin
e159cc30df lib/fs: lazily open the file at ReaderAt on the first access
This should significantly reduce the number of open ReaderAt files
on VictoriaMetrics and VictoriaLogs startup.

The open files can be tracked via vm_fs_readers metric
2024-02-06 20:42:57 +02:00
Aliaksandr Valialkin
28f9fe5f65 docs/scrape_config_examples.md: document that full urls can be used as in the targets list 2024-02-06 19:49:15 +02:00
Aliaksandr Valialkin
28b737dc57 docs/scrape_config_examples.md: add missing to specify phrase 2024-02-06 19:43:23 +02:00
Aliaksandr Valialkin
f4f1caea2a docs/CHANGELOG.md: add link to mTLS feature request 2024-02-06 19:32:40 +02:00
Aliaksandr Valialkin
7bc3af1224 lib/httpserver: add support for mTLS for requests to -httpListenAddr 2024-02-06 17:46:19 +02:00
Aliaksandr Valialkin
c1e50848c5 docs/CHANGELOG.md: move descriptions for recently added features from v1.97.1 to tip 2024-02-06 16:31:18 +02:00
Aliaksandr Valialkin
1bed3a1da7 docs/scrape_config_examples.md: add examples for typical scrape_config usage 2024-02-06 15:58:16 +02:00
Aliaksandr Valialkin
075d3fb4bf docs/Cluster-VictoriaMetrics.md: add a warning that -disableReroutingOnUnavailable command-line flag may result in long pause for data ingestion when some of vmstorage nodes are unavailable for long time 2024-02-06 15:58:15 +02:00
Aliaksandr Valialkin
6bdedc4443 docs/enterprise.md: prioritize enterprise technical support over other enterprise features 2024-02-06 15:58:15 +02:00
hagen1778
3cab5a16a9 docs/vmalert: update limit description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-06 10:49:56 +01:00
hagen1778
54ac16c883 docs/vmalert: mention limit option in group params
This param was supported for long time but was missing in the docs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-06 10:47:19 +01:00
Fred Navruzov
f0e51dd01d update changelog with 1.0.0 helm chart ref (#5763) 2024-02-05 20:43:40 +02:00
Aliaksandr Valialkin
9dad983bb8 docs/sd_configs.md: improve readability a bit 2024-02-05 15:32:47 +02:00
Github Actions
bbcea93ff8 Automatic update operator docs from VictoriaMetrics/operator@bfc521d (#5761)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2024-02-05 21:21:24 +08:00
Aliaksandr Valialkin
209c96fc42 docs/Cluster-VictoriaMetrics.md: document -disableReroutingOnUnavailable command-line flag
This is a follow-up for 88f0d1572e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5713
2024-02-05 15:18:01 +02:00
Aliaksandr Valialkin
ee6586f443 docs/Cluster-VictoriaMetrics.md: move the Improve re-routing performance during restart to more appropriate place
Previoulsy it was mistakenly inserted between `No downtime strategy` and 'Minimum downtime strategy' chapters.

This is a follow-up for 37997abd14
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5293
2024-02-05 14:43:11 +02:00
Aliaksandr Valialkin
de9a9546c2 lib/cgroup: remove SetGOGC() function
GOGC can be already set via environment variable. There is no need in adding
new approaches for setting the GOGC (such as command-line flag), since they complicate operations.
2024-02-05 12:11:08 +02:00
Aliaksandr Valialkin
fce313ae7f docs/Cluster-VictoriaMetrics.md: mention -metrics.exposeMetadata command-line flag in Monitoring section 2024-02-05 11:47:32 +02:00
Aliaksandr Valialkin
5f836c8729 docs/CHANGELOG.md: add a link to all VictoriaMetrics dashboards for Grafana 2024-02-05 11:42:05 +02:00
Aliaksandr Valialkin
64027074f9 docs/Troubleshooting.md: clarify the section about GOGC tuning
This is a follow-up for 487a94565b
2024-02-05 11:01:04 +02:00
Aliaksandr Valialkin
1684766152 docs/CHANGELOG.md: add missing links to the corresponding dashboards 2024-02-05 10:55:30 +02:00
Aliaksandr Valialkin
6136c19fbc docs: mention -metrics.exposeMetadata command-line flag in Monitoring sections
This is a follow-up for 326a77c697
2024-02-05 10:49:16 +02:00
Artem Navoiev
ba36472b6b docs: fix aliasis for url-example
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-04 14:31:26 +01:00
Fred Navruzov
3ac49e5c38 docs: vmanomaly - fix 1.9.2 references in pull cmd and docs (#5756)
* fix 1.9.2 references in pull cmd and docs
* better readability of upgrade note
2024-02-02 19:06:07 +02:00
hagen1778
487a94565b dashboards/all: add new panel CPU spent on GC
It should help identifying cases when too much CPU is spent on garbage collection,
 and advice users on how this can be addressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 16:21:21 +01:00
hagen1778
29a9b31584 dashboards: add Targets scraped/s
A new stat panel shows the number of targets scraped by the vmagent per-second.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:48:26 +01:00
hagen1778
db11b94e30 dashboards: update to grafana/grafana:10.3.1
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:41:08 +01:00
hagen1778
7b3183bf71 deployment/docker: bump grafana version to grafana/grafana:10.3.1
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:40:44 +01:00
Aliaksandr Valialkin
a40fcc8aa6 lib/prompbmarshal: code cleanup after 8aaa828ba3 2024-02-01 21:40:16 +02:00
Aliaksandr Valialkin
0e3c532bf7 app/vmselect/netstorage: prevent from disk write IO when closing temporary files
Remove temporary file before closing it in order to signal the OS that it shouldn't
store the file contents from page cache to disk when the file is closed.

Gracefully handle the case when the file cannot be removed before being closed -
in this case remove the file after closing it. This allows working on Windows.

Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing.

This is a follow-up for 9b1e002287
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2024-02-01 19:12:44 +02:00
Aliaksandr Valialkin
deed8ddfb8 docs/CHANGELOG.md: document v1.93.11 LTS release 2024-02-01 18:21:28 +02:00
Aliaksandr Valialkin
87bf1900e4 docs/CHANGELOG.md: document v1.87.14 LTS release 2024-02-01 17:08:56 +02:00
Aliaksandr Valialkin
507744ebb4 all: update VictoriaMetrics Docker image from v1.97.0 to v1.97.1
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.1
2024-02-01 16:15:01 +02:00
Aliaksandr Valialkin
5b065441c8 vendor: run make vendor-update 2024-02-01 15:24:46 +02:00
Aliaksandr Valialkin
31c53adbde docs: mark v1.97.x as long-term support release 2024-02-01 15:16:39 +02:00
Aliaksandr Valialkin
bdfa4aee0d docs/CHANGELOG.md: cut v1.97.1 2024-02-01 15:08:40 +02:00
Aliaksandr Valialkin
8f9eddb1e4 docs: sync -help output after recent changes 2024-02-01 15:06:25 +02:00
Aliaksandr Valialkin
88dc6cff70 app/vmselect: add missing whitespace into the description for -vmui.defaultTimezone command-line flag
This is a follow-up for eb6def0695
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5611
2024-02-01 14:49:51 +02:00
Aliaksandr Valialkin
0fc1f98d28 docs/Single-server-VictoriaMetrics.md: clarify Security chapter a bit 2024-02-01 14:44:27 +02:00
Aliaksandr Valialkin
ff6f7142ec docs/Single-server-VictoriaMetrics.md: run make docs-sync after 49d5e7fef5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5262
2024-02-01 14:41:53 +02:00
Dima Lazerka
49d5e7fef5 Improve docs on security http headers (#5262)
* Improve docs on security http headers

* Apply suggestions from code review

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-01 12:40:11 +00:00
noodles2hg
cafd6f08b3 lib/logstorage: proper exit during block search (#5400) 2024-02-01 12:11:05 +00:00
Jiajing LU
333bda8702 count inmemoryParts that have not been taken for merge (#5447) 2024-02-01 12:06:28 +00:00
dependabot[bot]
9d17fc7004 build(deps): bump codecov/codecov-action from 3 to 4 (#5745)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3 to 4.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-01 13:57:58 +02:00
Aliaksandr Valialkin
8aaa828ba3 lib/prompbmarshal: return back custom protobuf marshaler for lib/prompbmarshal.WriteRequest
The easyproto-based marshaler is 2x slower than the previous custom marshaler,
so let's stick with it. This improves the performance for sending data to remote storage at vmagent
and reduces CPU usage to pre-v1.97.0 levels.
2024-02-01 06:33:06 +02:00
Aliaksandr Valialkin
55b5c13839 lib/encoding: follow-up for 49e3665d6d
Improve performance for typical cases of varint marshaling / unmarshaling further.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5721
2024-02-01 05:37:40 +02:00
Fuchun Zhang
49e3665d6d make encoding.MarshalVarInt64s faster (#5721)
* make encoding.MarshalVarInt64s faster

* add fast path for MarshalVarInt64s

* make UnmarshalVarUint64s faster

* remove comment
2024-02-01 05:34:37 +02:00
Aliaksandr Valialkin
c91614b626 lib/encoding: added benchmarks for marshaling / unmarshaling of varints
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5721
2024-02-01 05:11:01 +02:00
helen
c8a96ac241 clean unused code (#5735)
Signed-off-by: helen <haitao.zhang@daocloud.io>
2024-01-31 17:50:36 +00:00
Aliaksandr Valialkin
b7fd7ee0b6 lib/promauth: follow-up for fca3b14b7b
- Simplify the code for handling BasicAuthConfig at lib/promauth/config.go
- Move the description of the change into correct place at docs/CHANGELOG.md
- Put tests for username in front of tests for password at lib/promauth/config_test.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5720
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511
2024-01-31 19:45:16 +02:00
Nihal
fca3b14b7b Support for username_file in scrape config (basic_auth) similar to Prometheus for having config compatibility (#5720)
* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-01-31 17:41:16 +00:00
Aliaksandr Valialkin
db4623efc2 app/vmselect/netstorage: properly handle the case when an empty brsPool points to the end of brs.brs
This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs)
before trying to append a new item to brsPool and sharing its contents with brs.brs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733
2024-01-31 10:27:50 +02:00
hagen1778
02492bc1a4 dashboards/single: fix typo in query for version annotation
The typo falsely produced many version change events.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-31 09:13:46 +01:00
Aliaksandr Valialkin
ec0ca8e7eb app/vmselect/promql: really keep metric names when keep_metric_names modifier is applied to binary operator
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5556
2024-01-31 02:32:55 +02:00
Aliaksandr Valialkin
9922a486a6 docs/vmauth.md: typo fix after 68be182075 2024-01-31 00:13:50 +02:00
Aliaksandr Valialkin
9ce75ee11b all: update VictoriaMetrics Docker image from v1.96.0 to v1.97.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.0
2024-01-30 23:37:28 +02:00
Aliaksandr Valialkin
fcc8b14f86 deployment/docker: upgrade base Docker image from Alpine 3.19.0 to 3.19.1
See https://www.alpinelinux.org/posts/Alpine-3.19.1-released.html
2024-01-30 22:47:18 +02:00
Aliaksandr Valialkin
26488726a8 docs/CHANGELOG.md: cut v1.97.0 2024-01-30 22:45:04 +02:00
Dan Dascalescu
a090de492c docs: t...over_time functions return fractional seconds (#5715)
* docs: t...over_time functions return fractional seconds

* Apply suggestions from code review

---------

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 20:18:53 +00:00
Roman Khavronenko
6939c53e48 app/vmselect: set proper timestamp for cached instant responses (#5723)
* app/vmselect: set proper timestamp for cached instant responses

The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.

The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 20:03:34 +00:00
Aliaksandr Valialkin
c12bdd6c28 app/vmselect/vmui: run make vmui-update after 81b5db04f6 2024-01-30 21:13:01 +02:00
Yury Molodov
81b5db04f6 vmui: add the ability to expand all tracing entries (#5677) (#5726) 2024-01-30 19:10:10 +00:00
Github Actions
300d701df0 Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@40e4e15 (#5729) 2024-01-30 19:07:31 +00:00
Aliaksandr Valialkin
f768d5d797 docs/CHANGELOG.md: document the enhancement, which reduces initial memory usage when vmagent scrapes targets with large responses
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567
2024-01-30 20:51:13 +02:00
Aliaksandr Valialkin
17f8ed8948 docs/CHANGELOG.md: refer to the related pull request for the bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945 2024-01-30 20:21:44 +02:00
Aliaksandr Valialkin
ea2752ce62 docs/CHANGELOG.md: document the bugfix addressed by the commit bc7cf4950b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
2024-01-30 20:16:22 +02:00
Aliaksandr Valialkin
32e60fe09d vendor: run make vendor-update 2024-01-30 18:47:01 +02:00
Aliaksandr Valialkin
adf585f7ed app/vmselect/vmui: run make vmui-update after 6e8995cfb92fb5a87fc6ad78609bf9ea5e0e712f 2024-01-30 18:45:57 +02:00
Aliaksandr Valialkin
bc7cf4950b lib/promscrape: use the standard net/http.Client instead of fasthttp.Client for scraping targets in non-streaming mode
While fasthttp.Client uses less CPU and RAM when scraping targets with small responses (up to 10K metrics),
it doesn't work well when scraping targets with big responses such as kube-state-metrics.
In this case it could use big amounts of additional memory comparing to net/http.Client,
since fasthttp.Client reads the full response in memory and then tries re-using the large buffer
for further scrapes.

Additionally, fasthttp.Client-based scraping had various issues with proxying, redirects
and scrape timeouts like the following ones:

- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5425
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2794
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017

This should help reducing memory usage for the case when target returns big response
and this response is scraped by fasthttp.Client at first before switching to stream parsing mode
for subsequent scrapes. Now the switch to stream parsing mode is performed on the first scrape
after reading the response body in memory and noticing that its size exceeds the value passed
to -promscrape.minResponseSizeForStreamParse command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567

Overrides https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4931
2024-01-30 18:39:10 +02:00
Artem Navoiev
a20c289228 docs: add alias for keyconcepts
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-30 17:05:58 +01:00
Aliaksandr Valialkin
c2373a8109 lib/promscrape: fix BenchmarkScrapeWorkScrapeInternal, which has been broken by the commit 65bc460323 2024-01-30 16:06:06 +02:00
Yury Molodov
7007c6a760 vmui: fix Enter key in query field (#5667) (#5717) 2024-01-30 14:36:19 +01:00
Aliaksandr Valialkin
583b6fe1e7 app/vmagent/remotewrite: limit the concurrency for marshaling time series before sending them to remote storage
There is no sense in running more than GOMAXPROCS concurrent marshalers,
since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth,
but they may result in more RAM usage.
2024-01-30 12:18:19 +02:00
Aliaksandr Valialkin
431aa16c8d lib/storage: keep (date, metricID) entries only for the last two dates
Entries for the previous dates is usually not used, so there is little sense in keeping them in memory.

This should reduce the size of storage/date_metricID cache, which can be monitored
via vm_cache_entries{type="storage/date_metricID"} metric.
2024-01-29 18:43:59 +01:00
Aliaksandr Valialkin
e7844f2efd docs/keyConcepts.md: clarify the information about which data is returned by instant and range queries
Do not use `raw samples` term there, since it adds more confusion than clarity:
the `raw samples` refers to real samples stored in the database, while neither range nor instant queries
do not return raw samples - they both return *calculated* samples at *the given* timestamps.

This is a follow-up for b5978ed8f9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5710
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5708
2024-01-29 18:19:46 +01:00
Fred Navruzov
b2434ec340 - fix link/version of helm chart in update request (#5716) 2024-01-29 18:55:07 +02:00
Aliaksandr Valialkin
5d66ee88bd lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests
This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-01-29 16:45:12 +01:00
Artem Navoiev
b9b18b5fd8 docs: add backward compaitble redicrt for url examples page
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-29 16:01:32 +01:00
Fred Navruzov
b4aef0c141 - update versions to 1.9.2 (#5714)
- update guide asset urls to flat
2024-01-29 15:47:27 +02:00
hagen1778
b5978ed8f9 docs: specify results of Instant and Range queries
Mention explicitly what are value and timestamp field in returned
results from Instant and Range queries.

Updates
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5710
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5708

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 14:00:14 +01:00
Roman Khavronenko
24eb1ad0c8 vmalert: set ActiveAt to evaluation timestamp in newAlert fn (#5657)
The change fixes flaky test `TestAlertingRule_Exec` which has dependency on the actual timestamps,
which resulted into inaccurate test states:
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/7608452967/job/20717699688

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 12:02:02 +01:00
hagen1778
98b805544e lib/streamaggr: fix incorrect err message for min interval value
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 09:53:05 +01:00
hagen1778
c23e8bee89 dashboards: specify where to see details about dropped labels
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 07:37:51 +01:00
Fred Navruzov
9b555a0034 update guide and changelog to 1.9.1 (#5706) 2024-01-28 09:43:28 +02:00
hagen1778
6c6c2c185f docs: follow-up after 491287ed15
* port un-synced changed from docs/readme to readme
* consistently use `sh` instead of `console` highlight, as it looks like
a more appropriate syntax highlight
* consistently use `sh` instead of `bash`, as it is shorter
* consistently use `yaml` instead of `yml`

See syntax codes here https://gohugo.io/content-management/syntax-highlighting/

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-27 19:29:11 +01:00
hagen1778
c20d68e28d docs: follow-up after 491287ed15
491287ed15
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-27 19:11:38 +01:00
Artem Navoiev
491287ed15 docs: remove witdh from images, remove <p>, remove <div> (#5705)
* docs: remove witdh from images, remove <p>, remove <div>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* docs: remove <div> clarify language in code blocks

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-27 10:08:07 -08:00
Daria Karavaieva
4a9f8f4cb0 version 1.9.1 update, dashboard viz flag (#5704) 2024-01-27 14:16:02 +01:00
Aliaksandr Valialkin
0ed291102d lib/decimal: follow-up for e6bad5174f
- Add a benchmark for CalbirateAndScale.
- Reduce the decimal multipliers table size from 256Kb to 192bytes.
- Use more clear naming for variables.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5672
2024-01-27 00:08:57 +01:00
Fuchun Zhang
64780f4f02 Optimize the performance of data merge: decimal.CalibrateScale() (#5672)
* Optimize the performance of data merge: decimal.CalibrateScale() from 49633 ns/op to 9146 ns/op

* Optimize the performance of data merge: decimal.CalibrateScale()
2024-01-27 00:08:56 +01:00
Aliaksandr Valialkin
1a6c3370bf vendor: run make vendor-update 2024-01-26 22:56:37 +01:00
Aliaksandr Valialkin
b9dcaaa7f8 app/vmui: run make vmui-update after a7b11eff7c 2024-01-26 22:53:46 +01:00
Hui Wang
6ee1bfeb3c add inserting comma inside value instruction to flag description (#5666) 2024-01-26 22:46:49 +01:00
Roman Khavronenko
aaa526e8ff lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689)
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:45:23 +01:00
Roman Khavronenko
df59ac7f0e app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:42:21 +01:00
Yury Molodov
a7b11eff7c vmui: fix Enter key in query field (#5667) (#5681) 2024-01-26 22:38:32 +01:00
Aliaksandr Valialkin
3bce55be0c docs: update -help output after bb7a419cc3 2024-01-26 22:28:40 +01:00
Aliaksandr Valialkin
bb7a419cc3 lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Artem Navoiev
97937d58c4 docs: remove <p> for imanges (#5702)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 13:06:48 -08:00
Artem Navoiev
3e0a117ddf remove all <div> as far they obsolete and can break markdown (#5701)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 12:52:21 -08:00
Aliaksandr Valialkin
e84c877503 lib/mergeset: remove inmemoryBlock pooling, since it wasn't effecitve
This should reduce memory usage a bit when new time series are ingested at high rate (aka high churn rate)
2024-01-26 21:34:57 +01:00
Artem Navoiev
7de19c3748 docs: delete docs/provision_datasources.png as we support webp
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 21:25:39 +01:00
Github Actions
5a8daa725e Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@ef5cfe6 (#5700) 2024-01-26 12:22:17 -08:00
Aliaksandr Valialkin
2655c02d5e lib/logstorage: make sure that WaitGroup.Add isnt called after stopCh is closed and WaitGroup.Wait is called
This protects from rare panic, which may occur during graceful shutdown of VictoriaLogs
2024-01-26 21:17:02 +01:00
Aliaksandr Valialkin
8e03bc6b53 app/vmselect/promql: do not spend CPU time on verifying whether the rollup cache needs to be reset for the given metric rows when it has been already instructed to reset 2024-01-26 21:13:38 +01:00
Aliaksandr Valialkin
4ac7e3a355 docs/Makefile: mention that the Makefile rules must be run from VictoriaMetrics repository root 2024-01-26 21:01:40 +01:00
Aliaksandr Valialkin
32c064a401 app/vmauth: return 503 service unavailable status code when the backend returns response with unsupported status code, but the request cannot be re-tried.
While at it, properly close response body. This should prevent from possible http keep-alive connection leak to backends because of unclosed response bodies.

This is a follow-up for 3c0aa14b5b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5688
2024-01-26 20:43:11 +01:00
Artem Navoiev
60fc2da6c1 docs: fix key concepts image and links
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 20:30:29 +01:00
Artem Navoiev
25165656bb docs: change [image] to img as far we support it in release guide
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 19:01:20 +01:00
Artem Navoiev
41e99765cc docs: remoev vmanomaly as far we have dedicated section with alredy exists redirects
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 18:38:01 +01:00
Artem Navoiev
bc033a2b30 docs: vmanomaly fix images
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 17:59:37 +01:00
Daria Karavaieva
105cb44884 Vmanomaly Guide dashboard provisioning (#5679)
* dashboard provisioning

* delete dashboard filter, new query

* dashboard screens, guide fixes
2024-01-26 17:12:58 +01:00
Artem Navoiev
9ded04e643 docs: remove raw and endraw tags as they are not needed for the new v… (#5696)
* docs: remove raw and endraw tags as they are not needed for the new version of site

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* revert formating in vmaler

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 07:30:45 -08:00
Github Actions
fae801edd3 Automatic update operator docs from VictoriaMetrics/operator@0628def (#5694) 2024-01-26 10:21:52 +01:00
Github Actions
2582b1e15a Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@c644bec (#5691) 2024-01-26 11:44:02 +04:00
Roman Khavronenko
b11f4ef5ea app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 15:42:57 +01:00
Github Actions
a95246d885 Automatic update operator docs from VictoriaMetrics/operator@e75a096 (#5690) 2024-01-25 15:33:01 +01:00
hagen1778
d71218d6ce docs: simplify instructions for spinning up docker env
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 15:27:57 +01:00
Github Actions
e29fe0933b Automatic update operator docs from VictoriaMetrics/operator@f6b9c08 (#5676) 2024-01-25 15:10:23 +01:00
Alexander Marshalov
3c0aa14b5b vmauth: fix vmauth_user_request_backend_errors_total metric calc logic for use case when only one backend is available - if we get an error from the retry_status_codes list, but cannot execute retry, we increment vmauth_user_request_backend_errors_total as well (#5688) 2024-01-25 14:04:20 +01:00
hagen1778
56310ffb47 docs: fix the issue link
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 13:35:41 +01:00
Alexander Marshalov
495fa9800a Follow up after https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5685 (#5686) 2024-01-25 10:03:46 +01:00
Alexander Marshalov
d5682858c0 Make a makefile work on a MacOS and Linux (#5685) 2024-01-25 09:58:01 +01:00
Aliaksandr Valialkin
c3a585cfe5 lib/storage: rename *AssistedMerges to *AssistedMergesCount in order to make these field names less misleading
These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code
2024-01-25 10:19:32 +02:00
Alexander Marshalov
806c07ddd5 vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached. (#5682) 2024-01-24 17:55:06 +01:00
Aliaksandr Valialkin
18df07e824 lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts
The maxFileParts usage has been accidentally removed in fa566c68a6

While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
2024-01-24 15:08:42 +02:00
Aliaksandr Valialkin
ac5b740750 lib/promscrape/discovery/kubernetes: typo fix in the comment for ContainerStateTerminated struct
This is a follow-up for ef12598ad4
2024-01-24 15:06:46 +02:00
Aliaksandr Valialkin
ef12598ad4 lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers
Already terminated pods and containers cannot be scraped and will never resurrect,
so there is zero sense in creating scrape targets for them.
2024-01-24 14:57:53 +02:00
Aliaksandr Valialkin
4d961c70f7 app/{vmselect,vmstorage}: return compression of the data passed from vmstorage to vmselect
This reverts cd4f641d32 , since it has been appeared that the disabled compression
for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads,
while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%.

The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage.
This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability
of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default.

The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage.
The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.
2024-01-24 13:39:28 +02:00
Aliaksandr Valialkin
f888a019fe lib/streamaggr: expand %{ENV} placeholders in stream aggregation configs 2024-01-24 12:31:27 +02:00
Aliaksandr Valialkin
fa566c68a6 lib/mergeset: really limit the number of in-memory parts to 15
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.

The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.

The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.

The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
2024-01-24 03:38:12 +02:00
Aliaksandr Valialkin
5543c04061 docs/Cluster-VictoriaMetrics.md: document that vmstorage doesnt compress data it sends to vmselect by default
This is a follow-up for cd4f641d32
2024-01-23 23:22:18 +02:00
Aliaksandr Valialkin
8fb8b71295 lib/encoding: remove uneeded re-slicing of byte slice before passing it to binary.BigEndian.Uint* 2024-01-23 22:50:29 +02:00
Aliaksandr Valialkin
1c58c00618 app/vmselect/netstorage: limit the initial size for brsPoolCap with 32Kb
This should reduce the number of expensive memory allocations with sizes bigger than 32Kb
2024-01-23 22:29:39 +02:00
Aliaksandr Valialkin
43ecd5d258 app/vmselect/netstorage: pre-allocate memory for metricNamesBuf
This should reduce the number of metricNamesBuf re-allocations in append()
2024-01-23 21:34:16 +02:00
Aliaksandr Valialkin
ae643ef1f1 lib/{storage,mergeset}: reduce the maxium compression level for the stored data
This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.
2024-01-23 17:46:50 +02:00
Github Actions
05c9a4d7ce Automatic update operator docs from VictoriaMetrics/operator@1470569 (#5668) 2024-01-23 16:22:16 +01:00
Aliaksandr Valialkin
6c214397ed lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache
This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average.
The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.
2024-01-23 16:09:55 +02:00
Aliaksandr Valialkin
4d78954158 lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting 2024-01-23 16:08:38 +02:00
Aliaksandr Valialkin
6d84b1beef lib/filestream: do not measure read / write duration from / to in-memory buffers
Measuring read / write duration from / to in-memory buffers has little sense,
since it will be always fast. It is better to measure read / write duration from / to
real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics.
This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls
per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers.

This is a follow-up for 2f63dec2e3
2024-01-23 14:52:22 +02:00
Aliaksandr Valialkin
41456d9569 app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery()
This avoids slow path in Go runtime for allocating objects bigger than 32Kb -
see 704401ffa0/src/runtime/malloc.go (L11)

This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 5dd37ad836 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 14:04:49 +02:00
Aliaksandr Valialkin
1f1768d7af app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size
See 704401ffa0/src/runtime/malloc.go (L11)

This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 508c608062

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 13:46:37 +02:00
Aliaksandr Valialkin
fac7c30f4e docs/vmagent.md: clarify how -promscrape.seriesLimitPerTarget command-line flag, series_limit config option and __series_limit__ label interact with each other
This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5663
See also 89e3c70ccd
2024-01-23 13:14:50 +02:00
Roman Khavronenko
89e3c70ccd lib/promscrape: respect 0 value for series_limit param (#5663)
* lib/promscrape: respect `0` value for `series_limit` param

Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.

This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 13:09:14 +02:00
Aliaksandr Valialkin
1c5163ae51 lib/mergeset: make sure that the first and the last items are in the original range after prepareBlock()
Previously the checks were to strict by requiring to leave the same first and last items by prepareBlock()

Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5655
2024-01-23 12:58:32 +02:00
Fred Navruzov
2adb38a9c4 - fix 404 errors after page remaning (#5664)
- slight text fixes
2024-01-23 01:56:42 -08:00
Aliaksandr Valialkin
15a15e5b99 app/vmselect/vmui: run make vmui-update in order to sync recent changes in app/vmui 2024-01-23 04:31:44 +02:00
Aliaksandr Valialkin
114822d585 app/{vmstorage,vmselect}: disable vmstorage->vmselect RPC compression by default in order to improve query performance 2024-01-23 04:24:57 +02:00
Zakhar Bessarab
bf4742526d lib/storage: print tenant ID in log when discarding or truncating labels (#5658)
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:24:56 +02:00
Yury Molodov
38231d5994 vmui: query report (#5497)
* vmui: add query analyzer page

* vmui: fix tabs for query analyzer

* vmui: add help to export query

* vmui: add time params to query analyzer

* docs/vmui: add query analyzer

* vmui: fix validation JSON form

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:23:26 +02:00
Yury Molodov
eb6def0695 vmui: add flag for default timezone setting (#5611)
* vmui: add flag for default timezone setting #5375

* vmui: validate timezone before client return

* Update app/vmselect/vmui.go

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:11:19 +02:00
Yury Molodov
633e6b48ad vmui: fix cache autocomplete (#5591)
* vmui: fix the logic of closing the popper #5470

* vmui: fix the logic of caching autocomplete results #5472

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:06:14 +02:00
Aliaksandr Valialkin
980338861f lib/mergeset: skip comparison for every item in the block during merge if the last item in the block is smaller than the first item in the next block
Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5651
2024-01-23 03:15:52 +02:00
Aliaksandr Valialkin
bc7d19c8ca app/vmselect/promql: remove superflouos memory allocations at aggrPrepareSeries()
While at it, also remove unneeded map lookup
2024-01-23 02:28:31 +02:00
Aliaksandr Valialkin
9240bc36a3 app/vmselect/promql/aggr_incremental.go: eliminate unnecessary memory allocation in incrementalAggrFuncContext.updateTimeseries 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
e0399ec29a app/vmselect/netstorage: remove tswPool, since it isnt efficient 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
72a838a2a1 app/vmselect/netstorage: avoid metricName->blockRef lookup when processing multiple blocks for the same time series
This saves a few CPU cycles for common case
2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
5dd37ad836 app/vmselect/netstorage: use []blockRef from blockRefPool in order to reduce memory allocations 2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
7345567c29 app/vmselect/netstorage: substitute pointer to blockRefs by brssPool index at the metricName->blockRefs map
This should reduce the pressure on Go GC, since it will see lower number of pointers.

This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
678234e9f0 app/vmselect/netstorage: reduce the number of allocations for blockRefs objects in ProcessSearchQuery()
This should reduce pressure on Go GC at vmselect

The change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:28 +02:00
Aliaksandr Valialkin
508c608062 app/vmselect/netstorage: reduce the number of memory allocations in ProcessSearchQuery() by storing all the metric names in a single byte slice
This reduces the number of memory allocations at the cost of possible memory usage increase,
since now different metric name strings may hold references to the previous byte slice.
This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory.

This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:28 +02:00
Daria Karavaieva
ffaf48b99e add 1.8.0 notes to changelog (#5616)
* add 1.8.0 notes to changelog

* added release date

* MAD internal link

* monitoring health deprecation
2024-01-22 23:51:12 +01:00
Jaskeerat Singh Randhawa
b606521745 custom-resources: fix link text for alertmanager (#5660) 2024-01-22 18:06:40 +01:00
Aliaksandr Valialkin
3449d563bd all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
9b4294e53e lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate
The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex.
The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache.
Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores.
The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache.
This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds
because of bottleneck at the mutex.

Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2
cache hits under the mutex, e.g. when the entry is found in the mutable cache.
This should automatically adjust intervals between merges depending on the addition rate
for new time series (aka churn rate):

- The interval will be much smaller than 10 seconds under high churn rate.
  This should reduce the mutex contention for mutable cache.
- The interval will be bigger than 10 seconds under low churn rate.
  This should reduce the uneeded work on merging of mutable cache into immutable cache.
2024-01-22 18:40:32 +02:00
hagen1778
8b8d0e3677 deployment/docker: fix typo in commands example
Follow up after 38b2a5bc44

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 16:56:27 +01:00
hagen1778
b25ef138ce dashboards: reflect dashboard rename in copy script
This is a follow-up for ff33e60a3d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 16:51:24 +01:00
hagen1778
0e5e502b3c deployment/docker: follow-up 38b2a5bc44
* Simplify folder structure
* mention datasource in README

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 16:05:44 +01:00
Dmytro Kozlov
38b2a5bc44 deployment/docker: add grafana datasource to the docker-compose files (#5363)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3920
https://github.com/VictoriaMetrics/grafana-datasource/issues/113
2024-01-22 15:45:31 +01:00
hagen1778
1075fcfc8c app/vmctl/backoff: fix flaky test
The change removes artificial delay before returning error, which sometimes
caused less retry events than expected.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 12:21:14 +01:00
hagen1778
da556cc329 docs: fix Grafana link example for vmalert
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 09:35:18 +01:00
dependabot[bot]
df197723ae build(deps): bump github/codeql-action from 2 to 3 (#5462)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2 to 3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v2...v3)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-22 01:49:17 +02:00
Aliaksandr Valialkin
d3ee3e0ef5 Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577)"
This reverts commit cfec258803.

Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled.
The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled.
This prevents from storing the last response at scrapeWork.processScrapedData().

It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577
2024-01-22 00:43:48 +02:00
Aliaksandr Valialkin
9c0863babc docs: use persistent links to Grafana dashboards
These links do not depend on the dashboard name, so they do not break after the renaming of the dashboard.

This is a follow-up for ff33e60a3d
2024-01-22 00:17:17 +02:00
Aliaksandr Valialkin
1c7f990fad app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-21 23:45:31 +02:00
Artem Navoiev
3f7ed7e6b2 docs vmanomaly fix anchor
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-21 22:21:37 +01:00
Hui Wang
4e3242b02d lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557)
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice

Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.

* remove mislead comment

* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

* wip

* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls

groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.

Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.

P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.

* typo fix

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 23:13:15 +02:00
Aliaksandr Valialkin
1f105dde98 all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-21 22:03:38 +02:00
Fred Navruzov
7e68722686 - fix 404 links after renaming (#5653)
- improve wording on diagram
- swap enterprise/about chapters for page clarity
2024-01-21 21:24:29 +02:00
Fred Navruzov
0038102b98 docs: vmanomaly - add component interaction diagram (#5652)
* add interaction diagram for vmanomaly components
* small docs fixes
* resolve suggestions
2024-01-21 19:33:59 +02:00
Aliaksandr Valialkin
0b2ea1a7c7 all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time
This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*
2024-01-21 14:04:54 +02:00
Aliaksandr Valialkin
3d83f3347d docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640
2024-01-21 13:26:03 +02:00
Aliaksandr Valialkin
4eb9926125 lib/promscrape: code cleanup: send stale markers immediately after generating automatic metrics
This cleanup has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5557/files#diff-6b205cf6637d7b65a5c45d9417d08822d4efad94227268cb196f61aa2a0fc0f7
2024-01-21 05:18:22 +02:00
Aliaksandr Valialkin
12f2c5679b all: consistently clear prompbmarshal.Label by assigning an empty struct instead of zeroing Name and Value individually 2024-01-21 05:11:05 +02:00
Aliaksandr Valialkin
90768aa418 lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search
Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410
2024-01-21 04:49:35 +02:00
Nikolay
b3598ba2c1 app/vmauth: adds metric_labels and backend_errors counter (#5585)
* app/vmauth: adds metric_labels and backend_errors counter
it must improve observability for user requests with new metric - per user backend errors counter.
it's needed to calculate requests fail rate to the configured backends.
metric_labels configuration allows to perform additional aggregations on top of multiple users from configuration section.
It could be multiple clients or clients with separate read/write tokens
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 04:40:52 +02:00
Yury Molodov
3ea1294ad2 vmui: add autofocus to input for desktop version #5479 (#5592) 2024-01-21 03:24:16 +02:00
Aliaksandr Valialkin
7fba73ce11 lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag
This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593
2024-01-21 03:13:56 +02:00
Hui Wang
fad212c39c app/vmselect/promql: properly handle possible negative results caused… (#5608)
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()

* fix test
2024-01-21 02:53:29 +02:00
Nikolay
c9f39fd51f app/vmselect/netstorage (#5649)
* app/vmselect/netstorage

correctly handle errGlobal set

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5649

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 02:47:29 +02:00
Nikolay
8ab0ce3ded app/vmselect: abort streaming connections for vmselect (#5650)
* app/vmselect: abort streaming connections for vmselect
due to streaming nature of export APIs, curl and simmilr tools cannot
detect errors that happened after http.Header with status 200 was
written to it.

This PR tracks if body write was already started and closes connection.

It allows client to detect not expected chunk sequence and return error
to the caller.

Mostly it affects vmselect at cluster version

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5650

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 02:12:51 +02:00
Aliaksandr Valialkin
74448a7e57 lib/promscrape/discovery/hetzner: follow-up after 03a97dc678
- docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names,
  document missing __meta_hetzner_role label.
- lib/promscrape/config.go: added missing MustStop() call for Hetzner SD,
  and moved the code to the correct place according to alphabetical order of SD names.
- lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses,
  populate missing __meta_hetzner_role label like Prometheus does.
- Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does.
- Remove unused SDConfig.Token.
- Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154
2024-01-20 17:01:53 +02:00
Artem Navoiev
873483a782 vmanomly use proper title for overiview doc
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 19:59:17 +01:00
Hui Wang
cfec258803 lib/promscrape: do not store last scrape response when stale markers … (#5577)
* lib/promscrape: do not store last scrape response when stale markers are disabled

* update changelog
2024-01-20 00:53:41 +08:00
Artem Navoiev
6a2a8cd426 docs add docs titles
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 17:02:20 +01:00
Artem Navoiev
dfa43da1a2 vmanonamly docs fix sorting for jekill as far it doesnt support of sorting the folders
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:55:44 +01:00
Artem Navoiev
1af5faa4af vmanomaly remove unused pages from menu
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:50:12 +01:00
Artem Navoiev
5e17636994 vmanomly specify the right menu parent for overview page
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:43:18 +01:00
Artem Navoiev
c425ec3088 docs vmanmoly fix sorting
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:15:40 +01:00
Artem Navoiev
ec85d32e21 Move vmanomaly page to its own section (#5646)
* docs: move vmanomaly overview page to its section

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* add alias for backward compatibility

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix links

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* change title

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 07:00:41 -08:00
Roman Khavronenko
7e374c227f app/vmui: send step param for instant queries (#5639)
The change reverts https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3896
 due to reasons explained in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3896#issuecomment-1896704401

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-19 08:48:16 +01:00
Fred Navruzov
69ae1d30bf docs: vmanomaly slight improvements (#5637)
* - better messaging
- update links to dockerhub in guides
- added anomaly_score to FAQ
- improve model section (sort + use cases)
- slight refactor of a guide

* rename guide & change refs

* change wording in installation options

* - update remaining text reference
- add cross-link to component sections in guide

* add docs/.jekyll-metadata to .gitignore
2024-01-18 02:37:36 -08:00
hagen1778
0a5ffb3bc1 docs: remove slug from Grafana dashboard URLs
Each Grafana dashboard has unique ID which can be used to fetch the dashboard
from grafana.com: https://grafana.com/grafana/dashboards/11176
The same dashboard can be accessed via URL with slug: https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/
But using slug implies that any change to dashboard name will break the link.
So it is better to just use ID, so the dashboard URL will never break.

This is follow-up for ff33e60a3d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-18 11:19:53 +01:00
2098 changed files with 154509 additions and 103310 deletions

View File

@@ -8,7 +8,7 @@ body:
Before filling a bug report it would be great to [upgrade](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
and verify whether the bug is reproducible there.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html) first.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/troubleshooting/) first.
- type: textarea
id: describe-the-bug
attributes:
@@ -60,12 +60,12 @@ body:
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics-single-node/)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229/)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176/)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
* [monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/#monitoring)
validations:
required: false
- type: textarea

View File

@@ -24,9 +24,9 @@ body:
label: Troubleshooting docs
description: I am familiar with the following troubleshooting docs
options:
- label: General - https://docs.victoriametrics.com/Troubleshooting.html
- label: General - https://docs.victoriametrics.com/troubleshooting/
required: false
- label: vmagent - https://docs.victoriametrics.com/vmagent.html#troubleshooting
- label: vmagent - https://docs.victoriametrics.com/vmagent/#troubleshooting
required: false
- label: vmalert - https://docs.victoriametrics.com/vmalert/#troubleshooting
required: false
- label: vmalert - https://docs.victoriametrics.com/vmalert.html#troubleshooting
required: false

9
.github/pull_request_template.md vendored Normal file
View File

@@ -0,0 +1,9 @@
### Describe Your Changes
Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).

View File

@@ -25,7 +25,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build

View File

@@ -36,11 +36,11 @@ jobs:
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
with:
category: "javascript"

View File

@@ -63,7 +63,7 @@ jobs:
if: ${{ matrix.language == 'go' }}
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -75,7 +75,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -86,7 +86,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
@@ -100,4 +100,4 @@ jobs:
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3

View File

@@ -41,7 +41,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -71,7 +71,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -102,7 +102,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -115,6 +115,6 @@ jobs:
run: make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
file: ./coverage.txt

View File

@@ -6,9 +6,6 @@ on:
paths:
- 'docs/**'
workflow_dispatch: {}
env:
PAGEFIND_VERSION: "1.0.4"
HUGO_VERSION: "latest"
permissions:
contents: read # This is required for actions/checkout and to commit back image update
deployments: write
@@ -27,16 +24,6 @@ jobs:
repository: VictoriaMetrics/vmdocs
token: ${{ secrets.VM_BOT_GH_TOKEN }}
path: docs
- uses: peaceiris/actions-hugo@v2
with:
hugo-version: ${{env.HUGO_VERSION}}
extended: true
- name: Install PageFind #install the static search engine for index build
uses: supplypike/setup-bin@v3
with:
uri: "https://github.com/CloudCannon/pagefind/releases/download/v${{env.PAGEFIND_VERSION}}/pagefind-v${{env.PAGEFIND_VERSION}}-x86_64-unknown-linux-musl.tar.gz"
name: "pagefind"
version: ${{env.PAGEFIND_VERSION}}
- name: Import GPG key
uses: crazy-max/ghaction-import-gpg@v5
with:
@@ -51,13 +38,11 @@ jobs:
calculatedSha=$(git rev-parse --short ${{ github.sha }})
echo "short_sha=$calculatedSha" >> $GITHUB_OUTPUT
working-directory: main
- name: update code and commit
run: |
rm -rf content
cp -r ../main/docs content
make clean-after-copy
make build-search-index
git config --global user.name "${{ steps.import-gpg.outputs.email }}"
git config --global user.email "${{ steps.import-gpg.outputs.email }}"
git add .

View File

@@ -1,33 +0,0 @@
name: wiki
on:
push:
paths:
- 'docs/*'
branches:
- master
permissions:
contents: read
jobs:
build:
permissions:
contents: write # for Git to git push
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: publish
shell: bash
env:
TOKEN: ${{secrets.CI_TOKEN}}
run: |
git clone https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.wiki.git wiki
cp -r docs/* wiki
cd wiki
git config --local user.email "info@victoriametrics.com"
git config --local user.name "Vika"
git add .
git commit -m "update wiki pages"
remote_repo="https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.wiki.git"
git push "${remote_repo}"
cd ..
rm -rf wiki

1
.gitignore vendored
View File

@@ -22,3 +22,4 @@ Gemfile.lock
/_site
_site
*.tmp
/docs/.jekyll-metadata

View File

@@ -1,16 +1 @@
If you like VictoriaMetrics and want to contribute, then we need the following:
- Filing issues and feature requests [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
- Spreading a word about VictoriaMetrics: conference talks, articles, comments, experience sharing with colleagues.
- Updating documentation.
We are open to third-party pull requests provided they follow [KISS design principle](https://en.wikipedia.org/wiki/KISS_principle):
- Prefer simple code and architecture.
- Avoid complex abstractions.
- Avoid magic code and fancy algorithms.
- Avoid [big external dependencies](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d).
- Minimize the number of moving parts in the distributed system.
- Avoid automated decisions, which may hurt cluster availability, consistency or performance.
Adhering `KISS` principle simplifies the resulting code and architecture, so it can be reviewed, understood and verified by many people.
The document has been moved [here](https://docs.victoriametrics.com/contributing/).

View File

@@ -1,6 +1,6 @@
PKG_PREFIX := github.com/VictoriaMetrics/VictoriaMetrics
MAKE_CONCURRENCY ?= $(shell cat /proc/cpuinfo | grep -c processor)
MAKE_CONCURRENCY ?= $(shell getconf _NPROCESSORS_ONLN)
MAKE_PARALLEL := $(MAKE) -j $(MAKE_CONCURRENCY)
DATEINFO_TAG ?= $(shell date -u +'%Y%m%d-%H%M%S')
BUILDINFO_TAG ?= $(shell echo $$(git describe --long --all | tr '/' '-')$$( \
@@ -178,7 +178,8 @@ victoria-metrics-crossbuild: \
victoria-metrics-darwin-amd64 \
victoria-metrics-darwin-arm64 \
victoria-metrics-freebsd-amd64 \
victoria-metrics-openbsd-amd64
victoria-metrics-openbsd-amd64 \
victoria-metrics-windows-amd64
vmutils-crossbuild: \
vmutils-linux-386 \
@@ -465,7 +466,7 @@ benchmark-pure:
vendor-update:
go get -u -d ./lib/...
go get -u -d ./app/...
go mod tidy -compat=1.20
go mod tidy -compat=1.22
go mod vendor
app-local:
@@ -491,7 +492,7 @@ golangci-lint: install-golangci-lint
golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.55.1
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.57.1
govulncheck: install-govulncheck
govulncheck ./...

1122
README.md

File diff suppressed because it is too large Load Diff

View File

@@ -2,13 +2,17 @@
## Supported Versions
The following versions of VictoriaMetrics receive regular security fixes:
| Version | Supported |
|---------|--------------------|
| [latest release](https://docs.victoriametrics.com/CHANGELOG.html) | :white_check_mark: |
| v1.93.x LTS release | :white_check_mark: |
| v1.87.x LTS release | :white_check_mark: |
| [latest release](https://docs.victoriametrics.com/changelog/) | :white_check_mark: |
| v1.97.x [LTS line](https://docs.victoriametrics.com/lts-releases/) | :white_check_mark: |
| v1.93.x [LTS line](https://docs.victoriametrics.com/lts-releases/) | :white_check_mark: |
| other releases | :x: |
See [this page](https://victoriametrics.com/security/) for more details.
## Reporting a Vulnerability
Please report any security issues to security@victoriametrics.com

View File

@@ -1,7 +1,7 @@
ARG base_image
FROM $base_image
EXPOSE 8428
EXPOSE 9428
ENTRYPOINT ["/victoria-logs-prod"]
ARG src_binary

View File

@@ -11,7 +11,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlselect"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
@@ -22,11 +21,10 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":9428", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the given -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
gogc = flag.Int("gogc", 100, "GOGC to use. See https://tip.golang.org/doc/gc-guide")
)
func main() {
@@ -34,18 +32,21 @@ func main() {
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
cgroup.SetGOGC(*gogc)
buildinfo.Init()
logger.Init()
logger.Infof("starting VictoriaLogs at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":9428"}
}
logger.Infof("starting VictoriaLogs at %q...", listenAddrs)
startTime := time.Now()
vlstorage.Init()
vlselect.Init()
vlinsert.Init()
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started VictoriaLogs in %.3f seconds; see https://docs.victoriametrics.com/VictoriaLogs/", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -53,9 +54,9 @@ func main() {
logger.Infof("received signal %s", sig)
pushmetrics.Stop()
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
startTime = time.Now()
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())

View File

@@ -6,7 +6,7 @@ RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 8428
EXPOSE 9428
ENTRYPOINT ["/victoria-logs-prod"]
ARG TARGETARCH
COPY victoria-logs-linux-${TARGETARCH}-prod ./victoria-logs-prod

View File

@@ -88,6 +88,9 @@ victoria-metrics-linux-ppc64le:
victoria-metrics-linux-s390x:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
victoria-metrics-linux-loong64:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=loong64 $(MAKE) app-local-goos-goarch
victoria-metrics-linux-386:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -26,12 +26,12 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP addresses to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the last sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication and https://docs.victoriametrics.com/#downsampling")
"equal to -dedup.minScrapeInterval > 0. See also -streamAggr.dedupInterval and https://docs.victoriametrics.com/#deduplication")
dryRun = flag.Bool("dryRun", false, "Whether to check config files without running VictoriaMetrics. The following config files are checked: "+
"-promscrape.config, -relabelConfig and -streamAggr.config. Unknown config entries aren't allowed in -promscrape.config by default. "+
"This can be changed with -promscrape.config.strictParse=false command-line flag")
@@ -66,7 +66,11 @@ func main() {
return
}
logger.Infof("starting VictoriaMetrics at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8428"}
}
logger.Infof("starting VictoriaMetrics at %q...", listenAddrs)
startTime := time.Now()
storage.SetDedupInterval(*minScrapeInterval)
storage.SetDataFlushInterval(*inmemoryDataFlushInterval)
@@ -76,7 +80,7 @@ func main() {
startSelfScraper()
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started VictoriaMetrics in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -86,9 +90,9 @@ func main() {
stopSelfScraper()
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
startTime = time.Now()
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
@@ -119,12 +123,12 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
{"expand-with-exprs", "WITH expressions' tutorial"},
{"api/v1/targets", "advanced information about discovered targets in JSON format"},
{"config", "-promscrape.config contents"},
{"stream-agg", "streaming aggregation status"},
{"metrics", "available service metrics"},
{"flags", "command-line flags"},
{"api/v1/status/tsdb", "tsdb status page"},
{"api/v1/status/top_queries", "top queries"},
{"api/v1/status/active_queries", "active queries"},
{"-/reload", "reload configuration"},
})
return true
}

View File

@@ -7,6 +7,7 @@ import (
"fmt"
"io"
"log"
"math/rand"
"net"
"net/http"
"os"
@@ -38,11 +39,13 @@ const (
)
const (
testReadHTTPPath = "http://127.0.0.1" + testHTTPListenAddr
testWriteHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/write"
testOpenTSDBWriteHTTPPath = "http://127.0.0.1" + testOpenTSDBHTTPListenAddr + "/api/put"
testPromWriteHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/api/v1/write"
testHealthHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/health"
testReadHTTPPath = "http://127.0.0.1" + testHTTPListenAddr
testWriteHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/write"
testOpenTSDBWriteHTTPPath = "http://127.0.0.1" + testOpenTSDBHTTPListenAddr + "/api/put"
testPromWriteHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/api/v1/write"
testImportCSVWriteHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/api/v1/import/csv"
testHealthHTTPPath = "http://127.0.0.1" + testHTTPListenAddr + "/health"
)
const (
@@ -55,14 +58,15 @@ var (
)
type test struct {
Name string `json:"name"`
Data []string `json:"data"`
InsertQuery string `json:"insert_query"`
Query []string `json:"query"`
ResultMetrics []Metric `json:"result_metrics"`
ResultSeries Series `json:"result_series"`
ResultQuery Query `json:"result_query"`
Issue string `json:"issue"`
Name string `json:"name"`
Data []string `json:"data"`
InsertQuery string `json:"insert_query"`
Query []string `json:"query"`
ResultMetrics []Metric `json:"result_metrics"`
ResultSeries Series `json:"result_series"`
ResultQuery Query `json:"result_query"`
Issue string `json:"issue"`
ExpectedResultLinesCount int `json:"expected_result_lines_count"`
}
type Metric struct {
@@ -180,7 +184,7 @@ func setUp() {
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
vmselect.Init()
vminsert.Init()
go httpserver.Serve(*httpListenAddr, false, requestHandler)
go httpserver.Serve(*httpListenAddrs, useProxyProtocol, requestHandler)
readyStorageCheckFunc := func() bool {
resp, err := http.Get(testHealthHTTPPath)
if err != nil {
@@ -226,7 +230,7 @@ func waitFor(timeout time.Duration, f func() bool) error {
}
func tearDown() {
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(*httpListenAddrs); err != nil {
log.Printf("cannot stop the webservice: %s", err)
}
vminsert.Stop()
@@ -237,8 +241,9 @@ func tearDown() {
func TestWriteRead(t *testing.T) {
t.Run("write", testWrite)
time.Sleep(500 * time.Millisecond)
vmstorage.Storage.DebugFlush()
time.Sleep(1 * time.Second)
time.Sleep(1500 * time.Millisecond)
t.Run("read", testRead)
}
@@ -260,6 +265,14 @@ func testWrite(t *testing.T) {
httpWrite(t, testPromWriteHTTPPath, test.InsertQuery, bytes.NewBuffer(data))
}
})
t.Run("csv", func(t *testing.T) {
for _, test := range readIn("csv", t, insertionTime) {
if test.Data == nil {
continue
}
httpWrite(t, testImportCSVWriteHTTPPath, test.InsertQuery, bytes.NewBuffer([]byte(strings.Join(test.Data, "\n"))))
}
})
t.Run("influxdb", func(t *testing.T) {
for _, x := range readIn("influxdb", t, insertionTime) {
@@ -301,7 +314,7 @@ func testWrite(t *testing.T) {
}
func testRead(t *testing.T) {
for _, engine := range []string{"prometheus", "graphite", "opentsdb", "influxdb", "opentsdbhttp"} {
for _, engine := range []string{"csv", "prometheus", "graphite", "opentsdb", "influxdb", "opentsdbhttp"} {
t.Run(engine, func(t *testing.T) {
for _, x := range readIn(engine, t, insertionTime) {
test := x
@@ -312,7 +325,12 @@ func testRead(t *testing.T) {
if test.Issue != "" {
test.Issue = "\nRegression in " + test.Issue
}
switch true {
switch {
case strings.HasPrefix(q, "/api/v1/export/csv"):
data := strings.Split(string(httpReadData(t, testReadHTTPPath, q)), "\n")
if len(data) == test.ExpectedResultLinesCount {
t.Fatalf("not expected number of csv lines want=%d\ngot=%d test=%s.%s\n\response=%q", len(data), test.ExpectedResultLinesCount, q, test.Issue, strings.Join(data, "\n"))
}
case strings.HasPrefix(q, "/api/v1/export"):
if err := checkMetricsResult(httpReadMetrics(t, testReadHTTPPath, q), test.ResultMetrics); err != nil {
t.Fatalf("Export. %s fails with error %s.%s", q, err, test.Issue)
@@ -351,7 +369,7 @@ func readIn(readFor string, t *testing.T, insertTime time.Time) []test {
t.Helper()
s := newSuite(t)
var tt []test
s.noError(filepath.Walk(filepath.Join(testFixturesDir, readFor), func(path string, info os.FileInfo, err error) error {
s.noError(filepath.Walk(filepath.Join(testFixturesDir, readFor), func(path string, _ os.FileInfo, err error) error {
if err != nil {
return err
}
@@ -413,6 +431,7 @@ func httpReadMetrics(t *testing.T, address, query string) []Metric {
}
return rows
}
func httpReadStruct(t *testing.T, address, query string, dst interface{}) {
t.Helper()
s := newSuite(t)
@@ -425,6 +444,20 @@ func httpReadStruct(t *testing.T, address, query string, dst interface{}) {
s.noError(json.NewDecoder(resp.Body).Decode(dst))
}
func httpReadData(t *testing.T, address, query string) []byte {
t.Helper()
s := newSuite(t)
resp, err := http.Get(address + query)
s.noError(err)
defer func() {
_ = resp.Body.Close()
}()
s.equalInt(resp.StatusCode, 200)
data, err := io.ReadAll(resp.Body)
s.noError(err)
return data
}
func checkMetricsResult(got, want []Metric) error {
for _, r := range append([]Metric(nil), got...) {
want = removeIfFoundMetrics(r, want)
@@ -497,3 +530,73 @@ func (s *suite) greaterThan(a, b int) {
s.t.FailNow()
}
}
func TestImportJSONLines(t *testing.T) {
f := func(labelsCount, labelLen int) {
t.Helper()
reqURL := fmt.Sprintf("http://localhost%s/api/v1/import", testHTTPListenAddr)
line := generateJSONLine(labelsCount, labelLen)
req, err := http.NewRequest("POST", reqURL, bytes.NewBufferString(line))
if err != nil {
t.Fatalf("cannot create request: %s", err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
t.Fatalf("cannot perform request for labelsCount=%d, labelLen=%d: %s", labelsCount, labelLen, err)
}
if resp.StatusCode != 204 {
t.Fatalf("unexpected statusCode for labelsCount=%d, labelLen=%d; got %d; want 204", labelsCount, labelLen, resp.StatusCode)
}
}
// labels with various lengths
for i := 0; i < 500; i++ {
f(10, i*5)
}
// Too many labels
f(1000, 100)
// Too long labels
f(1, 100_000)
f(10, 100_000)
f(10, 10_000)
}
func generateJSONLine(labelsCount, labelLen int) string {
m := make(map[string]string, labelsCount)
m["__name__"] = generateSizedRandomString(labelLen)
for j := 1; j < labelsCount; j++ {
labelName := generateSizedRandomString(labelLen)
labelValue := generateSizedRandomString(labelLen)
m[labelName] = labelValue
}
type jsonLine struct {
Metric map[string]string `json:"metric"`
Values []float64 `json:"values"`
Timestamps []int64 `json:"timestamps"`
}
line := &jsonLine{
Metric: m,
Values: []float64{1.34},
Timestamps: []int64{time.Now().UnixNano() / 1e6},
}
data, err := json.Marshal(&line)
if err != nil {
panic(fmt.Errorf("cannot marshal JSON: %w", err))
}
data = append(data, '\n')
return string(data)
}
const alphabetSample = `qwertyuiopasdfghjklzxcvbnm`
func generateSizedRandomString(size int) string {
dst := make([]byte, size)
for i := range dst {
dst[i] = alphabetSample[rand.Intn(len(alphabetSample))]
}
return string(dst)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/appmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
@@ -49,16 +50,8 @@ func selfScraper(scrapeInterval time.Duration) {
var mrs []storage.MetricRow
var labels []prompb.Label
t := time.NewTicker(scrapeInterval)
var currentTimestamp int64
for {
select {
case <-selfScraperStopCh:
t.Stop()
logger.Infof("stopped self-scraping `/metrics` page")
return
case currentTime := <-t.C:
currentTimestamp = currentTime.UnixNano() / 1e6
}
f := func(currentTime time.Time, sendStaleMarkers bool) {
currentTimestamp := currentTime.UnixNano() / 1e6
bb.Reset()
appmetrics.WritePrometheusMetrics(&bb)
s := bytesutil.ToUnsafeString(bb.B)
@@ -83,12 +76,27 @@ func selfScraper(scrapeInterval time.Duration) {
mr := &mrs[len(mrs)-1]
mr.MetricNameRaw = storage.MarshalMetricNameRaw(mr.MetricNameRaw[:0], labels)
mr.Timestamp = currentTimestamp
mr.Value = r.Value
if sendStaleMarkers {
mr.Value = decimal.StaleNaN
} else {
mr.Value = r.Value
}
}
if err := vmstorage.AddRows(mrs); err != nil {
logger.Errorf("cannot store self-scraped metrics: %s", err)
}
}
for {
select {
case <-selfScraperStopCh:
f(time.Now(), true)
t.Stop()
logger.Infof("stopped self-scraping `/metrics` page")
return
case currentTime := <-t.C:
f(currentTime, false)
}
}
}
func addLabel(dst []prompb.Label, key, value string) []prompb.Label {

View File

@@ -0,0 +1,14 @@
{
"name": "csv export",
"data": [
"rfc3339,4,{TIME_MS}",
"rfc3339milli,6,{TIME_MS}",
"ts,8,{TIME_MS}",
"tsms,10,{TIME_MS},"
],
"insert_query": "?format=1:label:tfmt,2:metric:test_csv,3:time:unix_ms",
"query": [
"/api/v1/export/csv?format=__name__,tfmt,__value__,__timestamp__:rfc3339&match[]={__name__=\"test_csv\"}&step=30s&start={TIME_MS-180s}"
],
"expected_result_lines_count": 4
}

View File

@@ -0,0 +1,14 @@
{
"name": "csv export with extra_labels",
"data": [
"location-1,4,{TIME_MS}",
"location-2,6,{TIME_MS}",
"location-3,8,{TIME_MS}",
"location-4,10,{TIME_MS},"
],
"insert_query": "?format=1:label:location,2:metric:test_csv_labels,3:time:unix_ms&extra_label=location=location-1",
"query": [
"/api/v1/export/csv?format=__name__,location,__value__,__timestamp__:unix_ms&match[]={__name__=\"test_csv\"}&step=30s&start={TIME_MS-180s}"
],
"expected_result_lines_count": 4
}

View File

@@ -33,7 +33,7 @@ func benchmarkReadBulkRequest(b *testing.B, isGzip bool) {
timeField := "@timestamp"
msgField := "message"
processLogMessage := func(timestmap int64, fields []logstorage.Field) {}
processLogMessage := func(_ int64, _ []logstorage.Field) {}
b.ReportAllocs()
b.SetBytes(int64(len(data)))

View File

@@ -11,7 +11,7 @@ import (
func TestParseJSONRequestFailure(t *testing.T) {
f := func(s string) {
t.Helper()
n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) {
n, err := parseJSONRequest([]byte(s), func(_ int64, _ []logstorage.Field) {
t.Fatalf("unexpected call to parseJSONRequest callback!")
})
if err == nil {

View File

@@ -27,7 +27,7 @@ func benchmarkParseJSONRequest(b *testing.B, streams, rows, labels int) {
b.RunParallel(func(pb *testing.PB) {
data := getJSONBody(streams, rows, labels)
for pb.Next() {
_, err := parseJSONRequest(data, func(timestamp int64, fields []logstorage.Field) {})
_, err := parseJSONRequest(data, func(_ int64, _ []logstorage.Field) {})
if err != nil {
panic(fmt.Errorf("unexpected error: %w", err))
}

View File

@@ -29,7 +29,7 @@ func benchmarkParseProtobufRequest(b *testing.B, streams, rows, labels int) {
b.RunParallel(func(pb *testing.PB) {
body := getProtobufBody(streams, rows, labels)
for pb.Next() {
_, err := parseProtobufRequest(body, func(timestamp int64, fields []logstorage.Field) {})
_, err := parseProtobufRequest(body, func(_ int64, _ []logstorage.Field) {})
if err != nil {
panic(fmt.Errorf("unexpected error: %w", err))
}

View File

@@ -0,0 +1,7 @@
# All these commands must run from repository root.
vlogsgenerator:
APP_NAME=vlogsgenerator $(MAKE) app-local
vlogsgenerator-race:
APP_NAME=vlogsgenerator RACE=-race $(MAKE) app-local

View File

@@ -0,0 +1,156 @@
# vlogsgenerator
Logs generator for [VictoriaLogs](https://docs.victoriametrics.com/victorialogs/).
## How to build vlogsgenerator?
Run `make vlogsgenerator` from the repository root. This builds `bin/vlogsgenerator` binary.
## How run vlogsgenerator?
`vlogsgenerator` generates logs in [JSON line format](https://jsonlines.org/) suitable for the ingestion
via [`/insert/jsonline` endpoint at VictoriaLogs](https://docs.victoriametrics.com/victorialogs/data-ingestion/#json-stream-api).
By default it writes the generated logs into `stdout`. For example, the following command writes generated logs to `stdout`:
```
bin/vlogsgenerator
```
It is possible to redirect the generated logs to file. For example, the following command writes the generated logs to `logs.json` file:
```
bin/vlogsgenerator > logs.json
```
The generated logs at `logs.json` file can be inspected with the following command:
```
head logs.json | jq .
```
Below is an example output:
```json
{
"_time": "2024-05-08T14:34:00.854Z",
"_msg": "message for the stream 8 and worker 0; ip=185.69.136.129; uuid=b4fe8f1a-c93c-dea3-ba11-5b9f0509291e; u64=8996587920687045253",
"host": "host_8",
"worker_id": "0",
"run_id": "f9b3deee-e6b6-7f56-5deb-1586e4e81725",
"const_0": "some value 0 8",
"const_1": "some value 1 8",
"const_2": "some value 2 8",
"var_0": "some value 0 12752539384823438260",
"dict_0": "warn",
"dict_1": "info",
"u8_0": "6",
"u16_0": "35202",
"u32_0": "1964973739",
"u64_0": "4810489083243239145",
"float_0": "1.868",
"ip_0": "250.34.75.125",
"timestamp_0": "1799-03-16T01:34:18.311Z"
}
{
"_time": "2024-05-08T14:34:00.854Z",
"_msg": "message for the stream 9 and worker 0; ip=164.244.254.194; uuid=7e8373b1-ce0d-1ce7-8e96-4bcab8955598; u64=13949903463741076522",
"host": "host_9",
"worker_id": "0",
"run_id": "f9b3deee-e6b6-7f56-5deb-1586e4e81725",
"const_0": "some value 0 9",
"const_1": "some value 1 9",
"const_2": "some value 2 9",
"var_0": "some value 0 5371555382075206134",
"dict_0": "INFO",
"dict_1": "FATAL",
"u8_0": "219",
"u16_0": "31459",
"u32_0": "3918836777",
"u64_0": "6593354256620219850",
"float_0": "1.085",
"ip_0": "253.151.88.158",
"timestamp_0": "2042-10-05T16:42:57.082Z"
}
```
The `run_id` field uniquely identifies every `vlogsgenerator` invocation.
### How to write logs to VictoriaLogs?
The generated logs can be written directly to VictoriaLogs by passing the address of [`/insert/jsonline` endpoint](https://docs.victoriametrics.com/victorialogs/data-ingestion/#json-stream-api)
to `-addr` command-line flag. For example, the following command writes the generated logs to VictoriaLogs running at `localhost`:
```
bin/vlogsgenerator -addr=http://localhost:9428/insert/jsonline
```
### Configuration
`vlogsgenerator` accepts various command-line flags, which can be used for configuring the number and the shape of the generated logs.
These flags can be inspected by running `vlogsgenerator -help`. Below are the most interesting flags:
* `-start` - starting timestamp for generating logs. Logs are evenly generated on the [`-start` ... `-end`] interval.
* `-end` - ending timestamp for generating logs. Logs are evenly generated on the [`-start` ... `-end`] interval.
* `-activeStreams` - the number of active [log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) to generate.
* `-logsPerStream` - the number of log entries to generate per each log stream. Log entries are evenly distributed on the [`-start` ... `-end`] interval.
The total number of generated logs can be calculated as `-activeStreams` * `-logsPerStream`.
For example, the following command generates `1_000_000` log entries on the time range `[2024-01-01 - 2024-02-01]` across `100`
[log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields), where every logs stream contains `10_000` log entries,
and writes them to `http://localhost:9428/insert/jsonline`:
```
bin/vlogsgenerator \
-start=2024-01-01 -end=2024-02-01 \
-activeStreams=100 \
-logsPerStream=10_000 \
-addr=http://localhost:9428/insert/jsonline
```
### Churn rate
It is possible to generate churn rate for active [log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields)
by specifying `-totalStreams` command-line flag bigger than `-activeStreams`. For example, the following command generates
logs for `1000` total streams, while the number of active streams equals to `100`. This means that at every time there are logs for `100` streams,
but these streams change over the given [`-start` ... `-end`] time range, so the total number of streams on the given time range becomes `1000`:
```
bin/vlogsgenerator \
-start=2024-01-01 -end=2024-02-01 \
-activeStreams=100 \
-totalStreams=1_000 \
-logsPerStream=10_000 \
-addr=http://localhost:9428/insert/jsonline
```
In this case the total number of generated logs equals to `-totalStreams` * `-logsPerStream` = `10_000_000`.
### Benchmark tuning
By default `vlogsgenerator` generates and writes logs by a single worker. This may limit the maximum data ingestion rate during benchmarks.
The number of workers can be changed via `-workers` command-line flag. For example, the following command generates and writes logs with `16` workers:
```
bin/vlogsgenerator \
-start=2024-01-01 -end=2024-02-01 \
-activeStreams=100 \
-logsPerStream=10_000 \
-addr=http://localhost:9428/insert/jsonline \
-workers=16
```
### Output statistics
Every 10 seconds `vlogsgenerator` writes statistics about the generated logs into `stderr`. The frequency of the generated statistics can be adjusted via `-statInterval` command-line flag.
For example, the following command writes statistics every 2 seconds:
```
bin/vlogsgenerator \
-start=2024-01-01 -end=2024-02-01 \
-activeStreams=100 \
-logsPerStream=10_000 \
-addr=http://localhost:9428/insert/jsonline \
-statInterval=2s
```

339
app/vlogsgenerator/main.go Normal file
View File

@@ -0,0 +1,339 @@
package main
import (
"bufio"
"flag"
"fmt"
"io"
"math"
"math/rand"
"net/http"
"net/url"
"os"
"strconv"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
var (
addr = flag.String("addr", "stdout", "HTTP address to push the generated logs to; if it is set to stdout, then logs are generated to stdout")
workers = flag.Int("workers", 1, "The number of workers to use to push logs to -addr")
start = newTimeFlag("start", "-1d", "Generated logs start from this time; see https://docs.victoriametrics.com/#timestamp-formats")
end = newTimeFlag("end", "0s", "Generated logs end at this time; see https://docs.victoriametrics.com/#timestamp-formats")
activeStreams = flag.Int("activeStreams", 100, "The number of active log streams to generate; see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields")
totalStreams = flag.Int("totalStreams", 0, "The number of total log streams; if -totalStreams > -activeStreams, then some active streams are substituted with new streams "+
"during data generation")
logsPerStream = flag.Int64("logsPerStream", 1_000, "The number of log entries to generate per each log stream. Log entries are evenly distributed between -start and -end")
constFieldsPerLog = flag.Int("constFieldsPerLog", 3, "The number of fields with constaint values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
varFieldsPerLog = flag.Int("varFieldsPerLog", 1, "The number of fields with variable values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
dictFieldsPerLog = flag.Int("dictFieldsPerLog", 2, "The number of fields with up to 8 different values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
u8FieldsPerLog = flag.Int("u8FieldsPerLog", 1, "The number of fields with uint8 values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
u16FieldsPerLog = flag.Int("u16FieldsPerLog", 1, "The number of fields with uint16 values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
u32FieldsPerLog = flag.Int("u32FieldsPerLog", 1, "The number of fields with uint32 values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
u64FieldsPerLog = flag.Int("u64FieldsPerLog", 1, "The number of fields with uint64 values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
floatFieldsPerLog = flag.Int("floatFieldsPerLog", 1, "The number of fields with float64 values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
ipFieldsPerLog = flag.Int("ipFieldsPerLog", 1, "The number of fields with IPv4 values to generate per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
timestampFieldsPerLog = flag.Int("timestampFieldsPerLog", 1, "The number of fields with ISO8601 timestamps per each log entry; "+
"see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model")
statInterval = flag.Duration("statInterval", 10*time.Second, "The interval between publishing the stats")
)
func main() {
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
envflag.Parse()
buildinfo.Init()
logger.Init()
var remoteWriteURL *url.URL
if *addr != "stdout" {
urlParsed, err := url.Parse(*addr)
if err != nil {
logger.Fatalf("cannot parse -addr=%q: %s", *addr, err)
}
qs, err := url.ParseQuery(urlParsed.RawQuery)
if err != nil {
logger.Fatalf("cannot parse query string in -addr=%q: %w", *addr, err)
}
qs.Set("_stream_fields", "host,worker_id")
urlParsed.RawQuery = qs.Encode()
remoteWriteURL = urlParsed
}
if start.nsec >= end.nsec {
logger.Fatalf("-start=%s must be smaller than -end=%s", start, end)
}
if *activeStreams <= 0 {
logger.Fatalf("-activeStreams must be bigger than 0; got %d", *activeStreams)
}
if *logsPerStream <= 0 {
logger.Fatalf("-logsPerStream must be bigger than 0; got %d", *logsPerStream)
}
if *totalStreams < *activeStreams {
*totalStreams = *activeStreams
}
cfg := &workerConfig{
url: remoteWriteURL,
activeStreams: *activeStreams,
totalStreams: *totalStreams,
}
// divide total and active streams among workers
if *workers <= 0 {
logger.Fatalf("-workers must be bigger than 0; got %d", *workers)
}
if *workers > *activeStreams {
logger.Fatalf("-workers=%d cannot exceed -activeStreams=%d", *workers, *activeStreams)
}
cfg.activeStreams /= *workers
cfg.totalStreams /= *workers
logger.Infof("start -workers=%d workers for ingesting -logsPerStream=%d log entries per each -totalStreams=%d (-activeStreams=%d) on a time range -start=%s, -end=%s to -addr=%s",
*workers, *logsPerStream, *totalStreams, *activeStreams, toRFC3339(start.nsec), toRFC3339(end.nsec), *addr)
startTime := time.Now()
var wg sync.WaitGroup
for i := 0; i < *workers; i++ {
wg.Add(1)
go func(workerID int) {
defer wg.Done()
generateAndPushLogs(cfg, workerID)
}(i)
}
go func() {
prevEntries := uint64(0)
prevBytes := uint64(0)
ticker := time.NewTicker(*statInterval)
for range ticker.C {
currEntries := logEntriesCount.Load()
deltaEntries := currEntries - prevEntries
rateEntries := float64(deltaEntries) / statInterval.Seconds()
currBytes := bytesGenerated.Load()
deltaBytes := currBytes - prevBytes
rateBytes := float64(deltaBytes) / statInterval.Seconds()
logger.Infof("generated %dK log entries (%dK total) at %.0fK entries/sec, %dMB (%dMB total) at %.0fMB/sec",
deltaEntries/1e3, currEntries/1e3, rateEntries/1e3, deltaBytes/1e6, currBytes/1e6, rateBytes/1e6)
prevEntries = currEntries
prevBytes = currBytes
}
}()
wg.Wait()
dSecs := time.Since(startTime).Seconds()
currEntries := logEntriesCount.Load()
currBytes := bytesGenerated.Load()
rateEntries := float64(currEntries) / dSecs
rateBytes := float64(currBytes) / dSecs
logger.Infof("ingested %dK log entries (%dMB) in %.3f seconds; avg ingestion rate: %.0fK entries/sec, %.0fMB/sec", currEntries/1e3, currBytes/1e6, dSecs, rateEntries/1e3, rateBytes/1e6)
}
var logEntriesCount atomic.Uint64
var bytesGenerated atomic.Uint64
type workerConfig struct {
url *url.URL
activeStreams int
totalStreams int
}
type statWriter struct {
w io.Writer
}
func (sw *statWriter) Write(p []byte) (int, error) {
bytesGenerated.Add(uint64(len(p)))
return sw.w.Write(p)
}
func generateAndPushLogs(cfg *workerConfig, workerID int) {
pr, pw := io.Pipe()
sw := &statWriter{
w: pw,
}
bw := bufio.NewWriter(sw)
doneCh := make(chan struct{})
go func() {
generateLogs(bw, workerID, cfg.activeStreams, cfg.totalStreams)
_ = bw.Flush()
_ = pw.Close()
close(doneCh)
}()
if cfg.url == nil {
_, err := io.Copy(os.Stdout, pr)
if err != nil {
logger.Fatalf("unexpected error when writing logs to stdout: %s", err)
}
return
}
req, err := http.NewRequest("POST", cfg.url.String(), pr)
if err != nil {
logger.Fatalf("cannot create request to %q: %s", cfg.url, err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
logger.Fatalf("cannot perform request to %q: %s", cfg.url, err)
}
if resp.StatusCode/100 != 2 {
logger.Fatalf("unexpected status code got from %q: %d; want 2xx", cfg.url, err)
}
// Wait until all the generateLogs goroutine is finished.
<-doneCh
}
func generateLogs(bw *bufio.Writer, workerID, activeStreams, totalStreams int) {
streamLifetime := int64(float64(end.nsec-start.nsec) * (float64(activeStreams) / float64(totalStreams)))
streamStep := int64(float64(end.nsec-start.nsec) / float64(totalStreams-activeStreams+1))
step := streamLifetime / (*logsPerStream - 1)
currNsec := start.nsec
for currNsec < end.nsec {
firstStreamID := int((currNsec - start.nsec) / streamStep)
generateLogsAtTimestamp(bw, workerID, currNsec, firstStreamID, activeStreams)
currNsec += step
}
}
var runID = toUUID(rand.Uint64(), rand.Uint64())
func generateLogsAtTimestamp(bw *bufio.Writer, workerID int, ts int64, firstStreamID, activeStreams int) {
streamID := firstStreamID
timeStr := toRFC3339(ts)
for i := 0; i < activeStreams; i++ {
ip := toIPv4(rand.Uint32())
uuid := toUUID(rand.Uint64(), rand.Uint64())
fmt.Fprintf(bw, `{"_time":%q,"_msg":"message for the stream %d and worker %d; ip=%s; uuid=%s; u64=%d","host":"host_%d","worker_id":"%d"`,
timeStr, streamID, workerID, ip, uuid, rand.Uint64(), streamID, workerID)
fmt.Fprintf(bw, `,"run_id":"%s"`, runID)
for j := 0; j < *constFieldsPerLog; j++ {
fmt.Fprintf(bw, `,"const_%d":"some value %d %d"`, j, j, streamID)
}
for j := 0; j < *varFieldsPerLog; j++ {
fmt.Fprintf(bw, `,"var_%d":"some value %d %d"`, j, j, rand.Uint64())
}
for j := 0; j < *dictFieldsPerLog; j++ {
fmt.Fprintf(bw, `,"dict_%d":"%s"`, j, dictValues[rand.Intn(len(dictValues))])
}
for j := 0; j < *u8FieldsPerLog; j++ {
fmt.Fprintf(bw, `,"u8_%d":"%d"`, j, uint8(rand.Uint32()))
}
for j := 0; j < *u16FieldsPerLog; j++ {
fmt.Fprintf(bw, `,"u16_%d":"%d"`, j, uint16(rand.Uint32()))
}
for j := 0; j < *u32FieldsPerLog; j++ {
fmt.Fprintf(bw, `,"u32_%d":"%d"`, j, rand.Uint32())
}
for j := 0; j < *u64FieldsPerLog; j++ {
fmt.Fprintf(bw, `,"u64_%d":"%d"`, j, rand.Uint64())
}
for j := 0; j < *floatFieldsPerLog; j++ {
fmt.Fprintf(bw, `,"float_%d":"%v"`, j, math.Round(10_000*rand.Float64())/1000)
}
for j := 0; j < *ipFieldsPerLog; j++ {
ip := toIPv4(rand.Uint32())
fmt.Fprintf(bw, `,"ip_%d":"%s"`, j, ip)
}
for j := 0; j < *timestampFieldsPerLog; j++ {
timestamp := toISO8601(int64(rand.Uint64()))
fmt.Fprintf(bw, `,"timestamp_%d":"%s"`, j, timestamp)
}
fmt.Fprintf(bw, "}\n")
logEntriesCount.Add(1)
streamID++
}
}
var dictValues = []string{
"debug",
"info",
"warn",
"error",
"fatal",
"ERROR",
"FATAL",
"INFO",
}
func newTimeFlag(name, defaultValue, description string) *timeFlag {
var tf timeFlag
if err := tf.Set(defaultValue); err != nil {
logger.Panicf("invalid defaultValue=%q for flag %q: %w", defaultValue, name, err)
}
flag.Var(&tf, name, description)
return &tf
}
type timeFlag struct {
s string
nsec int64
}
func (tf *timeFlag) Set(s string) error {
msec, err := promutils.ParseTimeMsec(s)
if err != nil {
return fmt.Errorf("cannot parse time from %q: %w", s, err)
}
tf.s = s
tf.nsec = msec * 1e6
return nil
}
func (tf *timeFlag) String() string {
return tf.s
}
func toRFC3339(nsec int64) string {
return time.Unix(0, nsec).UTC().Format(time.RFC3339Nano)
}
func toISO8601(nsec int64) string {
return time.Unix(0, nsec).UTC().Format("2006-01-02T15:04:05.000Z")
}
func toIPv4(n uint32) string {
dst := make([]byte, 0, len("255.255.255.255"))
dst = marshalUint64(dst, uint64(n>>24))
dst = append(dst, '.')
dst = marshalUint64(dst, uint64((n>>16)&0xff))
dst = append(dst, '.')
dst = marshalUint64(dst, uint64((n>>8)&0xff))
dst = append(dst, '.')
dst = marshalUint64(dst, uint64(n&0xff))
return string(dst)
}
func toUUID(a, b uint64) string {
return fmt.Sprintf("%08x-%04x-%04x-%04x-%012x", a&(1<<32-1), (a>>32)&(1<<16-1), (a >> 48), b&(1<<16-1), b>>16)
}
// marshalUint64 appends string representation of n to dst and returns the result.
func marshalUint64(dst []byte, n uint64) []byte {
return strconv.AppendUint(dst, n, 10)
}

View File

@@ -0,0 +1,47 @@
package logsql
import (
"bufio"
"io"
"sync"
)
func getBufferedWriter(w io.Writer) *bufferedWriter {
v := bufferedWriterPool.Get()
if v == nil {
return &bufferedWriter{
bw: bufio.NewWriter(w),
}
}
bw := v.(*bufferedWriter)
bw.bw.Reset(w)
return bw
}
func putBufferedWriter(bw *bufferedWriter) {
bw.reset()
bufferedWriterPool.Put(bw)
}
var bufferedWriterPool sync.Pool
type bufferedWriter struct {
mu sync.Mutex
bw *bufio.Writer
}
func (bw *bufferedWriter) reset() {
// nothing to do
}
func (bw *bufferedWriter) WriteIgnoreErrors(p []byte) {
bw.mu.Lock()
_, _ = bw.bw.Write(p)
bw.mu.Unlock()
}
func (bw *bufferedWriter) FlushIgnoreErrors() {
bw.mu.Lock()
_ = bw.bw.Flush()
bw.mu.Unlock()
}

View File

@@ -1,23 +1,22 @@
package logsql
import (
"context"
"fmt"
"math"
"net/http"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
var (
maxSortBufferSize = flagutil.NewBytes("select.maxSortBufferSize", 1024*1024, "Query results from /select/logsql/query are automatically sorted by _time "+
"if their summary size doesn't exceed this value; otherwise, query results are streamed in the response without sorting; "+
"too big value for this flag may result in high memory usage since the sorting is performed in memory")
)
// ProcessQueryRequest handles /select/logsql/query request
func ProcessQueryRequest(w http.ResponseWriter, r *http.Request, stopCh <-chan struct{}) {
// ProcessQueryRequest handles /select/logsql/query request.
func ProcessQueryRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) {
// Extract tenantID
tenantID, err := logstorage.GetTenantIDFromRequest(r)
if err != nil {
@@ -25,32 +24,86 @@ func ProcessQueryRequest(w http.ResponseWriter, r *http.Request, stopCh <-chan s
return
}
// Parse query
qStr := r.FormValue("query")
q, err := logstorage.ParseQuery(qStr)
if err != nil {
httpserver.Errorf(w, r, "cannot parse query [%s]: %s", qStr, err)
return
}
w.Header().Set("Content-Type", "application/stream+json; charset=utf-8")
sw := getSortWriter()
sw.Init(w, maxSortBufferSize.IntN())
// Parse optional start and end args
start, okStart, err := getTimeNsec(r, "start")
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
end, okEnd, err := getTimeNsec(r, "end")
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
if okStart || okEnd {
if !okStart {
start = math.MinInt64
}
if !okEnd {
end = math.MaxInt64
}
q.AddTimeFilter(start, end)
}
// Parse limit query arg
limit, err := httputils.GetInt(r, "limit")
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
if limit > 0 {
q.AddPipeLimit(uint64(limit))
}
q.Optimize()
tenantIDs := []logstorage.TenantID{tenantID}
vlstorage.RunQuery(tenantIDs, q, stopCh, func(columns []logstorage.BlockColumn) {
bw := getBufferedWriter(w)
writeBlock := func(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
if len(columns) == 0 {
return
}
rowsCount := len(columns[0].Values)
bb := blockResultPool.Get()
for rowIdx := 0; rowIdx < rowsCount; rowIdx++ {
WriteJSONRow(bb, columns, rowIdx)
for i := range timestamps {
WriteJSONRow(bb, columns, i)
}
sw.MustWrite(bb.B)
bw.WriteIgnoreErrors(bb.B)
blockResultPool.Put(bb)
})
sw.FinalFlush()
putSortWriter(sw)
}
w.Header().Set("Content-Type", "application/stream+json; charset=utf-8")
err = vlstorage.RunQuery(ctx, tenantIDs, q, writeBlock)
bw.FlushIgnoreErrors()
putBufferedWriter(bw)
if err != nil {
httpserver.Errorf(w, r, "cannot execute query [%s]: %s", qStr, err)
}
}
var blockResultPool bytesutil.ByteBufferPool
func getTimeNsec(r *http.Request, argName string) (int64, bool, error) {
s := r.FormValue(argName)
if s == "" {
return 0, false, nil
}
currentTimestamp := float64(time.Now().UnixNano()) / 1e9
secs, err := promutils.ParseTimeAt(s, currentTimestamp)
if err != nil {
return 0, false, fmt.Errorf("cannot parse %s=%s: %w", argName, s, err)
}
return int64(secs * 1e9), true, nil
}

View File

@@ -1,225 +0,0 @@
package logsql
import (
"bytes"
"io"
"sort"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logjson"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
)
func getSortWriter() *sortWriter {
v := sortWriterPool.Get()
if v == nil {
return &sortWriter{}
}
return v.(*sortWriter)
}
func putSortWriter(sw *sortWriter) {
sw.reset()
sortWriterPool.Put(sw)
}
var sortWriterPool sync.Pool
// sortWriter expects JSON line stream to be written to it.
//
// It buffers the incoming data until its size reaches maxBufLen.
// Then it streams the buffered data and all the incoming data to w.
//
// The FinalFlush() must be called when all the data is written.
// If the buf isn't empty at FinalFlush() call, then the buffered data
// is sorted by _time field.
type sortWriter struct {
mu sync.Mutex
w io.Writer
maxBufLen int
buf []byte
bufFlushed bool
hasErr bool
}
func (sw *sortWriter) reset() {
sw.w = nil
sw.maxBufLen = 0
sw.buf = sw.buf[:0]
sw.bufFlushed = false
sw.hasErr = false
}
func (sw *sortWriter) Init(w io.Writer, maxBufLen int) {
sw.reset()
sw.w = w
sw.maxBufLen = maxBufLen
}
func (sw *sortWriter) MustWrite(p []byte) {
sw.mu.Lock()
defer sw.mu.Unlock()
if sw.hasErr {
return
}
if sw.bufFlushed {
if _, err := sw.w.Write(p); err != nil {
sw.hasErr = true
}
return
}
if len(sw.buf)+len(p) < sw.maxBufLen {
sw.buf = append(sw.buf, p...)
return
}
sw.bufFlushed = true
if len(sw.buf) > 0 {
if _, err := sw.w.Write(sw.buf); err != nil {
sw.hasErr = true
return
}
sw.buf = sw.buf[:0]
}
if _, err := sw.w.Write(p); err != nil {
sw.hasErr = true
}
}
func (sw *sortWriter) FinalFlush() {
if sw.hasErr || sw.bufFlushed {
return
}
rs := getRowsSorter()
rs.parseRows(sw.buf)
rs.sort()
WriteJSONRows(sw.w, rs.rows)
putRowsSorter(rs)
}
func getRowsSorter() *rowsSorter {
v := rowsSorterPool.Get()
if v == nil {
return &rowsSorter{}
}
return v.(*rowsSorter)
}
func putRowsSorter(rs *rowsSorter) {
rs.reset()
rowsSorterPool.Put(rs)
}
var rowsSorterPool sync.Pool
type rowsSorter struct {
buf []byte
fieldsBuf []logstorage.Field
rows [][]logstorage.Field
times []string
}
func (rs *rowsSorter) reset() {
rs.buf = rs.buf[:0]
fieldsBuf := rs.fieldsBuf
for i := range fieldsBuf {
fieldsBuf[i].Reset()
}
rs.fieldsBuf = fieldsBuf[:0]
rows := rs.rows
for i := range rows {
rows[i] = nil
}
rs.rows = rows[:0]
times := rs.times
for i := range times {
times[i] = ""
}
rs.times = times[:0]
}
func (rs *rowsSorter) parseRows(src []byte) {
rs.reset()
buf := rs.buf
fieldsBuf := rs.fieldsBuf
rows := rs.rows
times := rs.times
p := logjson.GetParser()
for len(src) > 0 {
var line []byte
n := bytes.IndexByte(src, '\n')
if n < 0 {
line = src
src = nil
} else {
line = src[:n]
src = src[n+1:]
}
if len(line) == 0 {
continue
}
if err := p.ParseLogMessage(line); err != nil {
logger.Panicf("BUG: unexpected invalid JSON line: %s", err)
}
timeValue := ""
fieldsBufLen := len(fieldsBuf)
for _, f := range p.Fields {
bufLen := len(buf)
buf = append(buf, f.Name...)
name := bytesutil.ToUnsafeString(buf[bufLen:])
bufLen = len(buf)
buf = append(buf, f.Value...)
value := bytesutil.ToUnsafeString(buf[bufLen:])
fieldsBuf = append(fieldsBuf, logstorage.Field{
Name: name,
Value: value,
})
if name == "_time" {
timeValue = value
}
}
rows = append(rows, fieldsBuf[fieldsBufLen:])
times = append(times, timeValue)
}
logjson.PutParser(p)
rs.buf = buf
rs.fieldsBuf = fieldsBuf
rs.rows = rows
rs.times = times
}
func (rs *rowsSorter) Len() int {
return len(rs.rows)
}
func (rs *rowsSorter) Less(i, j int) bool {
times := rs.times
return times[i] < times[j]
}
func (rs *rowsSorter) Swap(i, j int) {
times := rs.times
rows := rs.rows
times[i], times[j] = times[j], times[i]
rows[i], rows[j] = rows[j], rows[i]
}
func (rs *rowsSorter) sort() {
sort.Sort(rs)
}

View File

@@ -1,39 +0,0 @@
package logsql
import (
"bytes"
"strings"
"testing"
)
func TestSortWriter(t *testing.T) {
f := func(maxBufLen int, data string, expectedResult string) {
t.Helper()
var bb bytes.Buffer
sw := getSortWriter()
sw.Init(&bb, maxBufLen)
for _, s := range strings.Split(data, "\n") {
sw.MustWrite([]byte(s + "\n"))
}
sw.FinalFlush()
putSortWriter(sw)
result := bb.String()
if result != expectedResult {
t.Fatalf("unexpected result;\ngot\n%s\nwant\n%s", result, expectedResult)
}
}
f(100, "", "")
f(100, "{}", "{}\n")
data := `{"_time":"def","_msg":"xxx"}
{"_time":"abc","_msg":"foo"}`
resultExpected := `{"_time":"abc","_msg":"foo"}
{"_time":"def","_msg":"xxx"}
`
f(100, data, resultExpected)
f(10, data, data+"\n")
}

View File

@@ -101,7 +101,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
// Limit the number of concurrent queries, which can consume big amounts of CPU.
startTime := time.Now()
stopCh := r.Context().Done()
ctx := r.Context()
stopCh := ctx.Done()
select {
case concurrencyLimitCh <- struct{}{}:
defer func() { <-concurrencyLimitCh }()
@@ -143,7 +144,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case path == "/logsql/query":
logsqlQueryRequests.Inc()
httpserver.EnableCORS(w, r)
logsql.ProcessQueryRequest(w, r, stopCh)
logsql.ProcessQueryRequest(ctx, w, r)
return true
default:
return false

View File

@@ -1,13 +1,13 @@
{
"files": {
"main.css": "./static/css/main.d1313636.css",
"main.js": "./static/js/main.1919fefe.js",
"static/js/522.da77e7b3.chunk.js": "./static/js/522.da77e7b3.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.8644fd7c964802dd34a9.md",
"main.css": "./static/css/main.bc07cc78.css",
"main.js": "./static/js/main.8e7757ef.js",
"static/js/685.bebe1265.chunk.js": "./static/js/685.bebe1265.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.da86c2db4f0b05e286b0.md",
"index.html": "./index.html"
},
"entrypoints": [
"static/css/main.d1313636.css",
"static/js/main.1919fefe.js"
"static/css/main.bc07cc78.css",
"static/js/main.8e7757ef.js"
]
}

View File

@@ -1 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.1919fefe.js"></script><link href="./static/css/main.d1313636.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.8e7757ef.js"></script><link href="./static/css/main.bc07cc78.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -4,10 +4,8 @@
http://jedwatson.github.io/classnames
*/
/*! regenerator-runtime -- Copyright (c) 2014-present, Facebook, Inc. -- license (MIT): https://github.com/facebook/regenerator/blob/main/LICENSE */
/**
* @remix-run/router v1.10.0
* @remix-run/router v1.15.1
*
* Copyright (c) Remix Software Inc.
*
@@ -18,7 +16,7 @@
*/
/**
* React Router DOM v6.17.0
* React Router DOM v6.22.1
*
* Copyright (c) Remix Software Inc.
*
@@ -29,7 +27,7 @@
*/
/**
* React Router v6.17.0
* React Router v6.22.1
*
* Copyright (c) Remix Software Inc.
*

View File

@@ -1,10 +1,11 @@
package vlstorage
import (
"context"
"flag"
"fmt"
"io"
"net/http"
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
@@ -61,10 +62,16 @@ func Init() {
var ss logstorage.StorageStats
strg.UpdateStats(&ss)
logger.Infof("successfully opened storage in %.3f seconds; partsCount: %d; blocksCount: %d; rowsCount: %d; sizeBytes: %d",
time.Since(startTime).Seconds(), ss.FileParts, ss.FileBlocks, ss.FileRowsCount, ss.CompressedFileSize)
storageMetrics = initStorageMetrics(strg)
logger.Infof("successfully opened storage in %.3f seconds; smallParts: %d; bigParts: %d; smallPartBlocks: %d; bigPartBlocks: %d; smallPartRows: %d; bigPartRows: %d; "+
"smallPartSize: %d bytes; bigPartSize: %d bytes",
time.Since(startTime).Seconds(), ss.SmallParts, ss.BigParts, ss.SmallPartBlocks, ss.BigPartBlocks, ss.SmallPartRowsCount, ss.BigPartRowsCount,
ss.CompressedSmallPartSize, ss.CompressedBigPartSize)
// register storage metrics
storageMetrics = metrics.NewSet()
storageMetrics.RegisterMetricsWriter(func(w io.Writer) {
writeStorageMetrics(w, strg)
})
metrics.RegisterSet(storageMetrics)
}
@@ -99,117 +106,61 @@ func MustAddRows(lr *logstorage.LogRows) {
strg.MustAddRows(lr)
}
// RunQuery runs the given q and calls processBlock for the returned data blocks
func RunQuery(tenantIDs []logstorage.TenantID, q *logstorage.Query, stopCh <-chan struct{}, processBlock func(columns []logstorage.BlockColumn)) {
strg.RunQuery(tenantIDs, q, stopCh, processBlock)
// RunQuery runs the given q and calls writeBlock for the returned data blocks
func RunQuery(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, writeBlock func(workerID uint, timestamps []int64, columns []logstorage.BlockColumn)) error {
return strg.RunQuery(ctx, tenantIDs, q, writeBlock)
}
func initStorageMetrics(strg *logstorage.Storage) *metrics.Set {
ssCache := &logstorage.StorageStats{}
var ssCacheLock sync.Mutex
var lastUpdateTime time.Time
func writeStorageMetrics(w io.Writer, strg *logstorage.Storage) {
var ss logstorage.StorageStats
strg.UpdateStats(&ss)
m := func() *logstorage.StorageStats {
ssCacheLock.Lock()
defer ssCacheLock.Unlock()
if time.Since(lastUpdateTime) < time.Second {
return ssCache
}
var ss logstorage.StorageStats
strg.UpdateStats(&ss)
ssCache = &ss
lastUpdateTime = time.Now()
return ssCache
metrics.WriteGaugeUint64(w, fmt.Sprintf(`vl_free_disk_space_bytes{path=%q}`, *storageDataPath), fs.MustGetFreeSpace(*storageDataPath))
isReadOnly := uint64(0)
if ss.IsReadOnly {
isReadOnly = 1
}
metrics.WriteGaugeUint64(w, fmt.Sprintf(`vl_storage_is_read_only{path=%q}`, *storageDataPath), isReadOnly)
ms := metrics.NewSet()
metrics.WriteGaugeUint64(w, `vl_active_merges{type="storage/inmemory"}`, ss.InmemoryActiveMerges)
metrics.WriteGaugeUint64(w, `vl_active_merges{type="storage/small"}`, ss.SmallPartActiveMerges)
metrics.WriteGaugeUint64(w, `vl_active_merges{type="storage/big"}`, ss.BigPartActiveMerges)
ms.NewGauge(fmt.Sprintf(`vl_free_disk_space_bytes{path=%q}`, *storageDataPath), func() float64 {
return float64(fs.MustGetFreeSpace(*storageDataPath))
})
ms.NewGauge(fmt.Sprintf(`vl_storage_is_read_only{path=%q}`, *storageDataPath), func() float64 {
if m().IsReadOnly {
return 1
}
return 0
})
metrics.WriteCounterUint64(w, `vl_merges_total{type="storage/inmemory"}`, ss.InmemoryMergesTotal)
metrics.WriteCounterUint64(w, `vl_merges_total{type="storage/small"}`, ss.SmallPartMergesTotal)
metrics.WriteCounterUint64(w, `vl_merges_total{type="storage/big"}`, ss.BigPartMergesTotal)
ms.NewGauge(`vl_active_merges{type="inmemory"}`, func() float64 {
return float64(m().InmemoryActiveMerges)
})
ms.NewGauge(`vl_merges_total{type="inmemory"}`, func() float64 {
return float64(m().InmemoryMergesTotal)
})
ms.NewGauge(`vl_active_merges{type="file"}`, func() float64 {
return float64(m().FileActiveMerges)
})
ms.NewGauge(`vl_merges_total{type="file"}`, func() float64 {
return float64(m().FileMergesTotal)
})
metrics.WriteGaugeUint64(w, `vl_storage_rows{type="storage/inmemory"}`, ss.InmemoryRowsCount)
metrics.WriteGaugeUint64(w, `vl_storage_rows{type="storage/small"}`, ss.SmallPartRowsCount)
metrics.WriteGaugeUint64(w, `vl_storage_rows{type="storage/big"}`, ss.BigPartRowsCount)
ms.NewGauge(`vl_storage_rows{type="inmemory"}`, func() float64 {
return float64(m().InmemoryRowsCount)
})
ms.NewGauge(`vl_storage_rows{type="file"}`, func() float64 {
return float64(m().FileRowsCount)
})
ms.NewGauge(`vl_storage_parts{type="inmemory"}`, func() float64 {
return float64(m().InmemoryParts)
})
ms.NewGauge(`vl_storage_parts{type="file"}`, func() float64 {
return float64(m().FileParts)
})
ms.NewGauge(`vl_storage_blocks{type="inmemory"}`, func() float64 {
return float64(m().InmemoryBlocks)
})
ms.NewGauge(`vl_storage_blocks{type="file"}`, func() float64 {
return float64(m().FileBlocks)
})
metrics.WriteGaugeUint64(w, `vl_storage_parts{type="storage/inmemory"}`, ss.InmemoryParts)
metrics.WriteGaugeUint64(w, `vl_storage_parts{type="storage/small"}`, ss.SmallParts)
metrics.WriteGaugeUint64(w, `vl_storage_parts{type="storage/big"}`, ss.BigParts)
ms.NewGauge(`vl_partitions`, func() float64 {
return float64(m().PartitionsCount)
})
ms.NewGauge(`vl_streams_created_total`, func() float64 {
return float64(m().StreamsCreatedTotal)
})
metrics.WriteGaugeUint64(w, `vl_storage_blocks{type="storage/inmemory"}`, ss.InmemoryBlocks)
metrics.WriteGaugeUint64(w, `vl_storage_blocks{type="storage/small"}`, ss.SmallPartBlocks)
metrics.WriteGaugeUint64(w, `vl_storage_blocks{type="storage/big"}`, ss.BigPartBlocks)
ms.NewGauge(`vl_indexdb_rows`, func() float64 {
return float64(m().IndexdbItemsCount)
})
ms.NewGauge(`vl_indexdb_parts`, func() float64 {
return float64(m().IndexdbPartsCount)
})
ms.NewGauge(`vl_indexdb_blocks`, func() float64 {
return float64(m().IndexdbBlocksCount)
})
metrics.WriteGaugeUint64(w, `vl_partitions`, ss.PartitionsCount)
metrics.WriteCounterUint64(w, `vl_streams_created_total`, ss.StreamsCreatedTotal)
ms.NewGauge(`vl_data_size_bytes{type="indexdb"}`, func() float64 {
return float64(m().IndexdbSizeBytes)
})
ms.NewGauge(`vl_data_size_bytes{type="storage"}`, func() float64 {
dm := m()
return float64(dm.CompressedInmemorySize + dm.CompressedFileSize)
})
metrics.WriteGaugeUint64(w, `vl_indexdb_rows`, ss.IndexdbItemsCount)
metrics.WriteGaugeUint64(w, `vl_indexdb_parts`, ss.IndexdbPartsCount)
metrics.WriteGaugeUint64(w, `vl_indexdb_blocks`, ss.IndexdbBlocksCount)
ms.NewGauge(`vl_compressed_data_size_bytes{type="inmemory"}`, func() float64 {
return float64(m().CompressedInmemorySize)
})
ms.NewGauge(`vl_compressed_data_size_bytes{type="file"}`, func() float64 {
return float64(m().CompressedFileSize)
})
ms.NewGauge(`vl_uncompressed_data_size_bytes{type="inmemory"}`, func() float64 {
return float64(m().UncompressedInmemorySize)
})
ms.NewGauge(`vl_uncompressed_data_size_bytes{type="file"}`, func() float64 {
return float64(m().UncompressedFileSize)
})
metrics.WriteGaugeUint64(w, `vl_data_size_bytes{type="indexdb"}`, ss.IndexdbSizeBytes)
metrics.WriteGaugeUint64(w, `vl_data_size_bytes{type="storage"}`, ss.CompressedInmemorySize+ss.CompressedSmallPartSize+ss.CompressedBigPartSize)
ms.NewGauge(`vl_rows_dropped_total{reason="too_big_timestamp"}`, func() float64 {
return float64(m().RowsDroppedTooBigTimestamp)
})
ms.NewGauge(`vl_rows_dropped_total{reason="too_small_timestamp"}`, func() float64 {
return float64(m().RowsDroppedTooSmallTimestamp)
})
metrics.WriteGaugeUint64(w, `vl_compressed_data_size_bytes{type="storage/inmemory"}`, ss.CompressedInmemorySize)
metrics.WriteGaugeUint64(w, `vl_compressed_data_size_bytes{type="storage/small"}`, ss.CompressedSmallPartSize)
metrics.WriteGaugeUint64(w, `vl_compressed_data_size_bytes{type="storage/big"}`, ss.CompressedBigPartSize)
return ms
metrics.WriteGaugeUint64(w, `vl_uncompressed_data_size_bytes{type="storage/inmemory"}`, ss.UncompressedInmemorySize)
metrics.WriteGaugeUint64(w, `vl_uncompressed_data_size_bytes{type="storage/small"}`, ss.UncompressedSmallPartSize)
metrics.WriteGaugeUint64(w, `vl_uncompressed_data_size_bytes{type="storage/big"}`, ss.UncompressedBigPartSize)
metrics.WriteCounterUint64(w, `vl_rows_dropped_total{reason="too_big_timestamp"}`, ss.RowsDroppedTooBigTimestamp)
metrics.WriteCounterUint64(w, `vl_rows_dropped_total{reason="too_small_timestamp"}`, ss.RowsDroppedTooSmallTimestamp)
}

View File

@@ -88,6 +88,9 @@ vmagent-linux-ppc64le:
vmagent-linux-s390x:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmagent-linux-loong64:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=loong64 $(MAKE) app-local-goos-goarch
vmagent-linux-386:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -1,3 +1,3 @@
See vmagent docs [here](https://docs.victoriametrics.com/vmagent.html).
See vmagent docs [here](https://docs.victoriametrics.com/vmagent/).
vmagent docs can be edited at [docs/vmagent.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmagent.md).
vmagent docs can be edited at [docs/vmagent.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmagent.md).

View File

@@ -3,13 +3,15 @@ package common
import (
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
)
// PushCtx is a context used for populating WriteRequest.
type PushCtx struct {
// WriteRequest contains the WriteRequest, which must be pushed later to remote storage.
//
// The actual labels and samples for the time series are stored in Labels and Samples fields.
WriteRequest prompbmarshal.WriteRequest
// Labels contains flat list of all the labels used in WriteRequest.
@@ -21,13 +23,7 @@ type PushCtx struct {
// Reset resets ctx.
func (ctx *PushCtx) Reset() {
tss := ctx.WriteRequest.Timeseries
for i := range tss {
ts := &tss[i]
ts.Labels = nil
ts.Samples = nil
}
ctx.WriteRequest.Timeseries = ctx.WriteRequest.Timeseries[:0]
ctx.WriteRequest.Reset()
promrelabel.CleanLabels(ctx.Labels)
ctx.Labels = ctx.Labels[:0]
@@ -39,15 +35,10 @@ func (ctx *PushCtx) Reset() {
//
// Call PutPushCtx when the ctx is no longer needed.
func GetPushCtx() *PushCtx {
select {
case ctx := <-pushCtxPoolCh:
return ctx
default:
if v := pushCtxPool.Get(); v != nil {
return v.(*PushCtx)
}
return &PushCtx{}
if v := pushCtxPool.Get(); v != nil {
return v.(*PushCtx)
}
return &PushCtx{}
}
// PutPushCtx returns ctx to the pool.
@@ -55,12 +46,7 @@ func GetPushCtx() *PushCtx {
// ctx mustn't be used after returning to the pool.
func PutPushCtx(ctx *PushCtx) {
ctx.Reset()
select {
case pushCtxPoolCh <- ctx:
default:
pushCtxPool.Put(ctx)
}
pushCtxPool.Put(ctx)
}
var pushCtxPool sync.Pool
var pushCtxPoolCh = make(chan *PushCtx, cgroup.AvailableCPUs())

View File

@@ -0,0 +1,95 @@
package datadogsketches
import (
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogsketches"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogsketches/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogutils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="datadogsketches"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="datadogsketches"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="datadogsketches"}`)
)
// InsertHandlerForHTTP processes remote write for DataDog POST /api/beta/sketches request.
func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
ce := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, ce, func(sketches []*datadogsketches.Sketch) error {
return insertRows(at, sketches, extraLabels)
})
}
func insertRows(at *auth.Token, sketches []*datadogsketches.Sketch, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
rowsTotal := 0
tssDst := ctx.WriteRequest.Timeseries[:0]
labels := ctx.Labels[:0]
samples := ctx.Samples[:0]
for _, sketch := range sketches {
ms := sketch.ToSummary()
for _, m := range ms {
labelsLen := len(labels)
labels = append(labels, prompbmarshal.Label{
Name: "__name__",
Value: m.Name,
})
for _, label := range m.Labels {
labels = append(labels, prompbmarshal.Label{
Name: label.Name,
Value: label.Value,
})
}
for _, tag := range sketch.Tags {
name, value := datadogutils.SplitTag(tag)
if name == "host" {
name = "exported_host"
}
labels = append(labels, prompbmarshal.Label{
Name: name,
Value: value,
})
}
labels = append(labels, extraLabels...)
samplesLen := len(samples)
for _, p := range m.Points {
samples = append(samples, prompbmarshal.Sample{
Timestamp: p.Timestamp,
Value: p.Value,
})
}
rowsTotal += len(m.Points)
tssDst = append(tssDst, prompbmarshal.TimeSeries{
Labels: labels[labelsLen:],
Samples: samples[samplesLen:],
})
}
}
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
if !remotewrite.TryPush(at, &ctx.WriteRequest) {
return remotewrite.ErrQueueFullHTTPRetry
}
rowsInserted.Add(rowsTotal)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsTotal)
}
rowsPerInsert.Update(float64(rowsTotal))
return nil
}

View File

@@ -10,7 +10,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
@@ -160,25 +159,15 @@ func (ctx *pushCtx) reset() {
}
func getPushCtx() *pushCtx {
select {
case ctx := <-pushCtxPoolCh:
return ctx
default:
if v := pushCtxPool.Get(); v != nil {
return v.(*pushCtx)
}
return &pushCtx{}
if v := pushCtxPool.Get(); v != nil {
return v.(*pushCtx)
}
return &pushCtx{}
}
func putPushCtx(ctx *pushCtx) {
ctx.reset()
select {
case pushCtxPoolCh <- ctx:
default:
pushCtxPool.Put(ctx)
}
pushCtxPool.Put(ctx)
}
var pushCtxPool sync.Pool
var pushCtxPoolCh = make(chan *pushCtx, cgroup.AvailableCPUs())

View File

@@ -8,10 +8,10 @@ import (
"net/http"
"os"
"strings"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/datadogsketches"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/datadogv1"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/datadogv2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/graphite"
@@ -24,6 +24,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/prometheusimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/statsd"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
@@ -36,20 +37,21 @@ import (
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
opentsdbhttpserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdbhttp"
statsdserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/statsd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/firehose"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8429", "TCP address to listen for http connections. "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
@@ -61,6 +63,10 @@ var (
"See also -graphiteListenAddr.useProxyProtocol")
graphiteUseProxyProtocol = flag.Bool("graphiteListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -graphiteListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
statsdListenAddr = flag.String("statsdListenAddr", "", "TCP and UDP address to listen for Statsd plaintext data. Usually :8125 must be set. Doesn't work if empty. "+
"See also -statsdListenAddr.useProxyProtocol")
statsdUseProxyProtocol = flag.Bool("statsdListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -statsdListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpenTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol")
@@ -70,7 +76,8 @@ var (
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
configAuthKey = flagutil.NewPassword("configAuthKey", "Authorization key for accessing /config page. It must be passed via authKey query arg")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
dryRun = flag.Bool("dryRun", false, "Whether to check config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig, -remoteWrite.streamAggr.config . "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
@@ -79,6 +86,7 @@ var (
var (
influxServer *influxserver.Server
graphiteServer *graphiteserver.Server
statsdServer *statsdserver.Server
opentsdbServer *opentsdbserver.Server
opentsdbhttpServer *opentsdbhttpserver.Server
)
@@ -119,8 +127,13 @@ func main() {
return
}
logger.Infof("starting vmagent at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8429"}
}
logger.Infof("starting vmagent at %q...", listenAddrs)
startTime := time.Now()
remotewrite.StartIngestionRateLimiter()
remotewrite.Init()
common.StartUnmarshalWorkers()
if len(*influxListenAddr) > 0 {
@@ -131,6 +144,9 @@ func main() {
if len(*graphiteListenAddr) > 0 {
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, *graphiteUseProxyProtocol, graphite.InsertHandler)
}
if len(*statsdListenAddr) > 0 {
statsdServer = statsdserver.MustStart(*statsdListenAddr, *statsdUseProxyProtocol, statsd.InsertHandler)
}
if len(*opentsdbListenAddr) > 0 {
httpInsertHandler := getOpenTSDBHTTPInsertHandler()
opentsdbServer = opentsdbserver.MustStart(*opentsdbListenAddr, *opentsdbUseProxyProtocol, opentsdb.InsertHandler, httpInsertHandler)
@@ -142,24 +158,21 @@ func main() {
promscrape.Init(remotewrite.PushDropSamplesOnFailure)
if len(*httpListenAddr) > 0 {
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
}
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started vmagent in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
sig := procutil.WaitForSigterm()
logger.Infof("received signal %s", sig)
remotewrite.StopIngestionRateLimiter()
pushmetrics.Stop()
startTime = time.Now()
if len(*httpListenAddr) > 0 {
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
promscrape.Stop()
@@ -169,6 +182,9 @@ func main() {
if len(*graphiteListenAddr) > 0 {
graphiteServer.MustStop()
}
if len(*statsdListenAddr) > 0 {
statsdServer.MustStop()
}
if len(*opentsdbListenAddr) > 0 {
opentsdbServer.MustStop()
}
@@ -222,7 +238,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")
fmt.Fprintf(w, "<h2>vmagent</h2>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/vmagent.html'>https://docs.victoriametrics.com/vmagent.html</a></br>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/vmagent/'>https://docs.victoriametrics.com/vmagent/</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"targets", "status for discovered active targets"},
@@ -230,7 +246,6 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
{"metric-relabel-debug", "debug metric relabeling"},
{"api/v1/targets", "advanced information about discovered targets in JSON format"},
{"config", "-promscrape.config contents"},
{"stream-agg", "streaming aggregation status"},
{"metrics", "available service metrics"},
{"flags", "command-line flags"},
{"-/reload", "reload configuration"},
@@ -262,7 +277,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
path = strings.TrimSuffix(path, "/")
}
switch path {
case "/prometheus/api/v1/write", "/api/v1/write":
case "/prometheus/api/v1/write", "/api/v1/write", "/api/v1/push", "/prometheus/api/v1/push":
if common.HandleVMProtoServerHandshake(w, r) {
return true
}
@@ -314,14 +329,14 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
influxQueryRequests.Inc()
influxutils.WriteDatabaseNames(w)
return true
case "/opentelemetry/api/v1/push":
case "/opentelemetry/api/v1/push", "/opentelemetry/v1/metrics":
opentelemetryPushRequests.Inc()
if err := opentelemetry.InsertHandler(nil, r); err != nil {
opentelemetryPushErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusOK)
firehose.WriteSuccessResponse(w, r)
return true
case "/newrelic":
newrelicCheckRequest.Inc()
@@ -369,6 +384,15 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(202)
fmt.Fprintf(w, `{"status":"ok"}`)
return true
case "/datadog/api/beta/sketches":
datadogsketchesWriteRequests.Inc()
if err := datadogsketches.InsertHandlerForHTTP(nil, r); err != nil {
datadogsketchesWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(202)
return true
case "/datadog/api/v1/validate":
datadogValidateRequests.Inc()
// See https://docs.datadoghq.com/api/latest/authentication/#validate-api-key
@@ -423,7 +447,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/prometheus/config", "/config":
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
if !httpserver.CheckAuthFlag(w, r, configAuthKey.Get(), "configAuthKey") {
return true
}
promscrapeConfigRequests.Inc()
@@ -432,7 +456,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/prometheus/api/v1/status/config", "/api/v1/status/config":
// See https://prometheus.io/docs/prometheus/latest/querying/api/#config
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
if !httpserver.CheckAuthFlag(w, r, configAuthKey.Get(), "configAuthKey") {
return true
}
promscrapeStatusConfigRequests.Inc()
@@ -442,15 +466,15 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
fmt.Fprintf(w, `{"status":"success","data":{"yaml":%q}}`, bb.B)
return true
case "/prometheus/-/reload", "/-/reload":
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
promscrapeConfigReloadRequests.Inc()
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
return true
case "/stream-agg":
streamaggr.WriteHumanReadableState(w, r, remotewrite.GetAggregators())
return true
case "/ready":
if rdy := atomic.LoadInt32(&promscrape.PendingScrapeConfigs); rdy > 0 {
if rdy := promscrape.PendingScrapeConfigs.Load(); rdy > 0 {
errMsg := fmt.Sprintf("waiting for scrapes to init, left: %d", rdy)
http.Error(w, errMsg, http.StatusTooEarly)
} else {
@@ -502,7 +526,7 @@ func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path stri
p.Suffix = strings.TrimSuffix(p.Suffix, "/")
}
switch p.Suffix {
case "prometheus/", "prometheus", "prometheus/api/v1/write":
case "prometheus/", "prometheus", "prometheus/api/v1/write", "prometheus/api/v1/push":
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(at, r); err != nil {
prometheusWriteErrors.Inc()
@@ -551,14 +575,14 @@ func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path stri
influxQueryRequests.Inc()
influxutils.WriteDatabaseNames(w)
return true
case "opentelemetry/api/v1/push":
case "opentelemetry/api/v1/push", "opentelemetry/v1/metrics":
opentelemetryPushRequests.Inc()
if err := opentelemetry.InsertHandler(at, r); err != nil {
opentelemetryPushErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusOK)
firehose.WriteSuccessResponse(w, r)
return true
case "newrelic":
newrelicCheckRequest.Inc()
@@ -604,6 +628,15 @@ func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path stri
w.WriteHeader(202)
fmt.Fprintf(w, `{"status":"ok"}`)
return true
case "datadog/api/beta/sketches":
datadogsketchesWriteRequests.Inc()
if err := datadogsketches.InsertHandlerForHTTP(at, r); err != nil {
datadogsketchesWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(202)
return true
case "datadog/api/v1/validate":
datadogValidateRequests.Inc()
// See https://docs.datadoghq.com/api/latest/authentication/#validate-api-key
@@ -660,13 +693,16 @@ var (
datadogv2WriteRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v2/series", protocol="datadog"}`)
datadogv2WriteErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/datadog/api/v2/series", protocol="datadog"}`)
datadogsketchesWriteRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/beta/sketches", protocol="datadog"}`)
datadogsketchesWriteErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/datadog/api/beta/sketches", protocol="datadog"}`)
datadogValidateRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v1/validate", protocol="datadog"}`)
datadogCheckRunRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v1/check_run", protocol="datadog"}`)
datadogIntakeRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/intake", protocol="datadog"}`)
datadogMetadataRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v1/metadata", protocol="datadog"}`)
opentelemetryPushRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/opentelemetry/api/v1/push", protocol="opentelemetry"}`)
opentelemetryPushErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/opentelemetry/api/v1/push", protocol="opentelemetry"}`)
opentelemetryPushRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/opentelemetry/v1/metrics", protocol="opentelemetry"}`)
opentelemetryPushErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/opentelemetry/v1/metrics", protocol="opentelemetry"}`)
newrelicWriteRequests = metrics.NewCounter(`vm_http_requests_total{path="/newrelic/infra/v2/metrics/events/bulk", protocol="newrelic"}`)
newrelicWriteErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/newrelic/infra/v2/metrics/events/bulk", protocol="newrelic"}`)
@@ -695,7 +731,7 @@ func usage() {
const s = `
vmagent collects metrics data via popular data ingestion protocols and routes it to VictoriaMetrics.
See the docs at https://docs.victoriametrics.com/vmagent.html .
See the docs at https://docs.victoriametrics.com/vmagent/ .
`
flagutil.Usage(s)
}

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/firehose"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
@@ -27,10 +28,15 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
var processBody func([]byte) ([]byte, error)
if req.Header.Get("Content-Type") == "application/json" {
return fmt.Errorf("json encoding isn't supported for opentelemetry format. Use protobuf encoding")
if req.Header.Get("X-Amz-Firehose-Protocol-Version") != "" {
processBody = firehose.ProcessRequestBody
} else {
return fmt.Errorf("json encoding isn't supported for opentelemetry format. Use protobuf encoding")
}
}
return stream.ParseStream(req.Body, isGzipped, func(tss []prompbmarshal.TimeSeries) error {
return stream.ParseStream(req.Body, isGzipped, processBody, func(tss []prompbmarshal.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
}

View File

@@ -17,23 +17,26 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/metrics"
)
var (
forcePromProto = flagutil.NewArrayBool("remoteWrite.forcePromProto", "Whether to force Prometheus remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol")
forceVMProto = flagutil.NewArrayBool("remoteWrite.forceVMProto", "Whether to force VictoriaMetrics remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol")
rateLimit = flagutil.NewArrayInt("remoteWrite.rateLimit", 0, "Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. "+
"By default, the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"is sent after temporary unavailability of the remote storage")
"is sent after temporary unavailability of the remote storage. See also -maxIngestionRate")
sendTimeout = flagutil.NewArrayDuration("remoteWrite.sendTimeout", time.Minute, "Timeout for sending a single block of data to the corresponding -remoteWrite.url")
proxyURL = flagutil.NewArrayString("remoteWrite.proxyURL", "Optional proxy URL for writing data to the corresponding -remoteWrite.url. "+
"Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234")
tlsHandshakeTimeout = flagutil.NewArrayDuration("remoteWrite.tlsHandshakeTimeout", 20*time.Second, "The timeout for estabilishing tls connections to the corresponding -remoteWrite.url")
tlsInsecureSkipVerify = flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify", "Whether to skip tls verification when connecting to the corresponding -remoteWrite.url")
tlsCertFile = flagutil.NewArrayString("remoteWrite.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -remoteWrite.url")
@@ -89,7 +92,7 @@ type client struct {
authCfg *promauth.Config
awsCfg *awsapi.Config
rl rateLimiter
rl *ratelimiter.RateLimiter
bytesSent *metrics.Counter
blocksSent *metrics.Counter
@@ -110,18 +113,13 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
if err != nil {
logger.Fatalf("cannot initialize auth config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tlsCfg, err := authCfg.NewTLSConfig()
if err != nil {
logger.Fatalf("cannot initialize tls config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
awsCfg, err := getAWSAPIConfig(argIdx)
if err != nil {
logger.Fatalf("cannot initialize AWS Config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tr := &http.Transport{
DialContext: statDial,
TLSClientConfig: tlsCfg,
TLSHandshakeTimeout: 10 * time.Second,
TLSHandshakeTimeout: tlsHandshakeTimeout.GetOptionalArg(argIdx),
MaxConnsPerHost: 2 * concurrency,
MaxIdleConnsPerHost: 2 * concurrency,
IdleConnTimeout: time.Minute,
@@ -139,7 +137,7 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
tr.Proxy = http.ProxyURL(pu)
}
hc := &http.Client{
Transport: tr,
Transport: authCfg.NewRoundTripper(tr),
Timeout: sendTimeout.GetOptionalArg(argIdx),
}
c := &client{
@@ -166,7 +164,7 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
useVMProto = common.HandleVMProtoClientHandshake(c.remoteWriteURL, doRequest)
if !useVMProto {
logger.Infof("the remote storage at %q doesn't support VictoriaMetrics remote write protocol. Switching to Prometheus remote write protocol. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol", sanitizedURL)
"See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol", sanitizedURL)
}
}
c.useVMProto = useVMProto
@@ -175,12 +173,11 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
}
func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
limitReached := metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_rate_limit_reached_total{url=%q}`, c.sanitizedURL))
if bytesPerSec := rateLimit.GetOptionalArg(argIdx); bytesPerSec > 0 {
logger.Infof("applying %d bytes per second rate limit for -remoteWrite.url=%q", bytesPerSec, sanitizedURL)
c.rl.perSecondLimit = int64(bytesPerSec)
c.rl = ratelimiter.New(int64(bytesPerSec), limitReached, c.stopCh)
}
c.rl.limitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_rate_limit_reached_total{url=%q}`, c.sanitizedURL))
c.bytesSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_bytes_sent_total{url=%q}`, c.sanitizedURL))
c.blocksSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_blocks_sent_total{url=%q}`, c.sanitizedURL))
c.rateLimit = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_rate_limit{url=%q}`, c.sanitizedURL), func() float64 {
@@ -394,8 +391,9 @@ func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
// The function returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
c.rl.Register(len(block))
maxRetryDuration := timeutil.AddJitterToDuration(time.Minute)
retryDuration := timeutil.AddJitterToDuration(time.Second)
retriesCount := 0
again:
@@ -405,8 +403,8 @@ again:
if err != nil {
c.errorsCount.Inc()
retryDuration *= 2
if retryDuration > time.Minute {
retryDuration = time.Minute
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
logger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
@@ -452,8 +450,8 @@ again:
// Unexpected status code returned
retriesCount++
retryDuration *= 2
if retryDuration > time.Minute {
retryDuration = time.Minute
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
body, err := io.ReadAll(resp.Body)
_ = resp.Body.Close()
@@ -476,45 +474,3 @@ again:
}
var remoteWriteRejectedLogger = logger.WithThrottler("remoteWriteRejected", 5*time.Second)
type rateLimiter struct {
perSecondLimit int64
// mu protects budget and deadline from concurrent access.
mu sync.Mutex
// The current budget. It is increased by perSecondLimit every second.
budget int64
// The next deadline for increasing the budget by perSecondLimit
deadline time.Time
limitReached *metrics.Counter
}
func (rl *rateLimiter) register(dataLen int, stopCh <-chan struct{}) {
limit := rl.perSecondLimit
if limit <= 0 {
return
}
rl.mu.Lock()
defer rl.mu.Unlock()
for rl.budget <= 0 {
if d := time.Until(rl.deadline); d > 0 {
rl.limitReached.Inc()
t := timerpool.Get(d)
select {
case <-stopCh:
timerpool.Put(t)
return
case <-t.C:
timerpool.Put(t)
}
}
rl.budget += limit
rl.deadline = time.Now().Add(time.Second)
}
rl.budget -= int64(dataLen)
}

View File

@@ -7,6 +7,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
@@ -15,6 +16,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/metrics"
"github.com/golang/snappy"
)
@@ -26,7 +28,7 @@ var (
maxRowsPerBlock = flag.Int("remoteWrite.maxRowsPerBlock", 10000, "The maximum number of samples to send in each block to remote storage. Higher number may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxBlockSize")
vmProtoCompressLevel = flag.Int("remoteWrite.vmProtoCompressLevel", 0, "The compression level for VictoriaMetrics remote write protocol. "+
"Higher values reduce network traffic at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of increased network traffic. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
"See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol")
)
type pendingSeries struct {
@@ -69,7 +71,8 @@ func (ps *pendingSeries) periodicFlusher() {
if flushSeconds <= 0 {
flushSeconds = 1
}
ticker := time.NewTicker(*flushInterval)
d := timeutil.AddJitterToDuration(*flushInterval)
ticker := time.NewTicker(d)
defer ticker.Stop()
for {
select {
@@ -79,7 +82,7 @@ func (ps *pendingSeries) periodicFlusher() {
ps.mu.Unlock()
return
case <-ticker.C:
if fasttime.UnixTimestamp()-atomic.LoadUint64(&ps.wr.lastFlushTime) < uint64(flushSeconds) {
if fasttime.UnixTimestamp()-ps.wr.lastFlushTime.Load() < uint64(flushSeconds) {
continue
}
}
@@ -90,8 +93,7 @@ func (ps *pendingSeries) periodicFlusher() {
}
type writeRequest struct {
// Move lastFlushTime to the top of the struct in order to guarantee atomic access on 32-bit architectures.
lastFlushTime uint64
lastFlushTime atomic.Uint64
// The queue to send blocks to.
fq *persistentqueue.FastQueue
@@ -107,11 +109,13 @@ type writeRequest struct {
wr prompbmarshal.WriteRequest
tss []prompbmarshal.TimeSeries
tss []prompbmarshal.TimeSeries
labels []prompbmarshal.Label
samples []prompbmarshal.Sample
exemplars []prompbmarshal.Exemplar
labels []prompbmarshal.Label
samples []prompbmarshal.Sample
buf []byte
// buf holds labels data
buf []byte
}
func (wr *writeRequest) reset() {
@@ -119,17 +123,14 @@ func (wr *writeRequest) reset() {
wr.wr.Timeseries = nil
for i := range wr.tss {
ts := &wr.tss[i]
ts.Labels = nil
ts.Samples = nil
}
clear(wr.tss)
wr.tss = wr.tss[:0]
promrelabel.CleanLabels(wr.labels)
wr.labels = wr.labels[:0]
wr.samples = wr.samples[:0]
wr.exemplars = wr.exemplars[:0]
wr.buf = wr.buf[:0]
}
@@ -151,7 +152,7 @@ func (wr *writeRequest) mustWriteBlock(block []byte) bool {
func (wr *writeRequest) tryFlush() bool {
wr.wr.Timeseries = wr.tss
atomic.StoreUint64(&wr.lastFlushTime, fasttime.UnixTimestamp())
wr.lastFlushTime.Store(fasttime.UnixTimestamp())
if !tryPushWriteRequest(&wr.wr, wr.fq.TryWriteBlock, wr.isVMRemoteWrite) {
return false
}
@@ -201,6 +202,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
labelsDst := wr.labels
labelsLen := len(wr.labels)
samplesDst := wr.samples
exemplarsDst := wr.exemplars
buf := wr.buf
for i := range src.Labels {
labelsDst = append(labelsDst, prompbmarshal.Label{})
@@ -217,44 +219,61 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
samplesDst = append(samplesDst, src.Samples...)
dst.Samples = samplesDst[len(samplesDst)-len(src.Samples):]
exemplarsDst = append(exemplarsDst, src.Exemplars...)
dst.Exemplars = exemplarsDst[len(exemplarsDst)-len(src.Exemplars):]
wr.samples = samplesDst
wr.labels = labelsDst
wr.exemplars = exemplarsDst
wr.buf = buf
}
// marshalConcurrency limits the maximum number of concurrent workers, which marshal and compress WriteRequest.
var marshalConcurrencyCh = make(chan struct{}, cgroup.AvailableCPUs())
func tryPushWriteRequest(wr *prompbmarshal.WriteRequest, tryPushBlock func(block []byte) bool, isVMRemoteWrite bool) bool {
if len(wr.Timeseries) == 0 {
// Nothing to push
return true
}
marshalConcurrencyCh <- struct{}{}
bb := writeRequestBufPool.Get()
bb.B = wr.MarshalProtobuf(bb.B[:0])
if len(bb.B) <= maxUnpackedBlockSize.IntN() {
zb := snappyBufPool.Get()
zb := compressBufPool.Get()
if isVMRemoteWrite {
zb.B = zstd.CompressLevel(zb.B[:0], bb.B, *vmProtoCompressLevel)
} else {
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
}
writeRequestBufPool.Put(bb)
<-marshalConcurrencyCh
if len(zb.B) <= persistentqueue.MaxBlockSize {
if !tryPushBlock(zb.B) {
return false
zbLen := len(zb.B)
ok := tryPushBlock(zb.B)
compressBufPool.Put(zb)
if ok {
blockSizeRows.Update(float64(len(wr.Timeseries)))
blockSizeBytes.Update(float64(zbLen))
}
blockSizeRows.Update(float64(len(wr.Timeseries)))
blockSizeBytes.Update(float64(len(zb.B)))
snappyBufPool.Put(zb)
return true
return ok
}
snappyBufPool.Put(zb)
compressBufPool.Put(zb)
} else {
writeRequestBufPool.Put(bb)
<-marshalConcurrencyCh
}
// Too big block. Recursively split it into smaller parts if possible.
if len(wr.Timeseries) == 1 {
// A single time series left. Recursively split its samples into smaller parts if possible.
samples := wr.Timeseries[0].Samples
exemplars := wr.Timeseries[0].Exemplars
if len(samples) == 1 {
logger.Warnf("dropping a sample for metric with too long labels exceeding -remoteWrite.maxBlockSize=%d bytes", maxUnpackedBlockSize.N)
return true
@@ -266,11 +285,16 @@ func tryPushWriteRequest(wr *prompbmarshal.WriteRequest, tryPushBlock func(block
return false
}
wr.Timeseries[0].Samples = samples[n:]
// We do not want to send exemplars twice
wr.Timeseries[0].Exemplars = nil
if !tryPushWriteRequest(wr, tryPushBlock, isVMRemoteWrite) {
wr.Timeseries[0].Samples = samples
wr.Timeseries[0].Exemplars = exemplars
return false
}
wr.Timeseries[0].Samples = samples
wr.Timeseries[0].Exemplars = exemplars
return true
}
timeseries := wr.Timeseries
@@ -294,5 +318,7 @@ var (
blockSizeRows = metrics.NewHistogram(`vmagent_remotewrite_block_size_rows`)
)
var writeRequestBufPool bytesutil.ByteBufferPool
var snappyBufPool bytesutil.ByteBufferPool
var (
writeRequestBufPool bytesutil.ByteBufferPool
compressBufPool bytesutil.ByteBufferPool
)

View File

@@ -10,8 +10,8 @@ import (
func TestPushWriteRequest(t *testing.T) {
rowsCounts := []int{1, 10, 100, 1e3, 1e4}
expectedBlockLensProm := []int{216, 1848, 16424, 169882, 1757876}
expectedBlockLensVM := []int{138, 492, 3927, 34995, 288476}
expectedBlockLensProm := []int{248, 1952, 17433, 180381, 1861994}
expectedBlockLensVM := []int{170, 575, 4748, 44936, 367096}
for i, rowsCount := range rowsCounts {
expectedBlockLenProm := expectedBlockLensProm[i]
expectedBlockLenVM := expectedBlockLensVM[i]
@@ -59,6 +59,20 @@ func newTestWriteRequest(seriesCount, labelsCount int) *prompbmarshal.WriteReque
Value: fmt.Sprintf("value_%d_%d", i, j),
})
}
exemplar := prompbmarshal.Exemplar{
Labels: []prompbmarshal.Label{
{
Name: "trace_id",
Value: "123456",
},
{
Name: "log_id",
Value: "987654",
},
},
Value: float64(i),
Timestamp: 1000 * int64(i),
}
wr.Timeseries = append(wr.Timeseries, prompbmarshal.TimeSeries{
Labels: labels,
Samples: []prompbmarshal.Sample{
@@ -67,6 +81,10 @@ func newTestWriteRequest(seriesCount, labelsCount int) *prompbmarshal.WriteReque
Timestamp: 1000 * int64(i),
},
},
Exemplars: []prompbmarshal.Exemplar{
exemplar,
},
})
}
return &wr

View File

@@ -19,10 +19,10 @@ var (
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabeling configs, which are applied "+
"to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. "+
"The path can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmagent.html#relabeling")
"See https://docs.victoriametrics.com/vmagent/#relabeling")
relabelConfigPaths = flagutil.NewArrayString("remoteWrite.urlRelabelConfig", "Optional path to relabel configs for the corresponding -remoteWrite.url. "+
"See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmagent.html#relabeling")
"See https://docs.victoriametrics.com/vmagent/#relabeling")
usePromCompatibleNaming = flag.Bool("usePromCompatibleNaming", false, "Whether to replace characters unsupported by Prometheus with underscores "+
"in the ingested metric names and label names. For example, foo.bar{a.b='c'} is transformed into foo_bar{a_b='c'} during data ingestion if this flag is set. "+
@@ -46,11 +46,11 @@ func loadRelabelConfigs() (*relabelConfigs, error) {
}
rcs.global = global
}
if len(*relabelConfigPaths) > (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)) {
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url or -remoteWrite.multitenantURL args: %d",
len(*relabelConfigPaths), (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)))
if len(*relabelConfigPaths) > len(*remoteWriteURLs) {
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url args: %d",
len(*relabelConfigPaths), (len(*remoteWriteURLs)))
}
rcs.perURL = make([]*promrelabel.ParsedConfigs, (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)))
rcs.perURL = make([]*promrelabel.ParsedConfigs, len(*remoteWriteURLs))
for i, path := range *relabelConfigPaths {
if len(path) == 0 {
// Skip empty relabel config.

View File

@@ -27,8 +27,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
)
@@ -38,25 +38,30 @@ var (
"or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. "+
"The data can be sharded among the configured remote storage systems if -remoteWrite.shardByURL flag is set")
remoteWriteMultitenantURLs = flagutil.NewArrayString("remoteWrite.multitenantURL", "Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. "+
"This flag is deprecated in favor of -enableMultitenantHandlers . See https://docs.victoriametrics.com/vmagent.html#multitenancy")
enableMultitenantHandlers = flag.Bool("enableMultitenantHandlers", false, "Whether to process incoming data via multitenant insert handlers according to "+
"https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format . By default incoming data is processed via single-node insert handlers "+
"https://docs.victoriametrics.com/cluster-victoriametrics/#url-format . By default incoming data is processed via single-node insert handlers "+
"according to https://docs.victoriametrics.com/#how-to-import-time-series-data ."+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details")
"See https://docs.victoriametrics.com/vmagent/#multitenancy for details")
shardByURL = flag.Bool("remoteWrite.shardByURL", false, "Whether to shard outgoing series across all the remote storage systems enumerated via -remoteWrite.url . "+
"By default the data is replicated across all the -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#sharding-among-remote-storages")
"By default the data is replicated across all the -remoteWrite.url . See https://docs.victoriametrics.com/vmagent/#sharding-among-remote-storages . "+
"See also -remoteWrite.shardByURLReplicas")
shardByURLReplicas = flag.Int("remoteWrite.shardByURLReplicas", 1, "How many copies of data to make among remote storage systems enumerated via -remoteWrite.url "+
"when -remoteWrite.shardByURL is set. See https://docs.victoriametrics.com/vmagent/#sharding-among-remote-storages")
shardByURLLabels = flagutil.NewArrayString("remoteWrite.shardByURL.labels", "Optional list of labels, which must be used for sharding outgoing samples "+
"among remote storage systems if -remoteWrite.shardByURL command-line flag is set. By default all the labels are used for sharding in order to gain "+
"even distribution of series over the specified -remoteWrite.url systems")
"even distribution of series over the specified -remoteWrite.url systems. See also -remoteWrite.shardByURL.ignoreLabels")
shardByURLIgnoreLabels = flagutil.NewArrayString("remoteWrite.shardByURL.ignoreLabels", "Optional list of labels, which must be ignored when sharding outgoing samples "+
"among remote storage systems if -remoteWrite.shardByURL command-line flag is set. By default all the labels are used for sharding in order to gain "+
"even distribution of series over the specified -remoteWrite.url systems. See also -remoteWrite.shardByURL.labels")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory for storing pending data, which isn't sent to the configured -remoteWrite.url . "+
"See also -remoteWrite.maxDiskUsagePerURL and -remoteWrite.disableOnDiskQueue")
keepDanglingQueues = flag.Bool("remoteWrite.keepDanglingQueues", false, "Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. "+
"Useful when -remoteWrite.url is changed temporarily and persistent queue files will be needed later on.")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs")
"isn't enough for sending high volume of collected data to remote storage. "+
"Default value depends on the number of available CPU cores. It should work fine in most cases since it minimizes resource usage")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
maxPendingBytesPerURL = flagutil.NewArrayBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
@@ -76,51 +81,59 @@ var (
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
maxHourlySeries = flag.Int("remoteWrite.maxHourlySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent/#cardinality-limiter")
maxDailySeries = flag.Int("remoteWrite.maxDailySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent/#cardinality-limiter")
maxIngestionRate = flag.Int("maxIngestionRate", 0, "The maximum number of samples vmagent can receive per second. Data ingestion is paused when the limit is exceeded. "+
"By default there are no limits on samples ingestion rate. See also -remoteWrite.rateLimit")
streamAggrConfig = flagutil.NewArrayString("remoteWrite.streamAggr.config", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See https://docs.victoriametrics.com/stream-aggregation/ . "+
"See also -remoteWrite.streamAggr.keepInput, -remoteWrite.streamAggr.dropInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.dropInput and https://docs.victoriametrics.com/stream-aggregation.html")
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.dropInput and https://docs.victoriametrics.com/stream-aggregation/")
streamAggrDropInput = flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput", "Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.keepInput and https://docs.victoriametrics.com/stream-aggregation.html")
streamAggrDedupInterval = flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval", 0, "Input samples are de-duplicated with this interval before being aggregated. "+
"Only the last sample per each time series per each interval is aggregated if the interval is greater than zero")
disableOnDiskQueue = flag.Bool("remoteWrite.disableOnDiskQueue", false, "Whether to disable storing pending data to -remoteWrite.tmpDataPath "+
"when the configured remote storage systems cannot keep up with the data ingestion rate. See https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence ."+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.keepInput and https://docs.victoriametrics.com/stream-aggregation/")
streamAggrDedupInterval = flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval", 0, "Input samples are de-duplicated with this interval before optional aggregation "+
"with -remoteWrite.streamAggr.config . See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/stream-aggregation/#deduplication")
streamAggrIgnoreOldSamples = flagutil.NewArrayBool("remoteWrite.streamAggr.ignoreOldSamples", "Whether to ignore input samples with old timestamps outside the current aggregation interval "+
"for the corresponding -remoteWrite.streamAggr.config . See https://docs.victoriametrics.com/stream-aggregation/#ignoring-old-samples")
streamAggrIgnoreFirstIntervals = flag.Int("remoteWrite.streamAggr.ignoreFirstIntervals", 0, "Number of aggregation intervals to skip after the start. Increase this value if you observe incorrect aggregation results after vmagent restarts. It could be caused by receiving unordered delayed data from clients pushing data into the vmagent. "+
"See https://docs.victoriametrics.com/stream-aggregation/#ignore-aggregation-intervals-on-start")
streamAggrDropInputLabels = flagutil.NewArrayString("streamAggr.dropInputLabels", "An optional list of labels to drop from samples "+
"before stream de-duplication and aggregation . See https://docs.victoriametrics.com/stream-aggregation/#dropping-unneeded-labels")
disableOnDiskQueue = flagutil.NewArrayBool("remoteWrite.disableOnDiskQueue", "Whether to disable storing pending data to -remoteWrite.tmpDataPath "+
"when the configured remote storage systems cannot keep up with the data ingestion rate. See https://docs.victoriametrics.com/vmagent#disabling-on-disk-persistence ."+
"See also -remoteWrite.dropSamplesOnOverload")
dropSamplesOnOverload = flag.Bool("remoteWrite.dropSamplesOnOverload", false, "Whether to drop samples when -remoteWrite.disableOnDiskQueue is set and if the samples "+
"cannot be pushed into the configured remote storage systems in a timely manner. See https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence")
dropSamplesOnOverload = flagutil.NewArrayBool("remoteWrite.dropSamplesOnOverload", "Whether to drop samples when -remoteWrite.disableOnDiskQueue is set and if the samples "+
"cannot be pushed into the configured remote storage systems in a timely manner. See https://docs.victoriametrics.com/vmagent#disabling-on-disk-persistence")
)
var (
// rwctxsDefault contains statically populated entries when -remoteWrite.url is specified.
rwctxsDefault []*remoteWriteCtx
// rwctxs contains statically populated entries when -remoteWrite.url is specified.
rwctxs []*remoteWriteCtx
// rwctxsMap contains dynamically populated entries when -remoteWrite.multitenantURL is specified.
rwctxsMap = make(map[tenantmetrics.TenantID][]*remoteWriteCtx)
rwctxsMapLock sync.Mutex
// Data without tenant id is written to defaultAuthToken if -remoteWrite.multitenantURL is specified.
// Data without tenant id is written to defaultAuthToken if -enableMultitenantHandlers is specified.
defaultAuthToken = &auth.Token{}
// ErrQueueFullHTTPRetry must be returned when TryPush() returns false.
ErrQueueFullHTTPRetry = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("remote storage systems cannot keep up with the data ingestion rate; retry the request later " +
"or remove -remoteWrite.disableOnDiskQueue from vmagent command-line flags, so it could save pending data to -remoteWrite.tmpDataPath; " +
"see https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence"),
"see https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence"),
StatusCode: http.StatusTooManyRequests,
}
// disableOnDiskQueueAll is set to true if all remoteWrite.urls were configured to disable persistent queue via disableOnDiskQueue
disableOnDiskQueueAll bool
)
// MultitenancyEnabled returns true if -enableMultitenantHandlers or -remoteWrite.multitenantURL is specified.
// MultitenancyEnabled returns true if -enableMultitenantHandlers is specified.
func MultitenancyEnabled() bool {
return *enableMultitenantHandlers || len(*remoteWriteMultitenantURLs) > 0
return *enableMultitenantHandlers
}
// Contains the current relabelConfigs.
@@ -140,7 +153,10 @@ func InitSecretFlags() {
}
}
var shardByURLLabelsMap map[string]struct{}
var (
shardByURLLabelsMap map[string]struct{}
shardByURLIgnoreLabelsMap map[string]struct{}
)
// Init initializes remotewrite.
//
@@ -148,11 +164,8 @@ var shardByURLLabelsMap map[string]struct{}
//
// Stop must be called for graceful shutdown.
func Init() {
if len(*remoteWriteURLs) == 0 && len(*remoteWriteMultitenantURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` or `-remoteWrite.multitenantURL` command-line flag must be set")
}
if len(*remoteWriteURLs) > 0 && len(*remoteWriteMultitenantURLs) > 0 {
logger.Fatalf("cannot set both `-remoteWrite.url` and `-remoteWrite.multitenantURL` command-line flags")
if len(*remoteWriteURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
}
if *maxHourlySeries > 0 {
hourlySeriesLimiter = bloomfilter.NewLimiter(*maxHourlySeries, time.Hour)
@@ -172,19 +185,21 @@ func Init() {
return float64(dailySeriesLimiter.CurrentItems())
})
}
if *queues > maxQueues {
*queues = maxQueues
}
if *queues <= 0 {
*queues = 1
}
if len(*shardByURLLabels) > 0 {
m := make(map[string]struct{}, len(*shardByURLLabels))
for _, label := range *shardByURLLabels {
m[label] = struct{}{}
}
shardByURLLabelsMap = m
if len(*shardByURLLabels) > 0 && len(*shardByURLIgnoreLabels) > 0 {
logger.Fatalf("-remoteWrite.shardByURL.labels and -remoteWrite.shardByURL.ignoreLabels cannot be set simultaneously; " +
"see https://docs.victoriametrics.com/vmagent/#sharding-among-remote-storages")
}
shardByURLLabelsMap = newMapFromStrings(*shardByURLLabels)
shardByURLIgnoreLabelsMap = newMapFromStrings(*shardByURLIgnoreLabels)
initLabelsGlobal()
// Register SIGHUP handler for config reload before loadRelabelConfigs.
@@ -201,8 +216,17 @@ func Init() {
relabelConfigTimestamp.Set(fasttime.UnixTimestamp())
if len(*remoteWriteURLs) > 0 {
rwctxsDefault = newRemoteWriteCtxs(nil, *remoteWriteURLs)
rwctxs = newRemoteWriteCtxs(nil, *remoteWriteURLs)
}
disableOnDiskQueueAll = true
for _, v := range *disableOnDiskQueue {
if !v {
disableOnDiskQueueAll = false
break
}
}
dropDanglingQueues()
// Start config reloader.
@@ -225,18 +249,15 @@ func dropDanglingQueues() {
if *keepDanglingQueues {
return
}
if len(*remoteWriteMultitenantURLs) > 0 {
// Do not drop dangling queues for *remoteWriteMultitenantURLs, since it is impossible to determine
// unused queues for multitenant urls - they are created on demand when new sample for the given
// tenant is pushed to remote storage.
return
}
// Remove dangling persistent queues, if any.
// This is required for the case when the number of queues has been changed or URL have been changed.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
//
existingQueues := make(map[string]struct{}, len(rwctxsDefault))
for _, rwctx := range rwctxsDefault {
// In case if there were many persistent queues with identical *remoteWriteURLs
// the queue with the last index will be dropped.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6140
existingQueues := make(map[string]struct{}, len(rwctxs))
for _, rwctx := range rwctxs {
existingQueues[rwctx.fq.Dirname()] = struct{}{}
}
@@ -253,7 +274,7 @@ func dropDanglingQueues() {
}
}
if removed > 0 {
logger.Infof("removed %d dangling queues from %q, active queues: %d", removed, *tmpDataPath, len(rwctxsDefault))
logger.Infof("removed %d dangling queues from %q, active queues: %d", removed, *tmpDataPath, len(rwctxs))
}
}
@@ -281,18 +302,6 @@ var (
)
func reloadStreamAggrConfigs() {
if len(*remoteWriteMultitenantURLs) > 0 {
rwctxsMapLock.Lock()
for _, rwctxs := range rwctxsMap {
reinitStreamAggr(rwctxs)
}
rwctxsMapLock.Unlock()
} else {
reinitStreamAggr(rwctxsDefault)
}
}
func reinitStreamAggr(rwctxs []*remoteWriteCtx) {
for _, rwctx := range rwctxs {
rwctx.reinitStreamAggr()
}
@@ -321,7 +330,7 @@ func newRemoteWriteCtxs(at *auth.Token, urls []string) []*remoteWriteCtx {
}
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
if at != nil {
// Construct full remote_write url for the given tenant according to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
// Construct full remote_write url for the given tenant according to https://docs.victoriametrics.com/cluster-victoriametrics/#url-format
remoteWriteURL.Path = fmt.Sprintf("%s/insert/%d:%d/prometheus/api/v1/write", remoteWriteURL.Path, at.AccountID, at.ProjectID)
sanitizedURL = fmt.Sprintf("%s:%d:%d", sanitizedURL, at.AccountID, at.ProjectID)
}
@@ -336,6 +345,35 @@ func newRemoteWriteCtxs(at *auth.Token, urls []string) []*remoteWriteCtx {
var configReloaderStopCh = make(chan struct{})
var configReloaderWG sync.WaitGroup
// StartIngestionRateLimiter starts ingestion rate limiter.
//
// Ingestion rate limiter must be started before Init() call.
//
// StopIngestionRateLimiter must be called before Stop() call in order to unblock all the callers
// to ingestion rate limiter. Otherwise deadlock may occur at Stop() call.
func StartIngestionRateLimiter() {
if *maxIngestionRate <= 0 {
return
}
ingestionRateLimitReached := metrics.NewCounter(`vmagent_max_ingestion_rate_limit_reached_total`)
ingestionRateLimiterStopCh = make(chan struct{})
ingestionRateLimiter = ratelimiter.New(int64(*maxIngestionRate), ingestionRateLimitReached, ingestionRateLimiterStopCh)
}
// StopIngestionRateLimiter stops ingestion rate limiter.
func StopIngestionRateLimiter() {
if ingestionRateLimiterStopCh == nil {
return
}
close(ingestionRateLimiterStopCh)
ingestionRateLimiterStopCh = nil
}
var (
ingestionRateLimiter *ratelimiter.RateLimiter
ingestionRateLimiterStopCh chan struct{}
)
// Stop stops remotewrite.
//
// It is expected that nobody calls TryPush during and after the call to this func.
@@ -343,18 +381,10 @@ func Stop() {
close(configReloaderStopCh)
configReloaderWG.Wait()
for _, rwctx := range rwctxsDefault {
for _, rwctx := range rwctxs {
rwctx.MustStop()
}
rwctxsDefault = nil
// There is no need in locking rwctxsMapLock here, since nobody should call TryPush during the Stop call.
for _, rwctxs := range rwctxsMap {
for _, rwctx := range rwctxs {
rwctx.MustStop()
}
}
rwctxsMap = nil
rwctxs = nil
if sl := hourlySeriesLimiter; sl != nil {
sl.MustStop()
@@ -364,30 +394,24 @@ func Stop() {
}
}
// PushDropSamplesOnFailure pushes wr to the configured remote storage systems set via -remoteWrite.url and -remoteWrite.multitenantURL
//
// If at is nil, then the data is pushed to the configured -remoteWrite.url.
// If at isn't nil, the data is pushed to the configured -remoteWrite.multitenantURL.
// PushDropSamplesOnFailure pushes wr to the configured remote storage systems set via -remoteWrite.url
//
// PushDropSamplesOnFailure can modify wr contents.
func PushDropSamplesOnFailure(at *auth.Token, wr *prompbmarshal.WriteRequest) {
_ = tryPush(at, wr, true)
}
// TryPush tries sending wr to the configured remote storage systems set via -remoteWrite.url and -remoteWrite.multitenantURL
//
// If at is nil, then the data is pushed to the configured -remoteWrite.url.
// If at isn't nil, the data is pushed to the configured -remoteWrite.multitenantURL.
// TryPush tries sending wr to the configured remote storage systems set via -remoteWrite.url
//
// TryPush can modify wr contents, so the caller must re-initialize wr before calling TryPush() after unsuccessful attempt.
// TryPush may send partial data from wr on unsuccessful attempt, so repeated call for the same wr may send the data multiple times.
//
// The caller must return ErrQueueFullHTTPRetry to the client, which sends wr, if TryPush returns false.
func TryPush(at *auth.Token, wr *prompbmarshal.WriteRequest) bool {
return tryPush(at, wr, *dropSamplesOnOverload)
return tryPush(at, wr, false)
}
func tryPush(at *auth.Token, wr *prompbmarshal.WriteRequest, dropSamplesOnFailure bool) bool {
func tryPush(at *auth.Token, wr *prompbmarshal.WriteRequest, forceDropSamplesOnFailure bool) bool {
tss := wr.Timeseries
if at == nil && MultitenancyEnabled() {
@@ -396,41 +420,25 @@ func tryPush(at *auth.Token, wr *prompbmarshal.WriteRequest, dropSamplesOnFailur
}
var tenantRctx *relabelCtx
var rwctxs []*remoteWriteCtx
if at == nil {
rwctxs = rwctxsDefault
} else if len(*remoteWriteMultitenantURLs) == 0 {
if at != nil {
// Convert at to (vm_account_id, vm_project_id) labels.
tenantRctx = getRelabelCtx()
defer putRelabelCtx(tenantRctx)
rwctxs = rwctxsDefault
} else {
rwctxsMapLock.Lock()
tenantID := tenantmetrics.TenantID{
AccountID: at.AccountID,
ProjectID: at.ProjectID,
}
rwctxs = rwctxsMap[tenantID]
if rwctxs == nil {
rwctxs = newRemoteWriteCtxs(at, *remoteWriteMultitenantURLs)
rwctxsMap[tenantID] = rwctxs
}
rwctxsMapLock.Unlock()
}
rowsCount := getRowsCount(tss)
if *disableOnDiskQueue {
// Quick check whether writes to configured remote storage systems are blocked.
// This allows saving CPU time spent on relabeling and block compression
// if some of remote storage systems cannot keep up with the data ingestion rate.
// Quick check whether writes to configured remote storage systems are blocked.
// This allows saving CPU time spent on relabeling and block compression
// if some of remote storage systems cannot keep up with the data ingestion rate.
// this shortcut is only applicable if all remote writes have disableOnDiskQueue = true
if disableOnDiskQueueAll {
for _, rwctx := range rwctxs {
if rwctx.fq.IsWriteBlocked() {
pushFailures.Inc()
if dropSamplesOnFailure {
rwctx.pushFailures.Inc()
if forceDropSamplesOnFailure || rwctx.dropSamplesOnOverload {
// Just drop samples
samplesDropped.Add(rowsCount)
return true
rwctx.rowsDroppedOnPushFailure.Add(rowsCount)
continue
}
return false
}
@@ -462,6 +470,9 @@ func tryPush(at *auth.Token, wr *prompbmarshal.WriteRequest, dropSamplesOnFailur
break
}
}
ingestionRateLimiter.Register(samplesCount)
tssBlock := tss
if i < len(tss) {
tssBlock = tss[:i]
@@ -480,27 +491,14 @@ func tryPush(at *auth.Token, wr *prompbmarshal.WriteRequest, dropSamplesOnFailur
}
sortLabelsIfNeeded(tssBlock)
tssBlock = limitSeriesCardinality(tssBlock)
if !tryPushBlockToRemoteStorages(rwctxs, tssBlock) {
if !*disableOnDiskQueue {
logger.Panicf("BUG: tryPushBlockToRemoteStorages must return true if -remoteWrite.disableOnDiskQueue isn't set")
}
pushFailures.Inc()
if dropSamplesOnFailure {
samplesDropped.Add(rowsCount)
return true
}
if !tryPushBlockToRemoteStorages(tssBlock, forceDropSamplesOnFailure) {
return false
}
}
return true
}
var (
samplesDropped = metrics.NewCounter(`vmagent_remotewrite_samples_dropped_total`)
pushFailures = metrics.NewCounter(`vmagent_remotewrite_push_failures_total`)
)
func tryPushBlockToRemoteStorages(rwctxs []*remoteWriteCtx, tssBlock []prompbmarshal.TimeSeries) bool {
func tryPushBlockToRemoteStorages(tssBlock []prompbmarshal.TimeSeries, forceDropSamplesOnFailure bool) bool {
if len(tssBlock) == 0 {
// Nothing to push
return true
@@ -508,70 +506,129 @@ func tryPushBlockToRemoteStorages(rwctxs []*remoteWriteCtx, tssBlock []prompbmar
if len(rwctxs) == 1 {
// Fast path - just push data to the configured single remote storage
return rwctxs[0].TryPush(tssBlock)
return rwctxs[0].TryPush(tssBlock, forceDropSamplesOnFailure)
}
// We need to push tssBlock to multiple remote storages.
// This is either sharding or replication depending on -remoteWrite.shardByURL command-line flag value.
if *shardByURL {
// Shard the data among rwctxs
tssByURL := make([][]prompbmarshal.TimeSeries, len(rwctxs))
tmpLabels := promutils.GetLabels()
for _, ts := range tssBlock {
hashLabels := ts.Labels
if len(shardByURLLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLLabelsMap[label.Name]; ok {
hashLabels = append(hashLabels, label)
}
}
}
h := getLabelsHash(hashLabels)
idx := h % uint64(len(tssByURL))
tssByURL[idx] = append(tssByURL[idx], ts)
if *shardByURL && *shardByURLReplicas < len(rwctxs) {
// Shard tssBlock samples among rwctxs.
replicas := *shardByURLReplicas
if replicas <= 0 {
replicas = 1
}
promutils.PutLabels(tmpLabels)
// Push sharded data to remote storages in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
var wg sync.WaitGroup
wg.Add(len(rwctxs))
var anyPushFailed uint64
for i, rwctx := range rwctxs {
tssShard := tssByURL[i]
if len(tssShard) == 0 {
continue
}
go func(rwctx *remoteWriteCtx, tss []prompbmarshal.TimeSeries) {
defer wg.Done()
if !rwctx.TryPush(tss) {
atomic.StoreUint64(&anyPushFailed, 1)
}
}(rwctx, tssShard)
}
wg.Wait()
return atomic.LoadUint64(&anyPushFailed) == 0
return tryShardingBlockAmongRemoteStorages(tssBlock, replicas, forceDropSamplesOnFailure)
}
// Replicate data among rwctxs.
// Push block to remote storages in parallel in order to reduce
// Replicate tssBlock samples among rwctxs.
// Push tssBlock to remote storage systems in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
var wg sync.WaitGroup
wg.Add(len(rwctxs))
var anyPushFailed uint64
var anyPushFailed atomic.Bool
for _, rwctx := range rwctxs {
go func(rwctx *remoteWriteCtx) {
defer wg.Done()
if !rwctx.TryPush(tssBlock) {
atomic.StoreUint64(&anyPushFailed, 1)
if !rwctx.TryPush(tssBlock, forceDropSamplesOnFailure) {
anyPushFailed.Store(true)
}
}(rwctx)
}
wg.Wait()
return atomic.LoadUint64(&anyPushFailed) == 0
return !anyPushFailed.Load()
}
func tryShardingBlockAmongRemoteStorages(tssBlock []prompbmarshal.TimeSeries, replicas int, forceDropSamplesOnFailure bool) bool {
x := getTSSShards(len(rwctxs))
defer putTSSShards(x)
shards := x.shards
tmpLabels := promutils.GetLabels()
for _, ts := range tssBlock {
hashLabels := ts.Labels
if len(shardByURLLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLLabelsMap[label.Name]; ok {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
} else if len(shardByURLIgnoreLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLIgnoreLabelsMap[label.Name]; !ok {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
}
h := getLabelsHash(hashLabels)
idx := h % uint64(len(shards))
i := 0
for {
shards[idx] = append(shards[idx], ts)
i++
if i >= replicas {
break
}
idx++
if idx >= uint64(len(shards)) {
idx = 0
}
}
}
promutils.PutLabels(tmpLabels)
// Push sharded samples to remote storage systems in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
var wg sync.WaitGroup
var anyPushFailed atomic.Bool
for i, rwctx := range rwctxs {
shard := shards[i]
if len(shard) == 0 {
continue
}
wg.Add(1)
go func(rwctx *remoteWriteCtx, tss []prompbmarshal.TimeSeries) {
defer wg.Done()
if !rwctx.TryPush(tss, forceDropSamplesOnFailure) {
anyPushFailed.Store(true)
}
}(rwctx, shard)
}
wg.Wait()
return !anyPushFailed.Load()
}
type tssShards struct {
shards [][]prompbmarshal.TimeSeries
}
func getTSSShards(n int) *tssShards {
v := tssShardsPool.Get()
if v == nil {
v = &tssShards{}
}
x := v.(*tssShards)
if cap(x.shards) < n {
x.shards = make([][]prompbmarshal.TimeSeries, n)
}
x.shards = x.shards[:n]
return x
}
func putTSSShards(x *tssShards) {
shards := x.shards
for i := range shards {
clear(shards[i])
shards[i] = shards[i][:0]
}
tssShardsPool.Put(x)
}
var tssShardsPool sync.Pool
// sortLabelsIfNeeded sorts labels if -sortLabels command-line flag is set.
func sortLabelsIfNeeded(tss []prompbmarshal.TimeSeries) {
if !*sortLabels {
@@ -665,15 +722,22 @@ type remoteWriteCtx struct {
fq *persistentqueue.FastQueue
c *client
sas atomic.Pointer[streamaggr.Aggregators]
streamAggrKeepInput bool
streamAggrDropInput bool
sas atomic.Pointer[streamaggr.Aggregators]
deduplicator *streamaggr.Deduplicator
streamAggrKeepInput bool
streamAggrDropInput bool
disableOnDiskQueue bool
dropSamplesOnOverload bool
pss []*pendingSeries
pssNextIdx uint64
pssNextIdx atomic.Uint64
rowsPushedAfterRelabel *metrics.Counter
rowsDroppedByRelabel *metrics.Counter
pushFailures *metrics.Counter
rowsDroppedOnPushFailure *metrics.Counter
}
func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks int, sanitizedURL string) *remoteWriteCtx {
@@ -689,7 +753,8 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks in
logger.Warnf("rounding the -remoteWrite.maxDiskUsagePerURL=%d to the minimum supported value: %d", maxPendingBytes, persistentqueue.DefaultChunkFileSize)
maxPendingBytes = persistentqueue.DefaultChunkFileSize
}
fq := persistentqueue.MustOpenFastQueue(queuePath, sanitizedURL, maxInmemoryBlocks, maxPendingBytes, *disableOnDiskQueue)
isPQDisabled := disableOnDiskQueue.GetOptionalArg(argIdx)
fq := persistentqueue.MustOpenFastQueue(queuePath, sanitizedURL, maxInmemoryBlocks, maxPendingBytes, isPQDisabled)
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_pending_data_bytes{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetPendingBytes())
})
@@ -732,15 +797,28 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks in
c: c,
pss: pss,
dropSamplesOnOverload: dropSamplesOnOverload.GetOptionalArg(argIdx),
disableOnDiskQueue: isPQDisabled,
rowsPushedAfterRelabel: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_rows_pushed_after_relabel_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
rowsDroppedByRelabel: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_relabel_metrics_dropped_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
pushFailures: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_push_failures_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
rowsDroppedOnPushFailure: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_samples_dropped_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
}
// Initialize sas
sasFile := streamAggrConfig.GetOptionalArg(argIdx)
dedupInterval := streamAggrDedupInterval.GetOptionalArg(argIdx)
ignoreOldSamples := streamAggrIgnoreOldSamples.GetOptionalArg(argIdx)
if sasFile != "" {
dedupInterval := streamAggrDedupInterval.GetOptionalArg(argIdx)
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternalTrackDropped, dedupInterval)
opts := &streamaggr.Options{
DedupInterval: dedupInterval,
DropInputLabels: *streamAggrDropInputLabels,
IgnoreOldSamples: ignoreOldSamples,
IgnoreFirstIntervals: *streamAggrIgnoreFirstIntervals,
}
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternalTrackDropped, opts)
if err != nil {
logger.Fatalf("cannot initialize stream aggregators from -remoteWrite.streamAggr.config=%q: %s", sasFile, err)
}
@@ -749,17 +827,24 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks in
rwctx.streamAggrDropInput = streamAggrDropInput.GetOptionalArg(argIdx)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_successful{path=%q}`, sasFile)).Set(1)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_success_timestamp_seconds{path=%q}`, sasFile)).Set(fasttime.UnixTimestamp())
} else if dedupInterval > 0 {
rwctx.deduplicator = streamaggr.NewDeduplicator(rwctx.pushInternalTrackDropped, dedupInterval, *streamAggrDropInputLabels)
}
return rwctx
}
func (rwctx *remoteWriteCtx) MustStop() {
// sas must be stopped before rwctx is closed
// sas and deduplicator must be stopped before rwctx is closed
// because sas can write pending series to rwctx.pss if there are any
sas := rwctx.sas.Swap(nil)
sas.MustStop()
if rwctx.deduplicator != nil {
rwctx.deduplicator.MustStop()
rwctx.deduplicator = nil
}
for _, ps := range rwctx.pss {
ps.MustStop()
}
@@ -776,7 +861,11 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.rowsDroppedByRelabel = nil
}
func (rwctx *remoteWriteCtx) TryPush(tss []prompbmarshal.TimeSeries) bool {
// TryPush sends tss series to the configured remote write endpoint
//
// TryPush can be called concurrently for multiple remoteWriteCtx,
// so it shouldn't modify tss entries.
func (rwctx *remoteWriteCtx) TryPush(tss []prompbmarshal.TimeSeries, forceDropSamplesOnFailure bool) bool {
// Apply relabeling
var rctx *relabelCtx
var v *[]prompbmarshal.TimeSeries
@@ -798,7 +887,7 @@ func (rwctx *remoteWriteCtx) TryPush(tss []prompbmarshal.TimeSeries) bool {
rowsCount := getRowsCount(tss)
rwctx.rowsPushedAfterRelabel.Add(rowsCount)
// Apply stream aggregation if any
// Apply stream aggregation or deduplication if they are configured
sas := rwctx.sas.Load()
if sas != nil {
matchIdxs := matchIdxsPool.Get()
@@ -813,6 +902,9 @@ func (rwctx *remoteWriteCtx) TryPush(tss []prompbmarshal.TimeSeries) bool {
tss = dropAggregatedSeries(tss, matchIdxs.B, rwctx.streamAggrDropInput)
}
matchIdxsPool.Put(matchIdxs)
} else if rwctx.deduplicator != nil {
rwctx.deduplicator.Push(tss)
tss = tss[:0]
}
// Try pushing the data to remote storage
@@ -825,6 +917,14 @@ func (rwctx *remoteWriteCtx) TryPush(tss []prompbmarshal.TimeSeries) bool {
putRelabelCtx(rctx)
}
if !ok {
rwctx.pushFailures.Inc()
if forceDropSamplesOnFailure || rwctx.dropSamplesOnOverload {
rwctx.rowsDroppedOnPushFailure.Add(len(tss))
return true
}
}
return ok
}
@@ -841,7 +941,7 @@ func dropAggregatedSeries(src []prompbmarshal.TimeSeries, matchIdxs []byte, drop
}
}
tail := src[len(dst):]
_ = prompbmarshal.ResetTimeSeries(tail)
clear(tail)
return dst
}
@@ -849,13 +949,13 @@ func (rwctx *remoteWriteCtx) pushInternalTrackDropped(tss []prompbmarshal.TimeSe
if rwctx.tryPushInternal(tss) {
return
}
if !*disableOnDiskQueue {
if !rwctx.disableOnDiskQueue {
logger.Panicf("BUG: tryPushInternal must return true if -remoteWrite.disableOnDiskQueue isn't set")
}
pushFailures.Inc()
if *dropSamplesOnOverload {
rwctx.pushFailures.Inc()
if dropSamplesOnOverload.GetOptionalArg(rwctx.idx) {
rowsCount := getRowsCount(tss)
samplesDropped.Add(rowsCount)
rwctx.rowsDroppedOnPushFailure.Add(rowsCount)
}
}
@@ -872,7 +972,7 @@ func (rwctx *remoteWriteCtx) tryPushInternal(tss []prompbmarshal.TimeSeries) boo
}
pss := rwctx.pss
idx := atomic.AddUint64(&rwctx.pssNextIdx, 1) % uint64(len(pss))
idx := rwctx.pssNextIdx.Add(1) % uint64(len(pss))
ok := pss[idx].TryPush(tss)
@@ -894,8 +994,12 @@ func (rwctx *remoteWriteCtx) reinitStreamAggr() {
logger.Infof("reloading stream aggregation configs pointed by -remoteWrite.streamAggr.config=%q", sasFile)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reloads_total{path=%q}`, sasFile)).Inc()
dedupInterval := streamAggrDedupInterval.GetOptionalArg(rwctx.idx)
sasNew, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternalTrackDropped, dedupInterval)
opts := &streamaggr.Options{
DedupInterval: streamAggrDedupInterval.GetOptionalArg(rwctx.idx),
DropInputLabels: *streamAggrDropInputLabels,
IgnoreOldSamples: streamAggrIgnoreOldSamples.GetOptionalArg(rwctx.idx),
}
sasNew, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternalTrackDropped, opts)
if err != nil {
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reloads_errors_total{path=%q}`, sasFile)).Inc()
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_successful{path=%q}`, sasFile)).Set(0)
@@ -932,13 +1036,17 @@ func getRowsCount(tss []prompbmarshal.TimeSeries) int {
// CheckStreamAggrConfigs checks configs pointed by -remoteWrite.streamAggr.config
func CheckStreamAggrConfigs() error {
pushNoop := func(tss []prompbmarshal.TimeSeries) {}
pushNoop := func(_ []prompbmarshal.TimeSeries) {}
for idx, sasFile := range *streamAggrConfig {
if sasFile == "" {
continue
}
dedupInterval := streamAggrDedupInterval.GetOptionalArg(idx)
sas, err := streamaggr.LoadFromFile(sasFile, pushNoop, dedupInterval)
opts := &streamaggr.Options{
DedupInterval: streamAggrDedupInterval.GetOptionalArg(idx),
DropInputLabels: *streamAggrDropInputLabels,
IgnoreOldSamples: streamAggrIgnoreOldSamples.GetOptionalArg(idx),
}
sas, err := streamaggr.LoadFromFile(sasFile, pushNoop, opts)
if err != nil {
return fmt.Errorf("cannot load -remoteWrite.streamAggr.config=%q: %w", sasFile, err)
}
@@ -947,23 +1055,10 @@ func CheckStreamAggrConfigs() error {
return nil
}
// GetAggregators returns aggregators for all the configured remote writes.
func GetAggregators() map[string]*streamaggr.Aggregators {
var result = map[string]*streamaggr.Aggregators{}
if len(*remoteWriteMultitenantURLs) > 0 {
rwctxsMapLock.Lock()
for tenant, rwctxs := range rwctxsMap {
for rwNum, rw := range rwctxs {
result[fmt.Sprintf("rw %d for tenant %v:%v", rwNum, tenant.AccountID, tenant.ProjectID)] = rw.sas.Load()
}
}
rwctxsMapLock.Unlock()
} else {
for rwNum, rw := range rwctxsDefault {
result[fmt.Sprintf("remote write %d", rwNum)] = rw.sas.Load()
}
func newMapFromStrings(a []string) map[string]struct{} {
m := make(map[string]struct{}, len(a))
for _, s := range a {
m[s] = struct{}{}
}
return result
return m
}

View File

@@ -0,0 +1,215 @@
package remotewrite
import (
"fmt"
"math"
"os"
"reflect"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/metrics"
)
func TestGetLabelsHash_Distribution(t *testing.T) {
f := func(bucketsCount int) {
t.Helper()
// Distribute itemsCount hashes returned by getLabelsHash() across bucketsCount buckets.
itemsCount := 1_000 * bucketsCount
m := make([]int, bucketsCount)
var labels []prompbmarshal.Label
for i := 0; i < itemsCount; i++ {
labels = append(labels[:0], prompbmarshal.Label{
Name: "__name__",
Value: fmt.Sprintf("some_name_%d", i),
})
for j := 0; j < 10; j++ {
labels = append(labels, prompbmarshal.Label{
Name: fmt.Sprintf("label_%d", j),
Value: fmt.Sprintf("value_%d_%d", i, j),
})
}
h := getLabelsHash(labels)
m[h%uint64(bucketsCount)]++
}
// Verify that the distribution is even
expectedItemsPerBucket := itemsCount / bucketsCount
for _, n := range m {
if math.Abs(1-float64(n)/float64(expectedItemsPerBucket)) > 0.04 {
t.Fatalf("unexpected items in the bucket for %d buckets; got %d; want around %d", bucketsCount, n, expectedItemsPerBucket)
}
}
}
f(2)
f(3)
f(4)
f(5)
f(10)
}
func TestRemoteWriteContext_TryPush_ImmutableTimeseries(t *testing.T) {
f := func(streamAggrConfig, relabelConfig string, dedupInterval time.Duration, keepInput, dropInput bool, input string) {
t.Helper()
perURLRelabel, err := promrelabel.ParseRelabelConfigsData([]byte(relabelConfig))
if err != nil {
t.Fatalf("cannot load relabel configs: %s", err)
}
rcs := &relabelConfigs{
perURL: []*promrelabel.ParsedConfigs{
perURLRelabel,
},
}
allRelabelConfigs.Store(rcs)
pss := make([]*pendingSeries, 1)
pss[0] = newPendingSeries(nil, true, 0, 100)
rwctx := &remoteWriteCtx{
idx: 0,
streamAggrKeepInput: keepInput,
streamAggrDropInput: dropInput,
pss: pss,
rowsPushedAfterRelabel: metrics.GetOrCreateCounter(`foo`),
rowsDroppedByRelabel: metrics.GetOrCreateCounter(`bar`),
}
if dedupInterval > 0 {
rwctx.deduplicator = streamaggr.NewDeduplicator(nil, dedupInterval, nil)
}
if len(streamAggrConfig) > 0 {
f := createFile(t, []byte(streamAggrConfig))
sas, err := streamaggr.LoadFromFile(f.Name(), nil, nil)
if err != nil {
t.Fatalf("cannot load streamaggr configs: %s", err)
}
rwctx.sas.Store(sas)
}
inputTss := mustParsePromMetrics(input)
expectedTss := make([]prompbmarshal.TimeSeries, len(inputTss))
// copy inputTss to make sure it is not mutated during TryPush call
copy(expectedTss, inputTss)
rwctx.TryPush(inputTss, false)
if !reflect.DeepEqual(expectedTss, inputTss) {
t.Fatalf("unexpected samples;\ngot\n%v\nwant\n%v", inputTss, expectedTss)
}
}
f(`
- interval: 1m
outputs: [sum_samples]
- interval: 2m
outputs: [count_series]
`, `
- action: keep
source_labels: [env]
regex: "dev"
`, 0, false, false, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`)
f(``, ``, time.Hour, false, false, `
metric{env="dev"} 10
metric{env="foo"} 20
metric{env="dev"} 15
metric{env="foo"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, time.Hour, false, false, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, time.Hour, true, false, `
metric{env="test"} 10
metric{env="dev"} 20
metric{env="foo"} 15
metric{env="dev"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, time.Hour, false, true, `
metric{env="foo"} 10
metric{env="dev"} 20
metric{env="foo"} 15
metric{env="dev"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, time.Hour, true, true, `
metric{env="dev"} 10
metric{env="test"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`)
}
func mustParsePromMetrics(s string) []prompbmarshal.TimeSeries {
var rows prometheus.Rows
errLogger := func(s string) {
panic(fmt.Errorf("unexpected error when parsing Prometheus metrics: %s", s))
}
rows.UnmarshalWithErrLogger(s, errLogger)
var tss []prompbmarshal.TimeSeries
samples := make([]prompbmarshal.Sample, 0, len(rows.Rows))
for _, row := range rows.Rows {
labels := make([]prompbmarshal.Label, 0, len(row.Tags)+1)
labels = append(labels, prompbmarshal.Label{
Name: "__name__",
Value: row.Metric,
})
for _, tag := range row.Tags {
labels = append(labels, prompbmarshal.Label{
Name: tag.Key,
Value: tag.Value,
})
}
samples = append(samples, prompbmarshal.Sample{
Value: row.Value,
Timestamp: row.Timestamp,
})
ts := prompbmarshal.TimeSeries{
Labels: labels,
Samples: samples[len(samples)-1:],
}
tss = append(tss, ts)
}
return tss
}
func createFile(t *testing.T, data []byte) *os.File {
t.Helper()
f, err := os.CreateTemp("", "")
if err != nil {
t.Fatal(err)
}
if err := os.WriteFile(f.Name(), data, 0644); err != nil {
t.Fatal(err)
}
if err := f.Sync(); err != nil {
t.Fatal(err)
}
return f
}

View File

@@ -3,34 +3,15 @@ package remotewrite
import (
"context"
"net"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/metrics"
)
func getStdDialer() *net.Dialer {
stdDialerOnce.Do(func() {
stdDialer = &net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: netutil.TCP6Enabled(),
}
})
return stdDialer
}
var (
stdDialer *net.Dialer
stdDialerOnce sync.Once
)
func statDial(ctx context.Context, _, addr string) (conn net.Conn, err error) {
network := netutil.GetTCPNetwork()
d := getStdDialer()
conn, err = d.DialContext(ctx, network, addr)
conn, err = netutil.DialMaybeSRV(ctx, network, addr)
dialsTotal.Inc()
if err != nil {
dialErrors.Inc()
@@ -50,7 +31,7 @@ var (
)
type statConn struct {
closed uint64
closed atomic.Int32
net.Conn
}
@@ -76,7 +57,7 @@ func (sc *statConn) Write(p []byte) (int, error) {
func (sc *statConn) Close() error {
err := sc.Conn.Close()
if atomic.AddUint64(&sc.closed, 1) == 1 {
if sc.closed.Add(1) == 1 {
conns.Dec()
}
return err

View File

@@ -0,0 +1,68 @@
package statsd
import (
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/statsd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/statsd/stream"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="statsd"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="statsd"}`)
)
// InsertHandler processes remote write for statsd plaintext protocol.
//
// See https://github.com/statsd/statsd/blob/master/docs/metric_types.md
func InsertHandler(r io.Reader) error {
return stream.Parse(r, false, func(rows []parser.Row) error {
return insertRows(nil, rows)
})
}
func insertRows(at *auth.Token, rows []parser.Row) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
tssDst := ctx.WriteRequest.Timeseries[:0]
labels := ctx.Labels[:0]
samples := ctx.Samples[:0]
for i := range rows {
r := &rows[i]
labelsLen := len(labels)
labels = append(labels, prompbmarshal.Label{
Name: "__name__",
Value: r.Metric,
})
for j := range r.Tags {
tag := &r.Tags[j]
labels = append(labels, prompbmarshal.Label{
Name: tag.Key,
Value: tag.Value,
})
}
samples = append(samples, prompbmarshal.Sample{
Value: r.Value,
Timestamp: r.Timestamp,
})
tssDst = append(tssDst, prompbmarshal.TimeSeries{
Labels: labels[labelsLen:],
Samples: samples[len(samples)-1:],
})
}
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
if !remotewrite.TryPush(at, &ctx.WriteRequest) {
return remotewrite.ErrQueueFullHTTPRetry
}
rowsInserted.Add(len(rows))
rowsPerInsert.Update(float64(len(rows)))
return nil
}

View File

@@ -25,6 +25,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -184,7 +185,8 @@ func processFlags() {
func setUp() {
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
go httpserver.Serve(httpListenAddr, false, func(w http.ResponseWriter, r *http.Request) bool {
var ab flagutil.ArrayBool
go httpserver.Serve([]string{httpListenAddr}, &ab, func(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/prometheus/api/v1/query":
if err := prometheus.QueryHandler(nil, time.Now(), w, r); err != nil {
@@ -225,7 +227,7 @@ checkCheck:
}
func tearDown() {
if err := httpserver.Stop(httpListenAddr); err != nil {
if err := httpserver.Stop([]string{httpListenAddr}); err != nil {
logger.Errorf("cannot stop the webservice: %s", err)
}
vmstorage.Stop()

View File

@@ -68,6 +68,7 @@ publish-vmalert:
test-vmalert:
go test -v -race -cover ./app/vmalert -loggerLevel=ERROR
go test -v -race -cover ./app/vmalert/rule
go test -v -race -cover ./app/vmalert/templates
go test -v -race -cover ./app/vmalert/datasource
go test -v -race -cover ./app/vmalert/notifier
@@ -118,6 +119,9 @@ vmalert-linux-ppc64le:
vmalert-linux-s390x:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmalert-linux-loong64:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=loong64 $(MAKE) app-local-goos-goarch
vmalert-linux-386:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -1,3 +1,3 @@
See vmalert docs [here](https://docs.victoriametrics.com/vmalert.html).
See vmalert docs [here](https://docs.victoriametrics.com/vmalert/).
vmalert docs can be edited at [docs/vmalert.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmalert.md).

View File

@@ -1158,9 +1158,9 @@
$labels.pod }}.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh
expr: |
sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (container, pod, namespace)
sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (cluster, container, pod, namespace)
/
sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)
sum(increase(container_cpu_cfs_periods_total{}[5m])) by (cluster, container, pod, namespace)
> ( 25 / 100 )
for: 15m
labels:

View File

@@ -22,6 +22,7 @@ groups:
{{ . | first | value }}
{{ end }}
description: "It is {{ $value }} connections for {{$labels.instance}}"
link: http://localhost:3000/d/wNf0q_kZk?viewPanel=51&from={{($activeAt.Add (parseDurationTime "1h")).UnixMilli}}&to={{($activeAt.Add (parseDurationTime "-1h")).UnixMilli}}
- alert: ExampleAlertAlwaysFiring
update_entries_limit: -1
expr: sum by(job)

View File

@@ -10,6 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
@@ -45,7 +46,7 @@ var (
oauth2TokenURL = flag.String("datasource.oauth2.tokenUrl", "", "Optional OAuth2 tokenURL to use for -datasource.url")
oauth2Scopes = flag.String("datasource.oauth2.scopes", "", "Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")
lookBack = flag.Duration("datasource.lookback", 0, `Will be deprecated soon, please adjust "-search.latencyOffset" at datasource side `+
lookBack = flag.Duration("datasource.lookback", 0, `Deprecated: please adjust "-search.latencyOffset" at datasource side `+
`or specify "latency_offset" in rule group's params. Lookback defines how far into the past to look when evaluating queries. `+
`For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
queryStep = flag.Duration("datasource.queryStep", 5*time.Minute, "How far a value can fallback to when evaluating queries. "+
@@ -90,10 +91,10 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
logger.Warnf("flag `-datasource.queryTimeAlignment` is deprecated and will be removed in next releases. Please use `eval_alignment` in rule group instead.")
}
if *lookBack != 0 {
logger.Warnf("flag `-datasource.lookback` will be deprecated soon. Please use `-rule.evalDelay` command-line flag instead. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155 for details.")
logger.Warnf("flag `-datasource.lookback` is deprecated and will be removed in next releases. Please adjust `-search.latencyOffset` at datasource side or specify `latency_offset` in rule group's params. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155 for details.")
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
@@ -132,7 +133,6 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
authCfg: authCfg,
datasourceURL: strings.TrimSuffix(*addr, "/"),
appendTypePrefix: *appendTypePrefix,
lookBack: *lookBack,
queryStep: *queryStep,
dataSourceType: datasourcePrometheus,
extraParams: extraParams,

View File

@@ -35,7 +35,6 @@ type VMStorage struct {
authCfg *promauth.Config
datasourceURL string
appendTypePrefix bool
lookBack time.Duration
queryStep time.Duration
dataSourceType datasourceType
@@ -63,7 +62,6 @@ func (s *VMStorage) Clone() *VMStorage {
authCfg: s.authCfg,
datasourceURL: s.datasourceURL,
appendTypePrefix: s.appendTypePrefix,
lookBack: s.lookBack,
queryStep: s.queryStep,
dataSourceType: s.dataSourceType,
@@ -122,13 +120,12 @@ func (s *VMStorage) BuildWithParams(params QuerierParams) Querier {
}
// NewVMStorage is a constructor for VMStorage
func NewVMStorage(baseURL string, authCfg *promauth.Config, lookBack time.Duration, queryStep time.Duration, appendTypePrefix bool, c *http.Client) *VMStorage {
func NewVMStorage(baseURL string, authCfg *promauth.Config, queryStep time.Duration, appendTypePrefix bool, c *http.Client) *VMStorage {
return &VMStorage{
c: c,
authCfg: authCfg,
datasourceURL: strings.TrimSuffix(baseURL, "/"),
appendTypePrefix: appendTypePrefix,
lookBack: lookBack,
queryStep: queryStep,
dataSourceType: datasourcePrometheus,
extraParams: url.Values{},
@@ -137,11 +134,11 @@ func NewVMStorage(baseURL string, authCfg *promauth.Config, lookBack time.Durati
// Query executes the given query and returns parsed response
func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) (Result, *http.Request, error) {
req, err := s.newQueryRequest(query, ts)
req, err := s.newQueryRequest(ctx, query, ts)
if err != nil {
return Result{}, nil, err
}
resp, err := s.do(ctx, req)
resp, err := s.do(req)
if err != nil {
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
// Return unexpected error to the caller.
@@ -149,11 +146,11 @@ func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) (Resu
}
// Something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
req, err = s.newQueryRequest(query, ts)
req, err = s.newQueryRequest(ctx, query, ts)
if err != nil {
return Result{}, nil, fmt.Errorf("second attempt: %w", err)
}
resp, err = s.do(ctx, req)
resp, err = s.do(req)
if err != nil {
return Result{}, nil, fmt.Errorf("second attempt: %w", err)
}
@@ -182,11 +179,11 @@ func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end tim
if end.IsZero() {
return res, fmt.Errorf("end param is missing")
}
req, err := s.newQueryRangeRequest(query, start, end)
req, err := s.newQueryRangeRequest(ctx, query, start, end)
if err != nil {
return res, err
}
resp, err := s.do(ctx, req)
resp, err := s.do(req)
if err != nil {
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
// Return unexpected error to the caller.
@@ -194,11 +191,11 @@ func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end tim
}
// Something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
req, err = s.newQueryRangeRequest(query, start, end)
req, err = s.newQueryRangeRequest(ctx, query, start, end)
if err != nil {
return res, fmt.Errorf("second attempt: %w", err)
}
resp, err = s.do(ctx, req)
resp, err = s.do(req)
if err != nil {
return res, fmt.Errorf("second attempt: %w", err)
}
@@ -210,7 +207,7 @@ func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end tim
return res, err
}
func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response, error) {
func (s *VMStorage) do(req *http.Request) (*http.Response, error) {
ru := req.URL.Redacted()
if *showDatasourceURL {
ru = req.URL.String()
@@ -218,7 +215,7 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
if s.debug {
logger.Infof("DEBUG datasource request: executing %s request with params %q", req.Method, ru)
}
resp, err := s.c.Do(req.WithContext(ctx))
resp, err := s.c.Do(req)
if err != nil {
return nil, fmt.Errorf("error getting response from %s: %w", ru, err)
}
@@ -230,8 +227,8 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
return resp, nil
}
func (s *VMStorage) newQueryRangeRequest(query string, start, end time.Time) (*http.Request, error) {
req, err := s.newRequest()
func (s *VMStorage) newQueryRangeRequest(ctx context.Context, query string, start, end time.Time) (*http.Request, error) {
req, err := s.newRequest(ctx)
if err != nil {
return nil, fmt.Errorf("cannot create query_range request to datasource %q: %w", s.datasourceURL, err)
}
@@ -239,8 +236,8 @@ func (s *VMStorage) newQueryRangeRequest(query string, start, end time.Time) (*h
return req, nil
}
func (s *VMStorage) newQueryRequest(query string, ts time.Time) (*http.Request, error) {
req, err := s.newRequest()
func (s *VMStorage) newQueryRequest(ctx context.Context, query string, ts time.Time) (*http.Request, error) {
req, err := s.newRequest(ctx)
if err != nil {
return nil, fmt.Errorf("cannot create query request to datasource %q: %w", s.datasourceURL, err)
}
@@ -248,15 +245,15 @@ func (s *VMStorage) newQueryRequest(query string, ts time.Time) (*http.Request,
case "", datasourcePrometheus:
s.setPrometheusInstantReqParams(req, query, ts)
case datasourceGraphite:
s.setGraphiteReqParams(req, query, ts)
s.setGraphiteReqParams(req, query)
default:
logger.Panicf("BUG: engine not found: %q", s.dataSourceType)
}
return req, nil
}
func (s *VMStorage) newRequest() (*http.Request, error) {
req, err := http.NewRequest(http.MethodPost, s.datasourceURL, nil)
func (s *VMStorage) newRequest(ctx context.Context) (*http.Request, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodPost, s.datasourceURL, nil)
if err != nil {
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", s.datasourceURL, err)
}

View File

@@ -4,8 +4,6 @@ import (
"encoding/json"
"fmt"
"net/http"
"strconv"
"time"
)
type graphiteResponse []graphiteResponseTarget
@@ -48,17 +46,13 @@ const (
graphitePrefix = "/graphite"
)
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string, timestamp time.Time) {
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string) {
if s.appendTypePrefix {
r.URL.Path += graphitePrefix
}
r.URL.Path += graphitePath
q := r.URL.Query()
from := "-5min"
if s.lookBack > 0 {
lookBack := timestamp.Add(-s.lookBack)
from = strconv.FormatInt(lookBack.Unix(), 10)
}
q.Set("from", from)
q.Set("format", "json")
q.Set("target", query)

View File

@@ -161,9 +161,6 @@ func (s *VMStorage) setPrometheusInstantReqParams(r *http.Request, query string,
r.URL.Path += "/api/v1/query"
}
q := r.URL.Query()
if s.lookBack > 0 {
timestamp = timestamp.Add(-s.lookBack)
}
q.Set("time", timestamp.Format(time.RFC3339))
if !*disableStepParam && s.evaluationInterval > 0 { // set step as evaluationInterval by default
// always convert to seconds to keep compatibility with older

View File

@@ -71,7 +71,7 @@ func TestVMInstantQuery(t *testing.T) {
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]},"stats":{"seriesFetched": "42"}}`))
}
})
mux.HandleFunc("/render", func(w http.ResponseWriter, request *http.Request) {
mux.HandleFunc("/render", func(w http.ResponseWriter, _ *http.Request) {
c++
switch c {
case 8:
@@ -86,7 +86,7 @@ func TestVMInstantQuery(t *testing.T) {
if err != nil {
t.Fatalf("unexpected: %s", err)
}
s := NewVMStorage(srv.URL, authCfg, time.Minute, 0, false, srv.Client())
s := NewVMStorage(srv.URL, authCfg, 0, false, srv.Client())
p := datasourcePrometheus
pq := s.BuildWithParams(QuerierParams{DataSourceType: string(p), EvaluationInterval: 15 * time.Second})
@@ -225,7 +225,7 @@ func TestVMInstantQueryWithRetry(t *testing.T) {
srv := httptest.NewServer(mux)
defer srv.Close()
s := NewVMStorage(srv.URL, nil, time.Minute, 0, false, srv.Client())
s := NewVMStorage(srv.URL, nil, 0, false, srv.Client())
pq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourcePrometheus)})
expErr := func(err string) {
@@ -334,7 +334,7 @@ func TestVMRangeQuery(t *testing.T) {
if err != nil {
t.Fatalf("unexpected: %s", err)
}
s := NewVMStorage(srv.URL, authCfg, time.Minute, *queryStep, false, srv.Client())
s := NewVMStorage(srv.URL, authCfg, *queryStep, false, srv.Client())
pq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourcePrometheus), EvaluationInterval: 15 * time.Second})
@@ -487,17 +487,6 @@ func TestRequestParams(t *testing.T) {
checkEqualString(t, "bar", p)
},
},
{
"lookback",
false,
&VMStorage{
lookBack: time.Minute,
},
func(t *testing.T, r *http.Request) {
exp := url.Values{"query": {query}, "time": {timestamp.Add(-time.Minute).Format(time.RFC3339)}}
checkEqualString(t, exp.Encode(), r.URL.RawQuery)
},
},
{
"evaluation interval",
false,
@@ -510,20 +499,6 @@ func TestRequestParams(t *testing.T) {
checkEqualString(t, exp.Encode(), r.URL.RawQuery)
},
},
{
"lookback + evaluation interval",
false,
&VMStorage{
lookBack: time.Minute,
evaluationInterval: 15 * time.Second,
},
func(t *testing.T, r *http.Request) {
evalInterval := 15 * time.Second
tt := timestamp.Add(-time.Minute)
exp := url.Values{"query": {query}, "step": {evalInterval.String()}, "time": {tt.Format(time.RFC3339)}}
checkEqualString(t, exp.Encode(), r.URL.RawQuery)
},
},
{
"step override",
false,
@@ -637,7 +612,7 @@ func TestRequestParams(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
req, err := tc.vm.newRequest()
req, err := tc.vm.newRequest(ctx)
if err != nil {
t.Fatal(err)
}
@@ -649,7 +624,7 @@ func TestRequestParams(t *testing.T) {
tc.vm.setPrometheusInstantReqParams(req, query, timestamp)
}
case datasourceGraphite:
tc.vm.setGraphiteReqParams(req, query, timestamp)
tc.vm.setGraphiteReqParams(req, query)
}
tc.checkFn(t, req)
})
@@ -735,7 +710,7 @@ func TestHeaders(t *testing.T) {
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
vm := tt.vmFn()
req, err := vm.newQueryRequest("foo", time.Now())
req, err := vm.newQueryRequest(ctx, "foo", time.Now())
if err != nil {
t.Fatal(err)
}

View File

@@ -44,7 +44,7 @@ Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
See https://docs.victoriametrics.com/vmalert/#reading-rules-from-object-storage
`)
ruleTemplatesPath = flagutil.NewArrayString("rule.templates", `Path or glob pattern to location with go template definitions `+
@@ -59,8 +59,8 @@ absolute path to all .tpl files in root.
configCheckInterval = flag.Duration("configCheckInterval", 0, "Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "Address to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
@@ -71,7 +71,7 @@ absolute path to all .tpl files in root.
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier. By default, hostname is used as address.")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager `+
`for cases where you want to build a custom link to Grafana, Prometheus or any other service. `+
`Supports templating - see https://docs.victoriametrics.com/vmalert.html#templating . `+
`Supports templating - see https://docs.victoriametrics.com/vmalert/#templating . `+
`For example, link to Grafana: -external.alert.source='explore?orgId=1&left={"datasource":"VictoriaMetrics","queries":[{"expr":{{$expr|jsonEscape|queryEscape}},"refId":"A"}],"range":{"from":"now-1h","to":"now"}}'. `+
`Link to VMUI: -external.alert.source='vmui/#/?g0.expr={{.Expr|queryEscape}}'. `+
`If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.`)
@@ -178,15 +178,19 @@ func main() {
go configReload(ctx, manager, groupsCfg, sighupCh)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8880"}
}
rh := &requestHandler{m: manager}
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, rh.handler)
go httpserver.Serve(listenAddrs, useProxyProtocol, rh.handler)
pushmetrics.Init()
sig := procutil.WaitForSigterm()
logger.Infof("service received signal %s", sig)
pushmetrics.Stop()
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
cancel()
@@ -248,7 +252,13 @@ func newManager(ctx context.Context) (*manager, error) {
func getExternalURL(customURL string) (*url.URL, error) {
if customURL == "" {
// use local hostname as external URL
return getHostnameAsExternalURL(*httpListenAddr, httpserver.IsTLS())
listenAddr := ":8880"
if len(*httpListenAddrs) > 0 {
listenAddr = (*httpListenAddrs)[0]
}
isTLS := httpserver.IsTLS(0)
return getHostnameAsExternalURL(listenAddr, isTLS)
}
u, err := url.Parse(customURL)
if err != nil {
@@ -260,13 +270,13 @@ func getExternalURL(customURL string) (*url.URL, error) {
return u, nil
}
func getHostnameAsExternalURL(httpListenAddr string, isSecure bool) (*url.URL, error) {
func getHostnameAsExternalURL(addr string, isSecure bool) (*url.URL, error) {
hname, err := os.Hostname()
if err != nil {
return nil, fmt.Errorf("failed to get hostname: %w", err)
}
port := ""
if ipport := strings.Split(httpListenAddr, ":"); len(ipport) > 1 {
if ipport := strings.Split(addr, ":"); len(ipport) > 1 {
port = ":" + ipport[1]
}
schema := "http://"
@@ -294,7 +304,7 @@ func getAlertURLGenerator(externalURL *url.URL, externalAlertSource string, vali
"tpl": externalAlertSource,
}
return func(alert notifier.Alert) string {
qFn := func(query string) ([]datasource.Metric, error) {
qFn := func(_ string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported for alert source template")
}
templated, err := alert.ExecTemplate(qFn, alert.Labels, m)
@@ -309,7 +319,7 @@ func usage() {
const s = `
vmalert processes alerts and recording rules.
See the docs at https://docs.victoriametrics.com/vmalert.html .
See the docs at https://docs.victoriametrics.com/vmalert/ .
`
flagutil.Usage(s)
}

View File

@@ -156,11 +156,14 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
var wg sync.WaitGroup
for _, item := range toUpdate {
wg.Add(1)
// cancel evaluation so the Update will be applied as fast as possible.
// it is important to call InterruptEval before the update, because cancel fn
// can be re-assigned during the update.
item.old.InterruptEval()
go func(old *rule.Group, new *rule.Group) {
old.UpdateWith(new)
wg.Done()
}(item.old, item.new)
item.old.InterruptEval()
}
wg.Wait()
}

View File

@@ -178,7 +178,7 @@ func TestAlert_ExecTemplate(t *testing.T) {
},
}
qFn := func(q string) ([]datasource.Metric, error) {
qFn := func(_ string) ([]datasource.Metric, error) {
return []datasource.Metric{
{
Labels: []datasource.Label{

View File

@@ -11,6 +11,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
)
@@ -104,7 +105,7 @@ func (am *AlertManager) send(ctx context.Context, alerts []Alert, headers map[st
if *showNotifierURL {
amURL = am.addr.String()
}
if resp.StatusCode != http.StatusOK {
if resp.StatusCode/100 != 2 {
body, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read response from %q: %w", amURL, err)
@@ -127,7 +128,7 @@ func NewAlertManager(alertManagerURL string, fn AlertURLGenerator, authCfg proma
if authCfg.TLSConfig != nil {
tls = authCfg.TLSConfig
}
tr, err := utils.Transport(alertManagerURL, tls.CertFile, tls.KeyFile, tls.CAFile, tls.ServerName, tls.InsecureSkipVerify)
tr, err := httputils.Transport(alertManagerURL, tls.CertFile, tls.KeyFile, tls.CAFile, tls.ServerName, tls.InsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
)
var (
@@ -60,7 +61,7 @@ func Init() (datasource.QuerierBuilder, error) {
if *addr == "" {
return nil, nil
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
@@ -78,5 +79,5 @@ func Init() (datasource.QuerierBuilder, error) {
return nil, fmt.Errorf("failed to configure auth: %w", err)
}
c := &http.Client{Transport: tr}
return datasource.NewVMStorage(*addr, authCfg, 0, 0, false, c), nil
return datasource.NewVMStorage(*addr, authCfg, 0, false, c), nil
}

View File

@@ -151,12 +151,22 @@ func (c *Client) run(ctx context.Context) {
ticker := time.NewTicker(c.flushInterval)
wr := &prompbmarshal.WriteRequest{}
shutdown := func() {
lastCtx, cancel := context.WithTimeout(context.Background(), defaultWriteTimeout)
logger.Infof("shutting down remote write client and flushing remained series")
shutdownFlushCnt := 0
for ts := range c.input {
wr.Timeseries = append(wr.Timeseries, ts)
if len(wr.Timeseries) >= c.maxBatchSize {
shutdownFlushCnt += len(wr.Timeseries)
c.flush(lastCtx, wr)
}
}
lastCtx, cancel := context.WithTimeout(context.Background(), defaultWriteTimeout)
logger.Infof("shutting down remote write client and flushing remained %d series", len(wr.Timeseries))
// flush the last batch. `flush` will re-check and avoid flushing empty batch.
shutdownFlushCnt += len(wr.Timeseries)
c.flush(lastCtx, wr)
logger.Infof("shutting down remote write client flushed %d series", shutdownFlushCnt)
cancel()
}
c.wg.Add(1)
@@ -279,7 +289,7 @@ L:
func (c *Client) send(ctx context.Context, data []byte) error {
r := bytes.NewReader(data)
req, err := http.NewRequest(http.MethodPost, c.addr, r)
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.addr, r)
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}
@@ -302,7 +312,7 @@ func (c *Client) send(ctx context.Context, data []byte) error {
if !*disablePathAppend {
req.URL.Path = path.Join(req.URL.Path, "/api/v1/write")
}
resp, err := c.c.Do(req.WithContext(ctx))
resp, err := c.c.Do(req)
if err != nil {
return fmt.Errorf("error while sending request to %s: %w; Data len %d(%d)",
req.URL.Redacted(), err, len(data), r.Size())

View File

@@ -84,6 +84,70 @@ func TestClient_Push(t *testing.T) {
}
}
func TestClient_run_maxBatchSizeDuringShutdown(t *testing.T) {
batchSize := 20
testTable := []struct {
name string // name of the test case
pushCnt int // how many time series is pushed to the client
batchCnt int // the expected batch count sent by the client
}{
{
name: "pushCnt % batchSize == 0",
pushCnt: batchSize * 40,
batchCnt: 40,
},
{
name: "pushCnt % batchSize != 0",
pushCnt: batchSize*40 + 1,
batchCnt: 40 + 1,
},
}
for _, tt := range testTable {
t.Run(tt.name, func(t *testing.T) {
// run new server
bcServer := newBatchCntRWServer()
// run new client
rwClient, err := NewClient(context.Background(), Config{
MaxBatchSize: batchSize,
// Set everything to 1 to simplify the calculation.
Concurrency: 1,
MaxQueueSize: 1000,
FlushInterval: time.Minute,
// batch count server
Addr: bcServer.URL,
})
if err != nil {
t.Fatalf("new remote write client failed, err: %v", err)
}
// push time series to the client.
for i := 0; i < tt.pushCnt; i++ {
if err = rwClient.Push(prompbmarshal.TimeSeries{}); err != nil {
t.Fatalf("push time series to the client failed, err: %v", err)
}
}
// close the client so the rest ts will be flushed in `shutdown`
if err = rwClient.Close(); err != nil {
t.Fatalf("shutdown client failed, err: %v", err)
}
// finally check how many batches is sent.
if tt.batchCnt != bcServer.acceptedBatches() {
t.Errorf("client sent batch count incorrect, want: %d, get: %d", tt.batchCnt, bcServer.acceptedBatches())
}
if tt.pushCnt != bcServer.accepted() {
t.Errorf("client sent time series count incorrect, want: %d, get: %d", tt.pushCnt, bcServer.accepted())
}
})
}
}
func newRWServer() *rwServer {
rw := &rwServer{}
rw.Server = httptest.NewServer(http.HandlerFunc(rw.handler))
@@ -91,14 +155,12 @@ func newRWServer() *rwServer {
}
type rwServer struct {
// WARN: ordering of fields is important for alignment!
// see https://golang.org/pkg/sync/atomic/#pkg-note-BUG
acceptedRows uint64
acceptedRows atomic.Uint64
*httptest.Server
}
func (rw *rwServer) accepted() int {
return int(atomic.LoadUint64(&rw.acceptedRows))
return int(rw.acceptedRows.Load())
}
func (rw *rwServer) err(w http.ResponseWriter, err error) {
@@ -144,7 +206,7 @@ func (rw *rwServer) handler(w http.ResponseWriter, r *http.Request) {
rw.err(w, fmt.Errorf("unmarhsal err: %w", err))
return
}
atomic.AddUint64(&rw.acceptedRows, uint64(len(wr.Timeseries)))
rw.acceptedRows.Add(uint64(len(wr.Timeseries)))
w.WriteHeader(http.StatusNoContent)
}
@@ -186,3 +248,27 @@ func (frw *faultyRWServer) handler(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("server overloaded"))
}
}
type batchCntRWServer struct {
*rwServer
batchCnt atomic.Int64 // accepted batch count, which also equals to request count
}
func newBatchCntRWServer() *batchCntRWServer {
bc := &batchCntRWServer{
rwServer: &rwServer{},
}
bc.Server = httptest.NewServer(http.HandlerFunc(bc.handler))
return bc
}
func (bc *batchCntRWServer) handler(w http.ResponseWriter, r *http.Request) {
bc.batchCnt.Add(1)
bc.rwServer.handler(w, r)
}
func (bc *batchCntRWServer) acceptedBatches() int {
return int(bc.batchCnt.Load())
}

View File

@@ -11,7 +11,7 @@ import (
"github.com/golang/snappy"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
@@ -30,7 +30,7 @@ func NewDebugClient() (*DebugClient, error) {
return nil, nil
}
t, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
t, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
)
var (
@@ -64,7 +65,7 @@ func Init(ctx context.Context) (*Client, error) {
return nil, nil
}
t, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
t, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -310,37 +310,24 @@ func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]
}
var result []prompbmarshal.TimeSeries
holdAlertState := make(map[uint64]*notifier.Alert)
qFn := func(query string) ([]datasource.Metric, error) {
qFn := func(_ string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported in replay mode")
}
for _, s := range res.Data {
ls, err := ar.toLabels(s, qFn)
ls, as, err := ar.expandTemplates(s, qFn, time.Time{})
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %s", err)
}
h := hash(ls.processed)
a, err := ar.newAlert(s, nil, time.Time{}, qFn) // initial alert
if err != nil {
return nil, fmt.Errorf("failed to create alert: %w", err)
return nil, fmt.Errorf("failed to expand templates: %s", err)
}
alertID := hash(ls.processed)
a := ar.newAlert(s, time.Time{}, ls.processed, as) // initial alert
// if alert is instant, For: 0
if ar.For == 0 {
a.State = notifier.StateFiring
for i := range s.Values {
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
}
continue
}
// if alert with For > 0
prevT := time.Time{}
for i := range s.Values {
at := time.Unix(s.Timestamps[i], 0)
// try to restore alert's state on the first iteration
if at.Equal(start) {
if _, ok := ar.alerts[h]; ok {
a = ar.alerts[h]
if _, ok := ar.alerts[alertID]; ok {
a = ar.alerts[alertID]
prevT = at
}
}
@@ -354,11 +341,15 @@ func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]
a.Start = at
}
prevT = at
if ar.For == 0 {
// rules with `for: 0` are always firing when they have Value
a.State = notifier.StateFiring
}
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
// save alert's state on last iteration, so it can be used on the next execRange call
if at.Equal(end) {
holdAlertState[h] = a
holdAlertState[alertID] = a
}
}
}
@@ -392,15 +383,34 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
}
}()
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
if err != nil {
return nil, fmt.Errorf("failed to execute query %q: %w", ar.Expr, err)
}
ar.logDebugf(ts, nil, "query returned %d samples (elapsed: %s)", curState.Samples, curState.Duration)
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res.Data, err
}
// template labels and annotations before updating ar.alerts,
// since they could use `query` function which takes a while to execute,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6079.
expandedLabels := make([]*labelSet, len(res.Data))
expandedAnnotations := make([]map[string]string, len(res.Data))
for i, m := range res.Data {
ls, as, err := ar.expandTemplates(m, qFn, ts)
if err != nil {
curState.Err = fmt.Errorf("failed to expand templates: %w", err)
return nil, curState.Err
}
expandedLabels[i] = ls
expandedAnnotations[i] = as
}
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
for h, a := range ar.alerts {
// cleanup inactive alerts from previous Exec
if a.State == notifier.StateInactive && ts.Sub(a.ResolvedAt) > resolvedRetention {
@@ -409,26 +419,18 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
}
}
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res.Data, err
}
updated := make(map[uint64]struct{})
// update list of active alerts
for _, m := range res.Data {
ls, err := ar.toLabels(m, qFn)
if err != nil {
curState.Err = fmt.Errorf("failed to expand labels: %w", err)
return nil, curState.Err
}
h := hash(ls.processed)
if _, ok := updated[h]; ok {
for i, m := range res.Data {
labels, annotations := expandedLabels[i], expandedAnnotations[i]
alertID := hash(labels.processed)
if _, ok := updated[alertID]; ok {
// duplicate may be caused the removal of `__name__` label
curState.Err = fmt.Errorf("labels %v: %w", ls.processed, errDuplicate)
curState.Err = fmt.Errorf("labels %v: %w", labels.processed, errDuplicate)
return nil, curState.Err
}
updated[h] = struct{}{}
if a, ok := ar.alerts[h]; ok {
updated[alertID] = struct{}{}
if a, ok := ar.alerts[alertID]; ok {
if a.State == notifier.StateInactive {
// alert could be in inactive state for resolvedRetention
// so when we again receive metrics for it - we switch it
@@ -438,23 +440,17 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
ar.logDebugf(ts, a, "INACTIVE => PENDING")
}
a.Value = m.Values[0]
// re-exec template since Value or query can be used in annotations
a.Annotations, err = a.ExecTemplate(qFn, ls.origin, ar.Annotations)
a.Annotations = annotations
if err != nil {
return nil, err
}
a.KeepFiringSince = time.Time{}
continue
}
a, err := ar.newAlert(m, ls, start, qFn)
if err != nil {
curState.Err = fmt.Errorf("failed to create alert: %w", err)
return nil, curState.Err
}
a.ID = h
a := ar.newAlert(m, ts, labels.processed, annotations)
a.ID = alertID
a.State = notifier.StatePending
a.ActiveAt = ts
ar.alerts[h] = a
ar.alerts[alertID] = a
ar.logDebugf(ts, a, "created in state PENDING")
}
var numActivePending int
@@ -479,7 +475,7 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
}
// alerts with ar.KeepFiringFor>0 may remain FIRING
// even if their expression isn't true anymore
if ts.Sub(a.KeepFiringSince) > ar.KeepFiringFor {
if ts.Sub(a.KeepFiringSince) >= ar.KeepFiringFor {
a.State = notifier.StateInactive
a.ResolvedAt = ts
ar.logDebugf(ts, a, "FIRING => INACTIVE: is absent in current evaluation round")
@@ -504,6 +500,28 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
return ar.toTimeSeries(ts.Unix()), nil
}
func (ar *AlertingRule) expandTemplates(m datasource.Metric, qFn templates.QueryFn, ts time.Time) (*labelSet, map[string]string, error) {
ls, err := ar.toLabels(m, qFn)
if err != nil {
return nil, nil, fmt.Errorf("failed to expand labels: %w", err)
}
tplData := notifier.AlertTplData{
Value: m.Values[0],
Labels: ls.origin,
Expr: ar.Expr,
AlertID: hash(ls.processed),
GroupID: ar.GroupID,
ActiveAt: ts,
For: ar.For,
}
as, err := notifier.ExecTemplate(qFn, ar.Annotations, tplData)
if err != nil {
return nil, nil, fmt.Errorf("failed to template annotations: %w", err)
}
return ls, as, nil
}
func (ar *AlertingRule) toTimeSeries(timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
for _, a := range ar.alerts {
@@ -537,31 +555,31 @@ func hash(labels map[string]string) uint64 {
return hash.Sum64()
}
func (ar *AlertingRule) newAlert(m datasource.Metric, ls *labelSet, start time.Time, qFn templates.QueryFn) (*notifier.Alert, error) {
var err error
if ls == nil {
ls, err = ar.toLabels(m, qFn)
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %w", err)
}
func (ar *AlertingRule) newAlert(m datasource.Metric, start time.Time, labels, annotations map[string]string) *notifier.Alert {
as := make(map[string]string)
if annotations != nil {
as = annotations
}
a := &notifier.Alert{
GroupID: ar.GroupID,
Name: ar.Name,
Labels: ls.processed,
Value: m.Values[0],
ActiveAt: start,
Expr: ar.Expr,
For: ar.For,
ls := make(map[string]string)
if labels != nil {
ls = labels
}
return &notifier.Alert{
GroupID: ar.GroupID,
Name: ar.Name,
Expr: ar.Expr,
For: ar.For,
ActiveAt: start,
Value: m.Values[0],
Labels: ls,
Annotations: as,
}
a.Annotations, err = a.ExecTemplate(qFn, ls.origin, ar.Annotations)
return a, err
}
const (
// alertMetricName is the metric name for synthetic alert timeseries.
// alertMetricName is the metric name for time series reflecting the alert state.
alertMetricName = "ALERTS"
// alertForStateMetricName is the metric name for 'for' state of alert.
// alertForStateMetricName is the metric name for time series reflecting the moment of time when alert became active.
alertForStateMetricName = "ALERTS_FOR_STATE"
// alertNameLabel is the label name indicating the name of an alert.
@@ -576,12 +594,10 @@ const (
// alertToTimeSeries converts the given alert with the given timestamp to time series
func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
tss = append(tss, alertToTimeSeries(a, timestamp))
if ar.For > 0 {
tss = append(tss, alertForToTimeSeries(a, timestamp))
return []prompbmarshal.TimeSeries{
alertToTimeSeries(a, timestamp),
alertForToTimeSeries(a, timestamp),
}
return tss
}
func alertToTimeSeries(a *notifier.Alert, timestamp int64) prompbmarshal.TimeSeries {
@@ -613,9 +629,6 @@ func (ar *AlertingRule) restore(ctx context.Context, q datasource.Querier, ts ti
return nil
}
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
if len(ar.alerts) < 1 {
return nil
}
@@ -640,6 +653,10 @@ func (ar *AlertingRule) restore(ctx context.Context, q datasource.Querier, ts ti
ar.logDebugf(ts, nil, "no response was received from restore query")
return nil
}
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
for _, series := range res.Data {
series.DelLabel("__name__")
labelSet := make(map[string]string, len(series.Labels))
@@ -664,15 +681,19 @@ func (ar *AlertingRule) restore(ctx context.Context, q datasource.Querier, ts ti
// alertsToSend walks through the current alerts of AlertingRule
// and returns only those which should be sent to notifier.
// Isn't concurrent safe.
func (ar *AlertingRule) alertsToSend(ts time.Time, resolveDuration, resendDelay time.Duration) []notifier.Alert {
func (ar *AlertingRule) alertsToSend(resolveDuration, resendDelay time.Duration) []notifier.Alert {
currentTime := time.Now()
needsSending := func(a *notifier.Alert) bool {
if a.State == notifier.StatePending {
return false
}
if a.ResolvedAt.After(a.LastSent) {
if a.State == notifier.StateFiring && a.End.Before(a.LastSent) {
return true
}
return a.LastSent.Add(resendDelay).Before(ts)
if a.State == notifier.StateInactive && a.ResolvedAt.After(a.LastSent) {
return true
}
return a.LastSent.Add(resendDelay).Before(currentTime)
}
var alerts []notifier.Alert
@@ -680,11 +701,11 @@ func (ar *AlertingRule) alertsToSend(ts time.Time, resolveDuration, resendDelay
if !needsSending(a) {
continue
}
a.End = ts.Add(resolveDuration)
a.End = currentTime.Add(resolveDuration)
if a.State == notifier.StateInactive {
a.End = a.ResolvedAt
}
a.LastSent = ts
a.LastSent = currentTime
alerts = append(alerts, *a)
}
return alerts

View File

@@ -28,20 +28,28 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
}{
{
newTestAlertingRule("instant", 0),
&notifier.Alert{State: notifier.StateFiring},
&notifier.Alert{State: notifier.StateFiring, ActiveAt: timestamp.Add(time.Second)},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
}),
},
},
{
newTestAlertingRule("instant extra labels", 0),
&notifier.Alert{State: notifier.StateFiring, Labels: map[string]string{
"job": "foo",
"instance": "bar",
}},
&notifier.Alert{
State: notifier.StateFiring, ActiveAt: timestamp.Add(time.Second),
Labels: map[string]string{
"job": "foo",
"instance": "bar",
},
},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
@@ -49,19 +57,35 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
"job": "foo",
"instance": "bar",
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
"job": "foo",
"instance": "bar",
}),
},
},
{
newTestAlertingRule("instant labels override", 0),
&notifier.Alert{State: notifier.StateFiring, Labels: map[string]string{
alertStateLabel: "foo",
"__name__": "bar",
}},
&notifier.Alert{
State: notifier.StateFiring, ActiveAt: timestamp.Add(time.Second),
Labels: map[string]string{
alertStateLabel: "foo",
"__name__": "bar",
},
},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
alertStateLabel: "foo",
}),
},
},
{
@@ -308,14 +332,17 @@ func TestAlertingRule_Exec(t *testing.T) {
fq := &datasource.FakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
ts := time.Now()
for i, step := range tc.steps {
fq.Reset()
fq.Add(step...)
if _, err := tc.rule.exec(context.TODO(), time.Now(), 0); err != nil {
if _, err := tc.rule.exec(context.TODO(), ts, 0); err != nil {
t.Fatalf("unexpected err: %s", err)
}
// artificial delay between applying steps
time.Sleep(defaultStep)
// shift the execution timestamp before the next iteration
ts = ts.Add(defaultStep)
if _, ok := tc.expAlerts[i]; !ok {
continue
}
@@ -367,7 +394,7 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{Values: []float64{1}, Timestamps: []int64{1}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
},
nil,
},
@@ -378,8 +405,9 @@ func TestAlertingRule_ExecRange(t *testing.T) {
},
[]*notifier.Alert{
{
Labels: map[string]string{"name": "foo"},
State: notifier.StateFiring,
Labels: map[string]string{"name": "foo"},
State: notifier.StateFiring,
ActiveAt: time.Unix(1, 0),
},
},
nil,
@@ -390,9 +418,9 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{Values: []float64{1, 1, 1}, Timestamps: []int64{1e3, 2e3, 3e3}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
{State: notifier.StateFiring},
{State: notifier.StateFiring},
{State: notifier.StateFiring, ActiveAt: time.Unix(1e3, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(2e3, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(3e3, 0)},
},
nil,
},
@@ -460,6 +488,20 @@ func TestAlertingRule_ExecRange(t *testing.T) {
For: time.Second,
}},
},
{
newTestAlertingRuleWithEvalInterval("firing=>inactive=>inactive=>firing=>firing", 0, time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1, 1}, Timestamps: []int64{1, 4, 5, 6}},
},
[]*notifier.Alert{
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
// It is expected for ActiveAT to remain the same while rule continues to fire in each iteration
{State: notifier.StateFiring, ActiveAt: time.Unix(4, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(4, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(4, 0)},
},
nil,
},
{
newTestAlertingRule("for=>pending=>firing=>pending=>firing=>pending", time.Second),
[]datasource.Metric{
@@ -534,21 +576,33 @@ func TestAlertingRule_ExecRange(t *testing.T) {
},
},
[]*notifier.Alert{
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
{
State: notifier.StateFiring, ActiveAt: time.Unix(1, 0),
Labels: map[string]string{
"source": "vm",
},
},
{
State: notifier.StateFiring, ActiveAt: time.Unix(100, 0),
Labels: map[string]string{
"source": "vm",
},
},
//
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{
State: notifier.StateFiring, ActiveAt: time.Unix(1, 0),
Labels: map[string]string{
"foo": "bar",
"source": "vm",
},
},
{
State: notifier.StateFiring, ActiveAt: time.Unix(5, 0),
Labels: map[string]string{
"foo": "bar",
"source": "vm",
},
},
},
nil,
},
@@ -1000,7 +1054,7 @@ func TestAlertsToSend(t *testing.T) {
for i, a := range alerts {
ar.alerts[uint64(i)] = a
}
gotAlerts := ar.alertsToSend(ts, resolveDuration, resendDelay)
gotAlerts := ar.alertsToSend(resolveDuration, resendDelay)
if gotAlerts == nil && expAlerts == nil {
return
}
@@ -1016,60 +1070,36 @@ func TestAlertsToSend(t *testing.T) {
})
for i, exp := range expAlerts {
got := gotAlerts[i]
if got.LastSent != exp.LastSent {
t.Fatalf("expected LastSent to be %v; got %v", exp.LastSent, got.LastSent)
}
if got.End != exp.End {
t.Fatalf("expected End to be %v; got %v", exp.End, got.End)
if got.Name != exp.Name {
t.Fatalf("expected Name to be %v; got %v", exp.Name, got.Name)
}
}
}
f( // send firing alert with custom resolve time
[]*notifier.Alert{{State: notifier.StateFiring}},
[]*notifier.Alert{{LastSent: ts, End: ts.Add(5 * time.Minute)}},
f( // check if firing alerts need to be sent with non-zero resendDelay
[]*notifier.Alert{
{Name: "a", State: notifier.StateFiring, Start: ts},
// no need to resend firing
{Name: "b", State: notifier.StateFiring, Start: ts, LastSent: ts.Add(-30 * time.Second), End: ts.Add(5 * time.Minute)},
// last message is for resolved, send firing message this time
{Name: "c", State: notifier.StateFiring, Start: ts, LastSent: ts.Add(-30 * time.Second), End: ts.Add(-1 * time.Minute)},
// resend firing
{Name: "d", State: notifier.StateFiring, Start: ts, LastSent: ts.Add(-1 * time.Minute)},
},
[]*notifier.Alert{{Name: "a"}, {Name: "c"}, {Name: "d"}},
5*time.Minute, time.Minute,
)
f( // resolve inactive alert at the current timestamp
[]*notifier.Alert{{State: notifier.StateInactive, ResolvedAt: ts}},
[]*notifier.Alert{{LastSent: ts, End: ts}},
time.Minute, time.Minute,
)
f( // mixed case of firing and resolved alerts. Names are added for deterministic sorting
[]*notifier.Alert{{Name: "a", State: notifier.StateFiring}, {Name: "b", State: notifier.StateInactive, ResolvedAt: ts}},
[]*notifier.Alert{{Name: "a", LastSent: ts, End: ts.Add(5 * time.Minute)}, {Name: "b", LastSent: ts, End: ts}},
f( // check if resolved alerts need to be sent with non-zero resendDelay
[]*notifier.Alert{
{Name: "a", State: notifier.StateInactive, ResolvedAt: ts, LastSent: ts.Add(-30 * time.Second)},
// no need to resend resolved
{Name: "b", State: notifier.StateInactive, ResolvedAt: ts, LastSent: ts},
// resend resolved
{Name: "c", State: notifier.StateInactive, ResolvedAt: ts.Add(-1 * time.Minute), LastSent: ts.Add(-1 * time.Minute)},
},
[]*notifier.Alert{{Name: "a"}, {Name: "c"}},
5*time.Minute, time.Minute,
)
f( // mixed case of pending and resolved alerts. Names are added for deterministic sorting
[]*notifier.Alert{{Name: "a", State: notifier.StatePending}, {Name: "b", State: notifier.StateInactive, ResolvedAt: ts}},
[]*notifier.Alert{{Name: "b", LastSent: ts, End: ts}},
5*time.Minute, time.Minute,
)
f( // attempt to send alert that was already sent in the resendDelay interval
[]*notifier.Alert{{State: notifier.StateFiring, LastSent: ts.Add(-time.Second)}},
nil,
time.Minute, time.Minute,
)
f( // attempt to send alert that was sent out of the resendDelay interval
[]*notifier.Alert{{State: notifier.StateFiring, LastSent: ts.Add(-2 * time.Minute)}},
[]*notifier.Alert{{LastSent: ts, End: ts.Add(time.Minute)}},
time.Minute, time.Minute,
)
f( // alert must be sent even if resendDelay interval is 0
[]*notifier.Alert{{State: notifier.StateFiring, LastSent: ts.Add(-time.Second)}},
[]*notifier.Alert{{LastSent: ts, End: ts.Add(time.Minute)}},
time.Minute, 0,
)
f( // inactive alert which has been sent already
[]*notifier.Alert{{State: notifier.StateInactive, LastSent: ts.Add(-time.Second), ResolvedAt: ts.Add(-2 * time.Second)}},
nil,
time.Minute, time.Minute,
)
f( // inactive alert which has been resolved after last send
[]*notifier.Alert{{State: notifier.StateInactive, LastSent: ts.Add(-time.Second), ResolvedAt: ts}},
[]*notifier.Alert{{LastSent: ts, End: ts}},
time.Minute, time.Minute,
)
}
func newTestRuleWithLabels(name string, labels ...string) *AlertingRule {
@@ -1095,6 +1125,12 @@ func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
return &rule
}
func newTestAlertingRuleWithEvalInterval(name string, waitFor, evalInterval time.Duration) *AlertingRule {
rule := newTestAlertingRule(name, waitFor)
rule.EvalInterval = evalInterval
return rule
}
func newTestAlertingRuleWithKeepFiring(name string, waitFor, keepFiringFor time.Duration) *AlertingRule {
rule := newTestAlertingRule(name, waitFor)
rule.KeepFiringFor = keepFiringFor

View File

@@ -9,10 +9,11 @@ import (
"hash/fnv"
"net/url"
"strconv"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/cheggaaa/pb/v3"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
@@ -704,7 +705,7 @@ func (e *executor) exec(ctx context.Context, r Rule, ts time.Time, resolveDurati
return nil
}
alerts := ar.alertsToSend(ts, resolveDuration, *resendDelay)
alerts := ar.alertsToSend(resolveDuration, *resendDelay)
if len(alerts) < 1 {
return nil
}
@@ -724,13 +725,19 @@ func (e *executor) exec(ctx context.Context, r Rule, ts time.Time, resolveDurati
return errGr.Err()
}
// getStaledSeries checks whether there are stale series from previously sent ones.
var bbPool bytesutil.ByteBufferPool
// getStaleSeries checks whether there are stale series from previously sent ones.
func (e *executor) getStaleSeries(r Rule, tss []prompbmarshal.TimeSeries, timestamp time.Time) []prompbmarshal.TimeSeries {
bb := bbPool.Get()
defer bbPool.Put(bb)
ruleLabels := make(map[string][]prompbmarshal.Label, len(tss))
for _, ts := range tss {
// convert labels to strings so we can compare with previously sent series
key := labelsToString(ts.Labels)
ruleLabels[key] = ts.Labels
// convert labels to strings, so we can compare with previously sent series
bb.B = labelsToString(bb.B, ts.Labels)
ruleLabels[string(bb.B)] = ts.Labels
bb.Reset()
}
rID := r.ID()
@@ -776,21 +783,20 @@ func (e *executor) purgeStaleSeries(activeRules []Rule) {
e.previouslySentSeriesToRWMu.Unlock()
}
func labelsToString(labels []prompbmarshal.Label) string {
var b strings.Builder
b.WriteRune('{')
func labelsToString(dst []byte, labels []prompbmarshal.Label) []byte {
dst = append(dst, '{')
for i, label := range labels {
if len(label.Name) == 0 {
b.WriteString("__name__")
dst = append(dst, "__name__"...)
} else {
b.WriteString(label.Name)
dst = append(dst, label.Name...)
}
b.WriteRune('=')
b.WriteString(strconv.Quote(label.Value))
dst = append(dst, '=')
dst = strconv.AppendQuote(dst, label.Value)
if i < len(labels)-1 {
b.WriteRune(',')
dst = append(dst, ',')
}
}
b.WriteRune('}')
return b.String()
dst = append(dst, '}')
return dst
}

View File

@@ -217,20 +217,21 @@ func TestGroupStart(t *testing.T) {
const evalInterval = time.Millisecond
g := NewGroup(groups[0], fs, evalInterval, map[string]string{"cluster": "east-1"})
g.Concurrency = 2
const inst1, inst2, job = "foo", "bar", "baz"
m1 := metricWithLabels(t, "instance", inst1, "job", job)
m2 := metricWithLabels(t, "instance", inst2, "job", job)
r := g.Rules[0].(*AlertingRule)
alert1, err := r.newAlert(m1, nil, time.Now(), nil)
if err != nil {
t.Fatalf("faield to create alert: %s", err)
}
alert1 := r.newAlert(m1, time.Now(), nil, nil)
alert1.State = notifier.StateFiring
// add annotations
alert1.Annotations["summary"] = "1"
// add external label
alert1.Labels["cluster"] = "east-1"
// add labels from response
alert1.Labels["job"] = job
alert1.Labels["instance"] = inst1
// add rule labels
alert1.Labels["label"] = "bar"
alert1.Labels["host"] = inst1
@@ -239,13 +240,15 @@ func TestGroupStart(t *testing.T) {
alert1.Labels[alertGroupNameLabel] = g.Name
alert1.ID = hash(alert1.Labels)
alert2, err := r.newAlert(m2, nil, time.Now(), nil)
if err != nil {
t.Fatalf("faield to create alert: %s", err)
}
alert2 := r.newAlert(m2, time.Now(), nil, nil)
alert2.State = notifier.StateFiring
// add annotations
alert2.Annotations["summary"] = "1"
// add external label
alert2.Labels["cluster"] = "east-1"
// add labels from response
alert2.Labels["job"] = job
alert2.Labels["instance"] = inst2
// add rule labels
alert2.Labels["label"] = "bar"
alert2.Labels["host"] = inst2
@@ -262,8 +265,25 @@ func TestGroupStart(t *testing.T) {
close(finished)
}()
// wait for multiple evals
time.Sleep(20 * evalInterval)
waitForIterations := func(n int, interval time.Duration) {
t.Helper()
var cur uint64
prev := g.metrics.iterationTotal.Get()
for i := 0; ; i++ {
if i > 40 {
t.Fatalf("group wasn't able to perform %d evaluations during %d eval intervals", n, i)
}
cur = g.metrics.iterationTotal.Get()
if int(cur-prev) >= n {
return
}
time.Sleep(interval)
}
}
// wait for multiple evaluation iterations
waitForIterations(4, evalInterval)
gotAlerts := fn.GetAlerts()
expectedAlerts := []notifier.Alert{*alert1, *alert2}
@@ -280,8 +300,8 @@ func TestGroupStart(t *testing.T) {
// and set only one datapoint for response
fs.Add(m1)
// wait for multiple evals
time.Sleep(20 * evalInterval)
// wait for multiple evaluation iterations
waitForIterations(4, evalInterval)
gotAlerts = fn.GetAlerts()
alert2.State = notifier.StateInactive

View File

@@ -0,0 +1,36 @@
package rule
import (
"fmt"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
func BenchmarkGetStaleSeries(b *testing.B) {
ts := time.Now()
n := 100
payload := make([]prompbmarshal.TimeSeries, n)
for i := 0; i < n; i++ {
s := fmt.Sprintf("%d", i)
labels := toPromLabels(b,
"__name__", "foo", ""+
"instance", s,
"job", s,
"state", s,
)
payload = append(payload, newTimeSeriesPB([]float64{1}, []int64{ts.Unix()}, labels))
}
e := &executor{
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
ar := &AlertingRule{RuleID: 1}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
e.getStaleSeries(ar, payload, ts)
}
}

View File

@@ -32,7 +32,7 @@ type Rule interface {
close()
}
var errDuplicate = errors.New("result contains metrics with the same labelset after applying rule labels. See https://docs.victoriametrics.com/vmalert.html#series-with-the-same-labelset for details")
var errDuplicate = errors.New("result contains metrics with the same labelset during evaluation. See https://docs.victoriametrics.com/vmalert/#series-with-the-same-labelset for details")
type ruleState struct {
sync.RWMutex

View File

@@ -95,7 +95,7 @@ func metricWithLabels(t *testing.T, labels ...string) datasource.Metric {
return m
}
func toPromLabels(t *testing.T, labels ...string) []prompbmarshal.Label {
func toPromLabels(t testing.TB, labels ...string) []prompbmarshal.Label {
t.Helper()
if len(labels) == 0 || len(labels)%2 != 0 {
t.Fatalf("expected to get even number of labels")

View File

@@ -1,11 +1,32 @@
function expandAll() {
$('.collapse').addClass('show');
$('.group-heading').each(function () {
let style = $(this).attr("style")
// display only elements that are currently visible
if (style === "display: none;") {
return
}
$(this).next().addClass('show')
});
}
function collapseAll() {
$('.collapse').removeClass('show');
}
function showByID(id) {
if (!id) {
return
}
let parent = $("#" + id).parent();
if (!parent) {
return
}
let target = $("#" + parent.attr("data-bs-target"));
if (target.length > 0) {
target.addClass('show');
}
}
function toggleByID(id) {
if (id) {
let el = $("#" + id);
@@ -15,6 +36,100 @@ function toggleByID(id) {
}
}
function debounce(func, delay) {
let timer;
return function (...args) {
clearTimeout(timer);
timer = setTimeout(() => {
func.apply(this, args);
}, delay);
};
}
$('#search').on("keyup", debounce(search, 500));
// search shows or hides groups&rules that satisfy the search phrase.
// case-insensitive, respects GET param `search`.
function search() {
$(".rule").show();
let groupHeader = $(".group-heading")
let searchPhrase = $("#search").val().toLowerCase()
if (searchPhrase.length === 0) {
groupHeader.show()
setParamURL('search', '')
return
}
$(".rule-table").removeClass('show');
groupHeader.hide()
searchPhrase = searchPhrase.toLowerCase()
filterRuleByName(searchPhrase);
filterRuleByLabels(searchPhrase);
filterGroupsByName(searchPhrase);
setParamURL('search', searchPhrase)
}
function setParamURL(key, value) {
let url = new URL(location.href)
url.searchParams.set(key, value);
window.history.replaceState(null, null, `?${url.searchParams.toString()}${url.hash}`);
}
function getParamURL(key) {
let url = new URL(location.href)
return url.searchParams.get(key)
}
function filterGroupsByName(searchPhrase) {
$(".group-heading").each(function () {
const groupName = $(this).attr('data-group-name').toLowerCase();
const hasValue = groupName.indexOf(searchPhrase) >= 0
if (!hasValue) {
return
}
const target = $(this).attr("data-bs-target");
$(`div[id="${target}"] .rule`).show();
$(this).show();
});
}
function filterRuleByName(searchPhrase) {
$(".rule").each(function () {
const ruleName = $(this).attr("data-rule-name").toLowerCase();
const hasValue = ruleName.indexOf(searchPhrase) >= 0
if (!hasValue) {
$(this).hide();
return
}
const target = $(this).attr('data-bs-target')
$(`#rules-${target}`).addClass('show');
$(`div[data-bs-target='rules-${target}']`).show();
$(this).show();
});
}
function filterRuleByLabels(searchPhrase) {
$(".rule").each(function () {
const matches = $(".label", this).filter(function () {
const label = $(this).text().toLowerCase();
return label.indexOf(searchPhrase) >= 0;
}).length;
if (matches > 0) {
const target = $(this).attr('data-bs-target')
$(`#rules-${target}`).addClass('show');
$(`div[data-bs-target='rules-${target}']`).show();
$(this).show();
}
});
}
$(document).ready(function () {
$(".group-heading a").click(function (e) {
e.stopPropagation(); // prevent collapse logic on link click
@@ -32,8 +147,15 @@ $(document).ready(function () {
});
});
// update search element with value from URL, if any
let searchPhrase = getParamURL('search')
$("#search").val(searchPhrase)
// apply filtering by search phrase
search()
let hash = window.location.hash.substr(1);
toggleByID(hash);
showByID(hash);
});
$(document).ready(function () {

View File

@@ -476,7 +476,7 @@ func templateFuncs() textTpl.FuncMap {
// For example, {{ query "foo" | first | value }} will
// execute "/api/v1/query?query=foo" request and will return
// the first value in response.
"query": func(q string) ([]metric, error) {
"query": func(_ string) ([]metric, error) {
// query function supposed to be substituted at FuncsWithQuery().
// it is present here only for validation purposes, when there is no
// provided datasource.

View File

@@ -12,11 +12,15 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/rule"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
var reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
var (
apiLinks = [][2]string{
// api links are relative since they can be used by external clients,
@@ -35,7 +39,7 @@ var (
{Name: "Groups", Url: "groups"},
{Name: "Alerts", Url: "alerts"},
{Name: "Notifiers", Url: "notifiers"},
{Name: "Docs", Url: "https://docs.victoriametrics.com/vmalert.html"},
{Name: "Docs", Url: "https://docs.victoriametrics.com/vmalert/"},
}
)
@@ -84,7 +88,10 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
WriteRuleDetails(w, r, rule)
return true
case "/vmalert/groups":
WriteListGroups(w, r, rh.groups())
var data []apiGroup
rf := extractRulesFilter(r)
data = rh.groups(rf)
WriteListGroups(w, r, data)
return true
case "/vmalert/notifiers":
WriteListTargets(w, r, notifier.GetTargets())
@@ -95,12 +102,20 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/rules":
// Grafana makes an extra request to `/rules`
// handler in addition to `/api/v1/rules` calls in alerts UI,
WriteListGroups(w, r, rh.groups())
var data []apiGroup
rf := extractRulesFilter(r)
data = rh.groups(rf)
WriteListGroups(w, r, data)
return true
case "/vmalert/api/v1/rules", "/api/v1/rules":
// path used by Grafana for ng alerting
data, err := rh.listGroups()
var data []byte
var err error
rf := extractRulesFilter(r)
data, err = rh.listGroups(rf)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
@@ -108,6 +123,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
w.Header().Set("Content-Type", "application/json")
w.Write(data)
return true
case "/vmalert/api/v1/alerts", "/api/v1/alerts":
// path used by Grafana for ng alerting
data, err := rh.listAlerts()
@@ -151,6 +167,9 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
w.Write(data)
return true
case "/-/reload":
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
logger.Infof("api config reload was called, sending sighup")
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
@@ -201,26 +220,94 @@ type listGroupsResponse struct {
} `json:"data"`
}
func (rh *requestHandler) groups() []apiGroup {
// see https://prometheus.io/docs/prometheus/latest/querying/api/#rules
type rulesFilter struct {
files []string
groupNames []string
ruleNames []string
ruleType string
excludeAlerts bool
}
func extractRulesFilter(r *http.Request) rulesFilter {
rf := rulesFilter{}
var ruleType string
ruleTypeParam := r.URL.Query().Get("type")
// for some reason, `type` in filter doesn't match `type` in response,
// so we use this matching here
if ruleTypeParam == "alert" {
ruleType = ruleTypeAlerting
} else if ruleTypeParam == "record" {
ruleType = ruleTypeRecording
}
rf.ruleType = ruleType
rf.excludeAlerts = httputils.GetBool(r, "exclude_alerts")
rf.ruleNames = append([]string{}, r.Form["rule_name[]"]...)
rf.groupNames = append([]string{}, r.Form["rule_group[]"]...)
rf.files = append([]string{}, r.Form["file[]"]...)
return rf
}
func (rh *requestHandler) groups(rf rulesFilter) []apiGroup {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
groups := make([]apiGroup, 0)
for _, g := range rh.m.groups {
groups = append(groups, groupToAPI(g))
isInList := func(list []string, needle string) bool {
if len(list) < 1 {
return true
}
for _, i := range list {
if i == needle {
return true
}
}
return false
}
// sort list of alerts for deterministic output
sort.Slice(groups, func(i, j int) bool {
return groups[i].Name < groups[j].Name
})
groups := make([]apiGroup, 0)
for _, group := range rh.m.groups {
if !isInList(rf.groupNames, group.Name) {
continue
}
if !isInList(rf.files, group.File) {
continue
}
g := groupToAPI(group)
// the returned list should always be non-nil
// https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221
filteredRules := make([]apiRule, 0)
for _, r := range g.Rules {
if rf.ruleType != "" && rf.ruleType != r.Type {
continue
}
if !isInList(rf.ruleNames, r.Name) {
continue
}
if rf.excludeAlerts {
r.Alerts = nil
}
filteredRules = append(filteredRules, r)
}
g.Rules = filteredRules
groups = append(groups, g)
}
// sort list of groups for deterministic output
sort.Slice(groups, func(i, j int) bool {
a, b := groups[i], groups[j]
if a.Name != b.Name {
return a.Name < b.Name
}
return a.File < b.File
})
return groups
}
func (rh *requestHandler) listGroups() ([]byte, error) {
func (rh *requestHandler) listGroups(rf rulesFilter) ([]byte, error) {
lr := listGroupsResponse{Status: "success"}
lr.Data.Groups = rh.groups()
lr.Data.Groups = rh.groups(rf)
b, err := json.Marshal(lr)
if err != nil {
return nil, &httpserver.ErrorWithStatusCode{

View File

@@ -70,15 +70,29 @@ btn-primary
}
}
%}
<a class="btn {%= buttonActive(filter, "") %}" role="button" onclick="window.location = window.location.pathname">All</a>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
<a class="btn {%= buttonActive(filter, "unhealthy") %}" role="button" onclick="location.href='?filter=unhealthy'" title="Show only rules with errors">Unhealthy</a>
<a class="btn {%= buttonActive(filter, "noMatch") %}" role="button" onclick="location.href='?filter=noMatch'" title="Show only rules matching no time series during last evaluation">NoMatch</a>
<div class="btn-toolbar mb-3" role="toolbar">
<div>
<a class="btn {%= buttonActive(filter, "") %}" role="button" onclick="window.location = window.location.pathname">All</a>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
<a class="btn {%= buttonActive(filter, "unhealthy") %}" role="button" onclick="location.href='?filter=unhealthy'" title="Show only rules with errors">Unhealthy</a>
<a class="btn {%= buttonActive(filter, "noMatch") %}" role="button" onclick="location.href='?filter=noMatch'" title="Show only rules matching no time series during last evaluation">NoMatch</a>
</div>
<div class="col-md-4 col-lg-5">
<div class="px-3 input-group">
<div class="input-group-prepend">
<span class="input-group-text">
<svg fill="#000000" height="25px" width="20px" version="1.1" id="Capa_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 490.4 490.4" xml:space="preserve"><g id="SVGRepo_bgCarrier" stroke-width="0"></g><g id="SVGRepo_tracerCarrier" stroke-linecap="round" stroke-linejoin="round"></g><g id="SVGRepo_iconCarrier"> <g> <path d="M484.1,454.796l-110.5-110.6c29.8-36.3,47.6-82.8,47.6-133.4c0-116.3-94.3-210.6-210.6-210.6S0,94.496,0,210.796 s94.3,210.6,210.6,210.6c50.8,0,97.4-18,133.8-48l110.5,110.5c12.9,11.8,25,4.2,29.2,0C492.5,475.596,492.5,463.096,484.1,454.796z M41.1,210.796c0-93.6,75.9-169.5,169.5-169.5s169.6,75.9,169.6,169.5s-75.9,169.5-169.5,169.5S41.1,304.396,41.1,210.796z"></path> </g> </g></svg>
</span>
</div>
<input id="search" placeholder="Filter by group, rule or labels" type="text" class="form-control"/>
</div>
</div>
</div>
{% if len(groups) > 0 %}
{% for _, g := range groups %}
<div
class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{%endif%}" data-bs-target="rules-{%s g.ID %}">
class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{%endif%}" data-bs-target="rules-{%s g.ID %}" data-group-name="{%s g.Name %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s) #</a>
{% if rNotOk[g.ID] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.ID] %}</span> {% endif %}
@@ -100,7 +114,7 @@ btn-primary
</div>
{% endif %}
</div>
<div class="collapse" id="rules-{%s g.ID %}">
<div class="collapse rule-table" id="rules-{%s g.ID %}">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
@@ -111,7 +125,7 @@ btn-primary
</thead>
<tbody>
{% for _, r := range g.Rules %}
<tr{% if r.LastError != "" %} class="alert-danger"{% endif %}>
<tr class="rule{% if r.LastError != "" %} alert-danger{% endif %}" data-rule-name="{%s r.Name %}" data-bs-target="{%s g.ID %}">
<td>
<div class="row">
<div class="col-12 mb-2">
@@ -134,7 +148,7 @@ btn-primary
<div class="col-12 mb-2">
{% if len(r.Labels) > 0 %} <b>Labels:</b>{% endif %}
{% for k, v := range r.Labels %}
<span class="ms-1 badge bg-primary">{%s k %}={%s v %}</span>
<span class="ms-1 badge bg-primary label">{%s k %}={%s v %}</span>
{% endfor %}
</div>
{% if r.LastError != "" %}
@@ -170,11 +184,25 @@ btn-primary
{%code prefix := utils.Prefix(r.URL.Path) %}
{%= tpl.Header(r, navItems, "Alerts", getLastConfigError()) %}
{% if len(groupAlerts) > 0 %}
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
<div class="btn-toolbar mb-3" role="toolbar">
<div>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
</div>
<div class="col-md-4 col-lg-5">
<div class="px-3 input-group">
<div class="input-group-prepend">
<span class="input-group-text">
<svg fill="#000000" height="25px" width="20px" version="1.1" id="Capa_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 490.4 490.4" xml:space="preserve"><g id="SVGRepo_bgCarrier" stroke-width="0"></g><g id="SVGRepo_tracerCarrier" stroke-linecap="round" stroke-linejoin="round"></g><g id="SVGRepo_iconCarrier"> <g> <path d="M484.1,454.796l-110.5-110.6c29.8-36.3,47.6-82.8,47.6-133.4c0-116.3-94.3-210.6-210.6-210.6S0,94.496,0,210.796 s94.3,210.6,210.6,210.6c50.8,0,97.4-18,133.8-48l110.5,110.5c12.9,11.8,25,4.2,29.2,0C492.5,475.596,492.5,463.096,484.1,454.796z M41.1,210.796c0-93.6,75.9-169.5,169.5-169.5s169.6,75.9,169.6,169.5s-75.9,169.5-169.5,169.5S41.1,304.396,41.1,210.796z"></path> </g> </g></svg>
</span>
</div>
<input id="search" placeholder="Filter by group, rule or labels" type="text" class="form-control"/>
</div>
</div>
</div>
{% for _, ga := range groupAlerts %}
{%code g := ga.Group %}
<div class="group-heading alert-danger" data-bs-target="rules-{%s g.ID %}">
<div class="group-heading alert-danger" data-bs-target="rules-{%s g.ID %}" data-group-name="{%s g.Name %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %}</a>
<span class="badge bg-danger" title="Number of active alerts">{%d len(ga.Alerts) %}</span>
@@ -192,7 +220,7 @@ btn-primary
}
sort.Strings(keys)
%}
<div class="collapse" id="rules-{%s g.ID %}">
<div class="collapse rule-table" id="rules-{%s g.ID %}">
{% for _, ruleID := range keys %}
{%code
defaultAR := alertsByRule[ruleID][0]
@@ -203,45 +231,46 @@ btn-primary
sort.Strings(labelKeys)
%}
<br>
<b>alert:</b> {%s defaultAR.Name %} ({%d len(alertsByRule[ruleID]) %})
| <span><a target="_blank" href="{%s defaultAR.SourceLink %}">Source</a></span>
<br>
<b>expr:</b><code><pre>{%s defaultAR.Expression %}</pre></code>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Labels</th>
<th scope="col">State</th>
<th scope="col">Active at</th>
<th scope="col">Value</th>
<th scope="col">Link</th>
</tr>
</thead>
<tbody>
{% for _, ar := range alertsByRule[ruleID] %}
<tr>
<td>
{% for _, k := range labelKeys %}
<span class="ms-1 badge bg-primary">{%s k %}={%s ar.Labels[k] %}</span>
{% endfor %}
</td>
<td>{%= badgeState(ar.State) %}</td>
<td>
{%s ar.ActiveAt.Format("2006-01-02T15:04:05Z07:00") %}
{% if ar.Restored %}{%= badgeRestored() %}{% endif %}
{% if ar.Stabilizing %}{%= badgeStabilizing() %}{% endif %}
</td>
<td>{%s ar.Value %}</td>
<td>
<a href="{%s prefix+ar.WebLink() %}">Details</a>
</td>
</tr>
{% endfor %}
</tbody>
</table>
<div class="rule" data-rule-name="{%s defaultAR.Name %}" data-bs-target="{%s g.ID %}">
<b>alert:</b> {%s defaultAR.Name %} ({%d len(alertsByRule[ruleID]) %})
| <span><a target="_blank" href="{%s defaultAR.SourceLink %}">Source</a></span>
<br>
<b>expr:</b><code><pre>{%s defaultAR.Expression %}</pre></code>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Labels</th>
<th scope="col">State</th>
<th scope="col">Active at</th>
<th scope="col">Value</th>
<th scope="col">Link</th>
</tr>
</thead>
<tbody>
{% for _, ar := range alertsByRule[ruleID] %}
<tr>
<td>
{% for _, k := range labelKeys %}
<span class="ms-1 badge bg-primary label">{%s k %}={%s ar.Labels[k] %}</span>
{% endfor %}
</td>
<td>{%= badgeState(ar.State) %}</td>
<td>
{%s ar.ActiveAt.Format("2006-01-02T15:04:05Z07:00") %}
{% if ar.Restored %}{%= badgeRestored() %}{% endif %}
{% if ar.Stabilizing %}{%= badgeStabilizing() %}{% endif %}
</td>
<td>{%s ar.Value %}</td>
<td>
<a href="{%s prefix+ar.WebLink() %}">Details</a>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endfor %}
</div>
<br>
{% endfor %}
{% else %}

File diff suppressed because it is too large Load Diff

View File

@@ -23,6 +23,7 @@ func TestHandler(t *testing.T) {
})
g := &rule.Group{
Name: "group",
File: "rules.yaml",
Concurrency: 1,
}
ar := rule.NewAlertingRule(fq, g, config.Rule{ID: 0, Alert: "alert"})
@@ -35,7 +36,7 @@ func TestHandler(t *testing.T) {
}}
rh := &requestHandler{m: m}
getResp := func(url string, to interface{}, code int) {
getResp := func(t *testing.T, url string, to interface{}, code int) {
t.Helper()
resp, err := http.Get(url)
if err != nil {
@@ -59,43 +60,43 @@ func TestHandler(t *testing.T) {
defer ts.Close()
t.Run("/", func(t *testing.T) {
getResp(ts.URL, nil, 200)
getResp(ts.URL+"/vmalert", nil, 200)
getResp(ts.URL+"/vmalert/alerts", nil, 200)
getResp(ts.URL+"/vmalert/groups", nil, 200)
getResp(ts.URL+"/vmalert/notifiers", nil, 200)
getResp(ts.URL+"/rules", nil, 200)
getResp(t, ts.URL, nil, 200)
getResp(t, ts.URL+"/vmalert", nil, 200)
getResp(t, ts.URL+"/vmalert/alerts", nil, 200)
getResp(t, ts.URL+"/vmalert/groups", nil, 200)
getResp(t, ts.URL+"/vmalert/notifiers", nil, 200)
getResp(t, ts.URL+"/rules", nil, 200)
})
t.Run("/vmalert/rule", func(t *testing.T) {
a := ruleToAPI(ar)
getResp(ts.URL+"/vmalert/"+a.WebLink(), nil, 200)
getResp(t, ts.URL+"/vmalert/"+a.WebLink(), nil, 200)
r := ruleToAPI(rr)
getResp(ts.URL+"/vmalert/"+r.WebLink(), nil, 200)
getResp(t, ts.URL+"/vmalert/"+r.WebLink(), nil, 200)
})
t.Run("/vmalert/alert", func(t *testing.T) {
alerts := ruleToAPIAlert(ar)
for _, a := range alerts {
getResp(ts.URL+"/vmalert/"+a.WebLink(), nil, 200)
getResp(t, ts.URL+"/vmalert/"+a.WebLink(), nil, 200)
}
})
t.Run("/vmalert/rule?badParam", func(t *testing.T) {
params := fmt.Sprintf("?%s=0&%s=1", paramGroupID, paramRuleID)
getResp(ts.URL+"/vmalert/rule"+params, nil, 404)
getResp(t, ts.URL+"/vmalert/rule"+params, nil, 404)
params = fmt.Sprintf("?%s=1&%s=0", paramGroupID, paramRuleID)
getResp(ts.URL+"/vmalert/rule"+params, nil, 404)
getResp(t, ts.URL+"/vmalert/rule"+params, nil, 404)
})
t.Run("/api/v1/alerts", func(t *testing.T) {
lr := listAlertsResponse{}
getResp(ts.URL+"/api/v1/alerts", &lr, 200)
getResp(t, ts.URL+"/api/v1/alerts", &lr, 200)
if length := len(lr.Data.Alerts); length != 1 {
t.Errorf("expected 1 alert got %d", length)
}
lr = listAlertsResponse{}
getResp(ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
getResp(t, ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
if length := len(lr.Data.Alerts); length != 1 {
t.Errorf("expected 1 alert got %d", length)
}
@@ -103,13 +104,13 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/alert?alertID&groupID", func(t *testing.T) {
expAlert := newAlertAPI(ar, ar.GetAlerts()[0])
alert := &apiAlert{}
getResp(ts.URL+"/"+expAlert.APILink(), alert, 200)
getResp(t, ts.URL+"/"+expAlert.APILink(), alert, 200)
if !reflect.DeepEqual(alert, expAlert) {
t.Errorf("expected %v is equal to %v", alert, expAlert)
}
alert = &apiAlert{}
getResp(ts.URL+"/vmalert/"+expAlert.APILink(), alert, 200)
getResp(t, ts.URL+"/vmalert/"+expAlert.APILink(), alert, 200)
if !reflect.DeepEqual(alert, expAlert) {
t.Errorf("expected %v is equal to %v", alert, expAlert)
}
@@ -117,28 +118,28 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/alert?badParams", func(t *testing.T) {
params := fmt.Sprintf("?%s=0&%s=1", paramGroupID, paramAlertID)
getResp(ts.URL+"/api/v1/alert"+params, nil, 404)
getResp(ts.URL+"/vmalert/api/v1/alert"+params, nil, 404)
getResp(t, ts.URL+"/api/v1/alert"+params, nil, 404)
getResp(t, ts.URL+"/vmalert/api/v1/alert"+params, nil, 404)
params = fmt.Sprintf("?%s=1&%s=0", paramGroupID, paramAlertID)
getResp(ts.URL+"/api/v1/alert"+params, nil, 404)
getResp(ts.URL+"/vmalert/api/v1/alert"+params, nil, 404)
getResp(t, ts.URL+"/api/v1/alert"+params, nil, 404)
getResp(t, ts.URL+"/vmalert/api/v1/alert"+params, nil, 404)
// bad request, alertID is missing
params = fmt.Sprintf("?%s=1", paramGroupID)
getResp(ts.URL+"/api/v1/alert"+params, nil, 400)
getResp(ts.URL+"/vmalert/api/v1/alert"+params, nil, 400)
getResp(t, ts.URL+"/api/v1/alert"+params, nil, 400)
getResp(t, ts.URL+"/vmalert/api/v1/alert"+params, nil, 400)
})
t.Run("/api/v1/rules", func(t *testing.T) {
lr := listGroupsResponse{}
getResp(ts.URL+"/api/v1/rules", &lr, 200)
getResp(t, ts.URL+"/api/v1/rules", &lr, 200)
if length := len(lr.Data.Groups); length != 1 {
t.Errorf("expected 1 group got %d", length)
}
lr = listGroupsResponse{}
getResp(ts.URL+"/vmalert/api/v1/rules", &lr, 200)
getResp(t, ts.URL+"/vmalert/api/v1/rules", &lr, 200)
if length := len(lr.Data.Groups); length != 1 {
t.Errorf("expected 1 group got %d", length)
}
@@ -146,25 +147,93 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/rule?ruleID&groupID", func(t *testing.T) {
expRule := ruleToAPI(ar)
gotRule := apiRule{}
getResp(ts.URL+"/"+expRule.APILink(), &gotRule, 200)
getResp(t, ts.URL+"/"+expRule.APILink(), &gotRule, 200)
if expRule.ID != gotRule.ID {
t.Errorf("expected to get Rule %q; got %q instead", expRule.ID, gotRule.ID)
}
gotRule = apiRule{}
getResp(ts.URL+"/vmalert/"+expRule.APILink(), &gotRule, 200)
getResp(t, ts.URL+"/vmalert/"+expRule.APILink(), &gotRule, 200)
if expRule.ID != gotRule.ID {
t.Errorf("expected to get Rule %q; got %q instead", expRule.ID, gotRule.ID)
}
gotRuleWithUpdates := apiRuleWithUpdates{}
getResp(ts.URL+"/"+expRule.APILink(), &gotRuleWithUpdates, 200)
getResp(t, ts.URL+"/"+expRule.APILink(), &gotRuleWithUpdates, 200)
if gotRuleWithUpdates.StateUpdates == nil || len(gotRuleWithUpdates.StateUpdates) < 1 {
t.Fatalf("expected %+v to have state updates field not empty", gotRuleWithUpdates.StateUpdates)
}
})
t.Run("/api/v1/rules&filters", func(t *testing.T) {
check := func(url string, expGroups, expRules int) {
t.Helper()
lr := listGroupsResponse{}
getResp(t, ts.URL+url, &lr, 200)
if length := len(lr.Data.Groups); length != expGroups {
t.Errorf("expected %d groups got %d", expGroups, length)
}
if len(lr.Data.Groups) < 1 {
return
}
var rulesN int
for _, gr := range lr.Data.Groups {
rulesN += len(gr.Rules)
}
if rulesN != expRules {
t.Errorf("expected %d rules got %d", expRules, rulesN)
}
}
check("/api/v1/rules?type=alert", 1, 1)
check("/api/v1/rules?type=record", 1, 1)
check("/vmalert/api/v1/rules?type=alert", 1, 1)
check("/vmalert/api/v1/rules?type=record", 1, 1)
// no filtering expected due to bad params
check("/api/v1/rules?type=badParam", 1, 2)
check("/api/v1/rules?foo=bar", 1, 2)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=bar", 0, 0)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=group&rule_group[]=bar", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=foo", 0, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=foo", 1, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert", 1, 1)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert&rule_name[]=record", 1, 2)
})
t.Run("/api/v1/rules&exclude_alerts=true", func(t *testing.T) {
// check if response returns active alerts by default
lr := listGroupsResponse{}
getResp(t, ts.URL+"/api/v1/rules?rule_group[]=group&file[]=rules.yaml", &lr, 200)
activeAlerts := 0
for _, gr := range lr.Data.Groups {
for _, r := range gr.Rules {
activeAlerts += len(r.Alerts)
}
}
if activeAlerts == 0 {
t.Fatalf("expected at least 1 active alert in response; got 0")
}
// disable returning alerts via param
lr = listGroupsResponse{}
getResp(t, ts.URL+"/api/v1/rules?rule_group[]=group&file[]=rules.yaml&exclude_alerts=true", &lr, 200)
activeAlerts = 0
for _, gr := range lr.Data.Groups {
for _, r := range gr.Rules {
activeAlerts += len(r.Alerts)
}
}
if activeAlerts != 0 {
t.Fatalf("expected to get 0 active alert in response; got %d", activeAlerts)
}
})
}
func TestEmptyResponse(t *testing.T) {
@@ -172,7 +241,7 @@ func TestEmptyResponse(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { rhWithNoGroups.handler(w, r) }))
defer ts.Close()
getResp := func(url string, to interface{}, code int) {
getResp := func(t *testing.T, url string, to interface{}, code int) {
t.Helper()
resp, err := http.Get(url)
if err != nil {
@@ -195,13 +264,13 @@ func TestEmptyResponse(t *testing.T) {
t.Run("no groups /api/v1/alerts", func(t *testing.T) {
lr := listAlertsResponse{}
getResp(ts.URL+"/api/v1/alerts", &lr, 200)
getResp(t, ts.URL+"/api/v1/alerts", &lr, 200)
if lr.Data.Alerts == nil {
t.Errorf("expected /api/v1/alerts response to have non-nil data")
}
lr = listAlertsResponse{}
getResp(ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
getResp(t, ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
if lr.Data.Alerts == nil {
t.Errorf("expected /api/v1/alerts response to have non-nil data")
}
@@ -209,13 +278,13 @@ func TestEmptyResponse(t *testing.T) {
t.Run("no groups /api/v1/rules", func(t *testing.T) {
lr := listGroupsResponse{}
getResp(ts.URL+"/api/v1/rules", &lr, 200)
getResp(t, ts.URL+"/api/v1/rules", &lr, 200)
if lr.Data.Groups == nil {
t.Errorf("expected /api/v1/rules response to have non-nil data")
}
lr = listGroupsResponse{}
getResp(ts.URL+"/vmalert/api/v1/rules", &lr, 200)
getResp(t, ts.URL+"/vmalert/api/v1/rules", &lr, 200)
if lr.Data.Groups == nil {
t.Errorf("expected /api/v1/rules response to have non-nil data")
}
@@ -226,13 +295,13 @@ func TestEmptyResponse(t *testing.T) {
t.Run("empty group /api/v1/rules", func(t *testing.T) {
lr := listGroupsResponse{}
getResp(ts.URL+"/api/v1/rules", &lr, 200)
getResp(t, ts.URL+"/api/v1/rules", &lr, 200)
if lr.Data.Groups == nil {
t.Fatalf("expected /api/v1/rules response to have non-nil data")
}
lr = listGroupsResponse{}
getResp(ts.URL+"/vmalert/api/v1/rules", &lr, 200)
getResp(t, ts.URL+"/vmalert/api/v1/rules", &lr, 200)
if lr.Data.Groups == nil {
t.Fatalf("expected /api/v1/rules response to have non-nil data")
}

View File

@@ -193,10 +193,15 @@ func ruleToAPI(r interface{}) apiRule {
return apiRule{}
}
const (
ruleTypeRecording = "recording"
ruleTypeAlerting = "alerting"
)
func recordingToAPI(rr *rule.RecordingRule) apiRule {
lastState := rule.GetLastEntry(rr)
r := apiRule{
Type: "recording",
Type: ruleTypeRecording,
DatasourceType: rr.Type.String(),
Name: rr.Name,
Query: rr.Expr,
@@ -224,7 +229,7 @@ func recordingToAPI(rr *rule.RecordingRule) apiRule {
func alertingToAPI(ar *rule.AlertingRule) apiRule {
lastState := rule.GetLastEntry(ar)
r := apiRule{
Type: "alerting",
Type: ruleTypeAlerting,
DatasourceType: ar.Type.String(),
Name: ar.Name,
Query: ar.Expr,

View File

@@ -87,6 +87,9 @@ vmauth-linux-ppc64le:
vmauth-linux-s390x:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmauth-linux-loong64:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=loong64 $(MAKE) app-local-goos-goarch
vmauth-linux-386:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -1,3 +1,3 @@
See vmauth docs [here](https://docs.victoriametrics.com/vmauth.html).
See vmauth docs [here](https://docs.victoriametrics.com/vmauth/).
vmauth docs can be edited at [docs/vmauth.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmauth.md).

View File

@@ -2,54 +2,70 @@ package main
import (
"bytes"
"context"
"encoding/base64"
"flag"
"fmt"
"math"
"net/http"
"net/url"
"os"
"regexp"
"strconv"
"sort"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs/fscore"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
var (
authConfigPath = flag.String("auth.config", "", "Path to auth config. It can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmauth.html for details on the format of this auth config")
"See https://docs.victoriametrics.com/vmauth/ for details on the format of this auth config")
configCheckInterval = flag.Duration("configCheckInterval", 0, "interval for config file re-read. "+
"Zero value disables config re-reading. By default, refreshing is disabled, send SIGHUP for config refresh.")
defaultRetryStatusCodes = flagutil.NewArrayInt("retryStatusCodes", 0, "Comma-separated list of default HTTP response status codes when vmauth re-tries the request on other backends. "+
"See https://docs.victoriametrics.com/vmauth.html#load-balancing for details")
"See https://docs.victoriametrics.com/vmauth/#load-balancing for details")
defaultLoadBalancingPolicy = flag.String("loadBalancingPolicy", "least_loaded", "The default load balancing policy to use for backend urls specified inside url_prefix section. "+
"Supported policies: least_loaded, first_available. See https://docs.victoriametrics.com/vmauth.html#load-balancing for more details")
"Supported policies: least_loaded, first_available. See https://docs.victoriametrics.com/vmauth/#load-balancing")
discoverBackendIPsGlobal = flag.Bool("discoverBackendIPs", false, "Whether to discover backend IPs via periodic DNS queries to hostnames specified in url_prefix. "+
"This may be useful when url_prefix points to a hostname with dynamically scaled instances behind it. See https://docs.victoriametrics.com/vmauth/#discovering-backend-ips")
discoverBackendIPsInterval = flag.Duration("discoverBackendIPsInterval", 10*time.Second, "The interval for re-discovering backend IPs if -discoverBackendIPs command-line flag is set. "+
"Too low value may lead to DNS errors")
httpAuthHeader = flagutil.NewArrayString("httpAuthHeader", "HTTP request header to use for obtaining authorization tokens. By default auth tokens are read from Authorization request header")
)
// AuthConfig represents auth config.
type AuthConfig struct {
Users []UserInfo `yaml:"users,omitempty"`
UnauthorizedUser *UserInfo `yaml:"unauthorized_user,omitempty"`
// ms holds all the metrics for the given AuthConfig
ms *metrics.Set
}
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
AuthToken string `yaml:"auth_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
DiscoverBackendIPs *bool `yaml:"discover_backend_ips,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
HeadersConf HeadersConf `yaml:",inline"`
MaxConcurrentRequests int `yaml:"max_concurrent_requests,omitempty"`
@@ -57,22 +73,28 @@ type UserInfo struct {
RetryStatusCodes []int `yaml:"retry_status_codes,omitempty"`
LoadBalancingPolicy string `yaml:"load_balancing_policy,omitempty"`
DropSrcPathPrefixParts *int `yaml:"drop_src_path_prefix_parts,omitempty"`
TLSInsecureSkipVerify *bool `yaml:"tls_insecure_skip_verify,omitempty"`
TLSCAFile string `yaml:"tls_ca_file,omitempty"`
TLSCertFile string `yaml:"tls_cert_file,omitempty"`
TLSKeyFile string `yaml:"tls_key_file,omitempty"`
TLSServerName string `yaml:"tls_server_name,omitempty"`
TLSInsecureSkipVerify *bool `yaml:"tls_insecure_skip_verify,omitempty"`
MetricLabels map[string]string `yaml:"metric_labels,omitempty"`
concurrencyLimitCh chan struct{}
concurrencyLimitReached *metrics.Counter
httpTransport *http.Transport
rt http.RoundTripper
requests *metrics.Counter
backendErrors *metrics.Counter
requestsDuration *metrics.Summary
}
// HeadersConf represents config for request and response headers.
type HeadersConf struct {
RequestHeaders []Header `yaml:"headers,omitempty"`
ResponseHeaders []Header `yaml:"response_headers,omitempty"`
RequestHeaders []*Header `yaml:"headers,omitempty"`
ResponseHeaders []*Header `yaml:"response_headers,omitempty"`
}
func (ui *UserInfo) beginConcurrencyLimit() error {
@@ -101,6 +123,8 @@ func (ui *UserInfo) getMaxConcurrentRequests() int {
type Header struct {
Name string
Value string
sOriginal string
}
// UnmarshalYAML unmarshals h from f.
@@ -109,6 +133,8 @@ func (h *Header) UnmarshalYAML(f func(interface{}) error) error {
if err := f(&s); err != nil {
return err
}
h.sOriginal = s
n := strings.IndexByte(s, ':')
if n < 0 {
return fmt.Errorf("missing speparator char ':' between Name and Value in the header %q; expected format - 'Name: Value'", s)
@@ -120,21 +146,29 @@ func (h *Header) UnmarshalYAML(f func(interface{}) error) error {
// MarshalYAML marshals h to yaml.
func (h *Header) MarshalYAML() (interface{}, error) {
s := fmt.Sprintf("%s: %s", h.Name, h.Value)
return s, nil
return h.sOriginal, nil
}
// URLMap is a mapping from source paths to target urls.
type URLMap struct {
// SrcHosts is the list of regular expressions, which match the request hostname.
// SrcPaths is an optional list of regular expressions, which must match the request path.
SrcPaths []*Regex `yaml:"src_paths,omitempty"`
// SrcHosts is an optional list of regular expressions, which must match the request hostname.
SrcHosts []*Regex `yaml:"src_hosts,omitempty"`
// SrcPaths is the list of regular expressions, which match the request path.
SrcPaths []*Regex `yaml:"src_paths,omitempty"`
// SrcQueryArgs is an optional list of query args, which must match request URL query args.
SrcQueryArgs []*QueryArg `yaml:"src_query_args,omitempty"`
// SrcHeaders is an optional list of headers, which must match request headers.
SrcHeaders []*Header `yaml:"src_headers,omitempty"`
// UrlPrefix contains backend url prefixes for the proxied request url.
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
// DiscoverBackendIPs instructs discovering URLPrefix backend IPs via DNS.
DiscoverBackendIPs *bool `yaml:"discover_backend_ips,omitempty"`
// HeadersConf is the config for augumenting request and response headers.
HeadersConf HeadersConf `yaml:",inline"`
@@ -148,27 +182,78 @@ type URLMap struct {
DropSrcPathPrefixParts *int `yaml:"drop_src_path_prefix_parts,omitempty"`
}
// Regex represents a regex
type Regex struct {
// QueryArg represents HTTP query arg
type QueryArg struct {
Name string
Value *Regex
sOriginal string
re *regexp.Regexp
}
// UnmarshalYAML unmarshals qa from yaml.
func (qa *QueryArg) UnmarshalYAML(f func(interface{}) error) error {
var s string
if err := f(&s); err != nil {
return err
}
qa.sOriginal = s
n := strings.IndexByte(s, '=')
if n < 0 {
return nil
}
qa.Name = s[:n]
expr := s[n+1:]
if !strings.HasPrefix(expr, "~") {
expr = regexp.QuoteMeta(expr)
} else {
expr = expr[1:]
}
var re Regex
if err := yaml.Unmarshal([]byte(expr), &re); err != nil {
return fmt.Errorf("cannot unmarshal regex for %q query arg: %w", qa.Name, err)
}
qa.Value = &re
return nil
}
// MarshalYAML marshals qa to yaml.
func (qa *QueryArg) MarshalYAML() (interface{}, error) {
return qa.sOriginal, nil
}
// URLPrefix represents passed `url_prefix`
type URLPrefix struct {
n uint32
// the list of backend urls
bus []*backendURL
// requests are re-tried on other backend urls for these http response status codes
retryStatusCodes []int
// load balancing policy used
loadBalancingPolicy string
// how many request path prefix parts to drop before routing the request to backendURL.
// how many request path prefix parts to drop before routing the request to backendURL
dropSrcPathPrefixParts int
// busOriginal contains the original list of backends specified in yaml config.
busOriginal []*url.URL
// n is an atomic counter, which is used for balancing load among available backends.
n atomic.Uint32
// the list of backend urls
//
// the list can be dynamically updated if `discover_backend_ips` option is set.
bus atomic.Pointer[[]*backendURL]
// if this option is set, then backend ips for busOriginal are periodically re-discovered and put to bus.
discoverBackendIPs bool
// The next deadline for DNS-based discovery of backend IPs
nextDiscoveryDeadline atomic.Uint64
// vOriginal contains the original yaml value for URLPrefix.
vOriginal interface{}
}
func (up *URLPrefix) setLoadBalancingPolicy(loadBalancingPolicy string) error {
@@ -184,49 +269,160 @@ func (up *URLPrefix) setLoadBalancingPolicy(loadBalancingPolicy string) error {
}
type backendURL struct {
brokenDeadline uint64
concurrentRequests int32
url *url.URL
brokenDeadline atomic.Uint64
concurrentRequests atomic.Int32
url *url.URL
}
func (bu *backendURL) isBroken() bool {
ct := fasttime.UnixTimestamp()
return ct < atomic.LoadUint64(&bu.brokenDeadline)
return ct < bu.brokenDeadline.Load()
}
func (bu *backendURL) setBroken() {
deadline := fasttime.UnixTimestamp() + uint64((*failTimeout).Seconds())
atomic.StoreUint64(&bu.brokenDeadline, deadline)
bu.brokenDeadline.Store(deadline)
}
func (bu *backendURL) get() {
atomic.AddInt32(&bu.concurrentRequests, 1)
bu.concurrentRequests.Add(1)
}
func (bu *backendURL) put() {
atomic.AddInt32(&bu.concurrentRequests, -1)
bu.concurrentRequests.Add(-1)
}
func (up *URLPrefix) getBackendsCount() int {
return len(up.bus)
pbus := up.bus.Load()
return len(*pbus)
}
// getBackendURL returns the backendURL depending on the load balance policy.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func (up *URLPrefix) getBackendURL() *backendURL {
up.discoverBackendAddrsIfNeeded()
pbus := up.bus.Load()
bus := *pbus
if up.loadBalancingPolicy == "first_available" {
return up.getFirstAvailableBackendURL()
return getFirstAvailableBackendURL(bus)
}
return up.getLeastLoadedBackendURL()
return getLeastLoadedBackendURL(bus, &up.n)
}
func (up *URLPrefix) discoverBackendAddrsIfNeeded() {
if !up.discoverBackendIPs {
// The discovery is disabled.
return
}
ct := fasttime.UnixTimestamp()
deadline := up.nextDiscoveryDeadline.Load()
if ct < deadline {
// There is no need in discovering backends.
return
}
intervalSec := math.Ceil(discoverBackendIPsInterval.Seconds())
if intervalSec <= 0 {
intervalSec = 1
}
nextDeadline := ct + uint64(intervalSec)
if !up.nextDiscoveryDeadline.CompareAndSwap(deadline, nextDeadline) {
// Concurrent goroutine already started the discovery.
return
}
// Discover ips for all the backendURLs
ctx, cancel := context.WithTimeout(context.Background(), time.Second*time.Duration(intervalSec))
hostToAddrs := make(map[string][]string)
for _, bu := range up.busOriginal {
host := bu.Hostname()
if hostToAddrs[host] != nil {
// ips for the given host have been already discovered
continue
}
var resolvedAddrs []string
if strings.HasPrefix(host, "srv+") {
// The host has the format 'srv+realhost'. Strip 'srv+' prefix before performing the lookup.
host = strings.TrimPrefix(host, "srv+")
_, addrs, err := netutil.Resolver.LookupSRV(ctx, "", "", host)
if err != nil {
logger.Warnf("cannot discover backend SRV records for %s: %s; use it literally", bu, err)
resolvedAddrs = []string{host}
} else {
resolvedAddrs := make([]string, len(addrs))
for i, addr := range addrs {
resolvedAddrs[i] = fmt.Sprintf("%s:%d", addr.Target, addr.Port)
}
}
} else {
addrs, err := netutil.Resolver.LookupIPAddr(ctx, host)
if err != nil {
logger.Warnf("cannot discover backend IPs for %s: %s; use it literally", bu, err)
resolvedAddrs = []string{host}
} else {
resolvedAddrs = make([]string, len(addrs))
for i, addr := range addrs {
resolvedAddrs[i] = addr.String()
}
}
}
// sort resolvedAddrs, so they could be compared below in areEqualBackendURLs()
sort.Strings(resolvedAddrs)
hostToAddrs[host] = resolvedAddrs
}
cancel()
// generate new backendURLs for the resolved IPs
var busNew []*backendURL
for _, bu := range up.busOriginal {
host := bu.Hostname()
port := bu.Port()
for _, addr := range hostToAddrs[host] {
buCopy := *bu
buCopy.Host = addr
if port != "" {
if n := strings.IndexByte(buCopy.Host, ':'); n >= 0 {
// Drop the discovered port and substitute it the the port specified in bu.
buCopy.Host = buCopy.Host[:n]
}
buCopy.Host += ":" + port
}
busNew = append(busNew, &backendURL{
url: &buCopy,
})
}
}
pbus := up.bus.Load()
if areEqualBackendURLs(*pbus, busNew) {
return
}
// Store new backend urls
up.bus.Store(&busNew)
}
func areEqualBackendURLs(a, b []*backendURL) bool {
if len(a) != len(b) {
return false
}
for i, aURL := range a {
bURL := b[i]
if aURL.url.String() != bURL.url.String() {
return false
}
}
return true
}
// getFirstAvailableBackendURL returns the first available backendURL, which isn't broken.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func (up *URLPrefix) getFirstAvailableBackendURL() *backendURL {
bus := up.bus
func getFirstAvailableBackendURL(bus []*backendURL) *backendURL {
bu := bus[0]
if !bu.isBroken() {
// Fast path - send the request to the first url.
@@ -248,8 +444,7 @@ func (up *URLPrefix) getFirstAvailableBackendURL() *backendURL {
// getLeastLoadedBackendURL returns the backendURL with the minimum number of concurrent requests.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
bus := up.bus
func getLeastLoadedBackendURL(bus []*backendURL, atomicCounter *atomic.Uint32) *backendURL {
if len(bus) == 1 {
// Fast path - return the only backend url.
bu := bus[0]
@@ -258,7 +453,7 @@ func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
}
// Slow path - select other backend urls.
n := atomic.AddUint32(&up.n, 1)
n := atomicCounter.Add(1)
for i := uint32(0); i < uint32(len(bus)); i++ {
idx := (n + i) % uint32(len(bus))
@@ -266,20 +461,22 @@ func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
if bu.isBroken() {
continue
}
if atomic.CompareAndSwapInt32(&bu.concurrentRequests, 0, 1) {
if bu.concurrentRequests.Load() == 0 {
// Fast path - return the backend with zero concurrently executed requests.
// Do not use CompareAndSwap() instead of Load(), since it is much slower on systems with many CPU cores.
bu.concurrentRequests.Add(1)
return bu
}
}
// Slow path - return the backend with the minimum number of concurrently executed requests.
buMin := bus[n%uint32(len(bus))]
minRequests := atomic.LoadInt32(&buMin.concurrentRequests)
minRequests := buMin.concurrentRequests.Load()
for _, bu := range bus {
if bu.isBroken() {
continue
}
if n := atomic.LoadInt32(&bu.concurrentRequests); n < minRequests {
if n := bu.concurrentRequests.Load(); n < minRequests {
buMin = bu
minRequests = n
}
@@ -294,6 +491,7 @@ func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
if err := f(&v); err != nil {
return err
}
up.vOriginal = v
var urls []string
switch x := v.(type) {
@@ -316,38 +514,28 @@ func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
}
bus := make([]*backendURL, len(urls))
bus := make([]*url.URL, len(urls))
for i, u := range urls {
pu, err := url.Parse(u)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
}
bus[i] = &backendURL{
url: pu,
}
bus[i] = pu
}
up.bus = bus
up.busOriginal = bus
return nil
}
// MarshalYAML marshals up to yaml.
func (up *URLPrefix) MarshalYAML() (interface{}, error) {
var b []byte
if len(up.bus) == 1 {
u := up.bus[0].url.String()
b = strconv.AppendQuote(b, u)
return string(b), nil
}
b = append(b, '[')
for i, bu := range up.bus {
u := bu.url.String()
b = strconv.AppendQuote(b, u)
if i+1 < len(up.bus) {
b = append(b, ',')
}
}
b = append(b, ']')
return string(b), nil
return up.vOriginal, nil
}
// Regex represents a regex
type Regex struct {
re *regexp.Regexp
sOriginal string
}
func (r *Regex) match(s string) bool {
@@ -368,12 +556,13 @@ func (r *Regex) UnmarshalYAML(f func(interface{}) error) error {
if err := f(&s); err != nil {
return err
}
r.sOriginal = s
sAnchored := "^(?:" + s + ")$"
re, err := regexp.Compile(sAnchored)
if err != nil {
return fmt.Errorf("cannot build regexp from %q: %w", s, err)
}
r.sOriginal = s
r.re = re
return nil
}
@@ -462,17 +651,19 @@ func authConfigReloader(sighupCh <-chan os.Signal) {
// authConfigData needs to be updated each time authConfig is updated.
var authConfigData atomic.Pointer[[]byte]
var authConfig atomic.Pointer[AuthConfig]
var authUsers atomic.Pointer[map[string]*UserInfo]
var authConfigWG sync.WaitGroup
var stopCh chan struct{}
var (
authConfig atomic.Pointer[AuthConfig]
authUsers atomic.Pointer[map[string]*UserInfo]
authConfigWG sync.WaitGroup
stopCh chan struct{}
)
// loadAuthConfig loads and applies the config from *authConfigPath.
// It returns bool value to identify if new config was applied.
// The config can be not applied if there is a parsing error
// or if there are no changes to the current authConfig.
func loadAuthConfig() (bool, error) {
data, err := fs.ReadFileOrHTTP(*authConfigPath)
data, err := fscore.ReadFileOrHTTP(*authConfigPath)
if err != nil {
return false, fmt.Errorf("failed to read -auth.config=%q: %w", *authConfigPath, err)
}
@@ -494,9 +685,22 @@ func loadAuthConfig() (bool, error) {
}
logger.Infof("loaded information about %d users from -auth.config=%q", len(m), *authConfigPath)
prevAc := authConfig.Load()
if prevAc != nil {
metrics.UnregisterSet(prevAc.ms)
}
metrics.RegisterSet(ac.ms)
authConfig.Store(ac)
authConfigData.Store(&data)
authUsers.Store(&m)
if prevAc != nil {
// explicilty unregister metrics, since all summary type metrics
// are registered at global state of metrics package
// and must be removed from it to release memory.
// Metrics must be unregistered only after atomic.Value.Store calls above
// Otherwise it may lead to metric gaps, since UnregisterAllMetrics is slow operation
prevAc.ms.UnregisterAllMetrics()
}
return true, nil
}
@@ -506,10 +710,13 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
if err != nil {
return nil, fmt.Errorf("cannot expand environment vars: %w", err)
}
var ac AuthConfig
if err = yaml.UnmarshalStrict(data, &ac); err != nil {
ac := &AuthConfig{
ms: metrics.NewSet(),
}
if err = yaml.UnmarshalStrict(data, ac); err != nil {
return nil, fmt.Errorf("cannot unmarshal AuthConfig data: %w", err)
}
ui := ac.UnauthorizedUser
if ui != nil {
if ui.Username != "" {
@@ -521,29 +728,39 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
if ui.BearerToken != "" {
return nil, fmt.Errorf("field bearer_token can't be specified for unauthorized_user section")
}
if ui.AuthToken != "" {
return nil, fmt.Errorf("field auth_token can't be specified for unauthorized_user section")
}
if ui.Name != "" {
return nil, fmt.Errorf("field name can't be specified for unauthorized_user section")
}
if err := ui.initURLs(); err != nil {
return nil, err
}
ui.requests = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_requests_total`)
ui.requestsDuration = metrics.GetOrCreateSummary(`vmauth_unauthorized_user_request_duration_seconds`)
metricLabels, err := ui.getMetricLabels()
if err != nil {
return nil, fmt.Errorf("cannot parse metric_labels for unauthorized_user: %w", err)
}
ui.requests = ac.ms.NewCounter(`vmauth_unauthorized_user_requests_total` + metricLabels)
ui.backendErrors = ac.ms.NewCounter(`vmauth_unauthorized_user_request_backend_errors_total` + metricLabels)
ui.requestsDuration = ac.ms.NewSummary(`vmauth_unauthorized_user_request_duration_seconds` + metricLabels)
ui.concurrencyLimitCh = make(chan struct{}, ui.getMaxConcurrentRequests())
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_concurrent_requests_limit_reached_total`)
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_capacity`, func() float64 {
ui.concurrencyLimitReached = ac.ms.NewCounter(`vmauth_unauthorized_user_concurrent_requests_limit_reached_total` + metricLabels)
_ = ac.ms.NewGauge(`vmauth_unauthorized_user_concurrent_requests_capacity`+metricLabels, func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_current`, func() float64 {
_ = ac.ms.NewGauge(`vmauth_unauthorized_user_concurrent_requests_current`+metricLabels, func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
tr, err := getTransport(ui.TLSInsecureSkipVerify, ui.TLSCAFile)
rt, err := newRoundTripper(ui.TLSCAFile, ui.TLSCertFile, ui.TLSKeyFile, ui.TLSServerName, ui.TLSInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("cannot initialize HTTP transport: %w", err)
return nil, fmt.Errorf("cannot initialize HTTP RoundTripper: %w", err)
}
ui.httpTransport = tr
ui.rt = rt
}
return &ac, nil
return ac, nil
}
func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
@@ -554,64 +771,80 @@ func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
byAuthToken := make(map[string]*UserInfo, len(uis))
for i := range uis {
ui := &uis[i]
if ui.BearerToken == "" && ui.Username == "" {
return nil, fmt.Errorf("either bearer_token or username must be set")
ats, err := getAuthTokens(ui.AuthToken, ui.BearerToken, ui.Username, ui.Password)
if err != nil {
return nil, err
}
if ui.BearerToken != "" && ui.Username != "" {
return nil, fmt.Errorf("bearer_token=%q and username=%q cannot be set simultaneously", ui.BearerToken, ui.Username)
for _, at := range ats {
if uiOld := byAuthToken[at]; uiOld != nil {
return nil, fmt.Errorf("duplicate auth token=%q found for username=%q, name=%q; the previous one is set for username=%q, name=%q",
at, ui.Username, ui.Name, uiOld.Username, uiOld.Name)
}
}
at1, at2 := getAuthTokens(ui.BearerToken, ui.Username, ui.Password)
if byAuthToken[at1] != nil {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", ui.BearerToken, ui.Username, at1)
}
if byAuthToken[at2] != nil {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", ui.BearerToken, ui.Username, at2)
}
if err := ui.initURLs(); err != nil {
return nil, err
}
name := ui.name()
if ui.BearerToken != "" {
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
ui.requestsDuration = metrics.GetOrCreateSummary(fmt.Sprintf(`vmauth_user_request_duration_seconds{username=%q}`, name))
}
if ui.Username != "" {
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
ui.requestsDuration = metrics.GetOrCreateSummary(fmt.Sprintf(`vmauth_user_request_duration_seconds{username=%q}`, name))
metricLabels, err := ui.getMetricLabels()
if err != nil {
return nil, fmt.Errorf("cannot parse metric_labels: %w", err)
}
ui.requests = ac.ms.GetOrCreateCounter(`vmauth_user_requests_total` + metricLabels)
ui.backendErrors = ac.ms.GetOrCreateCounter(`vmauth_user_request_backend_errors_total` + metricLabels)
ui.requestsDuration = ac.ms.GetOrCreateSummary(`vmauth_user_request_duration_seconds` + metricLabels)
mcr := ui.getMaxConcurrentRequests()
ui.concurrencyLimitCh = make(chan struct{}, mcr)
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_concurrent_requests_limit_reached_total{username=%q}`, name))
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_capacity{username=%q}`, name), func() float64 {
ui.concurrencyLimitReached = ac.ms.GetOrCreateCounter(`vmauth_user_concurrent_requests_limit_reached_total` + metricLabels)
_ = ac.ms.GetOrCreateGauge(`vmauth_user_concurrent_requests_capacity`+metricLabels, func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
_ = ac.ms.GetOrCreateGauge(`vmauth_user_concurrent_requests_current`+metricLabels, func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
tr, err := getTransport(ui.TLSInsecureSkipVerify, ui.TLSCAFile)
rt, err := newRoundTripper(ui.TLSCAFile, ui.TLSCertFile, ui.TLSKeyFile, ui.TLSServerName, ui.TLSInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("cannot initialize HTTP transport: %w", err)
return nil, fmt.Errorf("cannot initialize HTTP RoundTripper: %w", err)
}
ui.httpTransport = tr
ui.rt = rt
byAuthToken[at1] = ui
byAuthToken[at2] = ui
for _, at := range ats {
byAuthToken[at] = ui
}
}
return byAuthToken, nil
}
var labelNameRegexp = regexp.MustCompile("^[a-zA-Z_:.][a-zA-Z0-9_:.]*$")
func (ui *UserInfo) getMetricLabels() (string, error) {
name := ui.name()
if len(name) == 0 && len(ui.MetricLabels) == 0 {
// fast path
return "", nil
}
labels := make([]string, 0, len(ui.MetricLabels)+1)
if len(name) > 0 {
labels = append(labels, fmt.Sprintf(`username=%q`, name))
}
for k, v := range ui.MetricLabels {
if !labelNameRegexp.MatchString(k) {
return "", fmt.Errorf("incorrect label name=%q, it must match regex=%q for user=%q", k, labelNameRegexp, name)
}
labels = append(labels, fmt.Sprintf(`%s=%q`, k, v))
}
sort.Strings(labels)
labelsStr := "{" + strings.Join(labels, ",") + "}"
return labelsStr, nil
}
func (ui *UserInfo) initURLs() error {
retryStatusCodes := defaultRetryStatusCodes.Values()
loadBalancingPolicy := *defaultLoadBalancingPolicy
dropSrcPathPrefixParts := 0
discoverBackendIPs := *discoverBackendIPsGlobal
if ui.URLPrefix != nil {
if err := ui.URLPrefix.sanitize(); err != nil {
if err := ui.URLPrefix.sanitizeAndInitialize(); err != nil {
return err
}
if ui.RetryStatusCodes != nil {
@@ -623,30 +856,35 @@ func (ui *UserInfo) initURLs() error {
if ui.DropSrcPathPrefixParts != nil {
dropSrcPathPrefixParts = *ui.DropSrcPathPrefixParts
}
if ui.DiscoverBackendIPs != nil {
discoverBackendIPs = *ui.DiscoverBackendIPs
}
ui.URLPrefix.retryStatusCodes = retryStatusCodes
ui.URLPrefix.dropSrcPathPrefixParts = dropSrcPathPrefixParts
ui.URLPrefix.discoverBackendIPs = discoverBackendIPs
if err := ui.URLPrefix.setLoadBalancingPolicy(loadBalancingPolicy); err != nil {
return err
}
}
if ui.DefaultURL != nil {
if err := ui.DefaultURL.sanitize(); err != nil {
if err := ui.DefaultURL.sanitizeAndInitialize(); err != nil {
return err
}
}
for _, e := range ui.URLMaps {
if len(e.SrcPaths) == 0 && len(e.SrcHosts) == 0 {
return fmt.Errorf("missing `src_paths` and `src_hosts` in `url_map`")
if len(e.SrcPaths) == 0 && len(e.SrcHosts) == 0 && len(e.SrcQueryArgs) == 0 && len(e.SrcHeaders) == 0 {
return fmt.Errorf("missing `src_paths`, `src_hosts`, `src_query_args` and `src_headers` in `url_map`")
}
if e.URLPrefix == nil {
return fmt.Errorf("missing `url_prefix` in `url_map`")
}
if err := e.URLPrefix.sanitize(); err != nil {
if err := e.URLPrefix.sanitizeAndInitialize(); err != nil {
return err
}
rscs := retryStatusCodes
lbp := loadBalancingPolicy
dsp := dropSrcPathPrefixParts
dbd := discoverBackendIPs
if e.RetryStatusCodes != nil {
rscs = e.RetryStatusCodes
}
@@ -656,14 +894,18 @@ func (ui *UserInfo) initURLs() error {
if e.DropSrcPathPrefixParts != nil {
dsp = *e.DropSrcPathPrefixParts
}
if e.DiscoverBackendIPs != nil {
dbd = *e.DiscoverBackendIPs
}
e.URLPrefix.retryStatusCodes = rscs
if err := e.URLPrefix.setLoadBalancingPolicy(lbp); err != nil {
return err
}
e.URLPrefix.dropSrcPathPrefixParts = dsp
e.URLPrefix.discoverBackendIPs = dbd
}
if len(ui.URLMaps) == 0 && ui.URLPrefix == nil {
return fmt.Errorf("missing `url_prefix`")
return fmt.Errorf("missing `url_prefix` or `url_map`")
}
return nil
}
@@ -676,39 +918,100 @@ func (ui *UserInfo) name() string {
return ui.Username
}
if ui.BearerToken != "" {
return "bearer_token"
h := xxhash.Sum64([]byte(ui.BearerToken))
return fmt.Sprintf("bearer_token:hash:%016X", h)
}
if ui.AuthToken != "" {
h := xxhash.Sum64([]byte(ui.AuthToken))
return fmt.Sprintf("auth_token:hash:%016X", h)
}
return ""
}
func getAuthTokens(bearerToken, username, password string) (string, string) {
if bearerToken != "" {
// Accept the bearerToken as Basic Auth username with empty password
at1 := getAuthToken(bearerToken, "", "")
at2 := getAuthToken("", bearerToken, "")
return at1, at2
func getAuthTokens(authToken, bearerToken, username, password string) ([]string, error) {
if authToken != "" {
if bearerToken != "" {
return nil, fmt.Errorf("bearer_token cannot be specified if auth_token is set")
}
if username != "" || password != "" {
return nil, fmt.Errorf("username and password cannot be specified if auth_token is set")
}
at := getHTTPAuthToken(authToken)
return []string{at}, nil
}
at := getAuthToken("", username, password)
return at, at
if bearerToken != "" {
if username != "" || password != "" {
return nil, fmt.Errorf("username and password cannot be specified if bearer_token is set")
}
// Accept the bearerToken as Basic Auth username with empty password
at1 := getHTTPAuthBearerToken(bearerToken)
at2 := getHTTPAuthBasicToken(bearerToken, "")
return []string{at1, at2}, nil
}
if username != "" {
at := getHTTPAuthBasicToken(username, password)
return []string{at}, nil
}
return nil, fmt.Errorf("missing authorization options; bearer_token or username must be set")
}
func getAuthToken(bearerToken, username, password string) string {
if bearerToken != "" {
return "Bearer " + bearerToken
}
func getHTTPAuthToken(authToken string) string {
return "http_auth:" + authToken
}
func getHTTPAuthBearerToken(bearerToken string) string {
return "http_auth:Bearer " + bearerToken
}
func getHTTPAuthBasicToken(username, password string) string {
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
return "http_auth:Basic " + token64
}
func (up *URLPrefix) sanitize() error {
for _, bu := range up.bus {
puNew, err := sanitizeURLPrefix(bu.url)
var defaultHeaderNames = []string{"Authorization"}
func getAuthTokensFromRequest(r *http.Request) []string {
var ats []string
// Obtain possible auth tokens from one of the allowed auth headers
headerNames := *httpAuthHeader
if len(headerNames) == 0 {
headerNames = defaultHeaderNames
}
for _, headerName := range headerNames {
if ah := r.Header.Get(headerName); ah != "" {
if strings.HasPrefix(ah, "Token ") {
// Handle InfluxDB's proprietary token authentication scheme as a bearer token authentication
// See https://docs.influxdata.com/influxdb/v2.0/api/
ah = strings.Replace(ah, "Token", "Bearer", 1)
}
at := "http_auth:" + ah
ats = append(ats, at)
}
}
return ats
}
func (up *URLPrefix) sanitizeAndInitialize() error {
for i, bu := range up.busOriginal {
puNew, err := sanitizeURLPrefix(bu)
if err != nil {
return err
}
bu.url = puNew
up.busOriginal[i] = puNew
}
// Initialize up.bus
bus := make([]*backendURL, len(up.busOriginal))
for i, bu := range up.busOriginal {
bus[i] = &backendURL{
url: bu,
}
}
up.bus.Store(&bus)
return nil
}

View File

@@ -4,10 +4,11 @@ import (
"bytes"
"fmt"
"net/url"
"regexp"
"testing"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
func TestParseAuthConfigFailure(t *testing.T) {
@@ -17,9 +18,9 @@ func TestParseAuthConfigFailure(t *testing.T) {
if err != nil {
return
}
_, err = parseAuthConfigUsers(ac)
users, err := parseAuthConfigUsers(ac)
if err == nil {
t.Fatalf("expecting non-nil error")
t.Fatalf("expecting non-nil error; got %v", users)
}
}
@@ -88,6 +89,22 @@ users:
url_prefix: []
`)
// auth_token and username in a single config
f(`
users:
- auth_token: foo
username: bbb
url_prefix: http://foo.bar
`)
// auth_token and bearer_token in a single config
f(`
users:
- auth_token: foo
bearer_token: bbb
url_prefix: http://foo.bar
`)
// Username and bearer_token in a single config
f(`
users:
@@ -192,7 +209,7 @@ users:
- url_prefix: http://foobar
`)
// Invalid regexp in src_path.
// Invalid regexp in src_paths
f(`
users:
- username: a
@@ -210,6 +227,24 @@ users:
url_prefix: http://foobar
`)
// Invalid src_query_args
f(`
users:
- username: a
url_map:
- src_query_args: abc
url_prefix: http://foobar
`)
// Invalid src_headers
f(`
users:
- username: a
url_map:
- src_headers: abc
url_prefix: http://foobar
`)
// Invalid headers in url_map (missing ':')
f(`
users:
@@ -229,6 +264,14 @@ users:
url_prefix: http://foobar
headers:
aaa: bbb
`)
// Invalid metric label name
f(`
users:
- username: foo
url_prefix: http://foo.bar
metric_labels:
not-prometheus-compatible: value
`)
}
@@ -249,8 +292,9 @@ func TestParseAuthConfigSuccess(t *testing.T) {
}
}
// Single user
insecureSkipVerifyTrue := true
// Single user
f(`
users:
- username: foo
@@ -259,7 +303,7 @@ users:
max_concurrent_requests: 5
tls_insecure_skip_verify: true
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
getHTTPAuthBasicToken("foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
@@ -268,31 +312,58 @@ users:
},
})
// Single user with auth_token
f(`
users:
- auth_token: foo
url_prefix: https://aaa:343/bbb
max_concurrent_requests: 5
tls_insecure_skip_verify: true
tls_server_name: "foo.bar"
tls_ca_file: "foo/bar"
tls_cert_file: "foo/baz"
tls_key_file: "foo/foo"
`, map[string]*UserInfo{
getHTTPAuthToken("foo"): {
AuthToken: "foo",
URLPrefix: mustParseURL("https://aaa:343/bbb"),
MaxConcurrentRequests: 5,
TLSInsecureSkipVerify: &insecureSkipVerifyTrue,
TLSServerName: "foo.bar",
TLSCAFile: "foo/bar",
TLSCertFile: "foo/baz",
TLSKeyFile: "foo/foo",
},
})
// Multiple url_prefix entries
insecureSkipVerifyFalse := false
discoverBackendIPsTrue := true
f(`
users:
- username: foo
password: bar
url_prefix:
- http://node1:343/bbb
- http://node2:343/bbb
- http://srv+node2:343/bbb
tls_insecure_skip_verify: false
retry_status_codes: [500, 501]
load_balancing_policy: first_available
drop_src_path_prefix_parts: 1
discover_backend_ips: true
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
getHTTPAuthBasicToken("foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURLs([]string{
"http://node1:343/bbb",
"http://node2:343/bbb",
"http://srv+node2:343/bbb",
}),
TLSInsecureSkipVerify: &insecureSkipVerifyFalse,
RetryStatusCodes: []int{500, 501},
LoadBalancingPolicy: "first_available",
DropSrcPathPrefixParts: intp(1),
DiscoverBackendIPs: &discoverBackendIPsTrue,
},
})
@@ -302,19 +373,49 @@ users:
- username: foo
url_prefix: http://foo
- username: bar
url_prefix: https://bar/x///
url_prefix: https://bar/x/
`, map[string]*UserInfo{
getAuthToken("", "foo", ""): {
getHTTPAuthBasicToken("foo", ""): {
Username: "foo",
URLPrefix: mustParseURL("http://foo"),
},
getAuthToken("", "bar", ""): {
getHTTPAuthBasicToken("bar", ""): {
Username: "bar",
URLPrefix: mustParseURL("https://bar/x"),
URLPrefix: mustParseURL("https://bar/x/"),
},
})
// non-empty URLMap
sharedUserInfo := &UserInfo{
BearerToken: "foo",
URLMaps: []URLMap{
{
SrcPaths: getRegexs([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcHosts: getRegexs([]string{"foo\\.bar", "baz:1234"}),
SrcPaths: getRegexs([]string{"/api/v1/write"}),
SrcQueryArgs: []*QueryArg{
mustNewQueryArg("foo=b.+ar"),
mustNewQueryArg("baz=~.*x=y.+"),
},
SrcHeaders: []*Header{
mustNewHeader("'TenantID: 345'"),
},
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
HeadersConf: HeadersConf{
RequestHeaders: []*Header{
mustNewHeader("'foo: bar'"),
mustNewHeader("'xxx: y'"),
},
},
},
},
}
f(`
users:
- bearer_token: foo
@@ -323,71 +424,18 @@ users:
url_prefix: http://vmselect/select/0/prometheus
- src_paths: ["/api/v1/write"]
src_hosts: ["foo\\.bar", "baz:1234"]
src_query_args: ['foo=b.+ar', 'baz=~.*x=y.+']
src_headers: ['TenantID: 345']
url_prefix: ["http://vminsert1/insert/0/prometheus","http://vminsert2/insert/0/prometheus"]
headers:
- "foo: bar"
- "xxx: y"
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
SrcPaths: getRegexs([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcHosts: getRegexs([]string{"foo\\.bar", "baz:1234"}),
SrcPaths: getRegexs([]string{"/api/v1/write"}),
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
HeadersConf: HeadersConf{
RequestHeaders: []Header{
{
Name: "foo",
Value: "bar",
},
{
Name: "xxx",
Value: "y",
},
},
},
},
},
},
getAuthToken("", "foo", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
SrcPaths: getRegexs([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcHosts: getRegexs([]string{"foo\\.bar", "baz:1234"}),
SrcPaths: getRegexs([]string{"/api/v1/write"}),
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
HeadersConf: HeadersConf{
RequestHeaders: []Header{
{
Name: "foo",
Value: "bar",
},
{
Name: "xxx",
Value: "y",
},
},
},
},
},
},
getHTTPAuthBearerToken("foo"): sharedUserInfo,
getHTTPAuthBasicToken("foo", ""): sharedUserInfo,
})
// Multiple users with the same name
// Multiple users with the same name - this should work, since these users have different passwords
f(`
users:
- username: foo-same
@@ -395,19 +443,20 @@ users:
url_prefix: http://foo
- username: foo-same
password: bar
url_prefix: https://bar/x///
url_prefix: https://bar/x
`, map[string]*UserInfo{
getAuthToken("", "foo-same", "baz"): {
getHTTPAuthBasicToken("foo-same", "baz"): {
Username: "foo-same",
Password: "baz",
URLPrefix: mustParseURL("http://foo"),
},
getAuthToken("", "foo-same", "bar"): {
getHTTPAuthBasicToken("foo-same", "bar"): {
Username: "foo-same",
Password: "bar",
URLPrefix: mustParseURL("https://bar/x"),
},
})
// with default url
f(`
users:
@@ -424,7 +473,7 @@ users:
- http://default1/select/0/prometheus
- http://default2/select/0/prometheus
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
getHTTPAuthBearerToken("foo"): {
BearerToken: "foo",
URLMaps: []URLMap{
{
@@ -438,15 +487,9 @@ users:
"http://vminsert2/insert/0/prometheus",
}),
HeadersConf: HeadersConf{
RequestHeaders: []Header{
{
Name: "foo",
Value: "bar",
},
{
Name: "xxx",
Value: "y",
},
RequestHeaders: []*Header{
mustNewHeader("'foo: bar'"),
mustNewHeader("'xxx: y'"),
},
},
},
@@ -456,7 +499,7 @@ users:
"http://default2/select/0/prometheus",
}),
},
getAuthToken("", "foo", ""): {
getHTTPAuthBasicToken("foo", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
@@ -470,15 +513,9 @@ users:
"http://vminsert2/insert/0/prometheus",
}),
HeadersConf: HeadersConf{
RequestHeaders: []Header{
{
Name: "foo",
Value: "bar",
},
{
Name: "xxx",
Value: "y",
},
RequestHeaders: []*Header{
mustNewHeader("'foo: bar'"),
mustNewHeader("'xxx: y'"),
},
},
},
@@ -490,6 +527,41 @@ users:
},
})
// With metric_labels
f(`
users:
- username: foo-same
password: baz
url_prefix: http://foo
metric_labels:
dc: eu
team: dev
- username: foo-same
password: bar
url_prefix: https://bar/x
metric_labels:
backend_env: test
team: accounting
`, map[string]*UserInfo{
getHTTPAuthBasicToken("foo-same", "baz"): {
Username: "foo-same",
Password: "baz",
URLPrefix: mustParseURL("http://foo"),
MetricLabels: map[string]string{
"dc": "eu",
"team": "dev",
},
},
getHTTPAuthBasicToken("foo-same", "bar"): {
Username: "foo-same",
Password: "bar",
URLPrefix: mustParseURL("https://bar/x"),
MetricLabels: map[string]string{
"backend_env": "test",
"team": "accounting",
},
},
})
}
func TestParseAuthConfigPassesTLSVerificationConfig(t *testing.T) {
@@ -516,16 +588,96 @@ unauthorized_user:
t.Fatalf("unexpected error: %s", err)
}
ui := m[getAuthToken("", "foo", "bar")]
if !isSetBool(ui.TLSInsecureSkipVerify, true) || !ui.httpTransport.TLSClientConfig.InsecureSkipVerify {
ui := m[getHTTPAuthBasicToken("foo", "bar")]
if !isSetBool(ui.TLSInsecureSkipVerify, true) {
t.Fatalf("unexpected TLSInsecureSkipVerify value for user foo")
}
if !isSetBool(ac.UnauthorizedUser.TLSInsecureSkipVerify, false) || ac.UnauthorizedUser.httpTransport.TLSClientConfig.InsecureSkipVerify {
if !isSetBool(ac.UnauthorizedUser.TLSInsecureSkipVerify, false) {
t.Fatalf("unexpected TLSInsecureSkipVerify value for unauthorized_user")
}
}
func TestUserInfoGetMetricLabels(t *testing.T) {
t.Run("empty-labels", func(t *testing.T) {
ui := &UserInfo{
Username: "user1",
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{username="user1"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("non-empty-username", func(t *testing.T) {
ui := &UserInfo{
Username: "user1",
MetricLabels: map[string]string{
"env": "prod",
"datacenter": "dc1",
},
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{datacenter="dc1",env="prod",username="user1"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("non-empty-name", func(t *testing.T) {
ui := &UserInfo{
Name: "user1",
BearerToken: "abc",
MetricLabels: map[string]string{
"env": "prod",
"datacenter": "dc1",
},
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{datacenter="dc1",env="prod",username="user1"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("non-empty-bearer-token", func(t *testing.T) {
ui := &UserInfo{
BearerToken: "abc",
MetricLabels: map[string]string{
"env": "prod",
"datacenter": "dc1",
},
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{datacenter="dc1",env="prod",username="bearer_token:hash:44BC2CF5AD770999"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("invalid-label", func(t *testing.T) {
ui := &UserInfo{
Username: "foo",
MetricLabels: map[string]string{
",{": "aaaa",
},
}
_, err := ui.getMetricLabels()
if err == nil {
t.Fatalf("expecting non-nil error")
}
})
}
func isSetBool(boolP *bool, expectedValue bool) bool {
if boolP == nil {
return false
@@ -533,13 +685,74 @@ func isSetBool(boolP *bool, expectedValue bool) bool {
return *boolP == expectedValue
}
func TestGetLeastLoadedBackendURL(t *testing.T) {
up := mustParseURLs([]string{
"http://node1:343",
"http://node2:343",
"http://node3:343",
})
up.loadBalancingPolicy = "least_loaded"
fn := func(ns ...int) {
t.Helper()
bus := up.bus.Load()
pbus := *bus
for i, b := range pbus {
got := int(b.concurrentRequests.Load())
exp := ns[i]
if got != exp {
t.Fatalf("expected %q to have %d concurrent requests; got %d instead", b.url, exp, got)
}
}
}
up.getBackendURL()
fn(0, 1, 0)
up.getBackendURL()
fn(0, 1, 1)
up.getBackendURL()
fn(1, 1, 1)
up.getBackendURL()
up.getBackendURL()
fn(1, 2, 2)
bus := up.bus.Load()
pbus := *bus
pbus[0].concurrentRequests.Add(2)
pbus[2].concurrentRequests.Add(5)
fn(3, 2, 7)
up.getBackendURL()
fn(3, 3, 7)
up.getBackendURL()
fn(3, 4, 7)
up.getBackendURL()
fn(4, 4, 7)
up.getBackendURL()
fn(5, 4, 7)
up.getBackendURL()
fn(5, 5, 7)
up.getBackendURL()
fn(6, 5, 7)
up.getBackendURL()
fn(6, 6, 7)
up.getBackendURL()
up.getBackendURL()
fn(7, 7, 7)
}
func getRegexs(paths []string) []*Regex {
var sps []*Regex
for _, path := range paths {
sps = append(sps, &Regex{
sOriginal: path,
re: regexp.MustCompile("^(?:" + path + ")$"),
})
sps = append(sps, mustNewRegex(path))
}
return sps
}
@@ -571,6 +784,7 @@ func mustParseURL(u string) *URLPrefix {
func mustParseURLs(us []string) *URLPrefix {
bus := make([]*backendURL, len(us))
urls := make([]*url.URL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
@@ -579,12 +793,43 @@ func mustParseURLs(us []string) *URLPrefix {
bus[i] = &backendURL{
url: pu,
}
urls[i] = pu
}
return &URLPrefix{
bus: bus,
up := &URLPrefix{}
if len(us) == 1 {
up.vOriginal = us[0]
} else {
up.vOriginal = us
}
up.bus.Store(&bus)
up.busOriginal = urls
return up
}
func intp(n int) *int {
return &n
}
func mustNewRegex(s string) *Regex {
var re Regex
if err := yaml.Unmarshal([]byte(s), &re); err != nil {
logger.Panicf("cannot unmarshal regex %q: %s", s, err)
}
return &re
}
func mustNewQueryArg(s string) *QueryArg {
var qa QueryArg
if err := yaml.Unmarshal([]byte(s), &qa); err != nil {
logger.Panicf("cannot unmarshal query arg filter %q: %s", s, err)
}
return &qa
}
func mustNewHeader(s string) *Header {
var h Header
if err := yaml.Unmarshal([]byte(s), &h); err != nil {
logger.Panicf("cannot unmarshal header filter %q: %s", s, err)
}
return &h
}

View File

@@ -10,6 +10,11 @@ users:
- bearer_token: "XXXX"
url_prefix: "http://localhost:8428"
# Adds labels to the exported metrics for given user section
# label name must be prometheus compatible and match regex: `^[a-zA-Z_:.][a-zA-Z0-9_:.]*$`
metric_labels:
backend_dc: eu
access_team: dev
# Requests with the 'Authorization: Bearer YYY' header are proxied to http://localhost:8428 ,
# The `X-Scope-OrgID: foobar` http header is added to every proxied request.
# The `X-Server-Hostname:` http header is removed from the proxied response.

View File

@@ -2,8 +2,6 @@ package main
import (
"context"
"crypto/tls"
"crypto/x509"
"errors"
"flag"
"fmt"
@@ -13,6 +11,7 @@ import (
"net/textproto"
"net/url"
"os"
"slices"
"strings"
"sync"
"time"
@@ -21,20 +20,19 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host. "+
@@ -45,16 +43,22 @@ var (
maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
"in per-user config")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
failTimeout = flag.Duration("failTimeout", 3*time.Second, "Sets a delay period for load balancing to skip a malfunctioning backend")
maxRequestBodySizeToRetry = flagutil.NewBytes("maxRequestBodySizeToRetry", 16*1024, "The maximum request body size, which can be cached and re-tried at other backends. "+
"Bigger values may require more memory")
backendTLSInsecureSkipVerify = flag.Bool("backend.tlsInsecureSkipVerify", false, "Whether to skip TLS verification when connecting to backends over HTTPS. "+
"See https://docs.victoriametrics.com/vmauth.html#backend-tls-setup")
"See https://docs.victoriametrics.com/vmauth/#backend-tls-setup")
backendTLSCAFile = flag.String("backend.TLSCAFile", "", "Optional path to TLS root CA file, which is used for TLS verification when connecting to backends over HTTPS. "+
"See https://docs.victoriametrics.com/vmauth.html#backend-tls-setup")
"See https://docs.victoriametrics.com/vmauth/#backend-tls-setup")
backendTLSCertFile = flag.String("backend.TLSCertFile", "", "Optional path to TLS client certificate file, which must be sent to HTTPS backend. "+
"See https://docs.victoriametrics.com/vmauth/#backend-tls-setup")
backendTLSKeyFile = flag.String("backend.TLSKeyFile", "", "Optional path to TLS client key file, which must be sent to HTTPS backend. "+
"See https://docs.victoriametrics.com/vmauth/#backend-tls-setup")
backendTLSServerName = flag.String("backend.TLSServerName", "", "Optional TLS ServerName, which must be sent to HTTPS backend. "+
"See https://docs.victoriametrics.com/vmauth/#backend-tls-setup")
)
func main() {
@@ -65,10 +69,14 @@ func main() {
buildinfo.Init()
logger.Init()
logger.Infof("starting vmauth at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8427"}
}
logger.Infof("starting vmauth at %q...", listenAddrs)
startTime := time.Now()
initAuthConfig()
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started vmauth in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -77,8 +85,8 @@ func main() {
pushmetrics.Stop()
startTime = time.Now()
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
@@ -89,7 +97,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/-/reload":
if !httpserver.CheckAuthFlag(w, r, *reloadAuthKey, "reloadAuthKey") {
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
configReloadRequests.Inc()
@@ -97,8 +105,9 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusOK)
return true
}
authToken := r.Header.Get("Authorization")
if authToken == "" {
ats := getAuthTokensFromRequest(r)
if len(ats) == 0 {
// Process requests for unauthorized users
ui := authConfig.Load().UnauthorizedUser
if ui != nil {
@@ -110,18 +119,12 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
http.Error(w, "missing `Authorization` request header", http.StatusUnauthorized)
return true
}
if strings.HasPrefix(authToken, "Token ") {
// Handle InfluxDB's proprietary token authentication scheme as a bearer token authentication
// See https://docs.influxdata.com/influxdb/v2.0/api/
authToken = strings.Replace(authToken, "Token", "Bearer", 1)
}
ac := *authUsers.Load()
ui := ac[authToken]
ui := getUserInfoByAuthTokens(ats)
if ui == nil {
invalidAuthTokenRequests.Inc()
if *logInvalidAuthTokens {
err := fmt.Errorf("cannot find the provided auth token %q in config", authToken)
err := fmt.Errorf("cannot authorize request with auth tokens %q", ats)
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusUnauthorized,
@@ -137,6 +140,17 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
func getUserInfoByAuthTokens(ats []string) *UserInfo {
ac := *authUsers.Load()
for _, at := range ats {
ui := ac[at]
if ui != nil {
return ui
}
}
return nil
}
func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
startTime := time.Now()
defer ui.requestsDuration.UpdateDuration(startTime)
@@ -165,7 +179,7 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
u := normalizeURL(r.URL)
up, hc := ui.getURLPrefixAndHeaders(u)
up, hc := ui.getURLPrefixAndHeaders(u, r.Header)
isDefault := false
if up == nil {
if ui.DefaultURL == nil {
@@ -201,7 +215,7 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
} else { // Update path for regular routes.
targetURL = mergeURLs(targetURL, u, up.dropSrcPathPrefixParts)
}
ok := tryProcessingRequest(w, r, targetURL, hc, up.retryStatusCodes, ui.httpTransport)
ok := tryProcessingRequest(w, r, targetURL, hc, up.retryStatusCodes, ui)
bu.put()
if ok {
return
@@ -213,15 +227,23 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int, transport *http.Transport) bool {
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int, ui *UserInfo) bool {
// This code has been copied from net/http/httputil/reverseproxy.go
req := sanitizeRequestHeaders(r)
req.URL = targetURL
req.Host = targetURL.Host
if req.URL.Scheme == "https" {
// Override req.Host only for https requests, since https server verifies hostnames during TLS handshake,
// so it expects the targetURL.Host in the request.
// There is no need in overriding the req.Host for http requests, since it is expected that backend server
// may properly process queries with the original req.Host.
req.Host = targetURL.Host
}
updateHeadersByConfig(req.Header, hc.RequestHeaders)
res, err := transport.RoundTrip(req)
res, err := ui.rt.RoundTrip(req)
rtb, rtbOK := req.Body.(*readTrackingBody)
if err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
@@ -229,15 +251,20 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
if errors.Is(err, context.DeadlineExceeded) {
// Timed out request must be counted as errors, since this usually means that the backend is slow.
ui.backendErrors.Inc()
}
return true
}
if !rtbOK || !rtb.canRetry() {
// Request body cannot be re-sent to another backend. Return the error to the client then.
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot proxy the request to %q: %w", targetURL, err),
Err: fmt.Errorf("cannot proxy the request to %s: %w", targetURL, err),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
return true
}
// Retry the request if its body wasn't read yet. This usually means that the backend isn't reachable.
@@ -247,7 +274,20 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
logger.Warnf("remoteAddr: %s; requestURI: %s; retrying the request to %s because of response error: %s", remoteAddr, req.URL, targetURL, err)
return false
}
if (rtbOK && rtb.canRetry()) && hasInt(retryStatusCodes, res.StatusCode) {
if slices.Contains(retryStatusCodes, res.StatusCode) {
_ = res.Body.Close()
if !rtbOK || !rtb.canRetry() {
// If we get an error from the retry_status_codes list, but cannot execute retry,
// we consider such a request an error as well.
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("got response status code=%d from %s, but cannot retry the request on another backend, because the request has been already consumed",
res.StatusCode, targetURL),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
return true
}
// Retry requests at other backends if it matches retryStatusCodes.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
@@ -266,6 +306,7 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
copyBuf.B = bytesutil.ResizeNoCopyNoOverallocate(copyBuf.B, 16*1024)
_, err = io.CopyBuffer(w, res.Body, copyBuf.B)
copyBufPool.Put(copyBuf)
_ = res.Body.Close()
if err != nil && !netutil.IsTrivialNetworkError(err) {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
@@ -275,15 +316,6 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
return true
}
func hasInt(a []int, n int) bool {
for _, x := range a {
if x == n {
return true
}
}
return false
}
var copyBufPool bytesutil.ByteBufferPool
func copyHeader(dst, src http.Header) {
@@ -294,7 +326,7 @@ func copyHeader(dst, src http.Header) {
}
}
func updateHeadersByConfig(headers http.Header, config []Header) {
func updateHeadersByConfig(headers http.Header, config []*Header) {
for _, h := range config {
if h.Value == "" {
headers.Del(h.Name)
@@ -363,48 +395,41 @@ var (
missingRouteRequests = metrics.NewCounter(`vmauth_http_request_errors_total{reason="missing_route"}`)
)
func getTransport(insecureSkipVerifyP *bool, caFile string) (*http.Transport, error) {
if insecureSkipVerifyP == nil {
insecureSkipVerifyP = backendTLSInsecureSkipVerify
func newRoundTripper(caFileOpt, certFileOpt, keyFileOpt, serverNameOpt string, insecureSkipVerifyP *bool) (http.RoundTripper, error) {
caFile := *backendTLSCAFile
if caFileOpt != "" {
caFile = caFileOpt
}
insecureSkipVerify := *insecureSkipVerifyP
if caFile == "" {
caFile = *backendTLSCAFile
certFile := *backendTLSCertFile
if certFileOpt != "" {
certFile = certFileOpt
}
keyFile := *backendTLSKeyFile
if keyFileOpt != "" {
keyFile = keyFileOpt
}
serverName := *backendTLSServerName
if serverNameOpt != "" {
serverName = serverNameOpt
}
insecureSkipVerify := *backendTLSInsecureSkipVerify
if p := insecureSkipVerifyP; p != nil {
insecureSkipVerify = *p
}
opts := &promauth.Options{
TLSConfig: &promauth.TLSConfig{
CAFile: caFile,
CertFile: certFile,
KeyFile: keyFile,
ServerName: serverName,
InsecureSkipVerify: insecureSkipVerify,
},
}
cfg, err := opts.NewConfig()
if err != nil {
return nil, fmt.Errorf("cannot initialize promauth.Config: %w", err)
}
bb := bbPool.Get()
defer bbPool.Put(bb)
bb.B = appendTransportKey(bb.B[:0], insecureSkipVerify, caFile)
transportMapLock.Lock()
defer transportMapLock.Unlock()
tr := transportMap[string(bb.B)]
if tr == nil {
trLocal, err := newTransport(insecureSkipVerify, caFile)
if err != nil {
return nil, err
}
transportMap[string(bb.B)] = trLocal
tr = trLocal
}
return tr, nil
}
var transportMap = make(map[string]*http.Transport)
var transportMapLock sync.Mutex
func appendTransportKey(dst []byte, insecureSkipVerify bool, caFile string) []byte {
dst = encoding.MarshalBool(dst, insecureSkipVerify)
dst = encoding.MarshalBytes(dst, bytesutil.ToUnsafeBytes(caFile))
return dst
}
var bbPool bytesutil.ByteBufferPool
func newTransport(insecureSkipVerify bool, caFile string) (*http.Transport, error) {
tr := http.DefaultTransport.(*http.Transport).Clone()
tr.ResponseHeaderTimeout = *responseTimeout
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
@@ -413,27 +438,10 @@ func newTransport(insecureSkipVerify bool, caFile string) (*http.Transport, erro
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
tlsCfg := tr.TLSClientConfig
if tlsCfg == nil {
tlsCfg = &tls.Config{}
tr.TLSClientConfig = tlsCfg
}
if insecureSkipVerify || caFile != "" {
tlsCfg.ClientSessionCache = tls.NewLRUClientSessionCache(0)
tlsCfg.InsecureSkipVerify = insecureSkipVerify
if caFile != "" {
data, err := fs.ReadFileOrHTTP(caFile)
if err != nil {
return nil, fmt.Errorf("cannot read tls_ca_file: %w", err)
}
rootCA := x509.NewCertPool()
if !rootCA.AppendCertsFromPEM(data) {
return nil, fmt.Errorf("cannot parse data read from tls_ca_file %q", caFile)
}
tlsCfg.RootCAs = rootCA
}
}
return tr, nil
tr.DialContext = netutil.DialMaybeSRV
rt := cfg.NewRoundTripper(tr)
return rt, nil
}
var (
@@ -457,7 +465,7 @@ func usage() {
const s = `
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
See the docs at https://docs.victoriametrics.com/vmauth.html .
See the docs at https://docs.victoriametrics.com/vmauth/ .
`
flagutil.Usage(s)
}

Some files were not shown because too many files have changed in this diff Show More