Compare commits

...

195 Commits

Author SHA1 Message Date
hagen1778
a6d68f859f docs: improve wording of query stats
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-28 13:02:48 +02:00
Artem Fetishev
b911f721b3 docs: Fix the definition of active time series (#8824)
Based on the code, an active time series is one that has received at
least one sample during the last hour. It does not includes querying.

The first time the docs were updated with this was in
d06aae9454. But even at that time, the
code shows that querying a metric did not make it active.



- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-28 12:11:47 +02:00
hagen1778
3f39bd08cf docs: tidy up last changelog
* put important changes on top;
* replace `this issue` with corresponding issue number to make it more sense;
* consistently format components names.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-28 09:02:06 +02:00
Alexander Marshalov
cddf36af43 vmcloud docs: add information about access tokens (#8780)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Jose Gómez-Sellés <14234281+jgomezselles@users.noreply.github.com>
2025-04-27 21:40:40 +02:00
hagen1778
0e555b421d dashboards: update query stats
* display max values instead of cumulative values
to outline anomalies;
* remove unnecessary overrides on column width;
* add corresponding formatting units to displayed values.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-27 17:34:26 +02:00
Aliaksandr Valialkin
ec6f33f526 lib/logstorage: prevent from slow memory leak at datadb.rb
datadb.rb contains logRows shards, which weren't freed up after the data ingestion
for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs
for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards
after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order
to reduce the pressure on GC.
2025-04-26 22:40:32 +02:00
Aliaksandr Valialkin
e316ab878c all: fix broken links to *.md files inside docs package after the commit f152021521
This commit moved all the docs/*.md files to docs/victoriametrics/ folder. This broke direct links to these files
such as https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmagent.md .
Add the missing `/victoriametrics/` part here, so the link becomes https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmagent.md .

The move of all the docs/*.md files into docs/victoriametrics/ folder is dubious because of the following reasons:

- It breaks direct links to *.md files like mentioned above. It is impossible to fix such links all over the Internet,
  so they will remain broken :(
  The best thing we can do is to fix them on the resources we control such as VictoriaMetrics repository.

- It breaks Google indexing of VictoriaMetrics docs. Google index contains old links such as https://docs.victoriametrics.com/vmagent/#features .
  Now such links are automatically redirected to https://docs.victoriametrics.com/victoriametrics/vmagent/#features by a javascript on the https://docs.victoriametrics.com/vmagent/ page.
  Google doesn't like redirects at javascript, since they are frequently used by black hat SEO purposes. This leads to pessimization of VictoriaMetrics docs
  in Google search result :( We cannot update the old links all over the Internet in order to avoid the redirect by javascript :(
  The best thing we can do is to add <meta rel="canonical"> header with the new location of the page at the old url,
  and hope Google won't remove VictoriaMetrics docs from its search results. @AndrewChubatiuk , please do this.

- It breaks backwards navigation. When you click the https://docs.victoriametrics.com/vmagent/ link on some page and then press `back` button
  in the web browser, it won't return back you to the original page because of the intermediate redirect :( The broken navigation cannot be fixed
  for old links located all over the Internet.

- It increases chances of breaking old links left on the Internet in the future, which will lead to 404 Not Found pages
  and angry users :(

The sad thing is that we hit the same wall with harmful redirects again :( In the beginning the VictoriaMetrics docs links had .html suffix,
and they were case-sensitive. For example, http://docs.victoriametrics.com/MetricsQL.html#rate . It has been decided for some unknown reason
that it is a good idea to remove the .html suffix and to make all the links lowercase. So now such links are automatically redirected by javascript
to https://docs.victoriametrics.com/metricsql/#rate and then redirected again by another javasript to https://docs.victoriametrics.com/victoriametrics/metricsql/#rate :(
There are many old links all over the Internet (for example, at Reddit, StackOverflow, some internal Wiki pages, etc.). We cannot fix all of them,
so we need to pray these links won't break in the future.

@hagen1778, @tenmozes, @makasim , please make sure we won't hit the same wall in a third time.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8595
2025-04-26 00:52:24 +02:00
Aliaksandr Valialkin
4aa0f93085 docs/victoriametrics/Articles.md: add a link to https://blog.ogenki.io/post/series/observability/alerts/ 2025-04-26 00:26:42 +02:00
Aliaksandr Valialkin
764fc6010d vendor: run make vendor-update 2025-04-25 23:25:32 +02:00
Aliaksandr Valialkin
3ec70a4b6d Makefile: run go mod tidy with -compat=1.24 instead of -compat=1.23
VictoriaMetrics source code depends on Go 1.24, so it is better to run `go mod tidy` in Go 1.24 compatibility mode.
2025-04-25 23:16:57 +02:00
Aliaksandr Valialkin
3b7039679f lib/logstorage: make golangc-lint happy by substituting unused function arg with _ 2025-04-25 23:14:36 +02:00
Aliaksandr Valialkin
f3197dc5d7 deployment: update VictoriaLogs Docker image tags from v1.20.0-victorialogs to v1.21.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.21.0-victorialogs
2025-04-25 19:48:48 +02:00
Aliaksandr Valialkin
9d82de9385 docs/victorialogs: cut v1.21.0-victorialogs 2025-04-25 19:41:56 +02:00
Aliaksandr Valialkin
c34f6db5c0 app/vlselect/vmui: run make vmui-logs-update after 202fc0652c 2025-04-25 19:40:42 +02:00
Aliaksandr Valialkin
8ad81220d3 lib/logstorage: increase scalability of datadb.mustAddRows() on hosts with many CPU cores
Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts.
Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time.

This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one
from concurrently running goroutines, by 2x.
2025-04-25 19:35:33 +02:00
Aliaksandr Valialkin
7455e6c0a5 lib/logstorage: re-use newTestLogRows() for creating LogRows inside BenchmarkStorageMustAddRows 2025-04-25 19:35:32 +02:00
Alexander Marshalov
22c85a149e vmcloud docs: add intergrations section (#8744)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-25 19:28:47 +02:00
f41gh7
1e4238ede7 make vmui-update 2025-04-25 15:51:11 +03:00
Jose Gómez-Sellés
1d218f6aeb docs/cloud: add organizations documents (#8814)
Here a description of important terms related to Organizations is
provided. The intention is to leverage on a good UI design rather than
explaining all steps needed to perfomr certain actions.

However, an effort on explaining the meaning of internal states and
terms has been done.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-25 14:32:21 +02:00
f41gh7
ba4c9133ee lib/handshake: log client network errors during handshake as warnings
This commit modifies the logging behavior for client network errors
(e.g., EOFs, timeouts) during the handshake process. They are now logged
as warnings instead of errors, as they are not actionable from the
server’s perspective. Here's some examples of such errors.

Timeouts during the initial read phase:

2025-04-09T07:08:59.323Z	error
VictoriaMetrics/lib/vmselectapi/server.go:204	cannot perform
vmselect handshake with client "<REDACTED>": cannot read hello: cannot
read message with size 11: read tcp4 <REDACTED>-><REDACTED>: i/o
timeout; read only 0 bytes

EOFs occurring later in the handshake process:

2025-04-08T18:01:30.783Z	error
VictoriaMetrics/lib/vmselectapi/server.go:204	cannot perform
vmselect handshake with client "<REDACTED>": cannot read isCompressed
flag: cannot read message with size 1: EOF; read only 0 bytes

By logging these as warnings, we reduce noise in error logs while
preserving valuble information for debug.
2025-04-25 12:03:03 +03:00
Nikolay
dcadc65681 lib/cgroup: properly parse cpu limit
Previously, if `cpu.max` file has only `max` resource defined without
`period`, it was parsed incorrectly and silently drop error. While this
syntax is valid and actually used by some container runtimes. If period
is not defined, default value for it 100_000 must be used.

 This commit fixes parsing function by using default value for period.
In addition, it adds zero value check, which fixes possible panic if
period has 0 value.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8808
2025-04-25 11:48:05 +03:00
Zhu Jiekun
ae9126fae7 changelog: fix update note & sort upcomming changelog
sort the upcoming changelog in alphabet order.
2025-04-25 11:38:09 +03:00
hagen1778
bf3b98954e docs: tidy up changelog before release
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-25 10:03:30 +02:00
Andrii Chubatiuk
814a37e57e docs: fixed docs-debug target (#8791)
### Describe Your Changes

related https://github.com/VictoriaMetrics/vmdocs/issues/129

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-25 09:39:18 +02:00
f41gh7
0afa51a156 fixes test after 1572e1e5c3 2025-04-24 23:32:37 +03:00
Max Kotliar
b89cce6bd6 docs\stream-aggregation: Describe dropping unneeded labels in more details
Follow-up on
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8715 and
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8681#issuecomment-2796127921
2025-04-24 20:26:37 +03:00
Artur Minchukou
fdfa68bc2d app/vmui/logs: fix zoom behavior on VictoriaLogs chart
Fix zoom behavior on VictoriaLogs chart. The problem was that after
zooming, the chart was recreated and the cursor position on the graph
was lost, so further zooming occurred relative to the zero position. So
to fix this problem:
- removed unnecessary dependencies from useEffect to avoid recreating
the chart;
- moved the chart with the legend to a separate component to optimize
the code when the graph is hidden;
- also this PR contains changes from [this
PR](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8718), which
preserve zoom selection on autorefresh

Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8557
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8558
2025-04-24 20:25:48 +03:00
Zhu Jiekun
1572e1e5c3 app/vmagent: add consistent hashing for the remote write sharding
This commit adds the following changes:
* use consistent hashing  for the remote write sharding.
* properly count metric of remote write samples drop rate  when `shardByURL` was
enabled.

Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8546
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8702
2025-04-24 20:22:59 +03:00
Aliaksandr Valialkin
55d75b911c app/vlstorage: add /internal/force_flush endpoint for flusing all the recently ingested logs to searchable parts
This endpoint is needed for integration tests, which need querying the recently ingested logs.
2025-04-24 18:17:30 +02:00
Aliaksandr Valialkin
f9e312f35a app/vlselect/logsql: allow passing multiple extra_filters and extra_stream_filters query args to all the querying APIs
This should help the proper implementation of extra filters in VictoriaLogs Plugin for Grafana and
in VictoriaLogs web UI. See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/293#issuecomment-2827285583
2025-04-24 17:55:33 +02:00
Aliaksandr Valialkin
0cfe28c2fc Revert "ci: temporary disable vlogs tests for i386 "
This reverts commit fa6a32a39d.

Reason for revert: the broken tests were fixed on GOARCH=386 by skipping the check for the state size
after improting the state of stats function, since the state size depends on the hardware architecture.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8710
2025-04-24 17:31:40 +02:00
Yury Molodov
1c39853164 vmui: show read usage in Cardinality Explorer (#8757)
### Describe Your Changes

Two new columns were added to the **Cardinality Explorer** table:

`Requests count`
- Displays how many times a metric was queried since stats collection
began
- Highlighted **red** when the value is `0` or missing

`Last request`
- Shows when the metric was last queried
- Highlighted:
  - **Red** if older than **30 days**
  - **Yellow** if between **7 and 30 days**
- Shows exact timestamp on hover in `YYYY-MM-DD HH:mm:ss` format

Related issue: #6145


![image](https://github.com/user-attachments/assets/fcf6f33c-e2d0-45c0-9f20-177cf6fa9dde)

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-24 14:07:29 +02:00
Andrii Chubatiuk
7390f99897 app/vmalert: fixed rule group ID in UI (#8803)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2025-04-24 13:46:42 +02:00
Hui Wang
2d279750a9 lib/storage/downsampling: allow using -downsampling.period=filter:0s:0s to skip downsampling for time series that match the specified filter
* downsampling: allow using `-downsampling.period=filter:0s:0s` to skip downsampling for time series that match the specified `filter`

* doc minor fix

* lib/storage/downsampling: fix a typo in error message

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-24 14:41:37 +04:00
Artem Navoiev
e9c577132d docs: tag all documentation pages (#8793)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-24 02:58:02 -07:00
Yury Molodov
899d70a93e vmui: fix /flags API requests to respect custom URL prefixes (#8777)
### Describe Your Changes

Fixes incorrect /flags request in vmui when using URL prefixes.
 
 Related issue: #8641 

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-24 11:33:43 +02:00
Yury Molodov
1d13718fcf vmui/logs: fix oversized Query History button on mobile (#8794)
### Describe Your Changes

Fixes oversized `Query History` button on mobile in logs UI.
Related issue: #8788 

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-24 11:25:17 +02:00
Yury Molodov
fa4708178f vmui: hide duplicate legend items in Raw Query (#8784)
### Describe Your Changes

Hide identical series in the legend on the Raw Query page when
deduplication is disabled.

Related issue: #8688

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-24 11:20:41 +02:00
Harilou
e0ff58ce5e app/vmlogs: Replace elasticsearch plugin with opensearch plugin of logstash
### Describe your changes

1. When I used the opensearch plugin in the reference example, I found a
bug that caused memory exhaustion and stopped working. There is a
similar issue in the opensearch community:
https://github.com/opensearch-project/logstash-output-opensearch/issues/186,
@chadmyers raised this issue, but it has not been fixed yet. It has been
verified that the elasticsearch plugin does not have this problem.

![image](https://github.com/user-attachments/assets/cc2cbbfc-6dde-43af-8aca-3c1e371e1890)


2. In the vmlogs logstash docking example, it is consistent with the
official document's example of using the elasticsearch plugin.
https://docs.victoriametrics.com/victorialogs/data-ingestion/logstash/

### Checklist

The following checks are **required**:

- [x] My changes follow the [VictoriaMetrics Contribution
Guide](https://docs.victoriametrics.com/contributing/).
2025-04-24 12:07:56 +04:00
Yury Molodov
a9f03605d3 vmui/logs: handle ANSI escape sequences (#8548)
### Describe Your Changes

1. Added a toggle to enable parsing of ANSI escape sequences (e.g.,
color codes).
    Related issue:  #6614
2. Fixed the display of raw fields when the `_msg` field is missing. 
    Related issue: #8205
3. Fixed log display in Group view when Markdown parsing is enabled. 
    Related issue: #8575
4. Moved the Markdown parsing toggle to **Group View Settings**.
5. Configured testing setup.

<details>
<summary>Demo UI:</summary>

<br/>

**Disabled ANSI parsing:**

![image](https://github.com/user-attachments/assets/7f466051-ea6c-4ff7-b386-a3df4bb503c9)

**Enabled ANSI parsing:**

![image](https://github.com/user-attachments/assets/04074b6e-b1c2-4d5e-a8c2-5f58ca88494b)


</details>


### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-23 15:25:07 +02:00
hagen1778
19025d5a19 docs: update metric names tracker
* update wording;
* add more details about returned response;
* cover details on counter increments;

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-23 15:02:57 +02:00
hagen1778
1f8c115428 docs: move metric usage doc closer to similar APIs
Before, this doc section was related to vmui, by mistake.
Moved it closer to tsdb stats, where it makes more sense to be.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-23 14:37:02 +02:00
hagen1778
1863c5b95b docs: remove badges from the main page
Badges is more a decorative element rather than source of useful information.
Some badges break from time to time, giving misleading impression to users.
It is better to remove them to reduce visual load and confusion.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-23 14:35:03 +02:00
Hui Wang
7756046865 vmalert: correctly update the debug param for recording rule when u… (#8792)
…pdating the rule group

Refactor the `TestUpdateWith` a bit in preparation for
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8658
2025-04-23 14:32:07 +02:00
Zakhar Bessarab
7f2d194948 lib/backup/s3remote: enable HTTP/2 for S3 connections
### Describe Your Changes

HTTP/2 support is used by some S3-compatible storage providers, so
disabling it default leads to unexpected errors when trying to connect
to S3 endpoint.
For example, using MinIO as S3 storage backend: `net/http: HTTP/1.x
transport connection broken: malformed HTTP response`.

HTTP/2 was enabled by default previously, but while fixing inconsistency
e5f4826 commit disabled this by default.
cc: @valyala 

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-23 09:45:07 +04:00
Zakhar Bessarab
f224b60410 fix: compatibility for FIPS builds
### Describe Your Changes

Fixes which are required in order to build FIPS-compliant binaries.
These changes were originally added for enterprise version and synced to
opensource for consistency and easier maintenance.

- consistently use `hash/fnv` at `app/vmalert` when calculating
checksums. Usage of md5 is not allowed in FIPS mode.
- increase encryption keys size used in testing in order to allow tests
to successfully run in FIPS mode

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-23 09:43:07 +04:00
Aliaksandr Valialkin
46d32af89a lib/logstorage: add sample N for returning a random 1/Nth sample of matching logs 2025-04-22 16:37:07 +02:00
Aliaksandr Valialkin
3a7ea02d41 deployment: update VictoriaLogs Docker image from v1.19.0-victorialogs to v1.20.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.20.0-victorialogs
2025-04-22 14:30:17 +02:00
Aliaksandr Valialkin
84d2922f5e all: consistently use Grafana dashboard links ending with dashboard ID
The suffix after the ID may change in dashboard settings.
If it changes, the link becomes broken (dubious decision at grafana.com/grafana/dashboards/ ).
That's why it is better to drop all the suffixes and use links for Grafana dashbooards ending with IDs.
In this case they are automatically redirected to the url with the proper suffix.

This is a follow-up for 3e4c38c56c

See also the previous commits 9c0863babc and 0a5ffb3bc1
2025-04-22 14:20:15 +02:00
hagen1778
39b27cb397 docs: fix typo
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-22 14:13:01 +02:00
Zakhar Bessarab
b1523f650d lib/promrelabel/debug: use stricter format for labels (#8770)
### Describe Your Changes

Previously, it was possible to use any UTF-8 string to specify list of
labels. While this makes it easier to use it also leads to unexpected
parsing results in some cases (see
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8584 as an
example).

Enforce specifying metric in format {label="value"...} in order to avoid
issues with unexpected parsing results.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-22 14:10:25 +02:00
Aliaksandr Valialkin
297bc28e3d docs/victorialogs/CHANGELOG.md: cut v1.20.0-victorialogs 2025-04-22 13:53:58 +02:00
Aliaksandr Valialkin
54bb834c16 app/vlselect/vmui: run make vmui-logs-update after faf95b4a58 2025-04-22 13:53:40 +02:00
Aliaksandr Valialkin
edee80f215 docs/victorialogs/CHANGELOG.md: move the description of the bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8647 into the appropriate place
The bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8647 isn't released yet,
so it must be placed under the `tip` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8648
2025-04-22 13:49:19 +02:00
Aliaksandr Valialkin
5491d54c11 lib/logstorage: buffer the ingested log entries before converting them into searchable parts
This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts
when small number of log entries are passed to Storage.MustAddRows().

The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1,
up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100.

This should reduce CPU usage during data ingestion when every request contains small number of rows.
2025-04-22 13:49:17 +02:00
Aliaksandr Valialkin
14561a7ed3 lib/logstorage: add a benchmark for different number of rows added to the storage via Storage.MustAddRows() 2025-04-22 13:49:15 +02:00
Artem Navoiev
cd4ec0e739 dashboards: update statistic by tenant dashboard add instant churn ra… (#8779)
…te panel

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-22 13:46:46 +02:00
hagen1778
8e76b41081 docs: fix broken links after docker-compose updates
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-22 13:39:33 +02:00
f41gh7
9751ea1098 fixes tests after 795d3fe722
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-22 11:59:22 +03:00
Roman Khavronenko
787e6a9b24 app/vmalert/remotewrite: rm noisy shutdown logs
On shutdown, rw client printed two log messages about shutting down and
how many series it flushed on shut down.

This commit removes these log messages for the following reasons:
1. These messages were printed for each `client.cc`, which is set to x2
of available CPUs. This behavior generated a lot of useless logs on
shutdown.
2. Number of flushed series on shutdown doesn't matter to the user.

Instead, it now prints only two log messages: when shutdown starts and
when it ends:
```
^C2025-04-22T07:37:11.932Z      info    VictoriaMetrics/app/vmalert/main.go:189 service received signal interrupt
2025-04-22T07:37:11.932Z        info    VictoriaMetrics/app/vmalert/remotewrite/client.go:152   shutting down remote write client: flushing remained series
2025-04-22T07:37:11.933Z        info    VictoriaMetrics/app/vmalert/remotewrite/client.go:155   shutting down remote write client: finished in 210.917µs
```

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-22 11:50:23 +03:00
f41gh7
73a69bd655 app/vmselect/netstorage: properly set max read size for metric name
Previously, metric names stats API had a false assumption, that max
size of metric name is 256 byte. But this is configurable parameter with
4096 bytes max size. It triggered errors during API requests.

 This commit replaces hard-coded 256 byte limit with common constant:
maxLabelValueSize. It has 16 MB limit.

 In addition, this commit adds check for metric name stats tracker,
if metric name size exceeds default buffer limit, it will be allocated
directly on heap. It must be rare case, since most metric names has
16-64 byte size.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8759
2025-04-22 11:36:16 +03:00
f41gh7
ec6ed3cb48 lib/storage/metricnamestats: allow regex for match_pattern
This commit allows regex syntax for match_pattern query param.
It improves API usability.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6145
2025-04-22 11:35:23 +03:00
Yury Molodov
faf95b4a58 vmui/logs: update button styles to improve hover performance
Refactors button styles to fix high CPU usage on hover in the logs UI.

 Related issue: #
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8135
2025-04-22 11:28:19 +03:00
Artur Minchukou
a7106040de app/vmui/logs: add 0 label to the Y axis
Added 0 label to the Y axis on VictoriaLogs chart.


Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8409
2025-04-22 11:26:34 +03:00
Phuong Le
792d0f8bb8 vmagent: reduce log noise from remote write retries
Throttle remote write retry warning logs to one message per 5 seconds.
This reduces log noise during connectivity issues.

Users can monitor `vmagent_remotewrite_retries_count_total` and
`vmagent_remotewrite_errors_total` metrics for detailed retry rates per
destination if needed. This change prioritizes reducing log volume over
immediate destination-specific error logging in the retry warnings.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8723
2025-04-22 11:20:33 +03:00
Phuong Le
8628f90e6b lib/storage: put the unused in-memory part back into the pool 2025-04-22 11:19:50 +03:00
Georgy Torquemada
89237a04b9 fix: add missed severity levels (warn) for protobuff parser
Closes
[8647](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8647)

### Describe Your Changes

Added missed OTEL severities levels, added test for severity, fix some
severity in given tests

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-21 16:39:10 +04:00
Aliaksandr Valialkin
d6c156727b docs: remove misleading requirements for the minimum supported Go version needed for building VictoriaMetrics executables
VictoriaMetrics executables need the latest available stable release of Go (1.24 currently),
since they use its features. See, for example, GOEXPERIMENT=synctest from the commit 06c26315df .

There is no need in specifying the minimum supported Go version for building VictoriaMetrics products,
since Go automatically downloads and uses the version specified at `go ...` directive inside go.mod
starting from Go1.21. See https://go.dev/doc/toolchain for details.

So the actual minimum needed Go version is Go1.21, which has been released 1.5 years ago. It should automatically
install newer Go versions specified at `go ...` directive inside go.mod.
2025-04-21 13:06:09 +02:00
Aliaksandr Valialkin
f1fdc8610d lib/promscrape: prevent from excess memory allocation during scrapes when sample_limit is exceeded
Do not reset wc.labels in order to properly keep track of the number of used labels for the scrape,
and properly re-use the same number of wc.labels on subsequent scrapes.

See 12f26668a6 (r155481168)
2025-04-21 13:06:08 +02:00
Artem Navoiev
611450a188 docs: kuma sd actualize link
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-19 20:37:44 +02:00
Artem Navoiev
3b1a41c014 docs: victorialogs fix link to external collectiors
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-19 10:53:17 +02:00
hagen1778
cd9bb6b315 docs: rm links to swiftstack
Swiftstack s3 service and docs seems unavailable anymore.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-19 07:35:27 +02:00
Artem Navoiev
e58d3f03c7 docs: victoriametrics home page use relative anchor
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 22:22:44 +02:00
Artem Navoiev
3e4c38c56c docs: changelog fix link to grafana dashboard as far we rename it. Point to 2.53 (LTS) release of prometheus in vmalert as far link to 2.49 doesn't exist anymore
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 22:21:00 +02:00
Artem Navoiev
c3becae96b docs: actualize link to timescale compression in faq
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 22:12:33 +02:00
Artem Navoiev
84ee713dc6 docs: home victoriametrics fix link to readme.md file
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 22:06:01 +02:00
Artem Navoiev
7dc58894d1 docs: changelog remove deadlink to mesosphere marathon as far project in achive since october 2024 see https://github.com/d2iq-archive/marathon
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 21:55:47 +02:00
Artem Navoiev
30376722c1 docs: fix wrong link in cluster documentation, point to the new era5 page in data examples
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 21:44:11 +02:00
Artem Navoiev
6674691a58 docs: victoriametrics list current urls for redict to simplify the feature changes. More urls
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 21:10:46 +02:00
Artem Navoiev
55e3ccb5f0 docs: victoriametrics list current urls for redict to simplify the feature changes
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-18 20:44:01 +02:00
hagen1778
2c97c8841c deployment/docker: fix routing for vlogs vmui
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-18 13:55:25 +02:00
Andrii Chubatiuk
f38736343d deployment/docker: added victorialogs cluster docker compose setup (#8725)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8694

additionally removed container_name, docker network, renamed all
compose, config files for consistency

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-18 13:47:53 +02:00
Aliaksandr Valialkin
33315f1ece deployment/docker: update VictoriaLogs from v1.18.0-victorialogs to v1.19.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.19.0-victorialogs
2025-04-17 20:25:47 +02:00
Aliaksandr Valialkin
680cbef0cb docs/victorialogs/CHANGELOG.md: cut v1.19.0-victorialogs release 2025-04-17 20:15:20 +02:00
Aliaksandr Valialkin
2d3e048f59 docs/victorialogs/CHANGELOG.md: mention @arturminchukov as the author of the recent Web UI changes 2025-04-17 20:13:18 +02:00
Aliaksandr Valialkin
024a40a799 app/{vmselect,vlselect} run make vmui-update vmui-logs-update after 5fa40ee631 2025-04-17 20:08:28 +02:00
Aliaksandr Valialkin
225c6b6f52 docs/victorialogs/CHANGELOG.md: clarify the description of the bugfix in the commit 0fee22e91a
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8743
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8707
2025-04-17 20:05:39 +02:00
Andrii Chubatiuk
0fee22e91a lib/logstorage: expect message in a field with empty and _msg name (#8743)
### Describe Your Changes

fixes #8707

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-04-17 19:55:37 +02:00
Max Kotliar
96200a9ec4 vmagent: use tmp dir in integrations tests (#8748)
Before the change, the vmagent integration tests created their directory
and files inside apptest/tests.
After the change, vmagent is instructed to store all files in a real
temporary directory, which is automatically deleted after the tests
complete.
2025-04-17 18:23:42 +02:00
Artem Fetishev
06c26315df lib/storage: test wasMetricIDsMissingBefore with "testing/synctest" (#8740)
Using this package lets to manipulate time. In this particular case, it
lets to advance the time 61 second forward instantly.

A few side changes were necessary:

- Do not use fasttime in unit tests. The fasttime package starts a
goroutine outside the test bubble which causes the clock to be real, not
fake.
- Stop the time.Ticker explicitly and also stop idbNext. These two
create goroutines with infinite loops which causes the unit tests that
use synctest to hang forever. All goroutines created inside the bubble
must exit in order for the syntest to finish.
- synctest is an experimental package and requires an environment
variable to be set. The Makefile was changed to set it.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-17 16:58:47 +02:00
Max Kotliar
231810fe49 lib/protoparser/protoparserutil: restore write concurrency limiter in ReadUncompressedData due to performance regressions (#8742)
### Describe Your Changes

The write concurrency limiter in ReadUncompressedData was previously
removed in

22d1b916bf
to avoid suboptimal behavior in certain scenarios. However, follow-up
reports—including issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8674 and
production feedback from VictoriaMetrics Cloud—indicated a noticeable
degradation in performance after its removal.

To mitigate these regressions, this commit reintroduces the concurrency
limiter. A long-term, more optimal solution will be explored separately
in issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8728.

TODO:

* [x] Changelog


### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 14:05:41 +02:00
hagen1778
60ef715c79 docs: fix newline typo in query-stats.md
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 12:25:59 +02:00
hagen1778
8071dabe58 docs: update query-stats.md
* fix typos
* improve wording
* link grafana dashboard

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 12:23:15 +02:00
hagen1778
3d9b160fce docs: make query-stats.md available in docs
The doc was incorrectly ported into wrong directory after
a113516588
This change moves it to the victoriametrics dir.

While there, updated the order of some pages to couple them by meaning.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 12:15:17 +02:00
Yury Molodov
5fa40ee631 vmui/logs: fix incorrect table sorting for numeric (#8646)
### Describe Your Changes

Fixed table sorting logic and added unit tests for descendingComparator.
Values are now correctly sorted by type: number, date, or string.

Related issue: #8606

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 09:40:46 +02:00
Artur Minchukou
53814b1ea6 app/vmui: add query history to VictoriaLogs (#8703)
### Describe Your Changes
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8500

Added query history to VictoriaLogs

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 09:34:15 +02:00
Artur Minchukou
9ca2a246a9 app/vmui: fix mobile layout on the VictoriaLogs page, fix width of groups and table settings modal (#8679)
### Describe Your Changes
 - Fix mobile layout on the VictoriaLogs page
<img width="361" alt="image"
src="https://github.com/user-attachments/assets/2e09b999-34d5-4059-ba09-95a21b3e8ab3"
/>
<img width="353" alt="image"
src="https://github.com/user-attachments/assets/b9048d41-5972-4290-988f-e9b0a0472b99"
/>
 
- Fix width of groups settings modal
<img width="372" alt="image"
src="https://github.com/user-attachments/assets/e1cb1902-dc93-4bfd-8b32-eaf0d8e54253"
/>
<img width="352" alt="image"
src="https://github.com/user-attachments/assets/a7c7000f-2c4a-41d9-a3c1-a515fd077b9b"
/>

- Fix width of table settings modal
<img width="361" alt="image"
src="https://github.com/user-attachments/assets/12921936-6824-47e9-aff8-d0bb4de2e7f7"
/>
<img width="352" alt="image"
src="https://github.com/user-attachments/assets/77c857ee-85f4-4992-8113-4e252b40f1a6"
/>

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 09:27:10 +02:00
Artur Minchukou
4517425da8 app/vmui: add export logs button to VictoriaLogs (#8671)
### Describe Your Changes
Related issue:
[#8604](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8604)

Added a download logs button to VictoriaLogs, which allows you to export
logs in the following formats: csv, json.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-04-17 09:14:41 +02:00
Yury Molodov
953ed46680 vmui: update package.json
### Describe Your Changes

Bumped project dependencies in `package.json` to the latest stable
versions.
2025-04-16 20:19:17 +02:00
f41gh7
795d3fe722 lib/storage: enhance TSDB status response
This commit adds new fields - `requestsCount` and `lastRequestTimestamp`
to series count be metric names stats.
It allows to display an additional stats at explore cardinality page.
Stats will only be added if `storage.trackMetricNameStats` flag is set.

 This change requires an update to RPC protocol in order to properly
marshal data.

 In addition, this commit adds integration tests to TSDB stats API.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6145
2025-04-16 20:09:35 +02:00
f41gh7
536b40c06c make linter happy after a113516588 2025-04-16 19:55:54 +02:00
Roman Khavronenko
a113516588 app/vmselect: log search query request stats
This commit adds `search.logSlowQueryStats=<duration>` cmd-line flag on vmselect.
It reads stats from eval function, and doesn't slow down the query execution.

 Log line has the following structure:

 vm_slow_query_stats type=%s query=%q query_hash=%d start_ms=%d end_ms=%d step_ms=%d range_ms=%d tenant=%q execution_duration_ms=%v series_fetched=%d samples_fetched=%d bytes=%d memory_estimated_bytes=%d

 This feature is only available for enterprise version.
2025-04-16 19:39:13 +02:00
Fred Navruzov
b68f9ea9e3 docs/vmanomaly - release 1.22.0 - experimental (#8717)
### Describe Your Changes

Changelog note about experimental v1.22.0 release that solves deadlock
issue on multicore systems by complete parallelization turned off until
proper refactor is made to return it back w/o reintroducing the risk of
the deadlock in child processes.

P.s. other references and guides were not updated to experimental tag,
as long as it has downsides of dropped speed. The links will be updated
once we have parallelization properly turned on.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-14 15:56:10 +03:00
Roman Khavronenko
fa6a32a39d ci: temporary disable vlogs tests for i386
This change unblocks testing pipelines in CI for other contributions.
The tests are commented because I don't have full understanding of
fixing them.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8710

---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-14 11:11:43 +02:00
Max Kotliar
477635e00f Follow up to "vmagent/client: Use VictoriaMetrics remote write protocol by default, downgrade to Prometheus if needed" (#8706)
### Describe Your Changes

Follow-up to
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462

Addressed review comments:
- Log panic with FATAL prefix to indicate possible on-disk data
corruption
- Moved version bump line to the tip block (v1.114.0 has already been
released) in changelog
- Removed duplicate vmagent entry from targets list from Makefile


### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2025-04-14 11:55:01 +03:00
Max Kotliar
8e92cd3b2d lib/writeconcurrencylimiter: add some hints to unexpected EOF error message. (#8704)
### Describe Your Changes

Under heavy load, vmagent's wirte concurrency limiter

(2ab53acce4/lib/writeconcurrencylimiter/concurrencylimiter.go (L111))
queues incoming requests. If a client's timeout is shorter than the wait
time in the
queue, the client may close the connection before vmagent starts
processing it. When vmagent then tries to read the request body, it
encounters an ambiguous `unexpected EOF` error
(https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8675).

This commit adds more context to such errors to help users diagnose and
resolve
the issue when it's related to vmagent's own load and queuing behavior.

Possible user actions include:
- Lowering `-insert.maxQueueDuration` below the client's timeout.
- Increasing the client-side timeout, if applicable.
- Scaling up vmagent (e.g., adding more CPU resources).
- Increasing `-maxConcurrentInserts` if CPU capacity allows.

Steps to reproduce:
https://gist.github.com/makasim/6984e20f57bfd944411f56a7ebe5b6bf

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-13 11:22:12 +03:00
Max Kotliar
2ab53acce4 vmagent/client: Use VictoriaMetrics remote write protocol by default, downgrade to Prometheus if needed (#8462)
### Describe Your Changes

This commit improves how vmagent selects the remote write protocol.
Previously, vmagent [performed a handshake
probe](0ff1a3b154/lib/protoparser/protoparserutil/vmproto_handshake.go (L11))
at
[startup](0ff1a3b154/app/vmagent/remotewrite/client.go (L173)):

- If the probe succeeded, it used the VictoriaMetrics (VM) protocol.

- If the probe failed, it downgraded to the Prometheus protocol.

- No protocol changes occurred after the initial probe at runtime.

However, this approach had limitations:

- If vmstorage was unavailable during vmagent startup, vmagent would
immediately downgrade to the Prometheus protocol, leading to higher
network usage unitl vmagent restarted. This case has been reported in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7615.

- If the remote write server was updated or downgraded (e.g., during a
fallback or migration), vmagent would not detect the protocol change. It
would continue retrying failed requests and eventually drop them.
Require a restart of vmagent to pick up the new protocol.

This commit introduces a more adaptive mechanism.
vmagent always starts with the VM protocol and downgrades to the
Prometheus protocol only if an unsupported media type or bad request
response is received.
When this happens, the protocol is downgraded for all future requests.
In-flight requests are re-packed from Zstd to Snappy and retried
immediately.
Snappy-encoded requests are dropped if an unsupported media type or bad
request is received (no retrying).

Additionally, the in-memory and persisted queues could mix snappy and
zstd encoded blocks. The proper encoding is decided before sending by
encoding.IsZstd function.

TODO:
* [x] Add tests
* [x] Update documentation
* [x] Changelog
* [x] Research on
[content-type](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054),
[accept-encoding](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786923382)

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7615#top
issue.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-12 10:39:47 +03:00
Aliaksandr Valialkin
2f4680d8f3 docs/victoriametrics/Articles.md: add a link to https://chronicles.mad-scientist.club/tales/grepping-logs-remains-terrible/ 2025-04-10 23:10:24 +02:00
Aliaksandr Valialkin
1f1afeb06e lib/logstorage: add support for <duration_seconds:field> formatting option for format pipe
This option formats duration values as floating-point seconds.
2025-04-10 22:55:08 +02:00
Aliaksandr Valialkin
b2c075191e docs/victorialogs/cluster.md: add an example on how to query every vmstorage node as a single-node VictoriaLogs 2025-04-10 22:16:31 +02:00
f41gh7
1191c6453b app/vmagent: properly init kafka consumer
Previously, if vmagent was built with CGO_ENABLED=0, vmagent cannot start and reported runtime error:

 `Kafka client is not supported at systems without CGO`

 This error was trigger even if `-kafka.consumer.topic` was not
 provided. CGO_ENABLED=0 is default build option for linux/arm and some other archs.

 This commit properly inits kafka consumer by checking if `-kafka.consumer.topic` is set.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6019
2025-04-10 18:15:36 +02:00
Aliaksandr Valialkin
5d61122fd5 docs/victorialogs/cluster.md: add a link to the changelog for the latest available release 2025-04-10 17:09:23 +02:00
Aliaksandr Valialkin
e9c04879ce docs/victorialogs/CHANGELOG.md: add release date for v1.18.0-victorialogs 2025-04-10 17:07:58 +02:00
Aliaksandr Valialkin
5f4205a050 deployment: update VictoriaLogs Docker image tag from v1.17.0-victorialogs to v1.18.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.18.0-victorialogs
2025-04-10 17:04:30 +02:00
Aliaksandr Valialkin
7a46af3920 victorialogs: add cluster mode
Cluster mode is enabled when -storageNode command-line flag is passed to VictoriaLogs.
In this mode it spreads the ingested logs among storage nodes specified in the -storageNode flag.
It also queries storage nodes during `select` queries.

Cluster mode allows building multi-level cluster setup when top-level select node can query multiple lower-level clusters
and get global querying view.

See https://docs.victoriametrics.com/victorialogs/cluster/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5077
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7950
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8223
2025-04-10 16:55:23 +02:00
Aliaksandr Valialkin
ff967a8e65 lib/protoparser: support for identity encoding in a generic way inside protoparserutil.GetUncompressedReader
This should help avoiding future issues when `identity` encoding isn't replaced to `` encoding
by the caller of protoparserutil.GetUncompressedReader().

This is a follow-up for 303b425fa3

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8652
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8649
2025-04-10 13:52:45 +02:00
Artem Navoiev
3108376d95 docs: changelog fix the link to cluster version in 114 release.2
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-10 09:39:15 +02:00
Artem Navoiev
494fe4403a docs: changelog fix the link to cluster version in 114 release
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-09 21:28:40 +02:00
Andrii Chubatiuk
303b425fa3 lib/protoparser/datadog*: support Content-Encoding: identity value
introduction of common decompression logic in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416 removed
ability to treat unsupported compression algorithms as uncompressed data
for datadog v1 endpoint. This PR adds support of `identity`
Content-Encoding header value, though according to RFC 2616 this value
is only expected in `Accept-Encoding` header

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8649
2025-04-08 16:17:19 +02:00
Nikolay
8f3efde55d lib/httpserver: mask authKey at PostFrom
'authKey' is well-known url and form param for VictoriaMetrics
components authorization. Previously, it could be printed into stdout
via httpserver error logger. It makes this authKey insecure and hard to
use.

This commit prevents from logging authKey defined at PostForm or as part
of url.Query.

It's recommneded to transfer authKey via PostForm and it should be
implemented at separate PRs.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973

---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-08 16:15:48 +02:00
Nikolay
f16938bba9 lib/backup/s3: properly set ProfileName
Previously, if ProfileName is set to empty value (as default). AWS s3
lib ignored any profile config defined with `-configProfilePath`.

This commit correctly configure client options and set profile name only
if it's set to non-empty value.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8668
2025-04-08 16:15:07 +02:00
nemobis
65ff04bc09 docs: Fix typo in changelog for v113
Fix a typo `scrapped` for `scraped`.
2025-04-08 16:14:50 +02:00
Zakhar Bessarab
e2715f94af docs/guides/vmgateway-grafana-oidc: update guide for recent versions of components
- update grafana & keycloak to latest versions
- update UI images with the latest screenshots
- update wording to reflect UI changes
2025-04-08 16:13:58 +02:00
Zakhar Bessarab
582160f566 make: fix make package for vmalert-tool
`make package` relies on presence of `APP_NAME/deployment/Dockerfile`
which was missing for vmalert-tool.
2025-04-08 16:13:28 +02:00
nemobis
638f9839d5 docs: fix typo in pull request template
The verb is _adhere to_, see https://en.wiktionary.org/wiki/adhere .
2025-04-08 16:12:51 +02:00
Max Kotliar
3f5bf4bd03 vmagent/remotewrite: set content encoding header based on actual body
Improve remote write handling in vmagent by setting the
`Content-Encoding` header based on the actual request body, rather than
relying on configuration.

- Detects Zstd compression via the Zstd magic number.
- Falls back to Snappy if Zstd is not detected.
- Persistent queue may now contain mixed-encoding content.
- Add basic vmagent integration tests

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5344 and
12cd32fd75.

Extracted from
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
2025-04-08 16:12:06 +02:00
f41gh7
038419663b docs: release follow-up
* mention lts release changes
* update vm apps versions at docs and deployment examples

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-07 12:59:53 +02:00
f41gh7
123f373537 CHANGELOG.md: cut v1.115.0 release 2025-04-04 14:30:16 +02:00
f41gh7
57121c828f make docs-update-version 2025-04-04 14:23:08 +02:00
f41gh7
aa5edbc706 make vmui-update 2025-04-04 14:20:37 +02:00
Andrii Chubatiuk
f9d8c86b0a lib/streamaggr: fix panic in rate output
This commit properly reset aggregator state. Previously, it was not checked for `nil` and it lead to the panic on access.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8634
2025-04-04 14:14:52 +02:00
hansemschnokeloch
b733fc5b83 docs/vlogs: fix typo in README 2025-04-04 14:10:59 +02:00
Zakhar Bessarab
f2eaad62dc docs/changelog: correct entry location after 298f862f
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-04 12:24:16 +04:00
Aliaksandr Valialkin
adae788b18 lib/logstorage: pad pipeStatsProcessorShard.groupMapShards in order to avoid false sharing when merging these shards in parallel on many CPU cores 2025-04-03 22:21:18 +02:00
Aliaksandr Valialkin
a65d10fcce lib/logstorage: add padding between hitsMap items at hitsMapAdaptive.shards in order to avoid false sharing when processing the hitsMapAdaptive.shards on multiple CPU cores 2025-04-03 20:14:20 +02:00
Zakhar Bessarab
298f862fc0 deps: downgrade AWS dependencies
Pin AWS libraries to version before 2025-01-15 (see
https://github.com/aws/aws-sdk-go-v2/releases/tag/release-2025-01-15).

This version enabled request and response checksum verification by
default which breaks compatibility with non-AWS S3-compatible storage
providers.

See: https://github.com/victoriaMetrics/victoriaMetrics/issues/8622

Supersedes https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8630

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-03 18:05:07 +04:00
Zakhar Bessarab
aff1580a1d app/vmauth: return non-OK response for timeouts and request cancellation
Currently, requests failing due to network timeout would receive "200
OK" while producing a warning log message about the timeout. This
behaviour is confusing and might produce unexpected issues as it is not
possible to retry errors properly.

Change this to return "502 Bad Gateway" response so that error can be
handled by the client.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8621

Config for testing:
```
unauthorized_user:
  url_prefix: "http://example.com:9800"
```

Before the change:
```
*   Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
/* NOTE: 30 seconds timeout passes */
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 08:54:05 GMT
Date: Tue, 01 Apr 2025 08:54:05 GMT
<

* Connection #0 to host 127.0.0.1 left intact
```

After:
```
*   Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 502 Bad Gateway
HTTP/1.1 502 Bad Gateway
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 09:13:57 GMT
Date: Tue, 01 Apr 2025 09:13:57 GMT
< Content-Length: 109
Content-Length: 109
<

* Connection #0 to host 127.0.0.1 left intact
```

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-03 13:44:51 +04:00
hagen1778
d4c0a42c1b docs: improve wording for recent vmalert changes
follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8522

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3cbc3eb19f)
2025-04-03 09:47:34 +01:00
Emre Yazıcı
a9736a5bfb app/vmalert: show partial responses in debug logs (#8522)
### Describe Your Changes

Log when the data response from vmselect is partial during
rule(recording, alertingrule) evaluations.

vmselect returns `isPartial: true` in case data is not fully fetched
from scattered vmstorages. At the time of rule evals, it may be drifting
apart from real values due to missing points. This is an important event
that should be logged to inform users to see how often that happens as
it may lead to false positive alerts.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: emreya <emre.yazici@adyen.com>
Signed-off-by: emreya <e.yazici1990@gmail.com>
Signed-off-by: Emre Yazici <e.yazici1990@gmail.com>
(cherry picked from commit 56f60e8be9)
2025-04-03 09:47:34 +01:00
Artem Fetishev
2e4beeefb1 Update series count docs (#8631)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-03 10:37:35 +02:00
Aliaksandr Valialkin
635c5c9feb app/vlselect: run /select/logsql/tail queries without concurrency limit
The concurrency limit is intended for short-running queries. If it is applied to tail queries,
then this can affect short-running queries.
2025-04-02 20:22:27 +02:00
Aliaksandr Valialkin
4e1260e189 app/vlselect: do not log canceled requests, since they are expected and legal 2025-04-02 19:14:44 +02:00
Aliaksandr Valialkin
ca3910748f deployment: update Go builder from Go1.24.1 to Go1.24.2
See https://github.com/golang/go/issues?q=milestone%3AGo1.24.2+label%3ACherryPickApproved
2025-04-02 18:01:21 +02:00
Artem Fetishev
2f0796ff40 lib/storage: When creating and listing snapshots, panic instead of returning an error (#8585)
When creating and listing snapshots, panic instead of returning an error
since errors are not recoverable anyway.
Also do not cleanup the filesystem on panic. Leave as is for further
manual inspection.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 15:47:23 +02:00
Artem Fetishev
cdba6dbc0e lib/storage: Pass the partition time range during the partition creation and opening (#8571)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 14:57:59 +02:00
Aliaksandr Valialkin
f18daaeac5 app/vmui: replace old-style links to https://docs.victoriametrics.com/MetricsQL.html with https://docs.victoriametrics.com/metricsql/
Replace also https://docs.victoriametrics.com/keyConcepts.html with https://docs.victoriametrics.com/keyconcepts/

This is the follow-up for the commit ee1da35071
2025-04-02 13:22:58 +02:00
Artem Fetishev
a9f124388f lib/storage: mergeBlockStreams(): replace the dependency on Storage with dependency on the set of deleted metricIDs (#8569)
This should narrow down the function dependencies and simplify testing.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 13:16:26 +02:00
Aliaksandr Valialkin
4b2276608b docs/victoriametrics/vmagent.md: mention that increasing scrape_interval can reduce CPU usage 2025-04-02 12:41:57 +02:00
Aliaksandr Valialkin
b352470ae1 docs/victoriametrics/vmagent.md: mention that -promscrape.disableKeepAlive option can reduce RAM usage when scraping thousands of targets 2025-04-01 23:22:13 +02:00
Aliaksandr Valialkin
b3261a1b87 lib/promscrape: do not clutter logs with cannot scrape target ...: context canceled errors when vmagent is stopped 2025-04-01 23:20:43 +02:00
Aliaksandr Valialkin
6d5973dcb0 docs/victoriametrics/vmagent.md: change GOGC from 50 to 100 in the example of optimized config for vmagent
This is a follow-up after bf024d3dce,
2025-04-01 21:36:04 +02:00
Aliaksandr Valialkin
74f17bb67e docs/victoriametrics/vmagent.md: remove the recommendation to set GOGC to 50 at vmagent in order to reduce CPU usage
The default GOGC is set to 50 at vmagent after bf024d3dce,
so this recommendation makes no sense. Leave the recommendation to increase GOGC to 100.
2025-04-01 21:14:33 +02:00
Aliaksandr Valialkin
bf024d3dce app/vmagent: increase the default GOGC from 30 to 50
This reduces CPU usage by up to 30% in exchange of the increased RAM usage by 10%
when scraping thousands of targets, which expose millions of metrics in summary.

This looks like a good tradeoff after the commit edac875179 ,
which reduced RAM usage by more than 10%, so the final RAM usage for vmagent
is still lower than the RAM usage at v1.114.0 by ~15%, while CPU usage drops by 30%.
2025-04-01 21:04:28 +02:00
Aliaksandr Valialkin
5b87aff830 lib/promscrape: use chunkedbuffer.Buffer instead of bytesutil.ByteBuffer for reading response body from scrape targets
This reduces memory usage when reading large response bodies because the underlying buffer
doesn't need to be re-allocated during the read of large response body in the buffer.

Also decompress response body under the processScrapedDataConcurrencyLimitCh .
This reduces CPU usage and RAM usage a bit when scraping thousands of targets.
2025-04-01 20:30:39 +02:00
Aliaksandr Valialkin
34d35869fa docs/victoriametrics/vmagent.md: add Performance optimizations chapter
Enumerate the most commonly used options for reducing CPU usage and RAM usage
for vmagent, which scrapes thousands of targets.

See https://docs.victoriametrics.com/vmagent/#performance-optimizations
2025-04-01 18:35:43 +02:00
Max Kotliar
b1d1f1f461 vmagent/remotewrite: fix golangci-lint code style issue
### Describe Your Changes

Fixes golangci-lint issues introduced in
98f1e32e39

```
--- a/app/vmagent/remotewrite/pendingseries.go
+++ b/app/vmagent/remotewrite/pendingseries.go
@@ -202,7 +202,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {

 	// Pre-allocate memory for labels.
 	labelsLen := len(wr.labels)
-	wr.labels = slicesutil.SetLength(wr.labels, labelsLen + len(labelsSrc))
+	wr.labels = slicesutil.SetLength(wr.labels, labelsLen+len(labelsSrc))
 	labelsDst := wr.labels[labelsLen:]

 	// Pre-allocate memory for byte slice needed for storing label names and values.
@@ -212,7 +212,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
 		neededBufLen += len(label.Name) + len(label.Value)
 	}
 	bufLen := len(wr.buf)
-	wr.buf = slicesutil.SetLength(wr.buf, bufLen + neededBufLen)
+	wr.buf = slicesutil.SetLength(wr.buf, bufLen+neededBufLen) buf := wr.buf[:bufLen]

 	// Copy labels

```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-01 18:37:02 +04:00
Aliaksandr Valialkin
98f1e32e39 app/vmagent/remotewrite: optimize writeRequest.copyTimeSeries a bit
Pre-allocate memory for labels and for the needed byte buffer used
for holding the copied label names and values.
2025-04-01 15:58:04 +02:00
Aliaksandr Valialkin
edac875179 lib/promscrape: always store the last response per every scrape target in compressed form
This reduces memory usage for vmagent when scraping big number of targets at the cost of slightly higher CPU usage.

The increased CPU usage can be decreased by disabling tracking of stale markers either via -promscrape.noStaleMarkers
command-line flag or via `no_stale_markers: true` option at the scrape config pointed by -promscrape.config command-line flag.
See https://docs.victoriametrics.com/vmagent/#prometheus-staleness-markers
2025-04-01 15:27:11 +02:00
Aliaksandr Valialkin
0ff1a3b154 lib/leveledbytebufferpool: start with the pools[0] for byte slices up to 256 bytes
The pool is used mostly for obtaining byte buffers for responses from scrape targets.
There are no responses smaller than 256 bytes in practice, so there is no sense in maintaining
pools for byte slices up to 64 and 128 bytes.
2025-04-01 12:01:21 +02:00
Aliaksandr Valialkin
bbe58cc37b lib/promscrape: make sure that the maxLabelsLen contains really the maximum len(wc.labels) among concurrently running callbacks at stream.Parse
Previously the maxLabelsLen could be updated with smaller value after it is updated to bigger value by concurrently running goroutines.
Prevent this by loading the latest maxLabelsLen value and updating it only if it is smaller than the current len(wc.labels)
before the exit from callback passed to stream.Parse.

While at it, return early from the callback on the sample_limit exceeding error,
since the rest of the code in the callback becomes no-op after wc.reset().
This simplifies following the logic in the code a bit.

Also remove outdated misleading comment in front of sw.pushData() call inside callbacks passed to stream.Parse.
This comment has no sense after every callback start working with its own goroutine-local wc.
2025-04-01 11:49:35 +02:00
Aliaksandr Valialkin
78dca6ee6e lib/promscrape: tune leveledWriteRequestCtxPool a bit
Start with writeRequestCtx containing up to 256 labels instead of 8 labels,
since a typical response from scrape target contains much more than 8 labels across all the exposed metrics.

Do not pre-allocate labels at writeRequestCtx, since they are pre-allocated inside writeRequestCtx.addRows(),
together with the pre-allocation of samples and writeRequest.Timeseries.
2025-04-01 02:11:14 +02:00
Aliaksandr Valialkin
12f26668a6 lib/promscrape: make sure that the writeRequestCtxPool is efficiently used when sending automatically generated metrics to remote storage 2025-04-01 01:56:07 +02:00
Aliaksandr Valialkin
6c90843aab lib/protoparser/prometheus: use clear() instead of for { ... } loops for clearing Rows.Rows and Rows.tagsPool at Rows.Reset()
This simplifies the code a bit.
2025-04-01 01:39:19 +02:00
Aliaksandr Valialkin
4aad4c64bb lib/promscrape: attach applySeriesLimit to writeRequestCtx instead of scrapeWork
The applySeriesLimit applies the limit to samples stored at writeRequestCtx,
while the scrapeWork is used as read-only configuration source.
That's why it is better from maintainability PoV to attach the applySeriesLimit
method to writeRequestCtx.

While at it, clarify docs for the applySeriesLimit function.
2025-04-01 01:11:06 +02:00
Aliaksandr Valialkin
5630c0108e lib/promscrape: remove writeRequestCtx.resetNoRows() funtion
This function can be safely replaced with writeRequestCtx.reset() after the commit 188325f0fc,
which makes sure that all the rows inside writeRequestCtx.rows are pushed to the remote storage before returning
from stream parsing callback.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/753
2025-04-01 01:11:06 +02:00
Aliaksandr Valialkin
732a549cff lib/promscrape: clarify the comment for scrapeWork.pushData() 2025-04-01 01:11:05 +02:00
Aliaksandr Valialkin
af637bc2a2 lib/promscrape: remove the remaining writeRequestCtx.reset() calls before writeRequestCtxPool.Put() calls
These calls aren't needed, since they are performed by the writeRequestCtxPool.Put()
2025-04-01 01:11:05 +02:00
Aliaksandr Valialkin
6600916344 lib/promscrape: pass cfg *ScrapeWork as an arg to areIdentialSeries instead of attaching it to the ScrapeWork struct
This makes the code more consistent with other functions, which accept `cfg *ScrapeWork` as the first arg.
2025-04-01 01:11:04 +02:00
Aliaksandr Valialkin
e9cb95c5d4 lib/promscrape: replace scrapeWork.addRowToTimeseries with writeRequestCtx.addRows
The rows are added to writeRequestCtx, while the scrapeWork is used only as a read-only configuration source.
So it is better from maintainability PoV to attach addRows function to writeRequestCtx instead of scrapeWork.

Also attach addAutoMetrics to writeRequestCtx instead of scrapeWork due to the same reason:
addAutoMetrics adds metrics to the writeRequestCtx, while using scrapeWork as a read-only configuration source.

While at it, remove tmpRow from scrapeWork struct in order to reduce the complexity of this struct.
2025-04-01 01:11:04 +02:00
Aliaksandr Valialkin
8617faa160 lib/promscrape: remove wc.resetNoRows() call before returning wc to the pool, since this function is called inside writeRequestCtxPool.Put() 2025-04-01 01:11:03 +02:00
Aliaksandr Valialkin
fe888be58c lib/promscrape: remove at *auth.Token arg from scrapeWork.pushData(), since it always equals to sw.Config.AuthToken
This simplifies the code a bit.

While at it, mention that scrapeWork.PushData callback must be safe for calling from concurrently running goroutines.
2025-04-01 01:11:03 +02:00
Aliaksandr Valialkin
a38de1c242 lib/promscrape: attach areIdentialSeries method to ScrapeWork instead of scrapeWork
areIdenticalSeries doesn't access scrapeWork members except of sw.Config of *ScrapeWork type.
It is better from maintainability PoV to attach this methos to ScrapeWork then.

While at it, replace sw.Config with cfg shortcut at scrapeWork.processDataOneShot()
and scrapeWork.processDataInStreamMode().
2025-04-01 01:11:03 +02:00
Aliaksandr Valialkin
8dc0f6cfab lib/promscrape: remove unused scrapeWork arg from getSeriesAdded 2025-04-01 01:11:02 +02:00
Phuong Le
346db8a606 vmui: fix auto-suggestion doesn't work inside functions (#8473)
Fixes #8379

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-03-31 16:22:25 +02:00
hagen1778
b277a62e94 docs: mention pull request checklist in doc guides
Checklist is a more practical list of actions than a full Contributing doc.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-31 15:48:10 +02:00
hagen1778
e2535fcb28 app/vmui: fix path to metricsql doc
This is follow-up after f152021521

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-31 14:47:38 +02:00
Roman Khavronenko
66e7b908ec dashboards: drop all dashboards tags except victoriametrics or victorialogs tags for consistency (#8620)
Having `victoriametrics` or `victorialogs` tags should be enough for
filtering dashboards related to VictoriaMetrics components.

Related ticket
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8618

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-31 14:27:05 +02:00
Roman Khavronenko
a2ba37be68 lib/promscrape: support filtering targets via scrapePool GET param in /api/v1/targets API (#8611)
This improves compatibility with Prometheus `/api/v1/targets` API.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5343

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-03-31 14:26:25 +02:00
Aliaksandr Valialkin
004e5ff38b lib/promscrape: hide sw.seriesLimiter behind sw.getSeriesLimiter()
This guarantees that the sw.seriesLimiter is always read after the initialization.
2025-03-29 02:05:20 +01:00
Aliaksandr Valialkin
4160fbecc0 lib/promscrape: pass a string instead of a byte slice to scrapeWork.storeLastScrape
This removes superflouos references to the "body" variable.

While at it, remove obsolete misleading comment.
2025-03-29 01:59:20 +01:00
Aliaksandr Valialkin
44844b7fbe lib/promscrape: use "time.Time.UnixMilli()" instead of "time.Time.UnixNano() / 1e6"
This improves readability a bit
2025-03-29 01:43:31 +01:00
Aliaksandr Valialkin
f60458d5fa lib/protoparser/prometheus: add a fast path to AreIdenticalSeriesFast when two identical strings are passed to it
This may be the case when repeated scrapes return the same set of metrics with the same values
2025-03-29 01:43:31 +01:00
Zakhar Bessarab
e242dd5bf2 make vendor-update
Support of the latest prometheus/common is not released yet so pin to previous version.
Related commit at prometheus/prometheus: 95f49dd84b

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-03-28 18:31:49 +04:00
Zakhar Bessarab
f863b331c1 app/vmalert/rule: follow-up for d8fe739aba
Remove tenancy-related part of the commit as it is not relevant to OS vmalert version.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-03-28 18:29:41 +04:00
Aliaksandr Valialkin
a98779770c lib/promscrape: run BenchmarkScrapeWorkScrapeInternalStreamBigData on all the available CPU cores
This allows verifying how the benchmark performance scales with the number of available CPU cores
and makes the results of the benchmark consistent with other BenchmarkScrapeWorkScrapeInternal* benchmarks.

Also reduce the amounts of memory allocations inside generateScrape() function in order to reduce
measurement noise during the BenchmarkScrapeWorkScrapeInternalStreamBigData run.

This is a follow-up after c05ffa906d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8515
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
2025-03-28 13:38:13 +01:00
Aliaksandr Valialkin
bcbcae2309 lib/promscrape: improve the performance of getLabelsHash() after c05ffa906d
Before the commit:

BenchmarkScrapeWorkGetLabelsHash-16    	23226468	       249.5 ns/op	   4.01 MB/s	       0 B/op	       0 allocs/op

After the commit:

BenchmarkScrapeWorkGetLabelsHash-16    	39100964	       154.7 ns/op	   6.46 MB/s	       0 B/op	       0 allocs/op

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8515
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
2025-03-28 13:38:12 +01:00
Aliaksandr Valialkin
5b8c10d08e lib/promscrape: run the BenchmarkScrapeWorkGetLabelsHash benchmark in parallel on all the available CPU cores
It is always better to run benchmarks in parallel on all the available CPU cores
in order to see how their performance scales with the number of CPU cores (GOMAXPROCS).

The commit also performs the following modifications:

- Removes the dependency of on the scrapeWork from getLabelsHash() function.

- Makes sure that the benchmark cannot be optimized out by the compiler, by introducing a dependency
  on a global Sink variable. Previously the getLabelsHash() function call could be optimized out
  by the compiler, since this call has no side effects, and the returned result is ignored.

- Reduces the amounts of memory allocations inside the BenchmarkScrapeWorkGetLabelsHash
  when preparing the labels for the benchmark. This should reduce measurements' noise during the benchmark.

This is a follow-up for c05ffa906d

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8515
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
2025-03-28 13:38:12 +01:00
Aliaksandr Valialkin
588fa4d90d lib/promscrape: consistently use io.LimitReader across all the VictoriaMetrics repository 2025-03-28 13:38:11 +01:00
Hui Wang
b9c777a578 vmgateway: properly set the Host header when routing requests to `-… (#869)
* vmgateway: properly set the `Host` header when routing requests to `-write.url` and `-read.url`

* Update docs/victoriametrics/changelog/CHANGELOG.md

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-28 12:27:19 +01:00
Hui Wang
d8fe739aba vmalert: properly attach tenant labels vm_account_id and `vm_projec… (#866)
* vmalert: properly attach tenant labels `vm_account_id` and `vm_project_id` to alerting rules when enabling `-clusterMode`

Previously, these labels were lost in alert messages to Alertmanager. Bug was introduced in [v1.112.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.112.0).
2025-03-28 12:26:55 +01:00
Aliaksandr Valialkin
e5ddf475b8 lib/promscrape/scrapework.go: typo fix in the comment: replace 'parsing parsing' with 'parsing' 2025-03-27 15:03:53 +01:00
Aliaksandr Valialkin
3c3c8668d6 lib/bytesutil: grow the buffer at ByteBuffer.ReadFrom more smoothly
Previously the buffer was increased by 30% after it became 50% full.
For example, if more than 5MB of data is read into 10MB buffer, then its' size
was increased to 13MB, leading to 13MB-5MB = 8MB of waste.
This translates to 8MB/5MB = 160% waste in the worst case.

The updated algorithm increases the buffer by 30% after it becomes ~94% full.
This means that if more than 9.4MB of data is read into 10MB buffer,
then its' size is increased to 13MB, leading to 13MB-9.4MB = 3.6MB of waste.
This translates to 3.6MB / 9.4MB = ~38% waste in the worst case.

This should reduce memory usage when vmagent reads big responses from scrape targets.

While at it, properly append the data to buffer if it already has more than 4KiB of data.
Previously the data over 4KiB in the buffer was lost after ReadFrom call.

This is a follow-up for f28f496a9d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6761
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759
2025-03-27 15:03:53 +01:00
Aliaksandr Valialkin
22d1b916bf lib/protoparser/protoparserutil: optimize ReadUncompressedData for zstd and snappy
It is faster to read the whole data and then decompress it in one go for zstd and snappy encodings.
This reduces the number of potential read() syscalls and decompress CGO calls needed
for reading and decompressing the data.
2025-03-27 15:03:52 +01:00
Aliaksandr Valialkin
35b31f904d lib/httputil: automatically initialize data transfer metrics for the created HTTP transports via NewTransport() 2025-03-27 15:03:52 +01:00
Dmytro Kozlov
7c05ec42fe app/vmctl: fix show logs for prometheus migration mode (#8529)
### Describe Your Changes
Fixed issue an issue with show stats at the end of the process. Please
check the images below
Before the fix

![image](https://github.com/user-attachments/assets/d549c327-ed2b-46c5-965c-4f3581f54d83)


After the fix


![image](https://github.com/user-attachments/assets/c3200aff-dd50-40cf-92a9-b09800a25834)

I fixed it by moving logic to the function. Now it works correctly.

Added the tests for the Prometheus migration mode (make tests great
again).

The main discussion was introduced in this
[PR](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7863).

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-03-27 14:12:10 +01:00
1056 changed files with 45279 additions and 10717 deletions

View File

@@ -6,4 +6,4 @@ Please provide a brief description of the changes you made. Be as specific as po
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).
- [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).

1
.gitignore vendored
View File

@@ -27,3 +27,4 @@ _site
coverage.txt
cspell.json
*~
deployment/docker/provisioning/plugins/

View File

@@ -504,7 +504,7 @@ fmt:
gofmt -l -w -s ./apptest
vet:
go vet ./lib/...
GOEXPERIMENT=synctest go vet ./lib/...
go vet ./app/...
go vet ./apptest/...
@@ -513,35 +513,35 @@ check-all: fmt vet golangci-lint govulncheck
clean-checkers: remove-golangci-lint remove-govulncheck
test:
go test ./lib/... ./app/...
GOEXPERIMENT=synctest go test ./lib/... ./app/...
test-race:
go test -race ./lib/... ./app/...
GOEXPERIMENT=synctest go test -race ./lib/... ./app/...
test-pure:
CGO_ENABLED=0 go test ./lib/... ./app/...
GOEXPERIMENT=synctest CGO_ENABLED=0 go test ./lib/... ./app/...
test-full:
go test -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...
GOEXPERIMENT=synctest go test -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...
test-full-386:
GOARCH=386 go test -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...
GOEXPERIMENT=synctest GOARCH=386 go test -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...
integration-test: victoria-metrics vmagent vmalert vmauth
go test ./apptest/... -skip="^TestCluster.*"
benchmark:
go test -bench=. ./lib/...
GOEXPERIMENT=synctest go test -bench=. ./lib/...
go test -bench=. ./app/...
benchmark-pure:
CGO_ENABLED=0 go test -bench=. ./lib/...
GOEXPERIMENT=synctest CGO_ENABLED=0 go test -bench=. ./lib/...
CGO_ENABLED=0 go test -bench=. ./app/...
vendor-update:
go get -u ./lib/...
go get -u ./app/...
go mod tidy -compat=1.23
go mod tidy -compat=1.24
go mod vendor
app-local:
@@ -564,7 +564,7 @@ install-qtc:
golangci-lint: install-golangci-lint
golangci-lint run
GOEXPERIMENT=synctest golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.64.7

View File

@@ -219,6 +219,42 @@ func (lmp *logMessageProcessor) AddRow(timestamp int64, fields, streamFields []l
}
}
// InsertRowProcessor is used by native data ingestion protocol parser.
type InsertRowProcessor interface {
// AddInsertRow must add r to the underlying storage.
AddInsertRow(r *logstorage.InsertRow)
}
// AddInsertRow adds r to lmp.
func (lmp *logMessageProcessor) AddInsertRow(r *logstorage.InsertRow) {
lmp.rowsIngestedTotal.Inc()
n := logstorage.EstimatedJSONRowLen(r.Fields)
lmp.bytesIngestedTotal.Add(n)
if len(r.Fields) > *MaxFieldsPerLine {
line := logstorage.MarshalFieldsToJSON(nil, r.Fields)
logger.Warnf("dropping log line with %d fields; it exceeds -insert.maxFieldsPerLine=%d; %s", len(r.Fields), *MaxFieldsPerLine, line)
rowsDroppedTotalTooManyFields.Inc()
return
}
lmp.mu.Lock()
defer lmp.mu.Unlock()
lmp.lr.MustAddInsertRow(r)
if lmp.cp.Debug {
s := lmp.lr.GetRowString(0)
lmp.lr.ResetKeepSettings()
logger.Infof("remoteAddr=%s; requestURI=%s; ignoring log entry because of `debug` arg: %s", lmp.cp.DebugRemoteAddr, lmp.cp.DebugRequestURI, s)
rowsDroppedTotalDebug.Inc()
return
}
if lmp.lr.NeedFlush() {
lmp.flushLocked()
}
}
// flushLocked must be called under locked lmp.mu.
func (lmp *logMessageProcessor) flushLocked() {
lmp.lastFlushTime = time.Now()

View File

@@ -0,0 +1,96 @@
package internalinsert
import (
"flag"
"fmt"
"net/http"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
)
var (
disableInsert = flag.Bool("internalinsert.disable", false, "Whether to disable /internal/insert HTTP endpoint")
maxRequestSize = flagutil.NewBytes("internalinsert.maxRequestSize", 64*1024*1024, "The maximum size in bytes of a single request, which can be accepted at /internal/insert HTTP endpoint")
)
// RequestHandler processes /internal/insert requests.
func RequestHandler(w http.ResponseWriter, r *http.Request) {
if *disableInsert {
httpserver.Errorf(w, r, "requests to /internal/insert are disabled with -internalinsert.disable command-line flag")
return
}
startTime := time.Now()
if r.Method != "POST" {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
version := r.FormValue("version")
if version != netinsert.ProtocolVersion {
httpserver.Errorf(w, r, "unsupported protocol version=%q; want %q", version, netinsert.ProtocolVersion)
return
}
requestsTotal.Inc()
cp, err := insertutil.GetCommonParams(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
if err := vlstorage.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
encoding := r.Header.Get("Content-Encoding")
err = protoparserutil.ReadUncompressedData(r.Body, encoding, maxRequestSize, func(data []byte) error {
lmp := cp.NewLogMessageProcessor("internalinsert", false)
irp := lmp.(insertutil.InsertRowProcessor)
err := parseData(irp, data)
lmp.MustClose()
return err
})
if err != nil {
errorsTotal.Inc()
httpserver.Errorf(w, r, "cannot parse internal insert request: %s", err)
return
}
requestDuration.UpdateDuration(startTime)
}
func parseData(irp insertutil.InsertRowProcessor, data []byte) error {
r := logstorage.GetInsertRow()
src := data
i := 0
for len(src) > 0 {
tail, err := r.UnmarshalInplace(src)
if err != nil {
return fmt.Errorf("cannot parse row #%d: %s", i, err)
}
src = tail
i++
irp.AddInsertRow(r)
}
logstorage.PutInsertRow(r)
return nil
}
var (
requestsTotal = metrics.NewCounter(`vl_http_requests_total{path="/internal/insert"}`)
errorsTotal = metrics.NewCounter(`vl_http_errors_total{path="/internal/insert"}`)
requestDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/internal/insert"}`)
)

View File

@@ -7,6 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/elasticsearch"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/internalinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/journald"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/jsonline"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/loki"
@@ -28,6 +29,11 @@ func Stop() {
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
if path == "/internal/insert" {
internalinsert.RequestHandler(w, r)
return true
}
if !strings.HasPrefix(path, "/insert/") {
// Skip requests, which do not start with /insert/, since these aren't our requests.
return false

View File

@@ -41,6 +41,26 @@ func TestPushProtoOk(t *testing.T) {
`{"_msg":"log-line-message","severity":"Trace"}`,
)
// severities mapping
f([]pb.ResourceLogs{
{
ScopeLogs: []pb.ScopeLogs{
{
LogRecords: []pb.LogRecord{
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1234, SeverityNumber: 1, Body: pb.AnyValue{StringValue: ptrTo("log-line-message")}},
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1234, SeverityNumber: 13, Body: pb.AnyValue{StringValue: ptrTo("log-line-message")}},
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1234, SeverityNumber: 24, Body: pb.AnyValue{StringValue: ptrTo("log-line-message")}},
},
},
},
},
},
[]int64{1234, 1234, 1234},
`{"_msg":"log-line-message","severity":"Trace"}
{"_msg":"log-line-message","severity":"Warn"}
{"_msg":"log-line-message","severity":"Fatal4"}`,
)
// multi-line with resource attributes
f([]pb.ResourceLogs{
{
@@ -60,7 +80,7 @@ func TestPushProtoOk(t *testing.T) {
{
LogRecords: []pb.LogRecord{
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1234, SeverityNumber: 1, Body: pb.AnyValue{StringValue: ptrTo("log-line-message")}},
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1235, SeverityNumber: 21, Body: pb.AnyValue{StringValue: ptrTo("log-line-message-msg-2")}},
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1235, SeverityNumber: 25, Body: pb.AnyValue{StringValue: ptrTo("log-line-message-msg-2")}},
{Attributes: []*pb.KeyValue{}, TimeUnixNano: 1236, SeverityNumber: -1, Body: pb.AnyValue{StringValue: ptrTo("log-line-message-msg-2")}},
},
},

View File

@@ -0,0 +1,324 @@
package internalselect
import (
"context"
"flag"
"fmt"
"net/http"
"strconv"
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netselect"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/atomicutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
)
var disableSelect = flag.Bool("internalselect.disable", false, "Whether to disable /internal/select/* HTTP endpoints")
// RequestHandler processes requests to /internal/select/*
func RequestHandler(ctx context.Context, w http.ResponseWriter, r *http.Request) {
if *disableSelect {
httpserver.Errorf(w, r, "requests to /internal/select/* are disabled with -internalselect.disable command-line flag")
return
}
startTime := time.Now()
path := r.URL.Path
rh := requestHandlers[path]
if rh == nil {
httpserver.Errorf(w, r, "unsupported endpoint requested: %s", path)
return
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vl_http_requests_total{path=%q}`, path)).Inc()
if err := rh(ctx, w, r); err != nil && !netutil.IsTrivialNetworkError(err) {
metrics.GetOrCreateCounter(fmt.Sprintf(`vl_http_request_errors_total{path=%q}`, path)).Inc()
httpserver.Errorf(w, r, "%s", err)
// The return is skipped intentionally in order to track the duration of failed queries.
}
metrics.GetOrCreateSummary(fmt.Sprintf(`vl_http_request_duration_seconds{path=%q}`, path)).UpdateDuration(startTime)
}
var requestHandlers = map[string]func(ctx context.Context, w http.ResponseWriter, r *http.Request) error{
"/internal/select/query": processQueryRequest,
"/internal/select/field_names": processFieldNamesRequest,
"/internal/select/field_values": processFieldValuesRequest,
"/internal/select/stream_field_names": processStreamFieldNamesRequest,
"/internal/select/stream_field_values": processStreamFieldValuesRequest,
"/internal/select/streams": processStreamsRequest,
"/internal/select/stream_ids": processStreamIDsRequest,
}
func processQueryRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.QueryProtocolVersion)
if err != nil {
return err
}
w.Header().Set("Content-Type", "application/octet-stream")
var wLock sync.Mutex
var dataLenBuf []byte
sendBuf := func(bb *bytesutil.ByteBuffer) error {
if len(bb.B) == 0 {
return nil
}
data := bb.B
if !cp.DisableCompression {
bufLen := len(bb.B)
bb.B = zstd.CompressLevel(bb.B, bb.B, 1)
data = bb.B[bufLen:]
}
wLock.Lock()
dataLenBuf = encoding.MarshalUint64(dataLenBuf[:0], uint64(len(data)))
_, err := w.Write(dataLenBuf)
if err == nil {
_, err = w.Write(data)
}
wLock.Unlock()
// Reset the sent buf
bb.Reset()
return err
}
var bufs atomicutil.Slice[bytesutil.ByteBuffer]
var errGlobalLock sync.Mutex
var errGlobal error
writeBlock := func(workerID uint, db *logstorage.DataBlock) {
if errGlobal != nil {
return
}
bb := bufs.Get(workerID)
bb.B = db.Marshal(bb.B)
if len(bb.B) < 1024*1024 {
// Fast path - the bb is too small to be sent to the client yet.
return
}
// Slow path - the bb must be sent to the client.
if err := sendBuf(bb); err != nil {
errGlobalLock.Lock()
if errGlobal != nil {
errGlobal = err
}
errGlobalLock.Unlock()
}
}
if err := vlstorage.RunQuery(ctx, cp.TenantIDs, cp.Query, writeBlock); err != nil {
return err
}
if errGlobal != nil {
return errGlobal
}
// Send the remaining data
for _, bb := range bufs.GetSlice() {
if err := sendBuf(bb); err != nil {
return err
}
}
return nil
}
func processFieldNamesRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.FieldNamesProtocolVersion)
if err != nil {
return err
}
fieldNames, err := vlstorage.GetFieldNames(ctx, cp.TenantIDs, cp.Query)
if err != nil {
return fmt.Errorf("cannot obtain field names: %w", err)
}
return writeValuesWithHits(w, fieldNames, cp.DisableCompression)
}
func processFieldValuesRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.FieldValuesProtocolVersion)
if err != nil {
return err
}
fieldName := r.FormValue("field")
limit, err := getInt64FromRequest(r, "limit")
if err != nil {
return err
}
fieldValues, err := vlstorage.GetFieldValues(ctx, cp.TenantIDs, cp.Query, fieldName, uint64(limit))
if err != nil {
return fmt.Errorf("cannot obtain field values: %w", err)
}
return writeValuesWithHits(w, fieldValues, cp.DisableCompression)
}
func processStreamFieldNamesRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.StreamFieldNamesProtocolVersion)
if err != nil {
return err
}
fieldNames, err := vlstorage.GetStreamFieldNames(ctx, cp.TenantIDs, cp.Query)
if err != nil {
return fmt.Errorf("cannot obtain stream field names: %w", err)
}
return writeValuesWithHits(w, fieldNames, cp.DisableCompression)
}
func processStreamFieldValuesRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.StreamFieldValuesProtocolVersion)
if err != nil {
return err
}
fieldName := r.FormValue("field")
limit, err := getInt64FromRequest(r, "limit")
if err != nil {
return err
}
fieldValues, err := vlstorage.GetStreamFieldValues(ctx, cp.TenantIDs, cp.Query, fieldName, uint64(limit))
if err != nil {
return fmt.Errorf("cannot obtain stream field values: %w", err)
}
return writeValuesWithHits(w, fieldValues, cp.DisableCompression)
}
func processStreamsRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.StreamsProtocolVersion)
if err != nil {
return err
}
limit, err := getInt64FromRequest(r, "limit")
if err != nil {
return err
}
streams, err := vlstorage.GetStreams(ctx, cp.TenantIDs, cp.Query, uint64(limit))
if err != nil {
return fmt.Errorf("cannot obtain streams: %w", err)
}
return writeValuesWithHits(w, streams, cp.DisableCompression)
}
func processStreamIDsRequest(ctx context.Context, w http.ResponseWriter, r *http.Request) error {
cp, err := getCommonParams(r, netselect.StreamIDsProtocolVersion)
if err != nil {
return err
}
limit, err := getInt64FromRequest(r, "limit")
if err != nil {
return err
}
streamIDs, err := vlstorage.GetStreamIDs(ctx, cp.TenantIDs, cp.Query, uint64(limit))
if err != nil {
return fmt.Errorf("cannot obtain streams: %w", err)
}
return writeValuesWithHits(w, streamIDs, cp.DisableCompression)
}
type commonParams struct {
TenantIDs []logstorage.TenantID
Query *logstorage.Query
DisableCompression bool
}
func getCommonParams(r *http.Request, expectedProtocolVersion string) (*commonParams, error) {
version := r.FormValue("version")
if version != expectedProtocolVersion {
return nil, fmt.Errorf("unexpected version=%q; want %q", version, expectedProtocolVersion)
}
tenantIDsStr := r.FormValue("tenant_ids")
tenantIDs, err := logstorage.UnmarshalTenantIDs([]byte(tenantIDsStr))
if err != nil {
return nil, fmt.Errorf("cannot unmarshal tenant_ids=%q: %w", tenantIDsStr, err)
}
timestamp, err := getInt64FromRequest(r, "timestamp")
if err != nil {
return nil, err
}
qStr := r.FormValue("query")
q, err := logstorage.ParseQueryAtTimestamp(qStr, timestamp)
if err != nil {
return nil, fmt.Errorf("cannot unmarshal query=%q: %w", qStr, err)
}
s := r.FormValue("disable_compression")
disableCompression, err := strconv.ParseBool(s)
if err != nil {
return nil, fmt.Errorf("cannot parse disable_compression=%q: %w", s, err)
}
cp := &commonParams{
TenantIDs: tenantIDs,
Query: q,
DisableCompression: disableCompression,
}
return cp, nil
}
func writeValuesWithHits(w http.ResponseWriter, vhs []logstorage.ValueWithHits, disableCompression bool) error {
var b []byte
for i := range vhs {
b = vhs[i].Marshal(b)
}
if !disableCompression {
b = zstd.CompressLevel(nil, b, 1)
}
w.Header().Set("Content-Type", "application/octet-stream")
if _, err := w.Write(b); err != nil {
return fmt.Errorf("cannot send response to the client: %w", err)
}
return nil
}
func getInt64FromRequest(r *http.Request, argName string) (int64, error) {
s := r.FormValue(argName)
n, err := strconv.ParseInt(s, 10, 64)
if err != nil {
return 0, fmt.Errorf("cannot parse %s=%q: %w", argName, s, err)
}
return n, nil
}

View File

@@ -1,47 +0,0 @@
package logsql
import (
"bufio"
"io"
"sync"
)
func getBufferedWriter(w io.Writer) *bufferedWriter {
v := bufferedWriterPool.Get()
if v == nil {
return &bufferedWriter{
bw: bufio.NewWriter(w),
}
}
bw := v.(*bufferedWriter)
bw.bw.Reset(w)
return bw
}
func putBufferedWriter(bw *bufferedWriter) {
bw.reset()
bufferedWriterPool.Put(bw)
}
var bufferedWriterPool sync.Pool
type bufferedWriter struct {
mu sync.Mutex
bw *bufio.Writer
}
func (bw *bufferedWriter) reset() {
// nothing to do
}
func (bw *bufferedWriter) WriteIgnoreErrors(p []byte) {
bw.mu.Lock()
_, _ = bw.bw.Write(p)
bw.mu.Unlock()
}
func (bw *bufferedWriter) FlushIgnoreErrors() {
bw.mu.Lock()
_ = bw.bw.Flush()
bw.mu.Unlock()
}

View File

@@ -3,6 +3,7 @@ package logsql
import (
"context"
"fmt"
"io"
"math"
"net/http"
"regexp"
@@ -11,12 +12,14 @@ import (
"strconv"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/valyala/fastjson"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/atomicutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
@@ -57,10 +60,13 @@ func ProcessFacetsRequest(ctx context.Context, w http.ResponseWriter, r *http.Re
var mLock sync.Mutex
m := make(map[string][]facetEntry)
writeBlock := func(_ uint, _ []int64, columns []logstorage.BlockColumn) {
if len(columns) == 0 || len(columns[0].Values) == 0 {
writeBlock := func(_ uint, db *logstorage.DataBlock) {
rowsCount := db.RowsCount()
if rowsCount == 0 {
return
}
columns := db.Columns
if len(columns) != 3 {
logger.Panicf("BUG: expecting 3 columns; got %d columns", len(columns))
}
@@ -156,17 +162,19 @@ func ProcessHitsRequest(ctx context.Context, w http.ResponseWriter, r *http.Requ
var mLock sync.Mutex
m := make(map[string]*hitsSeries)
writeBlock := func(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
if len(columns) == 0 || len(columns[0].Values) == 0 {
writeBlock := func(_ uint, db *logstorage.DataBlock) {
rowsCount := db.RowsCount()
if rowsCount == 0 {
return
}
columns := db.Columns
timestampValues := columns[0].Values
hitsValues := columns[len(columns)-1].Values
columns = columns[1 : len(columns)-1]
bb := blockResultPool.Get()
for i := range timestamps {
for i := 0; i < rowsCount; i++ {
timestampStr := strings.Clone(timestampValues[i])
hitsStr := strings.Clone(hitsValues[i])
hits, err := strconv.ParseUint(hitsStr, 10, 64)
@@ -205,6 +213,8 @@ func ProcessHitsRequest(ctx context.Context, w http.ResponseWriter, r *http.Requ
WriteHitsSeries(w, m)
}
var blockResultPool bytesutil.ByteBufferPool
func getTopHitsSeries(m map[string]*hitsSeries, fieldsLimit int) map[string]*hitsSeries {
if fieldsLimit <= 0 || fieldsLimit >= len(m) {
return m
@@ -536,7 +546,7 @@ var liveTailRequests = metrics.NewCounter(`vl_live_tailing_requests`)
const tailOffsetNsecs = 5e9
type logRow struct {
timestamp int64
timestamp string
fields []logstorage.Field
}
@@ -552,7 +562,7 @@ type tailProcessor struct {
mu sync.Mutex
perStreamRows map[string][]logRow
lastTimestamps map[string]int64
lastTimestamps map[string]string
err error
}
@@ -562,12 +572,12 @@ func newTailProcessor(cancel func()) *tailProcessor {
cancel: cancel,
perStreamRows: make(map[string][]logRow),
lastTimestamps: make(map[string]int64),
lastTimestamps: make(map[string]string),
}
}
func (tp *tailProcessor) writeBlock(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
if len(timestamps) == 0 {
func (tp *tailProcessor) writeBlock(_ uint, db *logstorage.DataBlock) {
if db.RowsCount() == 0 {
return
}
@@ -579,14 +589,8 @@ func (tp *tailProcessor) writeBlock(_ uint, timestamps []int64, columns []logsto
}
// Make sure columns contain _time field, since it is needed for proper tail work.
hasTime := false
for _, c := range columns {
if c.Name == "_time" {
hasTime = true
break
}
}
if !hasTime {
timestamps, ok := db.GetTimestamps()
if !ok {
tp.err = fmt.Errorf("missing _time field")
tp.cancel()
return
@@ -595,8 +599,8 @@ func (tp *tailProcessor) writeBlock(_ uint, timestamps []int64, columns []logsto
// Copy block rows to tp.perStreamRows
for i, timestamp := range timestamps {
streamID := ""
fields := make([]logstorage.Field, len(columns))
for j, c := range columns {
fields := make([]logstorage.Field, len(db.Columns))
for j, c := range db.Columns {
name := strings.Clone(c.Name)
value := strings.Clone(c.Values[i])
@@ -688,12 +692,15 @@ func ProcessStatsQueryRangeRequest(ctx context.Context, w http.ResponseWriter, r
m := make(map[string]*statsSeries)
var mLock sync.Mutex
writeBlock := func(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
writeBlock := func(_ uint, db *logstorage.DataBlock) {
rowsCount := db.RowsCount()
columns := db.Columns
clonedColumnNames := make([]string, len(columns))
for i, c := range columns {
clonedColumnNames[i] = strings.Clone(c.Name)
}
for i := range timestamps {
for i := 0; i < rowsCount; i++ {
// Do not move q.GetTimestamp() outside writeBlock, since ts
// must be initialized to query timestamp for every processed log row.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8312
@@ -802,12 +809,14 @@ func ProcessStatsQueryRequest(ctx context.Context, w http.ResponseWriter, r *htt
var rowsLock sync.Mutex
timestamp := q.GetTimestamp()
writeBlock := func(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
writeBlock := func(_ uint, db *logstorage.DataBlock) {
rowsCount := db.RowsCount()
columns := db.Columns
clonedColumnNames := make([]string, len(columns))
for i, c := range columns {
clonedColumnNames[i] = strings.Clone(c.Name)
}
for i := range timestamps {
for i := 0; i < rowsCount; i++ {
labels := make([]logstorage.Field, 0, len(byFields))
for j, c := range columns {
if slices.Contains(byFields, c.Name) {
@@ -869,11 +878,21 @@ func ProcessQueryRequest(ctx context.Context, w http.ResponseWriter, r *http.Req
return
}
bw := getBufferedWriter(w)
sw := &syncWriter{
w: w,
}
var bwShards atomicutil.Slice[bufferedWriter]
bwShards.Init = func(shard *bufferedWriter) {
shard.sw = sw
}
defer func() {
bw.FlushIgnoreErrors()
putBufferedWriter(bw)
shards := bwShards.GetSlice()
for _, shard := range shards {
shard.FlushIgnoreErrors()
}
}()
w.Header().Set("Content-Type", "application/stream+json")
if limit > 0 {
@@ -883,32 +902,34 @@ func ProcessQueryRequest(ctx context.Context, w http.ResponseWriter, r *http.Req
httpserver.Errorf(w, r, "%s", err)
return
}
bb := blockResultPool.Get()
b := bb.B
bw := bwShards.Get(0)
for i := range rows {
b = logstorage.MarshalFieldsToJSON(b[:0], rows[i].fields)
b = append(b, '\n')
bw.WriteIgnoreErrors(b)
bw.buf = logstorage.MarshalFieldsToJSON(bw.buf, rows[i].fields)
bw.buf = append(bw.buf, '\n')
if len(bw.buf) > 16*1024 {
bw.FlushIgnoreErrors()
}
}
bb.B = b
blockResultPool.Put(bb)
return
}
q.AddPipeLimit(uint64(limit))
}
writeBlock := func(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
if len(columns) == 0 || len(columns[0].Values) == 0 {
writeBlock := func(workerID uint, db *logstorage.DataBlock) {
rowsCount := db.RowsCount()
if rowsCount == 0 {
return
}
columns := db.Columns
bb := blockResultPool.Get()
for i := range timestamps {
WriteJSONRow(bb, columns, i)
bw := bwShards.Get(workerID)
for i := 0; i < rowsCount; i++ {
WriteJSONRow(bw, columns, i)
if len(bw.buf) > 16*1024 {
bw.FlushIgnoreErrors()
}
}
bw.WriteIgnoreErrors(bb.B)
blockResultPool.Put(bb)
}
if err := vlstorage.RunQuery(ctx, tenantIDs, q, writeBlock); err != nil {
@@ -917,14 +938,37 @@ func ProcessQueryRequest(ctx context.Context, w http.ResponseWriter, r *http.Req
}
}
var blockResultPool bytesutil.ByteBufferPool
type row struct {
timestamp int64
fields []logstorage.Field
type syncWriter struct {
mu sync.Mutex
w io.Writer
}
func getLastNQueryResults(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit int) ([]row, error) {
func (sw *syncWriter) Write(p []byte) (int, error) {
sw.mu.Lock()
n, err := sw.w.Write(p)
sw.mu.Unlock()
return n, err
}
type bufferedWriter struct {
buf []byte
sw *syncWriter
}
func (bw *bufferedWriter) Write(p []byte) (int, error) {
bw.buf = append(bw.buf, p...)
// Do not send bw.buf to bw.sw here, since the data at bw.buf may be incomplete (it must end with '\n')
return len(p), nil
}
func (bw *bufferedWriter) FlushIgnoreErrors() {
_, _ = bw.sw.Write(bw.buf)
bw.buf = bw.buf[:0]
}
func getLastNQueryResults(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit int) ([]logRow, error) {
limitUpper := 2 * limit
q.AddPipeLimit(uint64(limitUpper))
@@ -993,7 +1037,7 @@ func getLastNQueryResults(ctx context.Context, tenantIDs []logstorage.TenantID,
}
}
func getLastNRows(rows []row, limit int) []row {
func getLastNRows(rows []logRow, limit int) []logRow {
sort.Slice(rows, func(i, j int) bool {
return rows[i].timestamp < rows[j].timestamp
})
@@ -1003,18 +1047,31 @@ func getLastNRows(rows []row, limit int) []row {
return rows
}
func getQueryResultsWithLimit(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit int) ([]row, error) {
func getQueryResultsWithLimit(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit int) ([]logRow, error) {
ctxWithCancel, cancel := context.WithCancel(ctx)
defer cancel()
var rows []row
var missingTimeColumn atomic.Bool
var rows []logRow
var rowsLock sync.Mutex
writeBlock := func(_ uint, timestamps []int64, columns []logstorage.BlockColumn) {
writeBlock := func(_ uint, db *logstorage.DataBlock) {
if missingTimeColumn.Load() {
return
}
columns := db.Columns
clonedColumnNames := make([]string, len(columns))
for i, c := range columns {
clonedColumnNames[i] = strings.Clone(c.Name)
}
timestamps, ok := db.GetTimestamps()
if !ok {
missingTimeColumn.Store(true)
cancel()
return
}
for i, timestamp := range timestamps {
fields := make([]logstorage.Field, len(columns))
for j := range columns {
@@ -1024,7 +1081,7 @@ func getQueryResultsWithLimit(ctx context.Context, tenantIDs []logstorage.Tenant
}
rowsLock.Lock()
rows = append(rows, row{
rows = append(rows, logRow{
timestamp: timestamp,
fields: fields,
})
@@ -1035,11 +1092,13 @@ func getQueryResultsWithLimit(ctx context.Context, tenantIDs []logstorage.Tenant
cancel()
}
}
if err := vlstorage.RunQuery(ctxWithCancel, tenantIDs, q, writeBlock); err != nil {
return nil, err
err := vlstorage.RunQuery(ctxWithCancel, tenantIDs, q, writeBlock)
if missingTimeColumn.Load() {
return nil, fmt.Errorf("missing _time column in the result for the query [%s]", q)
}
return rows, nil
return rows, err
}
func parseCommonArgs(r *http.Request) (*logstorage.Query, []logstorage.TenantID, error) {
@@ -1098,20 +1157,22 @@ func parseCommonArgs(r *http.Request) (*logstorage.Query, []logstorage.TenantID,
}
// Parse optional extra_filters
extraFiltersStr := r.FormValue("extra_filters")
extraFilters, err := parseExtraFilters(extraFiltersStr)
if err != nil {
return nil, nil, err
for _, extraFiltersStr := range r.Form["extra_filters"] {
extraFilters, err := parseExtraFilters(extraFiltersStr)
if err != nil {
return nil, nil, err
}
q.AddExtraFilters(extraFilters)
}
q.AddExtraFilters(extraFilters)
// Parse optional extra_stream_filters
extraStreamFiltersStr := r.FormValue("extra_stream_filters")
extraStreamFilters, err := parseExtraStreamFilters(extraStreamFiltersStr)
if err != nil {
return nil, nil, err
for _, extraStreamFiltersStr := range r.Form["extra_stream_filters"] {
extraStreamFilters, err := parseExtraStreamFilters(extraStreamFiltersStr)
if err != nil {
return nil, nil, err
}
q.AddExtraFilters(extraStreamFilters)
}
q.AddExtraFilters(extraStreamFilters)
return q, tenantIDs, nil
}

View File

@@ -9,6 +9,7 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlselect/internalselect"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlselect/logsql"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
@@ -72,7 +73,7 @@ var vmuiFileServer = http.FileServer(http.FS(vmuiFiles))
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
if !strings.HasPrefix(path, "/select/") {
if !strings.HasPrefix(path, "/select/") && !strings.HasPrefix(path, "/internal/select/") {
// Skip requests, which do not start with /select/, since these aren't our requests.
return false
}
@@ -99,9 +100,17 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
ctx := r.Context()
if path == "/select/logsql/tail" {
logsqlTailRequests.Inc()
// Process live tailing request without timeout, since it is OK to run live tailing requests for very long time.
// Also do not apply concurrency limit to tail requests, since these limits are intended for non-tail requests.
logsql.ProcessLiveTailRequest(ctx, w, r)
return true
}
// Limit the number of concurrent queries, which can consume big amounts of CPU time.
startTime := time.Now()
ctx := r.Context()
d := getMaxQueryDuration(r)
ctxWithTimeout, cancel := context.WithTimeout(ctx, d)
defer cancel()
@@ -111,11 +120,10 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
defer decRequestConcurrency()
if path == "/select/logsql/tail" {
logsqlTailRequests.Inc()
// Process live tailing request without timeout (e.g. use ctx instead of ctxWithTimeout),
// since it is OK to run live tailing requests for very long time.
logsql.ProcessLiveTailRequest(ctx, w, r)
if strings.HasPrefix(path, "/internal/select/") {
// Process internal request from vlselect without timeout (e.g. use ctx instead of ctxWithTimeout),
// since the timeout must be controlled by the vlselect.
internalselect.RequestHandler(ctx, w, r)
return true
}
@@ -124,15 +132,17 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return false
}
err := ctxWithTimeout.Err()
logRequestErrorIfNeeded(ctxWithTimeout, w, r, startTime)
return true
}
func logRequestErrorIfNeeded(ctx context.Context, w http.ResponseWriter, r *http.Request, startTime time.Time) {
err := ctx.Err()
switch err {
case nil:
// nothing to do
case context.Canceled:
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Infof("client has canceled the request after %.3f seconds: remoteAddr=%s, requestURI: %q",
time.Since(startTime).Seconds(), remoteAddr, requestURI)
// do not log canceled requests, since they are expected and legal.
case context.DeadlineExceeded:
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("the request couldn't be executed in %.3f seconds; possible solutions: "+
@@ -143,8 +153,6 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
default:
httpserver.Errorf(w, r, "unexpected error: %s", err)
}
return true
}
func incRequestConcurrency(ctx context.Context, w http.ResponseWriter, r *http.Request) bool {

View File

@@ -5,9 +5,13 @@ menu:
docs:
parent: 'victoriametrics'
weight: 23
tags:
- metrics
aliases:
- /ExtendedPromQL.html
- /MetricsQL.html
- /metricsql/index.html
- /metricsql/
---
[VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) implements MetricsQL -
query language inspired by [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/).

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -35,10 +35,10 @@
<meta property="og:title" content="UI for VictoriaLogs">
<meta property="og:url" content="https://victoriametrics.com/products/victorialogs/">
<meta property="og:description" content="Explore your log data with VictoriaLogs UI">
<script type="module" crossorigin src="./assets/index-BgdvCSTM.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-DojlIpLz.js">
<script type="module" crossorigin src="./assets/index-Wi3UrT0H.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-C-vZmbyg.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-D1GxaB_c.css">
<link rel="stylesheet" crossorigin href="./assets/index-u4IOGr0E.css">
<link rel="stylesheet" crossorigin href="./assets/index-Brup_hCI.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>

View File

@@ -10,11 +10,14 @@ import (
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netselect"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
var (
@@ -38,15 +41,59 @@ var (
minFreeDiskSpaceBytes = flagutil.NewBytes("storage.minFreeDiskSpaceBytes", 10e6, "The minimum free disk space at -storageDataPath after which "+
"the storage stops accepting new data")
forceMergeAuthKey = flagutil.NewPassword("forceMergeAuthKey", "authKey, which must be passed in query string to /internal/force_merge pages. It overrides -httpAuth.*")
forceMergeAuthKey = flagutil.NewPassword("forceMergeAuthKey", "authKey, which must be passed in query string to /internal/force_merge . It overrides -httpAuth.* . "+
"See https://docs.victoriametrics.com/victorialogs/#forced-merge")
forceFlushAuthKey = flagutil.NewPassword("forceFlushAuthKey", "authKey, which must be passed in query string to /internal/force_flush . It overrides -httpAuth.* . "+
"See https://docs.victoriametrics.com/victorialogs/#forced-flush")
storageNodeAddrs = flagutil.NewArrayString("storageNode", "Comma-separated list of TCP addresses for storage nodes to route the ingested logs to and to send select queries to. "+
"If the list is empty, then the ingested logs are stored and queried locally from -storageDataPath")
insertConcurrency = flag.Int("insert.concurrency", 2, "The average number of concurrent data ingestion requests, which can be sent to every -storageNode")
insertDisableCompression = flag.Bool("insert.disableCompression", false, "Whether to disable compression when sending the ingested data to -storageNode nodes. "+
"Disabled compression reduces CPU usage at the cost of higher network usage")
selectDisableCompression = flag.Bool("select.disableCompression", false, "Whether to disable compression for select query responses received from -storageNode nodes. "+
"Disabled compression reduces CPU usage at the cost of higher network usage")
storageNodeUsername = flagutil.NewArrayString("storageNode.username", "Optional basic auth username to use for the corresponding -storageNode")
storageNodePassword = flagutil.NewArrayString("storageNode.password", "Optional basic auth password to use for the corresponding -storageNode")
storageNodePasswordFile = flagutil.NewArrayString("storageNode.passwordFile", "Optional path to basic auth password to use for the corresponding -storageNode. "+
"The file is re-read every second")
storageNodeBearerToken = flagutil.NewArrayString("storageNode.bearerToken", "Optional bearer auth token to use for the corresponding -storageNode")
storageNodeBearerTokenFile = flagutil.NewArrayString("storageNode.bearerTokenFile", "Optional path to bearer token file to use for the corresponding -storageNode. "+
"The token is re-read from the file every second")
storageNodeTLS = flagutil.NewArrayBool("storageNode.tls", "Whether to use TLS (HTTPS) protocol for communicating with the corresponding -storageNode. "+
"By default communication is performed via HTTP")
storageNodeTLSCAFile = flagutil.NewArrayString("storageNode.tlsCAFile", "Optional path to TLS CA file to use for verifying connections to the corresponding -storageNode. "+
"By default, system CA is used")
storageNodeTLSCertFile = flagutil.NewArrayString("storageNode.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -storageNode")
storageNodeTLSKeyFile = flagutil.NewArrayString("storageNode.tlsKeyFile", "Optional path to client-side TLS certificate key to use when connecting to the corresponding -storageNode")
storageNodeTLSServerName = flagutil.NewArrayString("storageNode.tlsServerName", "Optional TLS server name to use for connections to the corresponding -storageNode. "+
"By default, the server name from -storageNode is used")
storageNodeTLSInsecureSkipVerify = flagutil.NewArrayBool("storageNode.tlsInsecureSkipVerify", "Whether to skip tls verification when connecting to the corresponding -storageNode")
)
var localStorage *logstorage.Storage
var localStorageMetrics *metrics.Set
var netstorageInsert *netinsert.Storage
var netstorageSelect *netselect.Storage
// Init initializes vlstorage.
//
// Stop must be called when vlstorage is no longer needed
func Init() {
if strg != nil {
logger.Panicf("BUG: Init() has been already called")
if len(*storageNodeAddrs) == 0 {
initLocalStorage()
} else {
initNetworkStorage()
}
}
func initLocalStorage() {
if localStorage != nil {
logger.Panicf("BUG: initLocalStorage() has been already called")
}
if retentionPeriod.Duration() < 24*time.Hour {
@@ -63,60 +110,157 @@ func Init() {
}
logger.Infof("opening storage at -storageDataPath=%s", *storageDataPath)
startTime := time.Now()
strg = logstorage.MustOpenStorage(*storageDataPath, cfg)
localStorage = logstorage.MustOpenStorage(*storageDataPath, cfg)
var ss logstorage.StorageStats
strg.UpdateStats(&ss)
localStorage.UpdateStats(&ss)
logger.Infof("successfully opened storage in %.3f seconds; smallParts: %d; bigParts: %d; smallPartBlocks: %d; bigPartBlocks: %d; smallPartRows: %d; bigPartRows: %d; "+
"smallPartSize: %d bytes; bigPartSize: %d bytes",
time.Since(startTime).Seconds(), ss.SmallParts, ss.BigParts, ss.SmallPartBlocks, ss.BigPartBlocks, ss.SmallPartRowsCount, ss.BigPartRowsCount,
ss.CompressedSmallPartSize, ss.CompressedBigPartSize)
// register storage metrics
storageMetrics = metrics.NewSet()
storageMetrics.RegisterMetricsWriter(func(w io.Writer) {
writeStorageMetrics(w, strg)
// register local storage metrics
localStorageMetrics = metrics.NewSet()
localStorageMetrics.RegisterMetricsWriter(func(w io.Writer) {
writeStorageMetrics(w, localStorage)
})
metrics.RegisterSet(storageMetrics)
metrics.RegisterSet(localStorageMetrics)
}
func initNetworkStorage() {
if netstorageInsert != nil || netstorageSelect != nil {
logger.Panicf("BUG: initNetworkStorage() has been already called")
}
authCfgs := make([]*promauth.Config, len(*storageNodeAddrs))
isTLSs := make([]bool, len(*storageNodeAddrs))
for i := range authCfgs {
authCfgs[i] = newAuthConfigForStorageNode(i)
isTLSs[i] = storageNodeTLS.GetOptionalArg(i)
}
logger.Infof("starting insert service for nodes %s", *storageNodeAddrs)
netstorageInsert = netinsert.NewStorage(*storageNodeAddrs, authCfgs, isTLSs, *insertConcurrency, *insertDisableCompression)
logger.Infof("initializing select service for nodes %s", *storageNodeAddrs)
netstorageSelect = netselect.NewStorage(*storageNodeAddrs, authCfgs, isTLSs, *selectDisableCompression)
logger.Infof("initialized all the network services")
}
func newAuthConfigForStorageNode(argIdx int) *promauth.Config {
username := storageNodeUsername.GetOptionalArg(argIdx)
password := storageNodePassword.GetOptionalArg(argIdx)
passwordFile := storageNodePasswordFile.GetOptionalArg(argIdx)
var basicAuthCfg *promauth.BasicAuthConfig
if username != "" || password != "" || passwordFile != "" {
basicAuthCfg = &promauth.BasicAuthConfig{
Username: username,
Password: promauth.NewSecret(password),
PasswordFile: passwordFile,
}
}
token := storageNodeBearerToken.GetOptionalArg(argIdx)
tokenFile := storageNodeBearerTokenFile.GetOptionalArg(argIdx)
tlsCfg := &promauth.TLSConfig{
CAFile: storageNodeTLSCAFile.GetOptionalArg(argIdx),
CertFile: storageNodeTLSCertFile.GetOptionalArg(argIdx),
KeyFile: storageNodeTLSKeyFile.GetOptionalArg(argIdx),
ServerName: storageNodeTLSServerName.GetOptionalArg(argIdx),
InsecureSkipVerify: storageNodeTLSInsecureSkipVerify.GetOptionalArg(argIdx),
}
opts := &promauth.Options{
BasicAuth: basicAuthCfg,
BearerToken: token,
BearerTokenFile: tokenFile,
TLSConfig: tlsCfg,
}
ac, err := opts.NewConfig()
if err != nil {
logger.Panicf("FATAL: cannot populate auth config for storage node #%d: %s", argIdx, err)
}
return ac
}
// Stop stops vlstorage.
func Stop() {
metrics.UnregisterSet(storageMetrics, true)
storageMetrics = nil
if localStorage != nil {
metrics.UnregisterSet(localStorageMetrics, true)
localStorageMetrics = nil
strg.MustClose()
strg = nil
localStorage.MustClose()
localStorage = nil
} else {
netstorageInsert.MustStop()
netstorageInsert = nil
netstorageSelect.MustStop()
netstorageSelect = nil
}
}
// RequestHandler is a storage request handler.
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
if path == "/internal/force_merge" {
if !httpserver.CheckAuthFlag(w, r, forceMergeAuthKey) {
return true
}
// Run force merge in background
partitionNamePrefix := r.FormValue("partition_prefix")
go func() {
activeForceMerges.Inc()
defer activeForceMerges.Dec()
logger.Infof("forced merge for partition_prefix=%q has been started", partitionNamePrefix)
startTime := time.Now()
strg.MustForceMerge(partitionNamePrefix)
logger.Infof("forced merge for partition_prefix=%q has been successfully finished in %.3f seconds", partitionNamePrefix, time.Since(startTime).Seconds())
}()
return true
switch path {
case "/internal/force_merge":
return processForceMerge(w, r)
case "/internal/force_flush":
return processForceFlush(w, r)
}
return false
}
var strg *logstorage.Storage
var storageMetrics *metrics.Set
func processForceMerge(w http.ResponseWriter, r *http.Request) bool {
if localStorage == nil {
// Force merge isn't supported by non-local storage
return false
}
// CanWriteData returns non-nil error if it cannot write data to vlstorage.
if !httpserver.CheckAuthFlag(w, r, forceMergeAuthKey) {
return true
}
// Run force merge in background
partitionNamePrefix := r.FormValue("partition_prefix")
go func() {
activeForceMerges.Inc()
defer activeForceMerges.Dec()
logger.Infof("forced merge for partition_prefix=%q has been started", partitionNamePrefix)
startTime := time.Now()
localStorage.MustForceMerge(partitionNamePrefix)
logger.Infof("forced merge for partition_prefix=%q has been successfully finished in %.3f seconds", partitionNamePrefix, time.Since(startTime).Seconds())
}()
return true
}
func processForceFlush(w http.ResponseWriter, r *http.Request) bool {
if localStorage == nil {
// Force merge isn't supported by non-local storage
return false
}
if !httpserver.CheckAuthFlag(w, r, forceFlushAuthKey) {
return true
}
logger.Infof("flushing storage to make pending data available for reading")
localStorage.DebugFlush()
return true
}
// CanWriteData returns non-nil error if it cannot write data to vlstorage
func CanWriteData() error {
if strg.IsReadOnly() {
if localStorage == nil {
// The data can be always written in non-local mode.
return nil
}
if localStorage.IsReadOnly() {
return &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot add rows into storage in read-only mode; the storage can be in read-only mode "+
"because of lack of free disk space at -storageDataPath=%s", *storageDataPath),
@@ -130,50 +274,77 @@ func CanWriteData() error {
//
// It is advised to call CanWriteData() before calling MustAddRows()
func MustAddRows(lr *logstorage.LogRows) {
strg.MustAddRows(lr)
if localStorage != nil {
// Store lr in the local storage.
localStorage.MustAddRows(lr)
} else {
// Store lr across the remote storage nodes.
lr.ForEachRow(netstorageInsert.AddRow)
}
}
// RunQuery runs the given q and calls writeBlock for the returned data blocks
func RunQuery(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, writeBlock logstorage.WriteBlockFunc) error {
return strg.RunQuery(ctx, tenantIDs, q, writeBlock)
func RunQuery(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, writeBlock logstorage.WriteDataBlockFunc) error {
if localStorage != nil {
return localStorage.RunQuery(ctx, tenantIDs, q, writeBlock)
}
return netstorageSelect.RunQuery(ctx, tenantIDs, q, writeBlock)
}
// GetFieldNames executes q and returns field names seen in results.
func GetFieldNames(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query) ([]logstorage.ValueWithHits, error) {
return strg.GetFieldNames(ctx, tenantIDs, q)
if localStorage != nil {
return localStorage.GetFieldNames(ctx, tenantIDs, q)
}
return netstorageSelect.GetFieldNames(ctx, tenantIDs, q)
}
// GetFieldValues executes q and returns unique values for the fieldName seen in results.
//
// If limit > 0, then up to limit unique values are returned.
func GetFieldValues(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, fieldName string, limit uint64) ([]logstorage.ValueWithHits, error) {
return strg.GetFieldValues(ctx, tenantIDs, q, fieldName, limit)
if localStorage != nil {
return localStorage.GetFieldValues(ctx, tenantIDs, q, fieldName, limit)
}
return netstorageSelect.GetFieldValues(ctx, tenantIDs, q, fieldName, limit)
}
// GetStreamFieldNames executes q and returns stream field names seen in results.
func GetStreamFieldNames(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query) ([]logstorage.ValueWithHits, error) {
return strg.GetStreamFieldNames(ctx, tenantIDs, q)
if localStorage != nil {
return localStorage.GetStreamFieldNames(ctx, tenantIDs, q)
}
return netstorageSelect.GetStreamFieldNames(ctx, tenantIDs, q)
}
// GetStreamFieldValues executes q and returns stream field values for the given fieldName seen in results.
//
// If limit > 0, then up to limit unique stream field values are returned.
func GetStreamFieldValues(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, fieldName string, limit uint64) ([]logstorage.ValueWithHits, error) {
return strg.GetStreamFieldValues(ctx, tenantIDs, q, fieldName, limit)
if localStorage != nil {
return localStorage.GetStreamFieldValues(ctx, tenantIDs, q, fieldName, limit)
}
return netstorageSelect.GetStreamFieldValues(ctx, tenantIDs, q, fieldName, limit)
}
// GetStreams executes q and returns streams seen in query results.
//
// If limit > 0, then up to limit unique streams are returned.
func GetStreams(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit uint64) ([]logstorage.ValueWithHits, error) {
return strg.GetStreams(ctx, tenantIDs, q, limit)
if localStorage != nil {
return localStorage.GetStreams(ctx, tenantIDs, q, limit)
}
return netstorageSelect.GetStreams(ctx, tenantIDs, q, limit)
}
// GetStreamIDs executes q and returns streamIDs seen in query results.
//
// If limit > 0, then up to limit unique streamIDs are returned.
func GetStreamIDs(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit uint64) ([]logstorage.ValueWithHits, error) {
return strg.GetStreamIDs(ctx, tenantIDs, q, limit)
if localStorage != nil {
return localStorage.GetStreamIDs(ctx, tenantIDs, q, limit)
}
return netstorageSelect.GetStreamIDs(ctx, tenantIDs, q, limit)
}
func writeStorageMetrics(w io.Writer, strg *logstorage.Storage) {

View File

@@ -0,0 +1,369 @@
package netinsert
import (
"errors"
"fmt"
"io"
"net/http"
"net/url"
"sync"
"sync/atomic"
"time"
"github.com/valyala/fastrand"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/contextutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
)
// the maximum size of a single data block sent to storage node.
const maxInsertBlockSize = 2 * 1024 * 1024
// ProtocolVersion is the version of the data ingestion protocol.
//
// It must be changed every time the data encoding at /internal/insert HTTP endpoint is changed.
const ProtocolVersion = "v1"
// Storage is a network storage for sending data to remote storage nodes in the cluster.
type Storage struct {
sns []*storageNode
disableCompression bool
srt *streamRowsTracker
pendingDataBuffers chan *bytesutil.ByteBuffer
stopCh chan struct{}
wg sync.WaitGroup
}
type storageNode struct {
// scheme is http or https scheme to communicate with addr
scheme string
// addr is TCP address of storage node to send the ingested data to
addr string
// s is a storage, which holds the given storageNode
s *Storage
// c is an http client used for sending data blocks to addr.
c *http.Client
// ac is auth config used for setting request headers such as Authorization and Host.
ac *promauth.Config
// pendingData contains pending data, which must be sent to the storage node at the addr.
pendingDataMu sync.Mutex
pendingData *bytesutil.ByteBuffer
pendingDataLastFlush time.Time
// the unix timestamp until the storageNode is disabled for data writing.
disabledUntil atomic.Uint64
}
func newStorageNode(s *Storage, addr string, ac *promauth.Config, isTLS bool) *storageNode {
tr := httputil.NewTransport(false, "vlinsert_backend")
tr.TLSHandshakeTimeout = 20 * time.Second
tr.DisableCompression = true
scheme := "http"
if isTLS {
scheme = "https"
}
sn := &storageNode{
scheme: scheme,
addr: addr,
s: s,
c: &http.Client{
Transport: ac.NewRoundTripper(tr),
},
ac: ac,
pendingData: &bytesutil.ByteBuffer{},
}
s.wg.Add(1)
go func() {
defer s.wg.Done()
sn.backgroundFlusher()
}()
return sn
}
func (sn *storageNode) backgroundFlusher() {
t := time.NewTicker(time.Second)
defer t.Stop()
for {
select {
case <-sn.s.stopCh:
return
case <-t.C:
sn.flushPendingData()
}
}
}
func (sn *storageNode) flushPendingData() {
sn.pendingDataMu.Lock()
if time.Since(sn.pendingDataLastFlush) < time.Second {
// nothing to flush
sn.pendingDataMu.Unlock()
return
}
pendingData := sn.grabPendingDataForFlushLocked()
sn.pendingDataMu.Unlock()
sn.mustSendInsertRequest(pendingData)
}
func (sn *storageNode) addRow(r *logstorage.InsertRow) {
bb := bbPool.Get()
b := bb.B
b = r.Marshal(b)
if len(b) > maxInsertBlockSize {
logger.Warnf("skipping too long log entry, since its length exceeds %d bytes; the actual log entry length is %d bytes; log entry contents: %s", maxInsertBlockSize, len(b), b)
return
}
var pendingData *bytesutil.ByteBuffer
sn.pendingDataMu.Lock()
if sn.pendingData.Len()+len(b) > maxInsertBlockSize {
pendingData = sn.grabPendingDataForFlushLocked()
}
sn.pendingData.MustWrite(b)
sn.pendingDataMu.Unlock()
bb.B = b
bbPool.Put(bb)
if pendingData != nil {
sn.mustSendInsertRequest(pendingData)
}
}
var bbPool bytesutil.ByteBufferPool
func (sn *storageNode) grabPendingDataForFlushLocked() *bytesutil.ByteBuffer {
sn.pendingDataLastFlush = time.Now()
pendingData := sn.pendingData
sn.pendingData = <-sn.s.pendingDataBuffers
return pendingData
}
func (sn *storageNode) mustSendInsertRequest(pendingData *bytesutil.ByteBuffer) {
defer func() {
pendingData.Reset()
sn.s.pendingDataBuffers <- pendingData
}()
err := sn.sendInsertRequest(pendingData)
if err == nil {
return
}
if !errors.Is(err, errTemporarilyDisabled) {
logger.Warnf("%s; re-routing the data block to the remaining nodes", err)
}
for !sn.s.sendInsertRequestToAnyNode(pendingData) {
logger.Errorf("cannot send pending data to all storage nodes, since all of them are unavailable; re-trying to send the data in a second")
t := timerpool.Get(time.Second)
select {
case <-sn.s.stopCh:
timerpool.Put(t)
logger.Errorf("dropping %d bytes of data, since there are no available storage nodes", pendingData.Len())
return
case <-t.C:
timerpool.Put(t)
}
}
}
func (sn *storageNode) sendInsertRequest(pendingData *bytesutil.ByteBuffer) error {
dataLen := pendingData.Len()
if dataLen == 0 {
// Nothing to send.
return nil
}
if sn.disabledUntil.Load() > fasttime.UnixTimestamp() {
return errTemporarilyDisabled
}
ctx, cancel := contextutil.NewStopChanContext(sn.s.stopCh)
defer cancel()
var body io.Reader
if !sn.s.disableCompression {
bb := zstdBufPool.Get()
defer zstdBufPool.Put(bb)
bb.B = zstd.CompressLevel(bb.B[:0], pendingData.B, 1)
body = bb.NewReader()
} else {
body = pendingData.NewReader()
}
reqURL := sn.getRequestURL("/internal/insert")
req, err := http.NewRequestWithContext(ctx, "POST", reqURL, body)
if err != nil {
logger.Panicf("BUG: unexpected error when creating an http request: %s", err)
}
req.Header.Set("Content-Type", "application/octet-stream")
if !sn.s.disableCompression {
req.Header.Set("Content-Encoding", "zstd")
}
if err := sn.ac.SetHeaders(req, true); err != nil {
return fmt.Errorf("cannot set auth headers for %q: %w", reqURL, err)
}
resp, err := sn.c.Do(req)
if err != nil {
// Disable sn for data writing for 10 seconds.
sn.disabledUntil.Store(fasttime.UnixTimestamp() + 10)
return fmt.Errorf("cannot send data block with the length %d to %q: %s", pendingData.Len(), reqURL, err)
}
defer resp.Body.Close()
if resp.StatusCode/100 == 2 {
return nil
}
respBody, err := io.ReadAll(resp.Body)
if err != nil {
respBody = []byte(fmt.Sprintf("%s", err))
}
// Disable sn for data writing for 10 seconds.
sn.disabledUntil.Store(fasttime.UnixTimestamp() + 10)
return fmt.Errorf("unexpected status code returned when sending data block to %q: %d; want 2xx; response body: %q", reqURL, resp.StatusCode, respBody)
}
func (sn *storageNode) getRequestURL(path string) string {
return fmt.Sprintf("%s://%s%s?version=%s", sn.scheme, sn.addr, path, url.QueryEscape(ProtocolVersion))
}
var zstdBufPool bytesutil.ByteBufferPool
// NewStorage returns new Storage for the given addrs with the given authCfgs.
//
// The concurrency is the average number of concurrent connections per every addr.
//
// If disableCompression is set, then the data is sent uncompressed to the remote storage.
//
// Call MustStop on the returned storage when it is no longer needed.
func NewStorage(addrs []string, authCfgs []*promauth.Config, isTLSs []bool, concurrency int, disableCompression bool) *Storage {
pendingDataBuffers := make(chan *bytesutil.ByteBuffer, concurrency*len(addrs))
for i := 0; i < cap(pendingDataBuffers); i++ {
pendingDataBuffers <- &bytesutil.ByteBuffer{}
}
s := &Storage{
disableCompression: disableCompression,
pendingDataBuffers: pendingDataBuffers,
stopCh: make(chan struct{}),
}
sns := make([]*storageNode, len(addrs))
for i, addr := range addrs {
sns[i] = newStorageNode(s, addr, authCfgs[i], isTLSs[i])
}
s.sns = sns
s.srt = newStreamRowsTracker(len(sns))
return s
}
// MustStop stops the s.
func (s *Storage) MustStop() {
close(s.stopCh)
s.wg.Wait()
s.sns = nil
}
// AddRow adds the given log row into s.
func (s *Storage) AddRow(streamHash uint64, r *logstorage.InsertRow) {
idx := s.srt.getNodeIdx(streamHash)
sn := s.sns[idx]
sn.addRow(r)
}
func (s *Storage) sendInsertRequestToAnyNode(pendingData *bytesutil.ByteBuffer) bool {
startIdx := int(fastrand.Uint32n(uint32(len(s.sns))))
for i := range s.sns {
idx := (startIdx + i) % len(s.sns)
sn := s.sns[idx]
err := sn.sendInsertRequest(pendingData)
if err == nil {
return true
}
if !errors.Is(err, errTemporarilyDisabled) {
logger.Warnf("cannot send pending data to the storage node %q: %s; trying to send it to another storage node", sn.addr, err)
}
}
return false
}
var errTemporarilyDisabled = fmt.Errorf("writing to the node is temporarily disabled")
type streamRowsTracker struct {
mu sync.Mutex
nodesCount int64
rowsPerStream map[uint64]uint64
}
func newStreamRowsTracker(nodesCount int) *streamRowsTracker {
return &streamRowsTracker{
nodesCount: int64(nodesCount),
rowsPerStream: make(map[uint64]uint64),
}
}
func (srt *streamRowsTracker) getNodeIdx(streamHash uint64) uint64 {
if srt.nodesCount == 1 {
// Fast path for a single node.
return 0
}
srt.mu.Lock()
defer srt.mu.Unlock()
streamRows := srt.rowsPerStream[streamHash] + 1
srt.rowsPerStream[streamHash] = streamRows
if streamRows <= 1000 {
// Write the initial rows for the stream to a single storage node for better locality.
// This should work great for log streams containing small number of logs, since will be distributed
// evenly among available storage nodes because they have different streamHash.
return streamHash % uint64(srt.nodesCount)
}
// The log stream contains more than 1000 rows. Distribute them among storage nodes at random
// in order to improve query performance over this stream (the data for the log stream
// can be processed in parallel on all the storage nodes).
//
// The random distribution is preferred over round-robin distribution in order to avoid possible
// dependency between the order of the ingested logs and the number of storage nodes,
// which may lead to non-uniform distribution of logs among storage nodes.
return uint64(fastrand.Uint32n(uint32(srt.nodesCount)))
}

View File

@@ -0,0 +1,57 @@
package netinsert
import (
"fmt"
"math"
"math/rand"
"testing"
"github.com/cespare/xxhash/v2"
)
func TestStreamRowsTracker(t *testing.T) {
f := func(rowsCount, streamsCount, nodesCount int) {
t.Helper()
// generate stream hashes
streamHashes := make([]uint64, streamsCount)
for i := range streamHashes {
streamHashes[i] = xxhash.Sum64([]byte(fmt.Sprintf("stream %d.", i)))
}
srt := newStreamRowsTracker(nodesCount)
rng := rand.New(rand.NewSource(0))
rowsPerNode := make([]uint64, nodesCount)
for i := 0; i < rowsCount; i++ {
streamIdx := rng.Intn(streamsCount)
h := streamHashes[streamIdx]
nodeIdx := srt.getNodeIdx(h)
rowsPerNode[nodeIdx]++
}
// Verify that rows are uniformly distributed among nodes.
expectedRowsPerNode := float64(rowsCount) / float64(nodesCount)
for nodeIdx, nodeRows := range rowsPerNode {
if math.Abs(float64(nodeRows)-expectedRowsPerNode)/expectedRowsPerNode > 0.15 {
t.Fatalf("non-uniform distribution of rows among nodes; node %d has %d rows, while it must have %v rows; rowsPerNode=%d",
nodeIdx, nodeRows, expectedRowsPerNode, rowsPerNode)
}
}
}
rowsCount := 10000
streamsCount := 9
nodesCount := 2
f(rowsCount, streamsCount, nodesCount)
rowsCount = 10000
streamsCount = 100
nodesCount = 2
f(rowsCount, streamsCount, nodesCount)
rowsCount = 100000
streamsCount = 1000
nodesCount = 9
f(rowsCount, streamsCount, nodesCount)
}

View File

@@ -0,0 +1,469 @@
package netselect
import (
"context"
"errors"
"fmt"
"io"
"math"
"net/http"
"net/url"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/contextutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
)
const (
// FieldNamesProtocolVersion is the version of the protocol used for /internal/select/field_names HTTP endpoint.
//
// It must be updated every time the protocol changes.
FieldNamesProtocolVersion = "v1"
// FieldValuesProtocolVersion is the version of the protocol used for /internal/select/field_values HTTP endpoint.
//
// It must be updated every time the protocol changes.
FieldValuesProtocolVersion = "v1"
// StreamFieldNamesProtocolVersion is the version of the protocol used for /internal/select/stream_field_names HTTP endpoint.
//
// It must be updated every time the protocol changes.
StreamFieldNamesProtocolVersion = "v1"
// StreamFieldValuesProtocolVersion is the version of the protocol used for /internal/select/stream_field_values HTTP endpoint.
//
// It must be updated every time the protocol changes.
StreamFieldValuesProtocolVersion = "v1"
// StreamsProtocolVersion is the version of the protocol used for /internal/select/streams HTTP endpoint.
//
// It must be updated every time the protocol changes.
StreamsProtocolVersion = "v1"
// StreamIDsProtocolVersion is the version of the protocol used for /internal/select/stream_ids HTTP endpoint.
//
// It must be updated every time the protocol changes.
StreamIDsProtocolVersion = "v1"
// QueryProtocolVersion is the version of the protocol used for /internal/select/query HTTP endpoint.
//
// It must be updated every time the protocol changes.
QueryProtocolVersion = "v1"
)
// Storage is a network storage for querying remote storage nodes in the cluster.
type Storage struct {
sns []*storageNode
disableCompression bool
}
type storageNode struct {
// scheme is http or https scheme to communicate with addr
scheme string
// addr is TCP address of the storage node to query
addr string
// s is a storage, which holds the given storageNode
s *Storage
// c is an http client used for querying storage node at addr.
c *http.Client
// ac is auth config used for setting request headers such as Authorization and Host.
ac *promauth.Config
}
func newStorageNode(s *Storage, addr string, ac *promauth.Config, isTLS bool) *storageNode {
tr := httputil.NewTransport(false, "vlselect_backend")
tr.TLSHandshakeTimeout = 20 * time.Second
tr.DisableCompression = true
scheme := "http"
if isTLS {
scheme = "https"
}
sn := &storageNode{
scheme: scheme,
addr: addr,
s: s,
c: &http.Client{
Transport: ac.NewRoundTripper(tr),
},
ac: ac,
}
return sn
}
func (sn *storageNode) runQuery(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, processBlock func(db *logstorage.DataBlock)) error {
args := sn.getCommonArgs(QueryProtocolVersion, tenantIDs, q)
reqURL := sn.getRequestURL("/internal/select/query", args)
req, err := http.NewRequestWithContext(ctx, "GET", reqURL, nil)
if err != nil {
logger.Panicf("BUG: unexpected error when creating a request: %s", err)
}
if err := sn.ac.SetHeaders(req, true); err != nil {
return fmt.Errorf("cannot set auth headers for %q: %w", reqURL, err)
}
// send the request to the storage node
resp, err := sn.c.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
responseBody, err := io.ReadAll(resp.Body)
if err != nil {
responseBody = []byte(err.Error())
}
return fmt.Errorf("unexpected status code for the request to %q: %d; want %d; response: %q", reqURL, resp.StatusCode, http.StatusOK, responseBody)
}
// read the response
var dataLenBuf [8]byte
var buf []byte
var db logstorage.DataBlock
var valuesBuf []string
for {
if _, err := io.ReadFull(resp.Body, dataLenBuf[:]); err != nil {
if errors.Is(err, io.EOF) {
// The end of response stream
return nil
}
return fmt.Errorf("cannot read block size from %q: %w", reqURL, err)
}
blockLen := encoding.UnmarshalUint64(dataLenBuf[:])
if blockLen > math.MaxInt {
return fmt.Errorf("too big data block: %d bytes; mustn't exceed %v bytes", blockLen, math.MaxInt)
}
buf = slicesutil.SetLength(buf, int(blockLen))
if _, err := io.ReadFull(resp.Body, buf); err != nil {
return fmt.Errorf("cannot read block with size of %d bytes from %q: %w", blockLen, reqURL, err)
}
src := buf
if !sn.s.disableCompression {
bufLen := len(buf)
var err error
buf, err = zstd.Decompress(buf, buf)
if err != nil {
return fmt.Errorf("cannot decompress data block: %w", err)
}
src = buf[bufLen:]
}
for len(src) > 0 {
tail, vb, err := db.UnmarshalInplace(src, valuesBuf[:0])
if err != nil {
return fmt.Errorf("cannot unmarshal data block received from %q: %w", reqURL, err)
}
valuesBuf = vb
src = tail
processBlock(&db)
clear(valuesBuf)
}
}
}
func (sn *storageNode) getFieldNames(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query) ([]logstorage.ValueWithHits, error) {
args := sn.getCommonArgs(FieldNamesProtocolVersion, tenantIDs, q)
return sn.getValuesWithHits(ctx, "/internal/select/field_names", args)
}
func (sn *storageNode) getFieldValues(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, fieldName string, limit uint64) ([]logstorage.ValueWithHits, error) {
args := sn.getCommonArgs(FieldValuesProtocolVersion, tenantIDs, q)
args.Set("field", fieldName)
args.Set("limit", fmt.Sprintf("%d", limit))
return sn.getValuesWithHits(ctx, "/internal/select/field_values", args)
}
func (sn *storageNode) getStreamFieldNames(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query) ([]logstorage.ValueWithHits, error) {
args := sn.getCommonArgs(StreamFieldNamesProtocolVersion, tenantIDs, q)
return sn.getValuesWithHits(ctx, "/internal/select/stream_field_names", args)
}
func (sn *storageNode) getStreamFieldValues(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, fieldName string, limit uint64) ([]logstorage.ValueWithHits, error) {
args := sn.getCommonArgs(StreamFieldValuesProtocolVersion, tenantIDs, q)
args.Set("field", fieldName)
args.Set("limit", fmt.Sprintf("%d", limit))
return sn.getValuesWithHits(ctx, "/internal/select/stream_field_values", args)
}
func (sn *storageNode) getStreams(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit uint64) ([]logstorage.ValueWithHits, error) {
args := sn.getCommonArgs(StreamsProtocolVersion, tenantIDs, q)
args.Set("limit", fmt.Sprintf("%d", limit))
return sn.getValuesWithHits(ctx, "/internal/select/streams", args)
}
func (sn *storageNode) getStreamIDs(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit uint64) ([]logstorage.ValueWithHits, error) {
args := sn.getCommonArgs(StreamIDsProtocolVersion, tenantIDs, q)
args.Set("limit", fmt.Sprintf("%d", limit))
return sn.getValuesWithHits(ctx, "/internal/select/stream_ids", args)
}
func (sn *storageNode) getCommonArgs(version string, tenantIDs []logstorage.TenantID, q *logstorage.Query) url.Values {
args := url.Values{}
args.Set("version", version)
args.Set("tenant_ids", string(logstorage.MarshalTenantIDs(nil, tenantIDs)))
args.Set("query", q.String())
args.Set("timestamp", fmt.Sprintf("%d", q.GetTimestamp()))
args.Set("disable_compression", fmt.Sprintf("%v", sn.s.disableCompression))
return args
}
func (sn *storageNode) getValuesWithHits(ctx context.Context, path string, args url.Values) ([]logstorage.ValueWithHits, error) {
data, err := sn.executeRequestAt(ctx, path, args)
if err != nil {
return nil, err
}
return unmarshalValuesWithHits(data)
}
func (sn *storageNode) executeRequestAt(ctx context.Context, path string, args url.Values) ([]byte, error) {
reqURL := sn.getRequestURL(path, args)
req, err := http.NewRequestWithContext(ctx, "GET", reqURL, nil)
if err != nil {
logger.Panicf("BUG: unexpected error when creating a request: %s", err)
}
// send the request to the storage node
resp, err := sn.c.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
responseBody, err := io.ReadAll(resp.Body)
if err != nil {
responseBody = []byte(err.Error())
}
return nil, fmt.Errorf("unexpected status code for the request to %q: %d; want %d; response: %q", reqURL, resp.StatusCode, http.StatusOK, responseBody)
}
// read the response
var bb bytesutil.ByteBuffer
if _, err := bb.ReadFrom(resp.Body); err != nil {
return nil, fmt.Errorf("cannot read response from %q: %w", reqURL, err)
}
if sn.s.disableCompression {
return bb.B, nil
}
bbLen := len(bb.B)
bb.B, err = zstd.Decompress(bb.B, bb.B)
if err != nil {
return nil, err
}
return bb.B[bbLen:], nil
}
func (sn *storageNode) getRequestURL(path string, args url.Values) string {
return fmt.Sprintf("%s://%s%s?%s", sn.scheme, sn.addr, path, args.Encode())
}
// NewStorage returns new Storage for the given addrs and the given authCfgs.
//
// If disableCompression is set, then uncompressed responses are received from storage nodes.
//
// Call MustStop on the returned storage when it is no longer needed.
func NewStorage(addrs []string, authCfgs []*promauth.Config, isTLSs []bool, disableCompression bool) *Storage {
s := &Storage{
disableCompression: disableCompression,
}
sns := make([]*storageNode, len(addrs))
for i, addr := range addrs {
sns[i] = newStorageNode(s, addr, authCfgs[i], isTLSs[i])
}
s.sns = sns
return s
}
// MustStop stops the s.
func (s *Storage) MustStop() {
s.sns = nil
}
// RunQuery runs the given q and calls writeBlock for the returned data blocks
func (s *Storage) RunQuery(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, writeBlock logstorage.WriteDataBlockFunc) error {
nqr, err := logstorage.NewNetQueryRunner(ctx, tenantIDs, q, s.RunQuery, writeBlock)
if err != nil {
return err
}
search := func(stopCh <-chan struct{}, q *logstorage.Query, writeBlock logstorage.WriteDataBlockFunc) error {
return s.runQuery(stopCh, tenantIDs, q, writeBlock)
}
concurrency := q.GetConcurrency()
return nqr.Run(ctx, concurrency, search)
}
func (s *Storage) runQuery(stopCh <-chan struct{}, tenantIDs []logstorage.TenantID, q *logstorage.Query, writeBlock logstorage.WriteDataBlockFunc) error {
ctxWithCancel, cancel := contextutil.NewStopChanContext(stopCh)
defer cancel()
errs := make([]error, len(s.sns))
var wg sync.WaitGroup
for i := range s.sns {
wg.Add(1)
go func(nodeIdx int) {
defer wg.Done()
sn := s.sns[nodeIdx]
err := sn.runQuery(ctxWithCancel, tenantIDs, q, func(db *logstorage.DataBlock) {
writeBlock(uint(nodeIdx), db)
})
if err != nil {
// Cancel the remaining parallel queries
cancel()
}
errs[nodeIdx] = err
}(i)
}
wg.Wait()
return getFirstNonCancelError(errs)
}
// GetFieldNames executes q and returns field names seen in results.
func (s *Storage) GetFieldNames(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query) ([]logstorage.ValueWithHits, error) {
return s.getValuesWithHits(ctx, 0, false, func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error) {
return sn.getFieldNames(ctx, tenantIDs, q)
})
}
// GetFieldValues executes q and returns unique values for the fieldName seen in results.
//
// If limit > 0, then up to limit unique values are returned.
func (s *Storage) GetFieldValues(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, fieldName string, limit uint64) ([]logstorage.ValueWithHits, error) {
return s.getValuesWithHits(ctx, limit, true, func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error) {
return sn.getFieldValues(ctx, tenantIDs, q, fieldName, limit)
})
}
// GetStreamFieldNames executes q and returns stream field names seen in results.
func (s *Storage) GetStreamFieldNames(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query) ([]logstorage.ValueWithHits, error) {
return s.getValuesWithHits(ctx, 0, false, func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error) {
return sn.getStreamFieldNames(ctx, tenantIDs, q)
})
}
// GetStreamFieldValues executes q and returns stream field values for the given fieldName seen in results.
//
// If limit > 0, then up to limit unique stream field values are returned.
func (s *Storage) GetStreamFieldValues(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, fieldName string, limit uint64) ([]logstorage.ValueWithHits, error) {
return s.getValuesWithHits(ctx, limit, true, func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error) {
return sn.getStreamFieldValues(ctx, tenantIDs, q, fieldName, limit)
})
}
// GetStreams executes q and returns streams seen in query results.
//
// If limit > 0, then up to limit unique streams are returned.
func (s *Storage) GetStreams(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit uint64) ([]logstorage.ValueWithHits, error) {
return s.getValuesWithHits(ctx, limit, true, func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error) {
return sn.getStreams(ctx, tenantIDs, q, limit)
})
}
// GetStreamIDs executes q and returns streamIDs seen in query results.
//
// If limit > 0, then up to limit unique streamIDs are returned.
func (s *Storage) GetStreamIDs(ctx context.Context, tenantIDs []logstorage.TenantID, q *logstorage.Query, limit uint64) ([]logstorage.ValueWithHits, error) {
return s.getValuesWithHits(ctx, limit, true, func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error) {
return sn.getStreamIDs(ctx, tenantIDs, q, limit)
})
}
func (s *Storage) getValuesWithHits(ctx context.Context, limit uint64, resetHitsOnLimitExceeded bool,
callback func(ctx context.Context, sn *storageNode) ([]logstorage.ValueWithHits, error)) ([]logstorage.ValueWithHits, error) {
ctxWithCancel, cancel := context.WithCancel(ctx)
defer cancel()
results := make([][]logstorage.ValueWithHits, len(s.sns))
errs := make([]error, len(s.sns))
var wg sync.WaitGroup
for i := range s.sns {
wg.Add(1)
go func(nodeIdx int) {
defer wg.Done()
sn := s.sns[nodeIdx]
vhs, err := callback(ctxWithCancel, sn)
results[nodeIdx] = vhs
errs[nodeIdx] = err
if err != nil {
// Cancel the remaining parallel requests
cancel()
}
}(i)
}
wg.Wait()
if err := getFirstNonCancelError(errs); err != nil {
return nil, err
}
vhs := logstorage.MergeValuesWithHits(results, limit, resetHitsOnLimitExceeded)
return vhs, nil
}
func getFirstNonCancelError(errs []error) error {
for _, err := range errs {
if err != nil && !errors.Is(err, context.Canceled) {
return err
}
}
return nil
}
func unmarshalValuesWithHits(src []byte) ([]logstorage.ValueWithHits, error) {
var vhs []logstorage.ValueWithHits
for len(src) > 0 {
var vh logstorage.ValueWithHits
tail, err := vh.UnmarshalInplace(src)
if err != nil {
return nil, fmt.Errorf("cannot unmarshal ValueWithHits #%d: %w", len(vhs), err)
}
src = tail
// Clone vh.Value, since it points to src.
vh.Value = strings.Clone(vh.Value)
vhs = append(vhs, vh)
}
return vhs, nil
}

View File

@@ -1,3 +1,3 @@
See vmagent docs [here](https://docs.victoriametrics.com/vmagent/).
See vmagent docs [here](https://docs.victoriametrics.com/victoriametrics/vmagent/).
vmagent docs can be edited at [docs/vmagent.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmagent.md).
vmagent docs can be edited at [docs/vmagent.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmagent.md).

View File

@@ -105,7 +105,7 @@ func main() {
// Some workloads may need increased GOGC values. Then such values can be set via GOGC environment variable.
// It is recommended increasing GOGC if go_memstats_gc_cpu_fraction metric exposed at /metrics page
// exceeds 0.05 for extended periods of time.
cgroup.SetGOGC(30)
cgroup.SetGOGC(50)
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
@@ -443,8 +443,10 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/prometheus/api/v1/targets", "/api/v1/targets":
promscrapeAPIV1TargetsRequests.Inc()
w.Header().Set("Content-Type", "application/json")
// https://prometheus.io/docs/prometheus/latest/querying/api/#targets
state := r.FormValue("state")
promscrape.WriteAPIV1Targets(w, state)
scrapePool := r.FormValue("scrapePool")
promscrape.WriteAPIV1Targets(w, state, scrapePool)
return true
case "/prometheus/target_response", "/target_response":
promscrapeTargetResponseRequests.Inc()

View File

@@ -10,20 +10,22 @@ import (
"strconv"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/awsapi"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/metrics"
"github.com/golang/snappy"
)
var (
@@ -88,7 +90,8 @@ type client struct {
remoteWriteURL string
// Whether to use VictoriaMetrics remote write protocol for sending the data to remoteWriteURL
useVMProto bool
useVMProto atomic.Bool
canDowngradeVMProto atomic.Bool
fq *persistentqueue.FastQueue
hc *http.Client
@@ -126,8 +129,7 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
logger.Fatalf("cannot initialize AWS Config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tr := httputil.NewTransport(false)
tr.DialContext = netutil.NewStatDialFunc("vmagent_remotewrite")
tr := httputil.NewTransport(false, "vmagent_remotewrite")
tr.TLSHandshakeTimeout = tlsHandshakeTimeout.GetOptionalArg(argIdx)
tr.MaxConnsPerHost = 2 * concurrency
tr.MaxIdleConnsPerHost = 2 * concurrency
@@ -168,17 +170,11 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
logger.Fatalf("-remoteWrite.useVMProto and -remoteWrite.usePromProto cannot be set simultaneously for -remoteWrite.url=%s", sanitizedURL)
}
if !useVMProto && !usePromProto {
// Auto-detect whether the remote storage supports VictoriaMetrics remote write protocol.
doRequest := func(url string) (*http.Response, error) {
return c.doRequest(url, nil)
}
useVMProto = protoparserutil.HandleVMProtoClientHandshake(c.remoteWriteURL, doRequest)
if !useVMProto {
logger.Infof("the remote storage at %q doesn't support VictoriaMetrics remote write protocol. Switching to Prometheus remote write protocol. "+
"See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol", sanitizedURL)
}
// The VM protocol could be downgraded later at runtime if unsupported media type response status is received.
useVMProto = true
c.canDowngradeVMProto.Store(true)
}
c.useVMProto = useVMProto
c.useVMProto.Store(useVMProto)
return c
}
@@ -386,7 +382,7 @@ func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
h := req.Header
h.Set("User-Agent", "vmagent")
h.Set("Content-Type", "application/x-protobuf")
if c.useVMProto {
if encoding.IsZstd(body) {
h.Set("Content-Encoding", "zstd")
h.Set("X-VictoriaMetrics-Remote-Write-Version", "1")
} else {
@@ -422,7 +418,7 @@ again:
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
logger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
remoteWriteRetryLogger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
t := timerpool.Get(retryDuration)
select {
@@ -435,6 +431,7 @@ again:
c.retriesCount.Inc()
goto again
}
statusCode := resp.StatusCode
if statusCode/100 == 2 {
_ = resp.Body.Close()
@@ -443,24 +440,46 @@ again:
c.blocksSent.Inc()
return true
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode == 409 || statusCode == 400 {
body, err := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if err != nil {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; "+
"failed to read response body: %s",
len(block), c.sanitizedURL, statusCode, err)
} else {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; response body: %s",
len(block), c.sanitizedURL, statusCode, string(body))
}
// Just drop block on 409 and 400 status codes like Prometheus does.
if statusCode == 409 {
logBlockRejected(block, c.sanitizedURL, resp)
// Just drop block on 409 status code like Prometheus does.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
_ = resp.Body.Close()
c.packetsDropped.Inc()
return true
// - Remote Write v1 specification implicitly expects a `400 Bad Request` when the encoding is not supported.
// - Remote Write v2 specification explicitly specifies a `415 Unsupported Media Type` for unsupported encodings.
// - Real-world implementations of v1 use both 400 and 415 status codes.
// See more in research: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054
} else if statusCode == 415 || statusCode == 400 {
if c.canDowngradeVMProto.Swap(false) {
logger.Infof("received unsupported media type or bad request from remote storage at %q. Downgrading protocol from VictoriaMetrics to Prometheus remote write for all future requests. "+
"See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol", c.sanitizedURL)
c.useVMProto.Store(false)
}
if encoding.IsZstd(block) {
logger.Infof("received unsupported media type or bad request from remote storage at %q. Re-packing the block to Prometheus remote write and retrying."+
"See https://docs.victoriametrics.com/vmagent/#victoriametrics-remote-write-protocol", c.sanitizedURL)
block = mustRepackBlockFromZstdToSnappy(block)
c.retriesCount.Inc()
_ = resp.Body.Close()
goto again
}
// Just drop snappy blocks on 400 or 415 status codes like Prometheus does.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
logBlockRejected(block, c.sanitizedURL, resp)
_ = resp.Body.Close()
c.packetsDropped.Inc()
return true
}
// Unexpected status code returned
@@ -490,6 +509,7 @@ again:
}
var remoteWriteRejectedLogger = logger.WithThrottler("remoteWriteRejected", 5*time.Second)
var remoteWriteRetryLogger = logger.WithThrottler("remoteWriteRetry", 5*time.Second)
// getRetryDuration returns retry duration.
// retryAfterDuration has the highest priority.
@@ -512,6 +532,28 @@ func getRetryDuration(retryAfterDuration, retryDuration, maxRetryDuration time.D
return retryDuration
}
func mustRepackBlockFromZstdToSnappy(zstdBlock []byte) []byte {
plainBlock := make([]byte, 0, len(zstdBlock)*2)
plainBlock, err := zstd.Decompress(plainBlock, zstdBlock)
if err != nil {
logger.Panicf("FATAL: cannot re-pack block with size %d bytes from Zstd to Snappy: %s", len(zstdBlock), err)
}
return snappy.Encode(nil, plainBlock)
}
func logBlockRejected(block []byte, sanitizedURL string, resp *http.Response) {
body, err := io.ReadAll(resp.Body)
if err != nil {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; "+
"failed to read response body: %s",
len(block), sanitizedURL, resp.StatusCode, err)
} else {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; response body: %s",
len(block), sanitizedURL, resp.StatusCode, string(body))
}
}
// parseRetryAfterHeader parses `Retry-After` value retrieved from HTTP response header.
// retryAfterString should be in either HTTP-date or a number of seconds.
// It will return time.Duration(0) if `retryAfterString` does not follow RFC 7231.

View File

@@ -5,6 +5,9 @@ import (
"net/http"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/golang/snappy"
)
func TestCalculateRetryDuration(t *testing.T) {
@@ -97,3 +100,19 @@ func helper(d time.Duration) time.Duration {
return d + dv
}
func TestRepackBlockFromZstdToSnappy(t *testing.T) {
expectedPlainBlock := []byte(`foobar`)
zstdBlock := encoding.CompressZSTDLevel(nil, expectedPlainBlock, 1)
snappyBlock := mustRepackBlockFromZstdToSnappy(zstdBlock)
actualPlainBlock, err := snappy.Decode(nil, snappyBlock)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
if string(actualPlainBlock) != string(expectedPlainBlock) {
t.Fatalf("unexpected plain block; got %q; want %q", actualPlainBlock, expectedPlainBlock)
}
}

View File

@@ -16,6 +16,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/metrics"
"github.com/golang/snappy"
@@ -39,7 +40,7 @@ type pendingSeries struct {
periodicFlusherWG sync.WaitGroup
}
func newPendingSeries(fq *persistentqueue.FastQueue, isVMRemoteWrite bool, significantFigures, roundDigits int) *pendingSeries {
func newPendingSeries(fq *persistentqueue.FastQueue, isVMRemoteWrite *atomic.Bool, significantFigures, roundDigits int) *pendingSeries {
var ps pendingSeries
ps.wr.fq = fq
ps.wr.isVMRemoteWrite = isVMRemoteWrite
@@ -99,7 +100,7 @@ type writeRequest struct {
fq *persistentqueue.FastQueue
// Whether to encode the write request with VictoriaMetrics remote write protocol.
isVMRemoteWrite bool
isVMRemoteWrite *atomic.Bool
// How many significant figures must be left before sending the writeRequest to fq.
significantFigures int
@@ -137,7 +138,7 @@ func (wr *writeRequest) reset() {
// This is needed in order to properly save in-memory data to persistent queue on graceful shutdown.
func (wr *writeRequest) mustFlushOnStop() {
wr.wr.Timeseries = wr.tss
if !tryPushWriteRequest(&wr.wr, wr.mustWriteBlock, wr.isVMRemoteWrite) {
if !tryPushWriteRequest(&wr.wr, wr.mustWriteBlock, wr.isVMRemoteWrite.Load()) {
logger.Panicf("BUG: final flush must always return true")
}
wr.reset()
@@ -151,7 +152,7 @@ func (wr *writeRequest) mustWriteBlock(block []byte) bool {
func (wr *writeRequest) tryFlush() bool {
wr.wr.Timeseries = wr.tss
wr.lastFlushTime.Store(fasttime.UnixTimestamp())
if !tryPushWriteRequest(&wr.wr, wr.fq.TryWriteBlock, wr.isVMRemoteWrite) {
if !tryPushWriteRequest(&wr.wr, wr.fq.TryWriteBlock, wr.isVMRemoteWrite.Load()) {
return false
}
wr.reset()
@@ -197,28 +198,43 @@ func (wr *writeRequest) tryPush(src []prompbmarshal.TimeSeries) bool {
}
func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
labelsDst := wr.labels
labelsSrc := src.Labels
// Pre-allocate memory for labels.
labelsLen := len(wr.labels)
samplesDst := wr.samples
buf := wr.buf
for i := range src.Labels {
labelsDst = append(labelsDst, prompbmarshal.Label{})
dstLabel := &labelsDst[len(labelsDst)-1]
srcLabel := &src.Labels[i]
wr.labels = slicesutil.SetLength(wr.labels, labelsLen+len(labelsSrc))
labelsDst := wr.labels[labelsLen:]
buf = append(buf, srcLabel.Name...)
dstLabel.Name = bytesutil.ToUnsafeString(buf[len(buf)-len(srcLabel.Name):])
buf = append(buf, srcLabel.Value...)
dstLabel.Value = bytesutil.ToUnsafeString(buf[len(buf)-len(srcLabel.Value):])
// Pre-allocate memory for byte slice needed for storing label names and values.
neededBufLen := 0
for i := range labelsSrc {
label := &labelsSrc[i]
neededBufLen += len(label.Name) + len(label.Value)
}
dst.Labels = labelsDst[labelsLen:]
bufLen := len(wr.buf)
wr.buf = slicesutil.SetLength(wr.buf, bufLen+neededBufLen)
buf := wr.buf[:bufLen]
samplesDst = append(samplesDst, src.Samples...)
dst.Samples = samplesDst[len(samplesDst)-len(src.Samples):]
// Copy labels
for i := range labelsSrc {
dstLabel := &labelsDst[i]
srcLabel := &labelsSrc[i]
wr.samples = samplesDst
wr.labels = labelsDst
bufLen := len(buf)
buf = append(buf, srcLabel.Name...)
dstLabel.Name = bytesutil.ToUnsafeString(buf[bufLen:])
bufLen = len(buf)
buf = append(buf, srcLabel.Value...)
dstLabel.Value = bytesutil.ToUnsafeString(buf[bufLen:])
}
wr.buf = buf
dst.Labels = labelsDst
// Copy samples
samplesLen := len(wr.samples)
wr.samples = append(wr.samples, src.Samples...)
dst.Samples = wr.samples[samplesLen:]
}
// marshalConcurrency limits the maximum number of concurrent workers, which marshal and compress WriteRequest.

View File

@@ -15,6 +15,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/consistenthash"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
@@ -43,7 +44,7 @@ var (
"according to https://docs.victoriametrics.com/#how-to-import-time-series-data ."+
"See https://docs.victoriametrics.com/vmagent/#multitenancy for details")
shardByURL = flag.Bool("remoteWrite.shardByURL", false, "Whether to shard outgoing series across all the remote storage systems enumerated via -remoteWrite.url . "+
shardByURL = flag.Bool("remoteWrite.shardByURL", false, "Whether to shard outgoing series across all the remote storage systems enumerated via -remoteWrite.url. "+
"By default the data is replicated across all the -remoteWrite.url . See https://docs.victoriametrics.com/vmagent/#sharding-among-remote-storages . "+
"See also -remoteWrite.shardByURLReplicas")
shardByURLReplicas = flag.Int("remoteWrite.shardByURLReplicas", 1, "How many copies of data to make among remote storage systems enumerated via -remoteWrite.url "+
@@ -54,7 +55,6 @@ var (
shardByURLIgnoreLabels = flagutil.NewArrayString("remoteWrite.shardByURL.ignoreLabels", "Optional list of labels, which must be ignored when sharding outgoing samples "+
"among remote storage systems if -remoteWrite.shardByURL command-line flag is set. By default all the labels are used for sharding in order to gain "+
"even distribution of series over the specified -remoteWrite.url systems. See also -remoteWrite.shardByURL.labels")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory for storing pending data, which isn't sent to the configured -remoteWrite.url . "+
"See also -remoteWrite.maxDiskUsagePerURL and -remoteWrite.disableOnDiskQueue")
keepDanglingQueues = flag.Bool("remoteWrite.keepDanglingQueues", false, "Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. "+
@@ -96,7 +96,9 @@ var (
var (
// rwctxsGlobal contains statically populated entries when -remoteWrite.url is specified.
rwctxsGlobal []*remoteWriteCtx
rwctxsGlobal []*remoteWriteCtx
rwctxsGlobalIdx []int
rwctxConsistentHashGlobal *consistenthash.ConsistentHash
// ErrQueueFullHTTPRetry must be returned when TryPush() returns false.
ErrQueueFullHTTPRetry = &httpserver.ErrorWithStatusCode{
@@ -205,7 +207,7 @@ func Init() {
initStreamAggrConfigGlobal()
rwctxsGlobal = newRemoteWriteCtxs(*remoteWriteURLs)
initRemoteWriteCtxs(*remoteWriteURLs)
disableOnDiskQueues := []bool(*disableOnDiskQueue)
disableOnDiskQueueAny = slices.Contains(disableOnDiskQueues, true)
@@ -290,7 +292,7 @@ var (
relabelConfigTimestamp = metrics.NewCounter(`vmagent_relabel_config_last_reload_success_timestamp_seconds`)
)
func newRemoteWriteCtxs(urls []string) []*remoteWriteCtx {
func initRemoteWriteCtxs(urls []string) {
if len(urls) == 0 {
logger.Panicf("BUG: urls must be non-empty")
}
@@ -306,6 +308,7 @@ func newRemoteWriteCtxs(urls []string) []*remoteWriteCtx {
maxInmemoryBlocks = 2
}
rwctxs := make([]*remoteWriteCtx, len(urls))
rwctxIdx := make([]int, len(urls))
for i, remoteWriteURLRaw := range urls {
remoteWriteURL, err := url.Parse(remoteWriteURLRaw)
if err != nil {
@@ -316,8 +319,19 @@ func newRemoteWriteCtxs(urls []string) []*remoteWriteCtx {
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
}
rwctxs[i] = newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
rwctxIdx[i] = i
}
return rwctxs
if *shardByURL {
consistentHashNodes := make([]string, 0, len(urls))
for i, url := range urls {
consistentHashNodes = append(consistentHashNodes, fmt.Sprintf("%d:%s", i+1, url))
}
rwctxConsistentHashGlobal = consistenthash.NewConsistentHash(consistentHashNodes, 0)
}
rwctxsGlobal = rwctxs
rwctxsGlobalIdx = rwctxIdx
}
var (
@@ -501,6 +515,10 @@ func tryPush(at *auth.Token, wr *prompbmarshal.WriteRequest, forceDropSamplesOnF
return true
}
// getEligibleRemoteWriteCtxs checks whether writes to configured remote storage systems are blocked and
// returns only the unblocked rwctx.
//
// calculateHealthyRwctxIdx will rely on the order of rwctx to be in ascending order.
func getEligibleRemoteWriteCtxs(tss []prompbmarshal.TimeSeries, forceDropSamplesOnFailure bool) ([]*remoteWriteCtx, bool) {
if !disableOnDiskQueueAny {
return rwctxsGlobal, true
@@ -517,6 +535,12 @@ func getEligibleRemoteWriteCtxs(tss []prompbmarshal.TimeSeries, forceDropSamples
return nil, false
}
rowsCount := getRowsCount(tss)
if *shardByURL {
// Todo: When shardByURL is enabled, the following metrics won't be 100% accurate. Because vmagent don't know
// which rwctx should data be pushed to yet. Let's consider the hashing algorithm fair and will distribute
// data to all rwctxs evenly.
rowsCount = rowsCount / len(rwctxsGlobal)
}
rwctx.rowsDroppedOnPushFailure.Add(rowsCount)
}
}
@@ -528,6 +552,7 @@ func pushToRemoteStoragesTrackDropped(tss []prompbmarshal.TimeSeries) {
if len(rwctxs) == 0 {
return
}
if !tryPushBlockToRemoteStorages(rwctxs, tss, true) {
logger.Panicf("BUG: tryPushBlockToRemoteStorages() must return true when forceDropSamplesOnFailure=true")
}
@@ -578,42 +603,7 @@ func tryShardingBlockAmongRemoteStorages(rwctxs []*remoteWriteCtx, tssBlock []pr
defer putTSSShards(x)
shards := x.shards
tmpLabels := promutil.GetLabels()
for _, ts := range tssBlock {
hashLabels := ts.Labels
if len(shardByURLLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLLabelsMap[label.Name]; ok {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
} else if len(shardByURLIgnoreLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLIgnoreLabelsMap[label.Name]; !ok {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
}
h := getLabelsHash(hashLabels)
idx := h % uint64(len(shards))
i := 0
for {
shards[idx] = append(shards[idx], ts)
i++
if i >= replicas {
break
}
idx++
if idx >= uint64(len(shards)) {
idx = 0
}
}
}
promutil.PutLabels(tmpLabels)
shardAmountRemoteWriteCtx(tssBlock, shards, rwctxs, replicas)
// Push sharded samples to remote storage systems in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
@@ -636,6 +626,86 @@ func tryShardingBlockAmongRemoteStorages(rwctxs []*remoteWriteCtx, tssBlock []pr
return !anyPushFailed.Load()
}
// calculateHealthyRwctxIdx returns the index of healthyRwctxs in rwctxsGlobal.
// It relies on the order of rwctx in healthyRwctxs, which is appended by getEligibleRemoteWriteCtxs.
func calculateHealthyRwctxIdx(healthyRwctxs []*remoteWriteCtx) ([]int, []int) {
// fast path: all rwctxs are healthy.
if len(healthyRwctxs) == len(rwctxsGlobal) {
return rwctxsGlobalIdx, nil
}
unhealthyIdx := make([]int, 0, len(rwctxsGlobal))
healthyIdx := make([]int, 0, len(rwctxsGlobal))
var i int
for j := range rwctxsGlobal {
if i < len(healthyRwctxs) && rwctxsGlobal[j].idx == healthyRwctxs[i].idx {
healthyIdx = append(healthyIdx, j)
i++
} else {
unhealthyIdx = append(unhealthyIdx, j)
}
}
return healthyIdx, unhealthyIdx
}
// shardAmountRemoteWriteCtx distribute time series to shards by consistent hashing.
func shardAmountRemoteWriteCtx(tssBlock []prompbmarshal.TimeSeries, shards [][]prompbmarshal.TimeSeries, rwctxs []*remoteWriteCtx, replicas int) {
tmpLabels := promutil.GetLabels()
defer promutil.PutLabels(tmpLabels)
healthyIdx, unhealthyIdx := calculateHealthyRwctxIdx(rwctxs)
// shardsIdxMap is a map to find which the shard idx by rwctxs idx.
// rwctxConsistentHashGlobal will tell which the rwctxs idx a time series should be written to.
// And this time series should be appended to the shards by correct shard idx.
shardsIdxMap := make(map[int]int, len(healthyIdx))
for idx, rwctxsIdx := range healthyIdx {
shardsIdxMap[rwctxsIdx] = idx
}
for _, ts := range tssBlock {
hashLabels := ts.Labels
if len(shardByURLLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLLabelsMap[label.Name]; ok {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
} else if len(shardByURLIgnoreLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLIgnoreLabelsMap[label.Name]; !ok {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
}
h := getLabelsHash(hashLabels)
// Get the rwctxIdx through consistent hashing and then map it to the index in shards.
// The rwctxIdx is not always equal to the shardIdx, for example, when some rwctx are not available.
rwctxIdx := rwctxConsistentHashGlobal.GetNodeIdx(h, unhealthyIdx)
shardIdx := shardsIdxMap[rwctxIdx]
replicated := 0
for {
shards[shardIdx] = append(shards[shardIdx], ts)
replicated++
if replicated >= replicas {
break
}
shardIdx++
if shardIdx >= len(shards) {
shardIdx = 0
}
}
}
}
type tssShards struct {
shards [][]prompbmarshal.TimeSeries
}
@@ -807,7 +877,7 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks in
}
pss := make([]*pendingSeries, pssLen)
for i := range pss {
pss[i] = newPendingSeries(fq, c.useVMProto, sf, rd)
pss[i] = newPendingSeries(fq, &c.useVMProto, sf, rd)
}
rwctx := &remoteWriteCtx{

View File

@@ -4,13 +4,17 @@ import (
"fmt"
"math"
"reflect"
"strconv"
"sync/atomic"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/consistenthash"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/metrics"
)
@@ -68,7 +72,9 @@ func TestRemoteWriteContext_TryPush_ImmutableTimeseries(t *testing.T) {
allRelabelConfigs.Store(rcs)
pss := make([]*pendingSeries, 1)
pss[0] = newPendingSeries(nil, true, 0, 100)
isVMProto := &atomic.Bool{}
isVMProto.Store(true)
pss[0] = newPendingSeries(nil, isVMProto, 0, 100)
rwctx := &remoteWriteCtx{
idx: 0,
streamAggrKeepInput: keepInput,
@@ -171,3 +177,173 @@ metric{env="dev"} 15
metric{env="bar"} 25
`)
}
func TestShardAmountRemoteWriteCtx(t *testing.T) {
// 1. distribute 100000 series to n nodes.
// 2. remove the last node from healthy list.
// 3. distribute the same 10000 series to (n-1) node again.
// 4. check active time series change rate:
// change rate must < (3/total nodes). e.g. +30% if 10 you have 10 nodes.
f := func(remoteWriteCount int, healthyIdx []int, replicas int) {
t.Helper()
defer func() {
rwctxsGlobal = nil
rwctxsGlobalIdx = nil
rwctxConsistentHashGlobal = nil
}()
rwctxsGlobal = make([]*remoteWriteCtx, remoteWriteCount)
rwctxsGlobalIdx = make([]int, remoteWriteCount)
rwctxs := make([]*remoteWriteCtx, 0, len(healthyIdx))
for i := range remoteWriteCount {
rwCtx := &remoteWriteCtx{
idx: i,
}
rwctxsGlobalIdx[i] = i
if i >= len(healthyIdx) {
rwctxsGlobal[i] = rwCtx
continue
}
hIdx := healthyIdx[i]
if hIdx != i {
rwctxs = append(rwctxs, &remoteWriteCtx{
idx: hIdx,
})
} else {
rwctxs = append(rwctxs, rwCtx)
}
rwctxsGlobal[i] = rwCtx
}
seriesCount := 100000
// build 1000000 series
tssBlock := make([]prompbmarshal.TimeSeries, 0, seriesCount)
for i := 0; i < seriesCount; i++ {
tssBlock = append(tssBlock, prompbmarshal.TimeSeries{
Labels: []prompbmarshal.Label{
{
Name: "label",
Value: strconv.Itoa(i),
},
},
Samples: []prompbmarshal.Sample{
{
Timestamp: 0,
Value: 0,
},
},
})
}
// build consistent hash for x remote write context
// build active time series set
nodes := make([]string, 0, remoteWriteCount)
activeTimeSeriesByNodes := make([]map[string]struct{}, remoteWriteCount)
for i := 0; i < remoteWriteCount; i++ {
nodes = append(nodes, fmt.Sprintf("node%d", i))
activeTimeSeriesByNodes[i] = make(map[string]struct{})
}
rwctxConsistentHashGlobal = consistenthash.NewConsistentHash(nodes, 0)
// create shards
x := getTSSShards(len(rwctxs))
shards := x.shards
// execute
shardAmountRemoteWriteCtx(tssBlock, shards, rwctxs, replicas)
for i, nodeIdx := range healthyIdx {
for _, ts := range shards[i] {
// add it to node[nodeIdx]'s active time series
activeTimeSeriesByNodes[nodeIdx][prompbmarshal.LabelsToString(ts.Labels)] = struct{}{}
}
}
totalActiveTimeSeries := 0
for _, activeTimeSeries := range activeTimeSeriesByNodes {
totalActiveTimeSeries += len(activeTimeSeries)
}
avgActiveTimeSeries1 := totalActiveTimeSeries / remoteWriteCount
putTSSShards(x)
// removed last node
rwctxs = rwctxs[:len(rwctxs)-1]
healthyIdx = healthyIdx[:len(healthyIdx)-1]
x = getTSSShards(len(rwctxs))
shards = x.shards
// execute
shardAmountRemoteWriteCtx(tssBlock, shards, rwctxs, replicas)
for i, nodeIdx := range healthyIdx {
for _, ts := range shards[i] {
// add it to node[nodeIdx]'s active time series
activeTimeSeriesByNodes[nodeIdx][prompbmarshal.LabelsToString(ts.Labels)] = struct{}{}
}
}
totalActiveTimeSeries = 0
for _, activeTimeSeries := range activeTimeSeriesByNodes {
totalActiveTimeSeries += len(activeTimeSeries)
}
avgActiveTimeSeries2 := totalActiveTimeSeries / remoteWriteCount
changed := math.Abs(float64(avgActiveTimeSeries2-avgActiveTimeSeries1) / float64(avgActiveTimeSeries1))
threshold := 3 / float64(remoteWriteCount)
if changed >= threshold {
t.Fatalf("average active time series before: %d, after: %d, changed: %.2f. threshold: %.2f", avgActiveTimeSeries1, avgActiveTimeSeries2, changed, threshold)
}
}
f(5, []int{0, 1, 2, 3, 4}, 1)
f(5, []int{0, 1, 2, 3, 4}, 2)
f(10, []int{0, 1, 2, 3, 4, 5, 6, 7, 9}, 1)
f(10, []int{0, 1, 2, 3, 4, 5, 6, 7, 9}, 3)
}
func TestCalculateHealthyRwctxIdx(t *testing.T) {
f := func(total int, healthyIdx []int, unhealthyIdx []int) {
t.Helper()
healthyMap := make(map[int]bool)
for _, idx := range healthyIdx {
healthyMap[idx] = true
}
rwctxsGlobal = make([]*remoteWriteCtx, total)
rwctxsGlobalIdx = make([]int, total)
rwctxs := make([]*remoteWriteCtx, 0, len(healthyIdx))
for i := range rwctxsGlobal {
rwctx := &remoteWriteCtx{idx: i}
rwctxsGlobal[i] = rwctx
if healthyMap[i] {
rwctxs = append(rwctxs, rwctx)
}
rwctxsGlobalIdx[i] = i
}
gotHealthyIdx, gotUnhealthyIdx := calculateHealthyRwctxIdx(rwctxs)
if !reflect.DeepEqual(healthyIdx, gotHealthyIdx) {
t.Errorf("calculateHealthyRwctxIdx want healthyIdx = %v, got %v", healthyIdx, gotHealthyIdx)
}
if !reflect.DeepEqual(unhealthyIdx, gotUnhealthyIdx) {
t.Errorf("calculateHealthyRwctxIdx want unhealthyIdx = %v, got %v", unhealthyIdx, gotUnhealthyIdx)
}
}
f(5, []int{0, 1, 2, 3, 4}, nil)
f(5, []int{0, 1, 2, 4}, []int{3})
f(5, []int{2, 4}, []int{0, 1, 3})
f(5, []int{0, 2, 4}, []int{1, 3})
f(5, []int{}, []int{0, 1, 2, 3, 4})
f(5, []int{4}, []int{0, 1, 2, 3})
f(1, []int{0}, nil)
f(1, []int{}, []int{0})
}

View File

@@ -1,3 +1,3 @@
See vmalert-tool docs [here](https://docs.victoriametrics.com/vmalert-tool.html).
See vmalert-tool docs [here](https://docs.victoriametrics.com/victoriametrics/vmalert-tool/).
vmalert-tool docs can be edited at [docs/vmalert-tool.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmalert-tool.md).
vmalert-tool docs can be edited at [docs/vmalert-tool.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmalert-tool.md).

View File

@@ -0,0 +1,8 @@
ARG base_image=non-existing
FROM $base_image
EXPOSE 8880
ENTRYPOINT ["/vmalert-tool-prod"]
ARG src_binary=non-existing
COPY $src_binary ./vmalert-tool-prod

View File

@@ -1,3 +1,3 @@
See vmalert docs [here](https://docs.victoriametrics.com/vmalert/).
See vmalert docs [here](https://docs.victoriametrics.com/victoriametrics/vmalert/).
vmalert docs can be edited at [docs/vmalert.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmalert.md).
vmalert docs can be edited at [docs/vmalert.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmalert.md).

View File

@@ -2,7 +2,6 @@ package config
import (
"bytes"
"crypto/md5"
"flag"
"fmt"
"hash/fnv"
@@ -11,11 +10,12 @@ import (
"sort"
"strings"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config/log"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/vmalertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
"gopkg.in/yaml.v2"
)
var (
@@ -67,7 +67,7 @@ func (g *Group) UnmarshalYAML(unmarshal func(any) error) error {
if g.Type.Get() == "" {
g.Type = NewRawType(*defaultRuleType)
}
h := md5.New()
h := fnv.New64a()
h.Write(b)
g.Checksum = fmt.Sprintf("%x", h.Sum(nil))
return nil

View File

@@ -35,6 +35,8 @@ type promResponse struct {
Stats struct {
SeriesFetched *string `json:"seriesFetched,omitempty"`
} `json:"stats,omitempty"`
// IsPartial supported by VictoriaMetrics
IsPartial *bool `json:"isPartial,omitempty"`
}
// see https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries
@@ -209,7 +211,7 @@ func parsePrometheusResponse(req *http.Request, resp *http.Response) (res Result
if err != nil {
return res, err
}
res = Result{Data: ms}
res = Result{Data: ms, IsPartial: r.IsPartial}
if r.Stats.SeriesFetched != nil {
intV, err := strconv.Atoi(*r.Stats.SeriesFetched)
if err != nil {

View File

@@ -72,12 +72,14 @@ func TestVMInstantQuery(t *testing.T) {
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
case 7:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]},"stats":{"seriesFetched": "42"}}`))
case 8:
w.Write([]byte(`{"status":"success", "isPartial":true, "data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
}
})
mux.HandleFunc("/render", func(w http.ResponseWriter, _ *http.Request) {
c++
switch c {
case 8:
case 9:
w.Write([]byte(`[{"target":"constantLine(10)","tags":{"name":"constantLine(10)"},"datapoints":[[10,1611758343],[10,1611758373],[10,1611758403]]}]`))
}
})
@@ -100,9 +102,9 @@ func TestVMInstantQuery(t *testing.T) {
t.Fatalf("failed to parse 'time' query param %q: %s", timeParam, err)
}
switch c {
case 9:
w.Write([]byte("[]"))
case 10:
w.Write([]byte("[]"))
case 11:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"total","foo":"bar"},"value":[1583786142,"13763"]},{"metric":{"__name__":"total","foo":"baz"},"value":[1583786140,"2000"]}]}}`))
}
})
@@ -203,10 +205,18 @@ func TestVMInstantQuery(t *testing.T) {
*res.SeriesFetched)
}
res, _, err = pq.Query(ctx, vmQuery, ts) // 8
if err != nil {
t.Fatalf("unexpected %s", err)
}
if res.IsPartial != nil && !*res.IsPartial {
t.Fatalf("unexpected metric isPartial want %+v", true)
}
// test graphite
gq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourceGraphite)})
res, _, err = gq.Query(ctx, queryRender, ts) // 8 - graphite
res, _, err = gq.Query(ctx, queryRender, ts) // 9 - graphite
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -226,9 +236,9 @@ func TestVMInstantQuery(t *testing.T) {
vlogs := datasourceVLogs
pq = s.BuildWithParams(QuerierParams{DataSourceType: string(vlogs), EvaluationInterval: 15 * time.Second})
expErr(vlogsQuery, "error parsing response") // 9
expErr(vlogsQuery, "error parsing response") // 10
res, _, err = pq.Query(ctx, vlogsQuery, ts) // 10
res, _, err = pq.Query(ctx, vlogsQuery, ts) // 11
if err != nil {
t.Fatalf("unexpected %s", err)
}

View File

@@ -34,6 +34,9 @@ type Result struct {
// If nil, then this feature is not supported by the datasource.
// SeriesFetched is supported by VictoriaMetrics since v1.90.
SeriesFetched *int
// IsPartial is used by VictoriaMetrics to indicate
// whether response data is partial.
IsPartial *bool
}
// QuerierBuilder builds Querier with given params.

View File

@@ -10,8 +10,9 @@ import (
// FakeQuerier is a mock querier that return predefined results and error message
type FakeQuerier struct {
sync.Mutex
metrics []Metric
err error
metrics []Metric
err error
isPartial *bool
}
// SetErr sets query error message
@@ -21,11 +22,19 @@ func (fq *FakeQuerier) SetErr(err error) {
fq.Unlock()
}
// SetPartialResponse marks query response as partial
func (fq *FakeQuerier) SetPartialResponse(partial bool) {
fq.Lock()
fq.isPartial = &partial
fq.Unlock()
}
// Reset reset querier's error message and results
func (fq *FakeQuerier) Reset() {
fq.Lock()
fq.err = nil
fq.metrics = fq.metrics[:0]
fq.isPartial = nil
fq.Unlock()
}
@@ -57,7 +66,7 @@ func (fq *FakeQuerier) Query(_ context.Context, _ string, _ time.Time) (Result,
cp := make([]Metric, len(fq.metrics))
copy(cp, fq.metrics)
req, _ := http.NewRequest(http.MethodPost, "foo.com", nil)
return Result{Data: cp}, req, nil
return Result{Data: cp, IsPartial: fq.isPartial}, req, nil
}
// FakeQuerierWithRegistry can store different results for different query expr

View File

@@ -11,7 +11,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/vmalertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
@@ -83,11 +82,10 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
if err := httputil.CheckURL(*addr); err != nil {
return nil, fmt.Errorf("invalid -datasource.url: %w", err)
}
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify, "vmalert_datasource")
if err != nil {
return nil, fmt.Errorf("failed to create transport for -datasource.url=%q: %w", *addr, err)
}
tr.DialContext = netutil.NewStatDialFunc("vmalert_datasource")
tr.DisableKeepAlives = *disableKeepAlive
tr.MaxIdleConnsPerHost = *maxIdleConnections
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {

View File

@@ -161,7 +161,7 @@ func NewAlertManager(alertManagerURL string, fn AlertURLGenerator, authCfg proma
if authCfg.TLSConfig != nil {
tls = authCfg.TLSConfig
}
tr, err := promauth.NewTLSTransport(tls.CertFile, tls.KeyFile, tls.CAFile, tls.ServerName, tls.InsecureSkipVerify)
tr, err := promauth.NewTLSTransport(tls.CertFile, tls.KeyFile, tls.CAFile, tls.ServerName, tls.InsecureSkipVerify, "vmalert_notifier")
if err != nil {
return nil, fmt.Errorf("failed to create transport for alertmanager URL=%q: %w", alertManagerURL, err)
@@ -198,8 +198,10 @@ func NewAlertManager(alertManagerURL string, fn AlertURLGenerator, authCfg proma
argFunc: fn,
authCfg: aCfg,
relabelConfigs: relabelCfg,
client: &http.Client{Transport: tr},
timeout: timeout,
metrics: newNotifierMetrics(alertManagerURL),
client: &http.Client{
Transport: tr,
},
timeout: timeout,
metrics: newNotifierMetrics(alertManagerURL),
}, nil
}

View File

@@ -1,8 +1,8 @@
package notifier
import (
"crypto/md5"
"fmt"
"hash/fnv"
"net/url"
"os"
"path"
@@ -99,7 +99,7 @@ func (cfg *Config) UnmarshalYAML(unmarshal func(any) error) error {
if err != nil {
return fmt.Errorf("failed to marshal configuration for checksum: %w", err)
}
h := md5.New()
h := fnv.New64a()
h.Write(b)
cfg.Checksum = fmt.Sprintf("%x", h.Sum(nil))
return nil

View File

@@ -10,7 +10,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/vmalertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
@@ -70,12 +69,11 @@ func Init() (datasource.QuerierBuilder, error) {
if err := httputil.CheckURL(*addr); err != nil {
return nil, fmt.Errorf("invalid -remoteRead.url: %w", err)
}
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify, "vmalert_remoteread")
if err != nil {
return nil, fmt.Errorf("failed to create transport for -remoteRead.url=%q: %w", *addr, err)
}
tr.IdleConnTimeout = *idleConnectionTimeout
tr.DialContext = netutil.NewStatDialFunc("vmalert_remoteread")
endpointParams, err := flagutil.ParseJSONMap(*oauth2EndpointParams)
if err != nil {

View File

@@ -93,7 +93,7 @@ func NewClient(ctx context.Context, cfg Config) (*Client, error) {
cfg.FlushInterval = defaultFlushInterval
}
if cfg.Transport == nil {
cfg.Transport = httputil.NewTransport(false)
cfg.Transport = httputil.NewTransport(false, "vmalert_remotewrite")
}
cc := defaultConcurrency
if cfg.Concurrency > 0 {
@@ -146,8 +146,13 @@ func (c *Client) Close() error {
return fmt.Errorf("client is already closed")
}
close(c.input)
start := time.Now()
logger.Infof("shutting down remote write client: flushing remained series")
close(c.doneCh)
c.wg.Wait()
logger.Infof("shutting down remote write client: finished in %v", time.Since(start))
return nil
}
@@ -156,21 +161,16 @@ func (c *Client) run(ctx context.Context) {
wr := &prompbmarshal.WriteRequest{}
shutdown := func() {
lastCtx, cancel := context.WithTimeout(context.Background(), defaultWriteTimeout)
logger.Infof("shutting down remote write client and flushing remained series")
shutdownFlushCnt := 0
for ts := range c.input {
wr.Timeseries = append(wr.Timeseries, ts)
if len(wr.Timeseries) >= c.maxBatchSize {
shutdownFlushCnt += len(wr.Timeseries)
c.flush(lastCtx, wr)
}
}
// flush the last batch. `flush` will re-check and avoid flushing empty batch.
shutdownFlushCnt += len(wr.Timeseries)
c.flush(lastCtx, wr)
logger.Infof("shutting down remote write client flushed %d series", shutdownFlushCnt)
cancel()
}
c.wg.Add(1)

View File

@@ -33,7 +33,7 @@ func NewDebugClient() (*DebugClient, error) {
if err := httputil.CheckURL(*addr); err != nil {
return nil, fmt.Errorf("invalid -remoteWrite.url: %w", err)
}
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify, "vmalert_remotewrite_debug")
if err != nil {
return nil, fmt.Errorf("failed to create transport for -remoteWrite.url=%q: %w", *addr, err)
}

View File

@@ -9,7 +9,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/vmalertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
@@ -73,12 +72,11 @@ func Init(ctx context.Context) (*Client, error) {
if err := httputil.CheckURL(*addr); err != nil {
return nil, fmt.Errorf("invalid -remoteWrite.url: %w", err)
}
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := promauth.NewTLSTransport(*tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify, "vmalert_remotewrite")
if err != nil {
return nil, fmt.Errorf("failed to create transport for -remoteWrite.url=%q: %w", *addr, err)
}
tr.IdleConnTimeout = *idleConnectionTimeout
tr.DialContext = netutil.NewStatDialFunc("vmalert_remotewrite")
endpointParams, err := flagutil.ParseJSONMap(*oauth2EndpointParams)
if err != nil {

View File

@@ -207,8 +207,8 @@ func (ar *AlertingRule) logDebugf(at time.Time, a *notifier.Alert, format string
if !ar.Debug {
return
}
prefix := fmt.Sprintf("DEBUG rule %q:%q (%d) at %v: ",
ar.GroupName, ar.Name, ar.RuleID, at.Format(time.RFC3339))
prefix := fmt.Sprintf("DEBUG alerting rule %q, %q:%q (%d) at %v: ",
ar.File, ar.GroupName, ar.Name, ar.RuleID, at.Format(time.RFC3339))
if a != nil {
labelKeys := make([]string, len(a.Labels))
@@ -408,8 +408,8 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
if err != nil {
return nil, fmt.Errorf("failed to execute query %q: %w", ar.Expr, err)
}
ar.logDebugf(ts, nil, "query returned %d samples (elapsed: %s)", curState.Samples, curState.Duration)
ar.logDebugf(ts, nil, "query returned %d samples (elapsed: %s, isPartial: %t)", curState.Samples, curState.Duration, isPartialResponse(res))
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res.Data, err

View File

@@ -22,6 +22,32 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
)
func TestNewAlertingRule(t *testing.T) {
f := func(group *Group, rule config.Rule, expectRule *AlertingRule) {
t.Helper()
r := NewAlertingRule(&datasource.FakeQuerier{}, group, rule)
if err := CompareRules(t, expectRule, r); err != nil {
t.Fatalf("unexpected rule mismatch: %s", err)
}
}
f(&Group{Name: "foo"},
config.Rule{
Alert: "health",
Expr: "up == 0",
Labels: map[string]string{
"foo": "bar",
},
}, &AlertingRule{
Name: "health",
Expr: "up == 0",
Labels: map[string]string{
"foo": "bar",
},
})
}
func TestAlertingRuleToTimeSeries(t *testing.T) {
timestamp := time.Now()
@@ -1322,3 +1348,23 @@ func TestAlertingRule_ToLabels(t *testing.T) {
t.Fatalf("processed labels mismatch, got: %v, want: %v", ls.processed, expectedProcessedLabels)
}
}
func TestAlertingRuleExec_Partial(t *testing.T) {
fq := &datasource.FakeQuerier{}
fq.Add(metricWithValueAndLabels(t, 10, "__name__", "bar"))
fq.SetPartialResponse(true)
ar := newTestAlertingRule("test", 0)
ar.Debug = true
ar.Labels = map[string]string{"job": "test"}
ar.q = fq
ar.For = time.Second
fq.Add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "bar"))
ts := time.Now()
_, err := ar.exec(context.TODO(), ts, 0)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
}

View File

@@ -471,6 +471,7 @@ func (g *Group) DeepCopy() *Group {
newG := Group{}
_ = json.Unmarshal(data, &newG)
newG.Rules = g.Rules
newG.id = g.id
return &newG
}

View File

@@ -36,144 +36,178 @@ func TestMain(m *testing.M) {
}
func TestUpdateWith(t *testing.T) {
f := func(currentRules, newRules []config.Rule) {
f := func(oldG, newG config.Group) {
t.Helper()
ns := metrics.NewSet()
g := &Group{
Name: "test",
metrics: &groupMetrics{set: ns},
}
qb := &datasource.FakeQuerier{}
for _, r := range currentRules {
r.ID = config.HashRule(r)
g.Rules = append(g.Rules, g.newRule(qb, r))
for i := range oldG.Rules {
oldG.Rules[i].ID = config.HashRule(oldG.Rules[i])
}
for i := range newG.Rules {
newG.Rules[i].ID = config.HashRule(newG.Rules[i])
}
ng := &Group{
Name: "test",
}
for _, r := range newRules {
r.ID = config.HashRule(r)
ng.Rules = append(ng.Rules, ng.newRule(qb, r))
}
g := NewGroup(oldG, qb, 0, nil)
g.metrics = &groupMetrics{set: ns}
expect := NewGroup(newG, qb, 0, nil)
err := g.updateWith(ng)
err := g.updateWith(expect)
if err != nil {
t.Fatalf("cannot update rule: %s", err)
}
if len(g.Rules) != len(newRules) {
t.Fatalf("expected to have %d rules; got: %d", len(g.Rules), len(newRules))
if len(g.Rules) != len(expect.Rules) {
t.Fatalf("expected to have %d rules; got: %d", len(expect.Rules), len(g.Rules))
}
sort.Slice(g.Rules, func(i, j int) bool {
return g.Rules[i].ID() < g.Rules[j].ID()
})
sort.Slice(ng.Rules, func(i, j int) bool {
return ng.Rules[i].ID() < ng.Rules[j].ID()
sort.Slice(expect.Rules, func(i, j int) bool {
return expect.Rules[i].ID() < expect.Rules[j].ID()
})
for i, r := range g.Rules {
got, want := r, ng.Rules[i]
got, want := r, expect.Rules[i]
if got.ID() != want.ID() {
t.Fatalf("expected to have rule %q; got %q", want, got)
}
if err := CompareRules(t, got, want); err != nil {
t.Fatalf("comparison error: %s", err)
t.Fatalf("comparison1 error: %s", err)
}
}
}
// new rule
f(nil, []config.Rule{
{Alert: "bar"},
})
f(config.Group{}, config.Group{
Rules: []config.Rule{
{Alert: "bar"},
}})
// update alerting rule
f([]config.Rule{
{
Alert: "foo",
Expr: "up > 0",
For: promutil.NewDuration(time.Second),
f(config.Group{
Rules: []config.Rule{
{
Alert: "foo",
Expr: "up > 0",
For: promutil.NewDuration(time.Second),
Labels: map[string]string{
"bar": "baz",
},
Annotations: map[string]string{
"summary": "{{ $value|humanize }}",
"description": "{{$labels}}",
},
},
{
Alert: "bar",
Expr: "up > 0",
For: promutil.NewDuration(time.Second),
Labels: map[string]string{
"bar": "baz",
},
},
}}, config.Group{
Rules: []config.Rule{
{
Alert: "foo",
Expr: "up > 10",
For: promutil.NewDuration(time.Second),
Labels: map[string]string{
"baz": "bar",
},
Annotations: map[string]string{
"summary": "none",
},
},
{
Alert: "bar",
Expr: "up > 0",
For: promutil.NewDuration(2 * time.Second),
KeepFiringFor: promutil.NewDuration(time.Minute),
Labels: map[string]string{
"bar": "baz",
},
},
}})
// update recording rule
f(config.Group{
Rules: []config.Rule{{
Record: "foo",
Expr: "max(up)",
Labels: map[string]string{
"bar": "baz",
},
Annotations: map[string]string{
"summary": "{{ $value|humanize }}",
"description": "{{$labels}}",
},
},
{
Alert: "bar",
Expr: "up > 0",
For: promutil.NewDuration(time.Second),
Labels: map[string]string{
"bar": "baz",
},
},
}, []config.Rule{
{
Alert: "foo",
Expr: "up > 10",
For: promutil.NewDuration(time.Second),
}}}, config.Group{
Rules: []config.Rule{{
Record: "foo",
Expr: "min(up)",
Debug: true,
Labels: map[string]string{
"baz": "bar",
},
Annotations: map[string]string{
"summary": "none",
},
},
{
Alert: "bar",
Expr: "up > 0",
For: promutil.NewDuration(2 * time.Second),
KeepFiringFor: promutil.NewDuration(time.Minute),
Labels: map[string]string{
"bar": "baz",
},
},
})
}}})
// update recording rule
f([]config.Rule{{
Record: "foo",
Expr: "max(up)",
Labels: map[string]string{
"bar": "baz",
},
}}, []config.Rule{{
Record: "foo",
Expr: "min(up)",
Labels: map[string]string{
"baz": "bar",
},
}})
// update debug
f(config.Group{
Rules: []config.Rule{
{
Record: "foo",
Expr: "max(up)",
},
{
Alert: "foo",
Expr: "up > 0",
Debug: true,
For: promutil.NewDuration(time.Second),
},
}}, config.Group{
Rules: []config.Rule{
{
Record: "foo",
Expr: "max(up)",
Debug: true,
},
{
Alert: "foo",
Expr: "up > 0",
For: promutil.NewDuration(time.Second),
},
}})
// empty rule
f([]config.Rule{{Alert: "foo"}, {Record: "bar"}}, nil)
f(config.Group{
Rules: []config.Rule{{Alert: "foo"}, {Record: "bar"}}}, config.Group{})
// multiple rules
f([]config.Rule{
{Alert: "bar"},
{Alert: "baz"},
{Alert: "foo"},
}, []config.Rule{
{Alert: "baz"},
{Record: "foo"},
})
f(config.Group{
Rules: []config.Rule{
{Alert: "bar"},
{Alert: "baz"},
{Alert: "foo"},
}}, config.Group{
Rules: []config.Rule{
{Alert: "baz"},
{Record: "foo"},
}})
// replace rule
f([]config.Rule{{Alert: "foo1"}}, []config.Rule{{Alert: "foo2"}})
f(config.Group{
Rules: []config.Rule{{Alert: "foo1"}}}, config.Group{
Rules: []config.Rule{{Alert: "foo2"}}})
// replace multiple rules
f([]config.Rule{
{Alert: "foo1"},
{Record: "foo2"},
{Alert: "foo3"},
}, []config.Rule{
{Alert: "foo3"},
{Alert: "foo4"},
{Record: "foo5"},
})
f(config.Group{
Rules: []config.Rule{
{Alert: "foo1"},
{Record: "foo2"},
{Alert: "foo3"},
}}, config.Group{
Rules: []config.Rule{
{Alert: "foo3"},
{Alert: "foo4"},
{Record: "foo5"},
}})
}
func TestUpdateDuringRandSleep(t *testing.T) {

View File

@@ -30,6 +30,7 @@ type RecordingRule struct {
GroupID uint64
GroupName string
File string
Debug bool
q datasource.Querier
@@ -91,12 +92,14 @@ func NewRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
GroupID: group.GetID(),
GroupName: group.Name,
File: group.File,
Debug: cfg.Debug,
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: group.Type.String(),
ApplyIntervalAsTimeFilter: setIntervalAsTimeFilter(group.Type.String(), cfg.Expr),
EvaluationInterval: group.Interval,
QueryParams: group.Params,
Headers: group.Headers,
Debug: cfg.Debug,
}),
}
@@ -169,6 +172,8 @@ func (rr *RecordingRule) exec(ctx context.Context, ts time.Time, limit int) ([]p
return nil, curState.Err
}
rr.logDebugf(ts, "query returned %d samples (elapsed: %s, isPartial: %t)", curState.Samples, curState.Duration, isPartialResponse(res))
qMetrics := res.Data
numSeries := len(qMetrics)
if limit > 0 && numSeries > limit {
@@ -202,6 +207,17 @@ func (rr *RecordingRule) exec(ctx context.Context, ts time.Time, limit int) ([]p
return tss, nil
}
func (rr *RecordingRule) logDebugf(at time.Time, format string, args ...any) {
if !rr.Debug {
return
}
prefix := fmt.Sprintf("DEBUG recording rule %q, %q:%q (%d) at %v: ",
rr.File, rr.GroupName, rr.Name, rr.RuleID, at.Format(time.RFC3339))
msg := fmt.Sprintf(format, args...)
logger.Infof("%s", prefix+msg)
}
func stringToLabels(s string) []prompbmarshal.Label {
labels := strings.Split(s, ",")
rLabels := make([]prompbmarshal.Label, 0, len(labels))
@@ -262,6 +278,7 @@ func (rr *RecordingRule) updateWith(r Rule) error {
rr.Expr = nr.Expr
rr.Labels = nr.Labels
rr.q = nr.q
rr.Debug = nr.Debug
return nil
}

View File

@@ -9,11 +9,34 @@ import (
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
func TestNewRecordingRule(t *testing.T) {
f := func(group *Group, rule config.Rule, expectRule *AlertingRule) {
t.Helper()
r := NewAlertingRule(&datasource.FakeQuerier{}, group, rule)
if err := CompareRules(t, expectRule, r); err != nil {
t.Fatalf("unexpected rule mismatch: %s", err)
}
}
f(&Group{Name: "foo"},
config.Rule{
Alert: "health",
Expr: "up == 0",
Labels: map[string]string{},
}, &AlertingRule{
Name: "health",
Expr: "up == 0",
Labels: map[string]string{},
})
}
func TestRecordingRule_Exec(t *testing.T) {
ts, _ := time.Parse(time.RFC3339, "2024-10-29T00:00:00Z")
const defaultStep = 5 * time.Millisecond
@@ -307,7 +330,8 @@ func TestRecordingRuleLimit_Failure(t *testing.T) {
fq := &datasource.FakeQuerier{}
fq.Add(testMetrics...)
rule := &RecordingRule{Name: "job:foo",
rule := &RecordingRule{
Name: "job:foo",
state: &ruleState{entries: make([]StateEntry, 10)},
Labels: map[string]string{
"source": "test_limit",
@@ -342,7 +366,8 @@ func TestRecordingRuleLimit_Success(t *testing.T) {
fq := &datasource.FakeQuerier{}
fq.Add(testMetrics...)
rule := &RecordingRule{Name: "job:foo",
rule := &RecordingRule{
Name: "job:foo",
state: &ruleState{entries: make([]StateEntry, 10)},
Labels: map[string]string{
"source": "test_limit",
@@ -421,3 +446,36 @@ func TestSetIntervalAsTimeFilter(t *testing.T) {
f(`error AND _time:5m | count()`, "vlogs", false)
f(`* | error AND _time:5m | count()`, "vlogs", false)
}
func TestRecordingRuleExec_Partial(t *testing.T) {
ts, _ := time.Parse(time.RFC3339, "2024-10-29T00:00:00Z")
fq := &datasource.FakeQuerier{}
m := metricWithValueAndLabels(t, 10, "__name__", "bar")
fq.Add(m)
fq.SetPartialResponse(true)
rule := &RecordingRule{
GroupName: "Bar",
Name: "foo",
state: &ruleState{
entries: make([]StateEntry, 10),
},
}
rule.Debug = true
rule.q = fq
got, err := rule.exec(context.TODO(), ts, 0)
want := []prompbmarshal.TimeSeries{
newTimeSeries([]float64{10}, []int64{ts.UnixNano()}, []prompbmarshal.Label{
{
Name: "__name__",
Value: "foo",
},
}),
}
if err != nil {
t.Fatalf("fail to test rule %s: unexpected error: %s", rule.Name, err)
}
if err := compareTimeSeries(t, want, got); err != nil {
t.Fatalf("fail to test rule %s: time series mismatch: %s", rule.Name, err)
}
}

View File

@@ -38,6 +38,9 @@ func compareRecordingRules(t *testing.T, a, b *RecordingRule) error {
if a.Expr != b.Expr {
return fmt.Errorf("expected to have expression %q; got %q", a.Expr, b.Expr)
}
if a.Debug != b.Debug {
return fmt.Errorf("expected to have debug=%t; got %t", a.Debug, b.Debug)
}
if !reflect.DeepEqual(a.Labels, b.Labels) {
return fmt.Errorf("expected to have labels %#v; got %#v", a.Labels, b.Labels)
}
@@ -64,6 +67,9 @@ func compareAlertingRules(t *testing.T, a, b *AlertingRule) error {
if a.Type.String() != b.Type.String() {
return fmt.Errorf("expected to have Type %#v; got %#v", a.Type.String(), b.Type.String())
}
if a.Debug != b.Debug {
return fmt.Errorf("expected to have debug=%t; got %t", a.Debug, b.Debug)
}
return nil
}

View File

@@ -105,3 +105,10 @@ func isSecreteHeader(str string) bool {
}
return false
}
func isPartialResponse(res datasource.Result) bool {
if res.IsPartial != nil && *res.IsPartial {
return true
}
return false
}

View File

@@ -1,3 +1,3 @@
See vmauth docs [here](https://docs.victoriametrics.com/vmauth/).
See vmauth docs [here](https://docs.victoriametrics.com/victoriametrics/vmauth/).
vmauth docs can be edited at [docs/vmauth.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmauth.md).
vmauth docs can be edited at [docs/vmauth.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmauth.md).

View File

@@ -319,7 +319,7 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
// Timed out request must be counted as errors, since this usually means that the backend is slow.
ui.backendErrors.Inc()
}
return true, false
return false, false
}
if !rtbOK || !rtb.canRetry() {
// Request body cannot be re-sent to another backend. Return the error to the client then.
@@ -508,7 +508,7 @@ func newRoundTripper(caFileOpt, certFileOpt, keyFileOpt, serverNameOpt string, i
return nil, fmt.Errorf("cannot initialize promauth.Config: %w", err)
}
tr := httputil.NewTransport(false)
tr := httputil.NewTransport(false, "vmauth_backend")
tr.ResponseHeaderTimeout = *responseTimeout
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
tr.DisableCompression = true
@@ -517,7 +517,6 @@ func newRoundTripper(caFileOpt, certFileOpt, keyFileOpt, serverNameOpt string, i
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
tr.DialContext = netutil.NewStatDialFunc("vmauth_backend")
rt := cfg.NewRoundTripper(tr)
return rt, nil

View File

@@ -1,3 +1,3 @@
See vmbackup docs [here](https://docs.victoriametrics.com/vmbackup/).
See vmbackup docs [here](https://docs.victoriametrics.com/victoriametrics/vmbackup/).
vmbackup docs can be edited at [docs/vmbackup.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmbackup.md).
vmbackup docs can be edited at [docs/vmbackup.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmbackup.md).

View File

@@ -1,3 +1,3 @@
See vmbackupmanager docs [here](https://docs.victoriametrics.com/vmbackupmanager/).
See vmbackupmanager docs [here](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/).
vmbackupmanager docs can be edited at [docs/vmbackupmanager.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmbackupmanager.md).
vmbackupmanager docs can be edited at [docs/vmbackupmanager.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmbackupmanager.md).

View File

@@ -1,3 +1,3 @@
See vmctl docs [here](https://docs.victoriametrics.com/vmctl/).
See vmctl docs [here](https://docs.victoriametrics.com/victoriametrics/vmctl/).
vmctl docs can be edited at [docs/vmctl.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmctl.md).
vmctl docs can be edited at [docs/vmctl.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmctl.md).

View File

@@ -70,7 +70,7 @@ func main() {
return fmt.Errorf("invalid -%s: %w", otsdbAddr, err)
}
tr, err := promauth.NewTLSTransport(certFile, keyFile, caFile, serverName, insecureSkipVerify)
tr, err := promauth.NewTLSTransport(certFile, keyFile, caFile, serverName, insecureSkipVerify, "vmctl_opentsdb")
if err != nil {
return fmt.Errorf("failed to create transport for -%s=%q: %s", otsdbAddr, addr, err)
}
@@ -185,7 +185,7 @@ func main() {
serverName := c.String(remoteReadServerName)
insecureSkipVerify := c.Bool(remoteReadInsecureSkipVerify)
tr, err := promauth.NewTLSTransport(certFile, keyFile, caFile, serverName, insecureSkipVerify)
tr, err := promauth.NewTLSTransport(certFile, keyFile, caFile, serverName, insecureSkipVerify, "vmctl_remoteread")
if err != nil {
return fmt.Errorf("failed to create transport for -%s=%q: %s", remoteReadSrcAddr, addr, err)
}
@@ -315,7 +315,7 @@ func main() {
return fmt.Errorf("failed to create TLS Config: %s", err)
}
trSrc := httputil.NewTransport(false)
trSrc := httputil.NewTransport(false, "vmctl_src")
trSrc.DisableKeepAlives = disableKeepAlive
trSrc.TLSClientConfig = srcTC
@@ -345,7 +345,7 @@ func main() {
return fmt.Errorf("failed to create TLS Config: %s", err)
}
trDst := httputil.NewTransport(false)
trDst := httputil.NewTransport(false, "vmctl_dst")
trDst.DisableKeepAlives = disableKeepAlive
trDst.TLSClientConfig = dstTC
@@ -455,7 +455,7 @@ func initConfigVM(c *cli.Context) (vm.Config, error) {
serverName := c.String(vmServerName)
insecureSkipVerify := c.Bool(vmInsecureSkipVerify)
tr, err := promauth.NewTLSTransport(certFile, keyFile, caFile, serverName, insecureSkipVerify)
tr, err := promauth.NewTLSTransport(certFile, keyFile, caFile, serverName, insecureSkipVerify, "vmctl_client")
if err != nil {
return vm.Config{}, fmt.Errorf("failed to create transport for -%s=%q: %s", vmAddr, addr, err)
}

View File

@@ -43,60 +43,12 @@ func (pp *prometheusProcessor) run() error {
return nil
}
bar := barpool.AddWithTemplate(fmt.Sprintf(barTpl, "Processing blocks"), len(blocks))
if err := barpool.Start(); err != nil {
return err
}
defer barpool.Stop()
blockReadersCh := make(chan tsdb.BlockReader)
errCh := make(chan error, pp.cc)
pp.im.ResetStats()
var wg sync.WaitGroup
wg.Add(pp.cc)
for i := 0; i < pp.cc; i++ {
go func() {
defer wg.Done()
for br := range blockReadersCh {
if err := pp.do(br); err != nil {
errCh <- fmt.Errorf("read failed for block %q: %s", br.Meta().ULID, err)
return
}
bar.Increment()
}
}()
}
// any error breaks the import
for _, br := range blocks {
select {
case promErr := <-errCh:
close(blockReadersCh)
return fmt.Errorf("prometheus error: %s", promErr)
case vmErr := <-pp.im.Errors():
close(blockReadersCh)
return fmt.Errorf("import process failed: %s", wrapErr(vmErr, pp.isVerbose))
case blockReadersCh <- br:
}
}
close(blockReadersCh)
wg.Wait()
// wait for all buffers to flush
pp.im.Close()
close(errCh)
// drain import errors channel
for vmErr := range pp.im.Errors() {
if vmErr.Err != nil {
return fmt.Errorf("import process failed: %s", wrapErr(vmErr, pp.isVerbose))
}
}
for err := range errCh {
return fmt.Errorf("import process failed: %s", err)
if err := pp.processBlocks(blocks); err != nil {
return fmt.Errorf("migration failed: %s", err)
}
log.Println("Import finished!")
log.Print(pp.im.Stats())
log.Println(pp.im.Stats())
return nil
}
@@ -156,3 +108,59 @@ func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
}
return ss.Err()
}
func (pp *prometheusProcessor) processBlocks(blocks []tsdb.BlockReader) error {
bar := barpool.AddWithTemplate(fmt.Sprintf(barTpl, "Processing blocks"), len(blocks))
if err := barpool.Start(); err != nil {
return err
}
defer barpool.Stop()
blockReadersCh := make(chan tsdb.BlockReader)
errCh := make(chan error, pp.cc)
pp.im.ResetStats()
var wg sync.WaitGroup
wg.Add(pp.cc)
for i := 0; i < pp.cc; i++ {
go func() {
defer wg.Done()
for br := range blockReadersCh {
if err := pp.do(br); err != nil {
errCh <- fmt.Errorf("read failed for block %q: %s", br.Meta().ULID, err)
return
}
bar.Increment()
}
}()
}
// any error breaks the import
for _, br := range blocks {
select {
case promErr := <-errCh:
close(blockReadersCh)
return fmt.Errorf("prometheus error: %s", promErr)
case vmErr := <-pp.im.Errors():
close(blockReadersCh)
return fmt.Errorf("import process failed: %s", wrapErr(vmErr, pp.isVerbose))
case blockReadersCh <- br:
}
}
close(blockReadersCh)
wg.Wait()
// wait for all buffers to flush
pp.im.Close()
close(errCh)
// drain import errors channel
for vmErr := range pp.im.Errors() {
if vmErr.Err != nil {
return fmt.Errorf("import process failed: %s", wrapErr(vmErr, pp.isVerbose))
}
}
for err := range errCh {
return fmt.Errorf("import process failed: %s", err)
}
return nil
}

View File

@@ -2,167 +2,214 @@ package main
import (
"context"
"fmt"
"log"
"os"
"testing"
"time"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/chunkenc"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/backoff"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/barpool"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/prometheus"
remote_read_integration "github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/testdata/servers_integration_test"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
)
// If you want to run this test:
// 1. provide test snapshot path in const testSnapshot
// 2. define httpAddr const with your victoriametrics address
// 3. run victoria metrics with defined address
// 4. remove t.Skip() from Test_prometheusProcessor_run
// 5. run tests one by one not all at one time
const (
httpAddr = "http://127.0.0.1:8428/"
testSnapshot = "./testdata/20220427T130947Z-70ba49b1093fd0bf"
testSnapshot = "./testdata/snapshots/20250118T124506Z-59d1b952d7eaf547"
blockData = "./testdata/snapshots/20250118T124506Z-59d1b952d7eaf547/01JHWQ445Y2P1TDYB05AEKD6MC"
)
// This test simulates close process if user abort it
func TestPrometheusProcessorRun(t *testing.T) {
t.Skip()
defer func() { isSilent = false }()
f := func(startStr, endStr string, numOfSeries int, resultExpected []vm.TimeSeries) {
t.Helper()
type fields struct {
cfg prometheus.Config
vmCfg vm.Config
cl func(prometheus.Config) *prometheus.Client
im func(vm.Config) *vm.Importer
closer func(importer *vm.Importer)
cc int
}
type args struct {
silent bool
verbose bool
}
dst := remote_read_integration.NewRemoteWriteServer(t)
tests := []struct {
name string
fields fields
args args
wantErr bool
}{
{
name: "simulate syscall.SIGINT",
fields: fields{
cfg: prometheus.Config{
Snapshot: testSnapshot,
Filter: prometheus.Filter{},
},
cl: func(cfg prometheus.Config) *prometheus.Client {
client, err := prometheus.NewClient(cfg)
if err != nil {
t.Fatalf("error init prometeus client: %s", err)
}
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}
return importer
},
closer: func(importer *vm.Importer) {
// simulate syscall.SIGINT
time.Sleep(time.Second * 5)
if importer != nil {
importer.Close()
}
},
vmCfg: vm.Config{Addr: httpAddr, Concurrency: 1},
cc: 2,
},
args: args{
silent: false,
verbose: false,
},
wantErr: true,
},
{
name: "simulate correct work",
fields: fields{
cfg: prometheus.Config{
Snapshot: testSnapshot,
Filter: prometheus.Filter{},
},
cl: func(cfg prometheus.Config) *prometheus.Client {
client, err := prometheus.NewClient(cfg)
if err != nil {
t.Fatalf("error init prometeus client: %s", err)
}
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}
return importer
},
closer: nil,
vmCfg: vm.Config{Addr: httpAddr, Concurrency: 5},
cc: 2,
},
args: args{
silent: true,
verbose: false,
},
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
client := tt.fields.cl(tt.fields.cfg)
importer := tt.fields.im(tt.fields.vmCfg)
isSilent = tt.args.silent
pp := &prometheusProcessor{
cl: client,
im: importer,
cc: tt.fields.cc,
isVerbose: tt.args.verbose,
}
defer func() {
dst.Close()
}()
// we should answer on prompt
if !tt.args.silent {
input := []byte("Y\n")
dst.Series(resultExpected)
dst.ExpectedSeries(resultExpected)
r, w, err := os.Pipe()
if err != nil {
t.Fatal(err)
}
if err := fillStorage(resultExpected); err != nil {
t.Fatalf("cannot fill storage: %s", err)
}
_, err = w.Write(input)
if err != nil {
t.Fatalf("cannot send 'Y' to importer: %s", err)
}
err = w.Close()
if err != nil {
t.Fatalf("cannot close writer: %s", err)
}
isSilent = true
defer func() { isSilent = false }()
stdin := os.Stdin
// Restore stdin right after the test.
defer func() {
os.Stdin = stdin
_ = r.Close()
}()
os.Stdin = r
}
bf, err := backoff.New(1, 1.8, time.Second*2)
if err != nil {
t.Fatalf("cannot create backoff: %s", err)
}
// simulate close if needed
if tt.fields.closer != nil {
go tt.fields.closer(importer)
}
importerCfg := vm.Config{
Addr: dst.URL(),
Transport: nil,
Concurrency: 1,
Backoff: bf,
}
if err := pp.run(); (err != nil) != tt.wantErr {
t.Fatalf("run() error = %v, wantErr %v", err, tt.wantErr)
}
ctx := context.Background()
importer, err := vm.NewImporter(ctx, importerCfg)
if err != nil {
t.Fatalf("cannot create importer: %s", err)
}
defer importer.Close()
matchName := "__name__"
matchValue := ".*"
filter := prometheus.Filter{
TimeMin: startStr,
TimeMax: endStr,
Label: matchName,
LabelValue: matchValue,
}
runnner, err := prometheus.NewClient(prometheus.Config{
Snapshot: testSnapshot,
Filter: filter,
})
if err != nil {
t.Fatalf("cannot create prometheus client: %s", err)
}
p := &prometheusProcessor{
cl: runnner,
im: importer,
cc: 1,
}
if err := p.run(); err != nil {
t.Fatalf("run() error: %s", err)
}
collectedTs := dst.GetCollectedTimeSeries()
t.Logf("collected timeseries: %d; expected timeseries: %d", len(collectedTs), len(resultExpected))
if len(collectedTs) != len(resultExpected) {
t.Fatalf("unexpected number of collected time series; got %d; want %d", len(collectedTs), numOfSeries)
}
deleted, err := deleteSeries(matchName, matchValue)
if err != nil {
t.Fatalf("cannot delete series: %s", err)
}
if deleted != numOfSeries {
t.Fatalf("unexpected number of deleted series; got %d; want %d", deleted, numOfSeries)
}
}
processFlags()
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
defer func() {
vmstorage.Stop()
if err := os.RemoveAll(storagePath); err != nil {
log.Fatalf("cannot remove %q: %s", storagePath, err)
}
}()
barpool.Disable(true)
defer func() {
barpool.Disable(false)
}()
b, err := tsdb.OpenBlock(nil, blockData, nil, nil)
if err != nil {
t.Fatalf("cannot open block: %s", err)
}
// timestamp is equal to minTime and maxTime from meta.json
ss, err := readBlock(b, 1737204082361, 1737204302539)
if err != nil {
t.Fatalf("cannot read block: %s", err)
}
resultExpected, err := prepareExpectedData(ss)
if err != nil {
t.Fatalf("cannot prepare expected data: %s", err)
}
f("2025-01-18T12:40:00Z", "2025-01-18T12:46:00Z", 2792, resultExpected)
}
func readBlock(b tsdb.BlockReader, timeMin int64, timeMax int64) (storage.SeriesSet, error) {
minTime, maxTime := b.Meta().MinTime, b.Meta().MaxTime
if timeMin != 0 {
minTime = timeMin
}
if timeMax != 0 {
maxTime = timeMax
}
q, err := tsdb.NewBlockQuerier(b, minTime, maxTime)
if err != nil {
return nil, err
}
matchName := "__name__"
matchValue := ".*"
ctx := context.Background()
ss := q.Select(ctx, false, nil, labels.MustNewMatcher(labels.MatchRegexp, matchName, matchValue))
return ss, nil
}
func prepareExpectedData(ss storage.SeriesSet) ([]vm.TimeSeries, error) {
var expectedSeriesSet []vm.TimeSeries
var it chunkenc.Iterator
for ss.Next() {
var name string
var labelPairs []vm.LabelPair
series := ss.At()
for _, label := range series.Labels() {
if label.Name == "__name__" {
name = label.Value
continue
}
labelPairs = append(labelPairs, vm.LabelPair{
Name: label.Name,
Value: label.Value,
})
}
if name == "" {
return nil, fmt.Errorf("failed to find `__name__` label in labelset for block")
}
var timestamps []int64
var values []float64
it = series.Iterator(it)
for {
typ := it.Next()
if typ == chunkenc.ValNone {
break
}
if typ != chunkenc.ValFloat {
// Skip unsupported values
continue
}
t, v := it.At()
timestamps = append(timestamps, t)
values = append(values, v)
}
if err := it.Err(); err != nil {
return nil, err
}
ts := vm.TimeSeries{
Name: name,
LabelPairs: labelPairs,
Timestamps: timestamps,
Values: values,
}
expectedSeriesSet = append(expectedSeriesSet, ts)
}
return expectedSeriesSet, nil
}

View File

@@ -73,7 +73,7 @@ func TestRemoteRead(t *testing.T) {
vmCfg: vm.Config{
Addr: "",
Concurrency: 1,
Transport: httputil.NewTransport(false),
Transport: httputil.NewTransport(false, "vmctl_test_read"),
},
start: "2022-09-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",

View File

@@ -41,6 +41,7 @@ type RemoteWriteServer struct {
server *httptest.Server
series []vm.TimeSeries
expectedSeries []vm.TimeSeries
tss []vm.TimeSeries
}
// NewRemoteWriteServer prepares test remote write server
@@ -73,6 +74,10 @@ func (rws *RemoteWriteServer) ExpectedSeries(series []vm.TimeSeries) {
rws.expectedSeries = append(rws.expectedSeries, series...)
}
func (rws *RemoteWriteServer) GetCollectedTimeSeries() []vm.TimeSeries {
return rws.tss
}
// URL returns server url
func (rws *RemoteWriteServer) URL() string {
return rws.server.URL
@@ -80,7 +85,6 @@ func (rws *RemoteWriteServer) URL() string {
func (rws *RemoteWriteServer) getWriteHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var tss []vm.TimeSeries
scanner := bufio.NewScanner(r.Body)
var rows parser.Rows
for scanner.Scan() {
@@ -102,17 +106,11 @@ func (rws *RemoteWriteServer) getWriteHandler(t *testing.T) http.Handler {
ts.Timestamps = append(ts.Timestamps, row.Timestamps...)
ts.Name = nameValue
ts.LabelPairs = labelPairs
tss = append(tss, ts)
rws.tss = append(rws.tss, ts)
}
rows.Reset()
}
if !reflect.DeepEqual(tss, rws.expectedSeries) {
w.WriteHeader(http.StatusInternalServerError)
t.Fatalf("datasets not equal, expected: %#v; \n got: %#v", rws.expectedSeries, tss)
return
}
w.WriteHeader(http.StatusNoContent)
return
})

View File

@@ -0,0 +1,17 @@
{
"ulid": "01JHWQ445Y2P1TDYB05AEKD6MC",
"minTime": 1737204082361,
"maxTime": 1737204302539,
"stats": {
"numSamples": 60275,
"numSeries": 2792,
"numChunks": 2792
},
"compaction": {
"level": 1,
"sources": [
"01JHWQ445Y2P1TDYB05AEKD6MC"
]
},
"version": 1
}

View File

@@ -23,8 +23,9 @@ import (
)
const (
storagePath = "TestStorage"
retentionPeriod = "100y"
storagePath = "TestStorage"
retentionPeriod = "100y"
deleteSeriesLimit = 3e3
)
func TestVMNativeProcessorRun(t *testing.T) {
@@ -66,7 +67,7 @@ func TestVMNativeProcessorRun(t *testing.T) {
t.Fatalf("cannot add series to storage: %s", err)
}
tr := httputil.NewTransport(false)
tr := httputil.NewTransport(false, "test_client")
tr.DisableKeepAlives = false
srcClient := &native.Client{
@@ -259,7 +260,7 @@ func deleteSeries(name, value string) (int, error) {
if err := tfs.Add([]byte(name), []byte(value), false, true); err != nil {
return 0, fmt.Errorf("unexpected error in TagFilters.Add: %w", err)
}
return vmstorage.DeleteSeries(nil, []*storage.TagFilters{tfs}, 1e3)
return vmstorage.DeleteSeries(nil, []*storage.TagFilters{tfs}, deleteSeriesLimit)
}
func TestBuildMatchWithFilter_Failure(t *testing.T) {

View File

@@ -1,3 +1,3 @@
See vmgateway docs [here](https://docs.victoriametrics.com/vmgateway/).
See vmgateway docs [here](https://docs.victoriametrics.com/victoriametrics/vmgateway/).
vmgateway docs can be edited at [docs/vmgateway.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmgateway.md).
vmgateway docs can be edited at [docs/vmgateway.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmgateway.md).

View File

@@ -320,8 +320,10 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/prometheus/api/v1/targets", "/api/v1/targets":
promscrapeAPIV1TargetsRequests.Inc()
w.Header().Set("Content-Type", "application/json")
// https://prometheus.io/docs/prometheus/latest/querying/api/#targets
state := r.FormValue("state")
promscrape.WriteAPIV1Targets(w, state)
scrapePool := r.FormValue("scrapePool")
promscrape.WriteAPIV1Targets(w, state, scrapePool)
return true
case "/prometheus/target_response", "/target_response":
promscrapeTargetResponseRequests.Inc()

View File

@@ -1,3 +1,3 @@
See vmrestore docs [here](https://docs.victoriametrics.com/vmrestore/).
See vmrestore docs [here](https://docs.victoriametrics.com/victoriametrics/vmrestore/).
vmrestore docs can be edited at [docs/vmrestore.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmrestore.md).
vmrestore docs can be edited at [docs/vmrestore.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmrestore.md).

View File

@@ -404,6 +404,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/api/v1/status/metric_names_stats":
metricNamesStatsRequests.Inc()
httpserver.EnableCORS(w, r)
if err := stats.MetricNamesStatsHandler(qt, w, r); err != nil {
metricNamesStatsErrors.Inc()
httpserver.Errorf(w, r, "%s", err)

View File

@@ -806,7 +806,6 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
} else {
queryOffset = 0
}
qs := &promql.QueryStats{}
ec := &promql.EvalConfig{
Start: start,
End: start,
@@ -822,9 +821,10 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
GetRequestURI: func() string {
return httpserver.GetRequestURI(r)
},
QueryStats: qs,
}
qs := promql.NewQueryStats(query, nil, ec)
ec.QueryStats = qs
result, err := promql.Exec(qt, ec, query, true)
if err != nil {
return fmt.Errorf("error when executing query=%q for (time=%d, step=%d): %w", query, start, step, err)
@@ -853,6 +853,7 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
if err := bw.Flush(); err != nil {
return fmt.Errorf("cannot flush query response to remote client: %w", err)
}
return nil
}
@@ -914,7 +915,6 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, w http.Respo
start, end = promql.AdjustStartEnd(start, end, step)
}
qs := &promql.QueryStats{}
ec := &promql.EvalConfig{
Start: start,
End: end,
@@ -930,9 +930,10 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, w http.Respo
GetRequestURI: func() string {
return httpserver.GetRequestURI(r)
},
QueryStats: qs,
}
qs := promql.NewQueryStats(query, nil, ec)
ec.QueryStats = qs
result, err := promql.Exec(qt, ec, query, false)
if err != nil {
return err
@@ -961,6 +962,7 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, w http.Respo
if err := bw.Flush(); err != nil {
return fmt.Errorf("cannot send query range response to remote client: %w", err)
}
return nil
}

View File

@@ -34,7 +34,7 @@ See https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries
// It cannot be converted to int without breaking backwards compatibility at vmalert :(
%}
"seriesFetched": "{%dl qs.SeriesFetched.Load() %}",
"executionTimeMsec": {%dl qs.ExecutionTimeMsec.Load() %}
"executionTimeMsec": {%dl qs.ExecutionDuration.Load().Milliseconds() %}
}
{% code
qt.Printf("generate /api/v1/query_range response for series=%d, points=%d", seriesCount, pointsCount)

View File

@@ -71,9 +71,9 @@ func StreamQueryRangeResponse(qw422016 *qt422016.Writer, rs []netstorage.Result,
qw422016.N().DL(qs.SeriesFetched.Load())
//line app/vmselect/prometheus/query_range_response.qtpl:36
qw422016.N().S(`","executionTimeMsec":`)
//line app/vmselect/prometheus/query_range_response.qtpl:37
qw422016.N().DL(qs.ExecutionTimeMsec.Load())
//line app/vmselect/prometheus/query_range_response.qtpl:37
//line app/vmselect/prometheus/query_range_response.qtpl:38
qw422016.N().DL(qs.ExecutionDuration.Load().Milliseconds())
//line app/vmselect/prometheus/query_range_response.qtpl:38
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_range_response.qtpl:40
qt.Printf("generate /api/v1/query_range response for series=%d, points=%d", seriesCount, pointsCount)

View File

@@ -36,7 +36,7 @@ See https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries
// It cannot be converted to int without breaking backwards compatibility at vmalert :(
%}
"seriesFetched": "{%dl qs.SeriesFetched.Load() %}",
"executionTimeMsec": {%dl qs.ExecutionTimeMsec.Load() %}
"executionTimeMsec": {%dl qs.ExecutionDuration.Load().Milliseconds() %}
}
{% code
qt.Printf("generate /api/v1/query response for series=%d", seriesCount)

View File

@@ -81,9 +81,9 @@ func StreamQueryResponse(qw422016 *qt422016.Writer, rs []netstorage.Result, qt *
qw422016.N().DL(qs.SeriesFetched.Load())
//line app/vmselect/prometheus/query_response.qtpl:38
qw422016.N().S(`","executionTimeMsec":`)
//line app/vmselect/prometheus/query_response.qtpl:39
qw422016.N().DL(qs.ExecutionTimeMsec.Load())
//line app/vmselect/prometheus/query_response.qtpl:39
//line app/vmselect/prometheus/query_response.qtpl:40
qw422016.N().DL(qs.ExecutionDuration.Load().Milliseconds())
//line app/vmselect/prometheus/query_response.qtpl:40
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_response.qtpl:42
qt.Printf("generate /api/v1/query response for series=%d", seriesCount)

View File

@@ -11,7 +11,7 @@ TSDBStatusResponse generates response for /api/v1/status/tsdb .
"data":{
"totalSeries": {%dul= status.TotalSeries %},
"totalLabelValuePairs": {%dul= status.TotalLabelValuePairs %},
"seriesCountByMetricName":{%= tsdbStatusEntries(status.SeriesCountByMetricName) %},
"seriesCountByMetricName":{%= tsdbStatusMetricNameEntries(status.SeriesCountByMetricName,status.SeriesQueryStatsByMetricName) %},
"seriesCountByLabelName":{%= tsdbStatusEntries(status.SeriesCountByLabelName) %},
"seriesCountByFocusLabelValue":{%= tsdbStatusEntries(status.SeriesCountByFocusLabelValue) %},
"seriesCountByLabelValuePair":{%= tsdbStatusEntries(status.SeriesCountByLabelValuePair) %},
@@ -34,4 +34,32 @@ TSDBStatusResponse generates response for /api/v1/status/tsdb .
]
{% endfunc %}
{% func tsdbStatusMetricNameEntries(a []storage.TopHeapEntry, queryStats []storage.MetricNamesStatsRecord) %}
{% code
queryStatsByMetricName := make(map[string]storage.MetricNamesStatsRecord,len(queryStats))
for _, record := range queryStats{
queryStatsByMetricName[record.MetricName] = record
}
%}
[
{% for i, e := range a %}
{
{% code
entry, ok := queryStatsByMetricName[e.Name]
%}
"name":{%q= e.Name %},
{% if !ok %}
"value":{%d= int(e.Count) %}
{% else %}
"value":{%d= int(e.Count) %},
"requestsCount":{%d= int(entry.RequestsCount) %},
"lastRequestTimestamp":{%d= int(entry.LastRequestTs) %}
{% endif %}
}
{% if i+1 < len(a) %},{% endif %}
{% endfor %}
]
{% endfunc %}
{% endstripspace %}

View File

@@ -37,9 +37,9 @@ func StreamTSDBStatusResponse(qw422016 *qt422016.Writer, status *storage.TSDBSta
qw422016.N().DUL(status.TotalLabelValuePairs)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:13
qw422016.N().S(`,"seriesCountByMetricName":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:14
streamtsdbStatusEntries(qw422016, status.SeriesCountByMetricName)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:14
//line app/vmselect/prometheus/tsdb_status_response.qtpl:15
streamtsdbStatusMetricNameEntries(qw422016, status.SeriesCountByMetricName, status.SeriesQueryStatsByMetricName)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:15
qw422016.N().S(`,"seriesCountByLabelName":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:15
streamtsdbStatusEntries(qw422016, status.SeriesCountByLabelName)
@@ -147,3 +147,89 @@ func tsdbStatusEntries(a []storage.TopHeapEntry) string {
return qs422016
//line app/vmselect/prometheus/tsdb_status_response.qtpl:35
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:38
func streamtsdbStatusMetricNameEntries(qw422016 *qt422016.Writer, a []storage.TopHeapEntry, queryStats []storage.MetricNamesStatsRecord) {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:40
queryStatsByMetricName := make(map[string]storage.MetricNamesStatsRecord, len(queryStats))
for _, record := range queryStats {
queryStatsByMetricName[record.MetricName] = record
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:44
qw422016.N().S(`[`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:46
for i, e := range a {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:46
qw422016.N().S(`{`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:49
entry, ok := queryStatsByMetricName[e.Name]
//line app/vmselect/prometheus/tsdb_status_response.qtpl:50
qw422016.N().S(`"name":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:51
qw422016.N().Q(e.Name)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:51
qw422016.N().S(`,`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:52
if !ok {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:52
qw422016.N().S(`"value":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:53
qw422016.N().D(int(e.Count))
//line app/vmselect/prometheus/tsdb_status_response.qtpl:54
} else {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:54
qw422016.N().S(`"value":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:55
qw422016.N().D(int(e.Count))
//line app/vmselect/prometheus/tsdb_status_response.qtpl:55
qw422016.N().S(`,"requestsCount":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:56
qw422016.N().D(int(entry.RequestsCount))
//line app/vmselect/prometheus/tsdb_status_response.qtpl:56
qw422016.N().S(`,"lastRequestTimestamp":`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:57
qw422016.N().D(int(entry.LastRequestTs))
//line app/vmselect/prometheus/tsdb_status_response.qtpl:58
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:58
qw422016.N().S(`}`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:60
if i+1 < len(a) {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:60
qw422016.N().S(`,`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:60
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:61
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:61
qw422016.N().S(`]`)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
func writetsdbStatusMetricNameEntries(qq422016 qtio422016.Writer, a []storage.TopHeapEntry, queryStats []storage.MetricNamesStatsRecord) {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
streamtsdbStatusMetricNameEntries(qw422016, a, queryStats)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
}
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
func tsdbStatusMetricNameEntries(a []storage.TopHeapEntry, queryStats []storage.MetricNamesStatsRecord) string {
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
writetsdbStatusMetricNameEntries(qb422016, a, queryStats)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
return qs422016
//line app/vmselect/prometheus/tsdb_status_response.qtpl:63
}

View File

@@ -172,30 +172,6 @@ func copyEvalConfig(src *EvalConfig) *EvalConfig {
return &ec
}
// QueryStats contains various stats for the query.
type QueryStats struct {
// SeriesFetched contains the number of series fetched from storage during the query evaluation.
SeriesFetched atomic.Int64
// ExecutionTimeMsec contains the number of milliseconds the query took to execute.
ExecutionTimeMsec atomic.Int64
}
func (qs *QueryStats) addSeriesFetched(n int) {
if qs == nil {
return
}
qs.SeriesFetched.Add(int64(n))
}
func (qs *QueryStats) addExecutionTimeMsec(startTime time.Time) {
if qs == nil {
return
}
d := time.Since(startTime).Milliseconds()
qs.ExecutionTimeMsec.Add(d)
}
func (ec *EvalConfig) validate() {
if ec.Start > ec.End {
logger.Panicf("BUG: start cannot exceed end; got %d vs %d", ec.Start, ec.End)
@@ -1721,12 +1697,13 @@ func evalRollupFuncNoCache(qt *querytracer.Tracer, ec *EvalConfig, funcName stri
if err != nil {
return nil, err
}
qs := ec.QueryStats
rssLen := rss.Len()
if rssLen == 0 {
rss.Cancel()
return nil, nil
}
ec.QueryStats.addSeriesFetched(rssLen)
qs.addSeriesFetched(rssLen)
// Verify timeseries fit available memory during rollup calculations.
timeseriesLen := rssLen

View File

@@ -0,0 +1,55 @@
package promql
import (
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
)
// QueryStats contains various stats of the query evaluation.
type QueryStats struct {
// ExecutionDuration contains the time duration the query took to execute.
ExecutionDuration atomic.Pointer[time.Duration]
// SeriesFetched contains the number of series fetched from storage or cache.
SeriesFetched atomic.Int64
at *auth.Token
query string
queryType string
start int64
end int64
step int64
}
// NewQueryStats creates a new QueryStats object.
func NewQueryStats(query string, at *auth.Token, ec *EvalConfig) *QueryStats {
qs := &QueryStats{
at: at,
query: query,
step: ec.Step,
start: ec.Start,
end: ec.End,
queryType: "range",
}
if qs.start == qs.end {
qs.queryType = "instant"
}
return qs
}
func (qs *QueryStats) addSeriesFetched(n int) {
if qs == nil {
return
}
qs.SeriesFetched.Add(int64(n))
}
func (qs *QueryStats) addExecutionTimeMsec(startTime time.Time) {
if qs == nil {
return
}
d := time.Since(startTime)
qs.ExecutionDuration.Store(&d)
}

View File

@@ -3,6 +3,7 @@ package stats
import (
"fmt"
"net/http"
"regexp"
"strconv"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
@@ -33,6 +34,12 @@ func MetricNamesStatsHandler(qt *querytracer.Tracer, w http.ResponseWriter, r *h
le = n
}
matchPattern := r.FormValue("match_pattern")
if len(matchPattern) > 0 {
_, err := regexp.Compile(matchPattern)
if err != nil {
return fmt.Errorf("match_pattern=%q must be valid regex: %w", matchPattern, err)
}
}
stats, err := netstorage.GetMetricNamesStats(qt, limit, le, matchPattern)
if err != nil {
return err

View File

@@ -5,9 +5,13 @@ menu:
docs:
parent: 'victoriametrics'
weight: 23
tags:
- metrics
aliases:
- /ExtendedPromQL.html
- /MetricsQL.html
- /metricsql/index.html
- /metricsql/
---
[VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) implements MetricsQL -
query language inspired by [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/).

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -36,10 +36,10 @@
<meta property="og:title" content="UI for VictoriaMetrics">
<meta property="og:url" content="https://victoriametrics.com/">
<meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data">
<script type="module" crossorigin src="./assets/index-Bc6E56oa.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-DojlIpLz.js">
<script type="module" crossorigin src="./assets/index-BLLXzY70.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-C-vZmbyg.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-D1GxaB_c.css">
<link rel="stylesheet" crossorigin href="./assets/index-u4IOGr0E.css">
<link rel="stylesheet" crossorigin href="./assets/index-Brup_hCI.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>

View File

@@ -336,13 +336,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/create":
snapshotsCreateTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshotPath, err := Storage.CreateSnapshot()
if err != nil {
err = fmt.Errorf("cannot create snapshot: %w", err)
jsonResponseError(w, err)
snapshotsCreateErrorsTotal.Inc()
return true
}
snapshotPath := Storage.MustCreateSnapshot()
if prometheusCompatibleResponse {
fmt.Fprintf(w, `{"status":"success","data":{"name":%s}}`, stringsutil.JSONString(snapshotPath))
} else {
@@ -352,13 +346,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/list":
snapshotsListTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshots, err := Storage.ListSnapshots()
if err != nil {
err = fmt.Errorf("cannot list snapshots: %w", err)
jsonResponseError(w, err)
snapshotsListErrorsTotal.Inc()
return true
}
snapshots := Storage.MustListSnapshots()
fmt.Fprintf(w, `{"status":"ok","snapshots":[`)
if len(snapshots) > 0 {
for _, snapshot := range snapshots[:len(snapshots)-1] {
@@ -373,13 +361,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
w.Header().Set("Content-Type", "application/json")
snapshotName := r.FormValue("snapshot")
snapshots, err := Storage.ListSnapshots()
if err != nil {
err = fmt.Errorf("cannot list snapshots: %w", err)
jsonResponseError(w, err)
snapshotsDeleteErrorsTotal.Inc()
return true
}
snapshots := Storage.MustListSnapshots()
for _, snName := range snapshots {
if snName == snapshotName {
if err := Storage.DeleteSnapshot(snName); err != nil {
@@ -393,19 +375,13 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
}
err = fmt.Errorf("cannot find snapshot %q", snapshotName)
err := fmt.Errorf("cannot find snapshot %q", snapshotName)
jsonResponseError(w, err)
return true
case "/delete_all":
snapshotsDeleteAllTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshots, err := Storage.ListSnapshots()
if err != nil {
err = fmt.Errorf("cannot list snapshots: %w", err)
jsonResponseError(w, err)
snapshotsDeleteAllErrorsTotal.Inc()
return true
}
snapshots := Storage.MustListSnapshots()
for _, snapshotName := range snapshots {
if err := Storage.DeleteSnapshot(snapshotName); err != nil {
err = fmt.Errorf("cannot delete snapshot %q: %w", snapshotName, err)
@@ -439,10 +415,7 @@ func initStaleSnapshotsRemover(strg *storage.Storage) {
return
case <-t.C:
}
if err := strg.DeleteStaleSnapshots(snapshotsMaxAgeDur); err != nil {
// Use logger.Errorf instead of logger.Fatalf in the hope the error is temporary.
logger.Errorf("cannot delete stale snapshots: %s", err)
}
strg.MustDeleteStaleSnapshots(snapshotsMaxAgeDur)
}
}()
}
@@ -460,11 +433,9 @@ var (
var (
activeForceMerges = metrics.NewCounter("vm_active_force_merges")
snapshotsCreateTotal = metrics.NewCounter(`vm_http_requests_total{path="/snapshot/create"}`)
snapshotsCreateErrorsTotal = metrics.NewCounter(`vm_http_request_errors_total{path="/snapshot/create"}`)
snapshotsCreateTotal = metrics.NewCounter(`vm_http_requests_total{path="/snapshot/create"}`)
snapshotsListTotal = metrics.NewCounter(`vm_http_requests_total{path="/snapshot/list"}`)
snapshotsListErrorsTotal = metrics.NewCounter(`vm_http_request_errors_total{path="/snapshot/list"}`)
snapshotsListTotal = metrics.NewCounter(`vm_http_requests_total{path="/snapshot/list"}`)
snapshotsDeleteTotal = metrics.NewCounter(`vm_http_requests_total{path="/snapshot/delete"}`)
snapshotsDeleteErrorsTotal = metrics.NewCounter(`vm_http_request_errors_total{path="/snapshot/delete"}`)

View File

@@ -1,4 +1,4 @@
FROM golang:1.24.1 AS build-web-stage
FROM golang:1.24.2 AS build-web-stage
COPY build /build
WORKDIR /build

View File

@@ -1,7 +1,7 @@
# All these commands must run from repository root.
copy-metricsql-docs:
cp docs/MetricsQL.md app/vmui/packages/vmui/src/assets/MetricsQL.md
cp docs/victoriametrics/MetricsQL.md app/vmui/packages/vmui/src/assets/MetricsQL.md
vmui-package-base-image:
docker build -t vmui-builder-image -f app/vmui/Dockerfile-build ./app/vmui

File diff suppressed because it is too large Load Diff

View File

@@ -8,21 +8,21 @@
"@types/lodash.debounce": "^4.0.9",
"@types/lodash.get": "^4.4.9",
"@types/qs": "^6.9.18",
"@types/react": "^19.0.12",
"@types/react": "^19.1.2",
"@types/react-input-mask": "^3.0.6",
"@types/react-router-dom": "^5.3.3",
"classnames": "^2.5.1",
"dayjs": "^1.11.13",
"lodash.debounce": "^4.0.8",
"lodash.get": "^4.4.2",
"marked": "^15.0.7",
"marked": "^15.0.8",
"marked-emoji": "^2.0.0",
"preact": "^10.26.4",
"preact": "^10.26.5",
"qs": "^6.14.0",
"react-input-mask": "^2.0.4",
"react-router-dom": "^7.4.0",
"react-router-dom": "^7.5.0",
"uplot": "^1.6.32",
"vite": "^6.2.3",
"vite": "^6.2.6",
"web-vitals": "^4.2.4"
},
"scripts": {
@@ -55,25 +55,25 @@
"@babel/plugin-proposal-nullish-coalescing-operator": "^7.18.6",
"@babel/plugin-proposal-private-property-in-object": "^7.21.11",
"@eslint/eslintrc": "^3.3.1",
"@eslint/js": "^9.23.0",
"@eslint/js": "^9.24.0",
"@preact/preset-vite": "^2.10.1",
"@testing-library/jest-dom": "^6.6.3",
"@testing-library/preact": "^3.2.4",
"@types/node": "^22.13.13",
"@typescript-eslint/eslint-plugin": "^8.28.0",
"@typescript-eslint/parser": "^8.28.0",
"@types/node": "^22.14.1",
"@typescript-eslint/eslint-plugin": "^8.30.1",
"@typescript-eslint/parser": "^8.30.1",
"cross-env": "^7.0.3",
"eslint": "^9.23.0",
"eslint-plugin-react": "^7.37.4",
"eslint": "^9.24.0",
"eslint-plugin-react": "^7.37.5",
"globals": "^16.0.0",
"http-proxy-middleware": "^3.0.3",
"jsdom": "^26.0.0",
"http-proxy-middleware": "^3.0.5",
"jsdom": "^26.1.0",
"postcss": "^8.5.3",
"rollup-plugin-visualizer": "^5.14.0",
"sass": "^1.86.0",
"sass-embedded": "^1.86.0",
"typescript": "^5.8.2",
"vitest": "^3.0.9",
"webpack": "^5.98.0"
"sass": "^1.86.3",
"sass-embedded": "^1.86.3",
"typescript": "^5.8.3",
"vitest": "^3.1.1",
"webpack": "^5.99.5"
}
}

View File

@@ -11,3 +11,19 @@ export const getCardinalityInfo = (server: string, requestsParam: CardinalityReq
return `${server}/api/v1/status/tsdb?topN=${requestsParam.topN}&date=${requestsParam.date}${match}${focusLabel}`;
};
interface MetricNamesStatsParams {
limit?: number,
le?: number, // less than or equal
match_pattern?: string, // a regex pattern to match metric names
}
export const getMetricNamesStats = (server: string, params: MetricNamesStatsParams) => {
const searchParams = new URLSearchParams(
Object.entries(params).filter(([_, value]) => value != null)
).toString();
console.log("searchParams", searchParams);
return `${server}/api/v1/status/metric_names_stats?${searchParams}`;
};

View File

@@ -5,9 +5,13 @@ menu:
docs:
parent: 'victoriametrics'
weight: 23
tags:
- metrics
aliases:
- /ExtendedPromQL.html
- /MetricsQL.html
- /metricsql/index.html
- /metricsql/
---
[VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) implements MetricsQL -
query language inspired by [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/).

View File

@@ -1,23 +1,13 @@
import React, { FC, useCallback, useMemo, useRef, useState } from "preact/compat";
import React, { FC, useState } from "preact/compat";
import "./style.scss";
import "uplot/dist/uPlot.min.css";
import useElementSize from "../../../hooks/useElementSize";
import uPlot, { AlignedData } from "uplot";
import { useEffect } from "react";
import useBarHitsOptions, { getLabelFromLogHit } from "./hooks/useBarHitsOptions";
import BarHitsTooltip from "./BarHitsTooltip/BarHitsTooltip";
import { AlignedData } from "uplot";
import { TimeParams } from "../../../types";
import usePlotScale from "../../../hooks/uplot/usePlotScale";
import useReadyChart from "../../../hooks/uplot/useReadyChart";
import useZoomChart from "../../../hooks/uplot/useZoomChart";
import classNames from "classnames";
import { LegendLogHits, LogHits } from "../../../api/types";
import { addSeries, delSeries, setBand } from "../../../utils/uplot";
import { LogHits } from "../../../api/types";
import { GraphOptions, GRAPH_STYLES } from "./types";
import BarHitsOptions from "./BarHitsOptions/BarHitsOptions";
import stack from "../../../utils/uplot/stack";
import BarHitsLegend from "./BarHitsLegend/BarHitsLegend";
import { calculateTotalHits, sortLogHits } from "../../../utils/logs";
import BarHitsPlot from "./BarHitsPlot/BarHitsPlot";
interface Props {
logHits: LogHits[];
@@ -27,9 +17,6 @@ interface Props {
onApplyFilter: (value: string) => void;
}
const BarHitsChart: FC<Props> = ({ logHits, data: _data, period, setPeriod, onApplyFilter }) => {
const [containerRef, containerSize] = useElementSize();
const uPlotRef = useRef<HTMLDivElement>(null);
const [uPlotInst, setUPlotInst] = useState<uPlot>();
const [graphOptions, setGraphOptions] = useState<GraphOptions>({
graphStyle: GRAPH_STYLES.LINE_STEPPED,
stacked: false,
@@ -37,82 +24,6 @@ const BarHitsChart: FC<Props> = ({ logHits, data: _data, period, setPeriod, onAp
hideChart: false,
});
const { xRange, setPlotScale } = usePlotScale({ period, setPeriod });
const { onReadyChart, isPanning } = useReadyChart(setPlotScale);
useZoomChart({ uPlotInst, xRange, setPlotScale });
const isEmptyData = useMemo(() => _data.every(d => d.length === 0), [_data]);
const { data, bands } = useMemo(() => {
return graphOptions.stacked ? stack(_data, () => false) : { data: _data, bands: [] };
}, [graphOptions, _data]);
const { options, series, focusDataIdx } = useBarHitsOptions({
data,
logHits,
bands,
xRange,
containerSize,
onReadyChart,
setPlotScale,
graphOptions
});
const prepareLegend = useCallback((hits: LogHits[], totalHits: number): LegendLogHits[] => {
return hits.map((hit) => {
const label = getLabelFromLogHit(hit);
const legendItem: LegendLogHits = {
label,
isOther: hit._isOther,
fields: hit.fields,
total: hit.total || 0,
totalHits,
stroke: series.find((s) => s.label === label)?.stroke,
};
return legendItem;
}).sort(sortLogHits("total"));
}, [series]);
const legendDetails: LegendLogHits[] = useMemo(() => {
const totalHits = calculateTotalHits(logHits);
return prepareLegend(logHits, totalHits);
}, [logHits, prepareLegend]);
useEffect(() => {
if (!uPlotInst) return;
delSeries(uPlotInst);
addSeries(uPlotInst, series, true);
setBand(uPlotInst, series);
uPlotInst.redraw();
}, [series]);
useEffect(() => {
if (!uPlotRef.current) return;
const uplot = new uPlot(options, data, uPlotRef.current);
setUPlotInst(uplot);
return () => uplot.destroy();
}, [uPlotRef.current, options]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.scales.x.range = () => [xRange.min, xRange.max];
uPlotInst.redraw();
}, [xRange]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.setSize(containerSize);
uPlotInst.redraw();
}, [containerSize]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.setData(data);
uPlotInst.redraw();
}, [data]);
return (
<div
@@ -122,32 +33,16 @@ const BarHitsChart: FC<Props> = ({ logHits, data: _data, period, setPeriod, onAp
})}
>
{!graphOptions.hideChart && (
<div
className={classNames({
"vm-bar-hits-chart": true,
"vm-bar-hits-chart_panning": isPanning
})}
ref={containerRef}
>
<div
className="vm-line-chart__u-plot"
ref={uPlotRef}
/>
<BarHitsTooltip
uPlotInst={uPlotInst}
data={_data}
focusDataIdx={focusDataIdx}
/>
</div>
)}
<BarHitsOptions onChange={setGraphOptions}/>
{uPlotInst && !isEmptyData && !graphOptions.hideChart && (
<BarHitsLegend
uPlotInst={uPlotInst}
<BarHitsPlot
logHits={logHits}
data={_data}
period={period}
setPeriod={setPeriod}
onApplyFilter={onApplyFilter}
legendDetails={legendDetails}
graphOptions={graphOptions}
/>
)}
<BarHitsOptions onChange={setGraphOptions}/>
</div>
);
};

View File

@@ -21,11 +21,15 @@ const BarHitsLegend: FC<Props> = ({ uPlotInst, legendDetails, onApplyFilter }) =
const handleRedrawGraph = () => {
uPlotInst.redraw();
setSeries(getSeries());
};
useEffect(() => {
setSeries(getSeries());
if (!uPlotInst.hooks.draw) {
uPlotInst.hooks.draw = [];
}
uPlotInst.hooks.draw.push(() => {
setSeries(getSeries());
});
}, [uPlotInst]);
return (

View File

@@ -0,0 +1,153 @@
import React, { FC, useCallback } from "preact/compat";
import useElementSize from "../../../../hooks/useElementSize";
import { useEffect, useMemo, useRef, useState } from "react";
import uPlot, { AlignedData, Series } from "uplot";
import { GraphOptions } from "../types";
import usePlotScale from "../../../../hooks/uplot/usePlotScale";
import useReadyChart from "../../../../hooks/uplot/useReadyChart";
import useZoomChart from "../../../../hooks/uplot/useZoomChart";
import stack from "../../../../utils/uplot/stack";
import useBarHitsOptions, { getLabelFromLogHit } from "../hooks/useBarHitsOptions";
import { LegendLogHits, LogHits } from "../../../../api/types";
import { addSeries, delSeries, setBand } from "../../../../utils/uplot";
import classNames from "classnames";
import BarHitsTooltip from "../BarHitsTooltip/BarHitsTooltip";
import { TimeParams } from "../../../../types";
import BarHitsLegend from "../BarHitsLegend/BarHitsLegend";
import { calculateTotalHits, sortLogHits } from "../../../../utils/logs";
interface Props {
logHits: LogHits[];
data: AlignedData;
period: TimeParams;
setPeriod: ({ from, to }: { from: Date, to: Date }) => void;
onApplyFilter: (value: string) => void;
graphOptions: GraphOptions;
}
const BarHitsPlot: FC<Props> = ({ graphOptions, logHits, data: _data, period, setPeriod, onApplyFilter }: Props) => {
const [containerRef, containerSize] = useElementSize();
const uPlotRef = useRef<HTMLDivElement>(null);
const [uPlotInst, setUPlotInst] = useState<uPlot>();
const { xRange, setPlotScale } = usePlotScale({ period, setPeriod });
const { onReadyChart, isPanning } = useReadyChart(setPlotScale);
useZoomChart({ uPlotInst, xRange, setPlotScale });
const { data, bands } = useMemo(() => {
return graphOptions.stacked ? stack(_data, () => false) : { data: _data, bands: [] };
}, [graphOptions, _data]);
const { options, series, focusDataIdx } = useBarHitsOptions({
data,
logHits,
bands,
xRange,
containerSize,
onReadyChart,
setPlotScale,
graphOptions
});
const prepareLegend = useCallback((hits: LogHits[], totalHits: number): LegendLogHits[] => {
return hits.map((hit) => {
const label = getLabelFromLogHit(hit);
const legendItem: LegendLogHits = {
label,
isOther: hit._isOther,
fields: hit.fields,
total: hit.total || 0,
totalHits,
stroke: series.find((s) => s.label === label)?.stroke,
};
return legendItem;
}).sort(sortLogHits("total"));
}, [series]);
const legendDetails: LegendLogHits[] = useMemo(() => {
const totalHits = calculateTotalHits(logHits);
return prepareLegend(logHits, totalHits);
}, [logHits, prepareLegend]);
useEffect(() => {
if (!uPlotInst) return;
const oldSeriesMap = new Map(uPlotInst.series.map(s => [s.label, s]));
const syncedSeries = series.map(s => {
const old = oldSeriesMap.get(s.label);
return old ? { ...s, show: old.show } : s;
});
delSeries(uPlotInst);
addSeries(uPlotInst, syncedSeries, true);
setBand(uPlotInst, syncedSeries);
uPlotInst.redraw();
}, [series, uPlotInst]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.delBand();
bands.forEach(band => {
uPlotInst.addBand(band);
});
uPlotInst.redraw();
}, [bands]);
useEffect(() => {
if (!uPlotRef.current) return;
const uplot = new uPlot(options, data, uPlotRef.current);
setUPlotInst(uplot);
return () => uplot.destroy();
}, [uPlotRef.current]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.scales.x.range = () => [xRange.min, xRange.max];
uPlotInst.redraw();
}, [xRange]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.setSize(containerSize);
uPlotInst.redraw();
}, [containerSize]);
useEffect(() => {
if (!uPlotInst) return;
uPlotInst.setData(data);
uPlotInst.redraw();
}, [data]);
return (
<>
<div
className={classNames({
"vm-bar-hits-chart": true,
"vm-bar-hits-chart_panning": isPanning
})}
ref={containerRef}
>
<div
className="vm-line-chart__u-plot"
ref={uPlotRef}
/>
<BarHitsTooltip
uPlotInst={uPlotInst}
data={_data}
focusDataIdx={focusDataIdx}
/>
</div>
{uPlotInst && <BarHitsLegend
uPlotInst={uPlotInst}
onApplyFilter={onApplyFilter}
legendDetails={legendDetails}
/>}
</>
);
};
export default BarHitsPlot;

Some files were not shown because too many files have changed in this diff Show More