Compare commits

...

190 Commits

Author SHA1 Message Date
Max Kotliar
966eecc7cf wip 2025-07-03 21:36:48 +03:00
Max Kotliar
5d476dfcab wip 2025-07-03 19:42:54 +03:00
Max Kotliar
0f0bd28510 lib/netutil: connection pool based on sync.Cond syncronization 2025-07-03 18:49:24 +03:00
Max Kotliar
00b3189646 lib/netutil: try get conn from pool on dial or handshake error
Dial and handshake takes some time, during which a connection may appear
in the pool. So, if we failed establish a new connection we can try luck
getting one from pool, otherwise return the error.

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8922 and attempt
to partially address
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9345
2025-07-03 15:35:07 +03:00
Olexandr88
86f2ec69a8 docs: add links to codecov, release, A+, license icons (#9335)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit dd40ed2456)
2025-07-03 11:44:03 +02:00
Hui Wang
266535eacc vmalert: fix alerts state restoration for alerting rule using templat… (#9348)
…ing in the labels

fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9305

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 420f6b322e)
2025-07-03 11:18:35 +02:00
Andrii Chubatiuk
f5fd7a635f lib/streamaggr: properly clean quantiles output state on flush (#9351)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9350

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit 8eb88798ca)
2025-07-03 10:30:11 +02:00
Hui Wang
c3f92993a7 vmalert: respect group order defined in the rule file during replay m… (#9349)
…ode to allow chained group if needed

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9334

(cherry picked from commit 6a71c53c41)
2025-07-03 10:30:10 +02:00
Phuong Le
77bd1bed8a VictoriaLogs: Add metrics to track the retention.maxDiskSpaceUsageBytes flag (#9319)
- `vl_max_disk_space_usage_bytes{path}` reports the configured maximum
disk space usage limit for the storage path. It helps to show the limit
on the dashboard.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9320
2025-07-03 08:30:01 +02:00
Phuong Le
128fb696b0 VictoriaLogs: add message processor metrics (#9316)
Add metrics to track the behavior of the message processor:

- `vl_insert_processors_count` tracks the current number of active log
message processors.
- `vl_insert_flush_duration_seconds{type="protocol"}` measures the time
taken to flush log data from recently parsed logs to the underlying
storage system (in-memory buffering, remote storage nodes, or local
storage).

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9320

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-03 08:09:40 +02:00
Phuong Le
c51e2d2845 VictoriaLogs: fix rate_sum() inconsistency between normal queries and vmalert recording rules (#9304)
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9303

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-07-03 07:26:00 +02:00
Mac G
3d4d88361d docs/readme: fix typo in metric name
Fix typo in metric name mentioned in doc: vm_rows_ignored_total
2025-07-02 19:38:14 +04:00
Zakhar Bessarab
1d278c9eb7 app/vmbackupmanager: use "lib/snapshot" for snapshot operations
Re-use existing `lib/snapshot` for snapshot operations in order to reduce code duplicates and unify supported options and error handling.

Previously, vmbackupmanager did not support `snapshot.tls*` options and did not handle errors in a same was as vmbackup does.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9340
2025-07-02 14:46:09 +02:00
Max Kotliar
155ccd83a6 lib/license: improve error message when an empty license is provided (#911) 2025-07-02 14:06:42 +02:00
Yury Molodov
2d6225e07a vmui: disable autocomplete popup on initial page load
When the page loads, the query field automatically receives focus, which
is expected. However, this also immediately opens the autocomplete
popup, triggering an unnecessary request and showing suggestions before
the user starts typing.

 This PR commit the autocomplete popup from opening on initial page
load.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9342
2025-07-02 13:59:01 +02:00
Nikolay
a72af279d4 docs/release-guide: mention release candidate tagging process
This commits changes release process by introducing new intermediate
docker image tag with `-rcY` suffix. It's needed to properly test
releases for possible regressions without need to erase release docker
images. Which could be cached by some proxy and actually used in
production.

 Release candidate image must re-tagged into final release via make
command.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9136
2025-07-02 13:59:01 +02:00
Benjamin Nichols-Farquhar
7d66b672da lib/storage: Introduce vm_cache_eviction_bytes_total metric
This commit adds the following new metric `vm_cache_eviction_bytes_total` for tsid, metricName and metricIDs caches backed by workingset cache.

 The problem this is attempting to solve is doing fine grained tuning for
caches on large high churn, high query load VictoriaMetrics deployments.

Cache performance in those scenarios is critical for cluster
performance, however for stability reasons its imperative not to provide
_too much_ cache space compared to available memory or there becomes a
risk of OOM which can easily destabilize a cluster.

Given the cost of memory, especially for large deployments, there is
therefore a need to optimize cache usage to be near the minimum required
to provide acceptable performance under churn, ingestion and query
loads.

VM provides an excellent set of flags for configuring this cache
performance(size and expiry and removal percentage).

However specifically when tuning `prevCacheRemovalPercent` and
`cacheExpireDuration` it can be hard to known exactly what impact
they're having. Sometimes its very clear when theres cyclical behavior
that cacheExpireDuration is involved, but in other scenarios it can be
very unclear why cache miss rate goes up.

Specifically when attempting to debug spikes in cache miss rate there
are questions that are hard to answer:

* Did a new workload come in that was poorly cached?
* Were caches rotated due to expiry and the `prev` cache contained many
active entries due to limited cache size?
* Where the caches rotated due to miss percentage and our tuning is too
aggressive?

In many of these debugging scenarios it would be greatly helpful to have
explicit metrics on how many bytes of key caches were evicted and for
what reasons, it then becomes trivial to associate spikes in eviction
for a specific reason with a miss rate spike that causes poor
performance. The appropriate tuning them becomes much more obvious.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
2025-07-02 13:59:00 +02:00
hagen1778
3f4cdd5acf docs: refresh FAQ.md
Partially based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9244

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 469b337707)
2025-07-01 14:23:48 +02:00
dependabot[bot]
e671f21ea2 build(deps): bump github.com/go-viper/mapstructure/v2 from 2.2.1 to 2.3.0 (#9299)
Bumps
[github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure)
from 2.2.1 to 2.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/go-viper/mapstructure/releases">github.com/go-viper/mapstructure/v2's
releases</a>.</em></p>
<blockquote>
<h2>v2.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>build(deps): bump actions/checkout from 4.1.7 to 4.2.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/46">go-viper/mapstructure#46</a></li>
<li>build(deps): bump golangci/golangci-lint-action from 6.1.0 to 6.1.1
by <a href="https://github.com/dependabot"><code>@​dependabot</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/47">go-viper/mapstructure#47</a></li>
<li>[enhancement] Add check for <code>reflect.Value</code> in
<code>ComposeDecodeHookFunc</code> by <a
href="https://github.com/mahadzaryab1"><code>@​mahadzaryab1</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/52">go-viper/mapstructure#52</a></li>
<li>build(deps): bump actions/setup-go from 5.0.2 to 5.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/51">go-viper/mapstructure#51</a></li>
<li>build(deps): bump actions/checkout from 4.2.0 to 4.2.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/50">go-viper/mapstructure#50</a></li>
<li>build(deps): bump actions/setup-go from 5.1.0 to 5.2.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/55">go-viper/mapstructure#55</a></li>
<li>build(deps): bump actions/setup-go from 5.2.0 to 5.3.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/58">go-viper/mapstructure#58</a></li>
<li>ci: add Go 1.24 to the test matrix by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/74">go-viper/mapstructure#74</a></li>
<li>build(deps): bump golangci/golangci-lint-action from 6.1.1 to 6.5.0
by <a href="https://github.com/dependabot"><code>@​dependabot</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/72">go-viper/mapstructure#72</a></li>
<li>build(deps): bump golangci/golangci-lint-action from 6.5.0 to 6.5.1
by <a href="https://github.com/dependabot"><code>@​dependabot</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/76">go-viper/mapstructure#76</a></li>
<li>build(deps): bump actions/setup-go from 5.3.0 to 5.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/78">go-viper/mapstructure#78</a></li>
<li>feat: add decode hook for netip.Prefix by <a
href="https://github.com/tklauser"><code>@​tklauser</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/85">go-viper/mapstructure#85</a></li>
<li>Updates by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/86">go-viper/mapstructure#86</a></li>
<li>build(deps): bump github/codeql-action from 2.13.4 to 3.28.15 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/87">go-viper/mapstructure#87</a></li>
<li>build(deps): bump actions/setup-go from 5.4.0 to 5.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/93">go-viper/mapstructure#93</a></li>
<li>build(deps): bump github/codeql-action from 3.28.15 to 3.28.17 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/92">go-viper/mapstructure#92</a></li>
<li>build(deps): bump github/codeql-action from 3.28.17 to 3.28.19 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/97">go-viper/mapstructure#97</a></li>
<li>build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/96">go-viper/mapstructure#96</a></li>
<li>Update README.md by <a
href="https://github.com/peczenyj"><code>@​peczenyj</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/90">go-viper/mapstructure#90</a></li>
<li>Add omitzero tag. by <a
href="https://github.com/Crystalix007"><code>@​Crystalix007</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/98">go-viper/mapstructure#98</a></li>
<li>Use error structs instead of duplicated strings by <a
href="https://github.com/m1k1o"><code>@​m1k1o</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/102">go-viper/mapstructure#102</a></li>
<li>build(deps): bump github/codeql-action from 3.28.19 to 3.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/101">go-viper/mapstructure#101</a></li>
<li>feat: add common error interface by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/105">go-viper/mapstructure#105</a></li>
<li>update linter by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/106">go-viper/mapstructure#106</a></li>
<li>Feature allow unset pointer by <a
href="https://github.com/rostislaved"><code>@​rostislaved</code></a> in
<a
href="https://redirect.github.com/go-viper/mapstructure/pull/80">go-viper/mapstructure#80</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/tklauser"><code>@​tklauser</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/85">go-viper/mapstructure#85</a></li>
<li><a href="https://github.com/peczenyj"><code>@​peczenyj</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/90">go-viper/mapstructure#90</a></li>
<li><a
href="https://github.com/Crystalix007"><code>@​Crystalix007</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/98">go-viper/mapstructure#98</a></li>
<li><a
href="https://github.com/rostislaved"><code>@​rostislaved</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/80">go-viper/mapstructure#80</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0">https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8c61ec1924"><code>8c61ec1</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/80">#80</a>
from rostislaved/feature-allow-unset-pointer</li>
<li><a
href="df765f469a"><code>df765f4</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/106">#106</a>
from go-viper/update-linter</li>
<li><a
href="5f34b05aa1"><code>5f34b05</code></a>
update linter</li>
<li><a
href="36de1e1d74"><code>36de1e1</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/105">#105</a>
from go-viper/error-refactor</li>
<li><a
href="6a283a390e"><code>6a283a3</code></a>
chore: update error type doc</li>
<li><a
href="599cb73236"><code>599cb73</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/101">#101</a>
from go-viper/dependabot/github_actions/github/codeql...</li>
<li><a
href="ed3f921815"><code>ed3f921</code></a>
feat: remove value from error messages</li>
<li><a
href="a3f8b227dc"><code>a3f8b22</code></a>
revert: error message change</li>
<li><a
href="9661f6d07c"><code>9661f6d</code></a>
feat: add common error interface</li>
<li><a
href="f12f6c76fe"><code>f12f6c7</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/102">#102</a>
from m1k1o/prettify-errors2</li>
<li>Additional commits viewable in <a
href="https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/go-viper/mapstructure/v2&package-manager=go_modules&previous-version=2.2.1&new-version=2.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 2b131c13b5)
2025-07-01 14:23:48 +02:00
Olexandr88
de09a53a1b docs: add a link to the main, twitter, reddit icon (#9331)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit aa342cea3b)
2025-07-01 14:23:47 +02:00
Roman Khavronenko
7cf6b12634 deployment/docker: add troubleshooting section (#9329)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9323

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
(cherry picked from commit 26a2f4c25c)
2025-07-01 14:23:47 +02:00
Andrii Chubatiuk
53ee5014ef app/vmalert: removed inline styles from UI, which are not working with recommended CSP (#9295)
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9236

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8a3a5e6867)
2025-07-01 14:23:47 +02:00
Phuong Le
0212a4a5c5 document/data-ingestion/journald: fix small typo (#9325)
s/`-journlad.streamFields`/`-journald.streamFields`

(cherry picked from commit 9f9a1e1996)
2025-07-01 10:47:55 +02:00
Roman Khavronenko
f86d35ca40 app/vmselect/promql: respect window interval for rollup functions when removing resets (#9273)
removeCounterResets was operating only using stalenessInterval (when set
by user) and didn't account for user-specified [window] in rollup
functions. Since in rollup functions MetricsQL could look behind for up
to [window] interval - it can capture those datapoints that weren't
accounted in removeCounterResets. With this change, we remove resets on
stalenessInterval+window interval to avoid this corner case.

Fixes
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-2978728661

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 2d27404639)
2025-07-01 10:20:01 +02:00
Nils K
0138b81270 docs: Update journald ingestion docs regarding suggested fields (#9275)
### Describe Your Changes

Point out a some things to watch out for with the default stream field
set and explain what the `_TRANSPORT` field is useful for

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Signed-off-by: Septatrix <24257556+septatrix@users.noreply.github.com>
(cherry picked from commit 91a05697fd)
2025-07-01 10:20:01 +02:00
hagen1778
4d793d9f0f docs: add alias for vlogs keyConcepts page
GA noticed requests to /victorialogs/keyConcepts/ not being served properly

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e9571576b8)
2025-07-01 10:20:00 +02:00
Nikolay
5a70a987b0 app/vmagent: introduce concurrency setting for kafka producer (#906)
This commit adds a new option `concurrency` for kafka producer, which adds additional workers
for publishing messages into the kafka cluster.

 This setting could be useful at networks with high latency. And it increases write throughput.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9249
2025-06-30 18:21:53 +02:00
f41gh7
ac6e54b5f4 app/vmselect/promql: support search.logSlowQueryStatsHeaders for whitelisting header keys
Headers sent together with queries could contain important information
about the source of the request. This could help to find sources that send specific queries.

Logging all headers could expose sensitive info in logs.

So introducing `-search.logSlowQueryStatsHeaders` to whitelist headers that need
to be logged. See test case for example.
2025-06-30 18:21:53 +02:00
Andrii Chubatiuk
664efe4d50 app/vmselect: proxy /api/v1/notifiers requests to vmalert (#9267)
Proxy /api/v1/notifiers requests to vmalert in the same manner, how it's done for other vmalert endpoints

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9267
2025-06-30 18:07:33 +02:00
Andrii Chubatiuk
58b5b7b1c1 lib/awsapi: fixed static AWS credentials precedence
Previously lib/awsapi used incorrect precedence of authorization methods.

 This commit aligns this behaviour  with aws-sdk.

 Also, it fixes usage of `AWS_ROLE_ARN` env variable. It can only be used with web identity file and it must be ignored for any other configurations. 32873b942d/config/resolve_credentials.go (L121)

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9168
2025-06-30 17:50:54 +02:00
Nikolay
9aba879620 lib/storage: properly optimise .+|^$ regex
Previously, regex `.+|^$` was optimised into `.+`. But in fact, it must
optimised into `.*`. Since this regex matches any non empty symbols and
empty value. Which combines into any input.

 This commit adds a special case of isDotStar check, which covers this
expression.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9290
2025-06-30 17:50:54 +02:00
Nikolay
e26340c385 app: add vlagent component
This commit introduces new component - VictoriaLogs Agent (vlagent).

It accepts logs data via any data ingestion protocol supported by
VictoriaLogs and forwards it to the provided remote storages.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766
2025-06-30 17:50:54 +02:00
Alexander Frolov
9a7a71cf4e lib/mergeset: misuse of unsafe.Pointer
It appears the `lib/mergeset.Item` violates the 3rd rule
https://pkg.go.dev/unsafe#Pointer

> (3) Conversion of a Pointer to a uintptr and back, with arithmetic.
> 
> Unlike in C, it is not valid to advance a pointer just beyond the end
of its original allocation:
> ```
> // INVALID: end points outside allocated space.
> b := make([]byte, n)`
> end = unsafe.Pointer(uintptr(unsafe.Pointer(&b[0])) + uintptr(n))
> ```


`CGO_ENABLED=0 go test -trimpath -gcflags=all=-d=checkptr -run
TestTableAddItemsSerial`
```
fatal error: checkptr: pointer arithmetic result points to invalid allocation


 This commit adds a special path to item encoding, that prevents invalid memory references.

 See this PR for details:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9264
2025-06-30 17:50:37 +02:00
hagen1778
27932a344b docs: mention vmui changes separately in vlogs and vm changelogs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 21878f0760)
2025-06-30 15:02:47 +02:00
hagen1778
b47f26c81e docs: add changelog after 71eef98b0e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit aca321f235)
2025-06-30 14:56:31 +02:00
hagen1778
e005516154 dashboards: make dashboards-sync
follow-up after dc08b590c4

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 0b2e976706)
2025-06-30 14:27:57 +02:00
Vadim Rutkovsky
7d204c12c7 dashboards: set equal heights for each row (#9298)
### Describe Your Changes
Several rows in cluster dashboard have 7 height, although the rest are 8
height. Setting all rows to 8 to be consistent

![Untitled](https://github.com/user-attachments/assets/16ba9601-9005-45f2-a01f-4b5fd59cd5b0)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Depends on: #9253

(cherry picked from commit dc08b590c4)
2025-06-30 14:27:57 +02:00
Artur Minchukou
eaec2fcea1 app/vmui: add crossorigin attribute to manifest links
The change should solve the problem with loading manifest when vmui is behind reverse-proxy
with basic auth enabled.

(cherry picked from commit 71eef98b0e)
2025-06-30 14:27:56 +02:00
Artem Navoiev
43cdf2dec4 docs: enterprise - clarify FIPS builds (#9310)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
(cherry picked from commit 6bc48ce669)
2025-06-30 14:27:56 +02:00
hagen1778
caf2886f22 app/vmalert/notifier: follow-up a9b641531b
* reduce time wait on notifiers update
* deterministically sort notifiers() call. This improves and tests and output for users

a9b641531b
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 482ae8135f)
2025-06-30 14:27:56 +02:00
Andrii Chubatiuk
d1f75d06e8 docs/anomaly-detection: mark import_json_path as deprecated (#9283)
### Describe Your Changes

as noticed @f41gh7 in [operator
PR](https://github.com/VictoriaMetrics/operator/pull/1427#discussion_r2164171944)
vm writer `import_json_path` became deprecated starting
[v1.19.2](cc2cd0e084).
Adding this information to docs

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit 393398996a)
2025-06-27 15:55:55 +02:00
Mikhail
9dbdd67d01 docs: fix typo in multi-region setup guide (#9297)
### Describe Your Changes

Just a typo correction in docs

(cherry picked from commit 7703a95709)
2025-06-27 15:55:55 +02:00
Max Kotliar
ae9d92a981 docs/victoriametrics/changelog: Add changelog for merged PR "Log errors during cache restore from file" (#9296)
### Describe Your Changes

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8952

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit 114d657670)
2025-06-27 15:55:55 +02:00
Hui Wang
3a84a97d2c vmalert: fix duplicated metrics for dynamically discovered notifier… (#9285)
…s via Consul and DNS

fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9260.

Bug was introduced in
[v1.114.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.114.0).

(cherry picked from commit a9b641531b)
2025-06-27 15:55:54 +02:00
hagen1778
cfdb601841 dashboards: warn about correlation betweem concurrent requests and mem usage in panel description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 536df52682)
2025-06-27 11:36:28 +02:00
Zakhar Bessarab
93c8fdf28a lib/backup/s3remote: retry ExpiredToken errors (#9286)
Tolerate token expiration as it might be handled ty token rotation
automatically when using automatic token rotation with EKS Pod Identity
and similar options.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9280

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
2025-06-27 13:08:56 +04:00
Roman Khavronenko
a96869c606 app/vmctl: print import error before pipe error (#9288)
See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9220#issuecomment-3001880833:
```
2025-06-24T17:27:47.062Z      error VictoriaMetrics/app/vmctl/backoff/backoff.go:60 got error: failed to write into "http://vminsert-vmcluster.metrics.svc.cluster.local:8480/insert/0/prometheus": io: read/write on closed pipe on attempt: 1; will retry in 2s
```

This error could happen only if pipe-reader was closed before we tried
to write into pipe-writer. But in this case we won't see the error
explaining why pipe-reader was closed. In this change, we checking if
there was a pipe-reader error before checking pipe-writer error.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6e7df578c4)
2025-06-27 10:57:04 +02:00
hagen1778
273b454aa7 docs: whitelist /admin/ requests in vmauth config for cluster VM
Besides `/select/`-prefixed requests vmauth also does `/admin/` requests
to fetch tenant IDs. Without whitelisting, such requests will fail to route.

I am adding this change only to generic vmauth configs where /admin/ access
is expected to be by default.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 4a7b4ae852)
2025-06-27 10:57:04 +02:00
Jose Gómez-Sellés
712c35b161 Migrate docs to cloud repo (#9230)
This is ready, since https://github.com/VictoriaMetrics/cloud/pull/3229
is merged .

This PR deletes the cloud docs folder after moving it into the private
cloud repo. The main reason is to keep things tidy and handle reviews in
a better way, as the helm charts repo is doing.

Future updates should be protected since the github actions file with
rsync command should not be messing with this folder when updating
everything.

### Checklist

The following checks are **mandatory**:

- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit 65087f08c4)
2025-06-27 10:57:04 +02:00
Artur Minchukou
e8a9c9cd34 app/vmui/logs: update the color of the VictoriaLogs favicon to make the color different from VictoriaMetrics (#9270)
### Describe Your Changes

Updated the favicon color for Victoria Logs that the icon could be
distinguished from Victoria Metrics. Took the same color as in
victorialogs-datasource plugin.

![image](https://github.com/user-attachments/assets/521fc747-8ea7-43c7-a719-5434fe39ab06)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit ca0479fff3)
2025-06-26 10:41:39 +02:00
Yury Molodov
bfcfc78df2 victorialogs/docs: fix invalid stats example in "Comments" section (#9272)
### Describe Your Changes

The example in the VictoriaLogs docs (Comments section) is currently
missing an aggregate function in the `stats` pipe:

```logsql
error                       # find logs with `error` word
  | stats by (_stream) logs # then count the number of logs per `_stream` label
  | sort by (logs) desc     # then sort by the found logs in descending order
  | limit 5                 # and show top 5 streams with the biggest number of logs
```

However, `stats by (_stream) logs` is invalid syntax - `logs` is
interpreted as a function name, which causes a parsing error.

This fix replaces it with a valid version using `count()`:

```logsql
| stats by (_stream) count() as logs
```

Without `count()`, the query fails with:

```
 cannot parse 'stats' pipe: unknown stats func "logs"
```

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
(cherry picked from commit 644c7a97c8)
2025-06-26 10:41:38 +02:00
Roman Khavronenko
a04ef0fe98 dashboards: fix adhoc filters for vmalert and vmagent (#9271)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8657

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 098cba5b73)
2025-06-26 10:41:38 +02:00
Hui Wang
a55a133b2a vmalert: fix data race in replay ut (#9278)
see https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9265

(cherry picked from commit 5d0e8c0d1b)
2025-06-26 10:41:38 +02:00
Aliaksandr Valialkin
dedaa31f74 lib/logstorage: add tests, which verify that NaN and Inf values cannot be parsed by tryParseFloat64
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8474
2025-06-25 23:01:08 +02:00
Aliaksandr Valialkin
aa73a281ff docs/victorialogs/LogsQL.md: document that sum() and avg() returns NaN when all the field values are non-numeric
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8474
2025-06-25 22:52:17 +02:00
hagen1778
b7b2029682 lib/prombpmarshal: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit deec361a64)
2025-06-24 22:11:26 +02:00
Roman Khavronenko
c3b404c505 lib/prombp{marshal}: support metadata in remote write protocol (#9124)
This change adds support for parsing and sending Metadata field in
Prometheus Remote Write protocol. It implements the first step for
Metadata support.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit fd4dce81ce)
2025-06-24 22:01:21 +02:00
hagen1778
b4fe1e8be0 docs: fix various typos and grammar errors
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 5431696d83)
2025-06-24 22:01:21 +02:00
hagen1778
e759749b99 docs: fix unresolved link in cluster docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1b71184bfb)
2025-06-24 22:01:21 +02:00
Phuong Le
af21ed2b56 VictoriaLogs: add a High Availability section to the cluster documentation. (#9247) 2025-06-24 18:55:18 +02:00
hagen1778
8d86829e9b docs: rm integrations section from single-node readme
* move netdata and carbon-api integrations to a dedicated `Integrations` page
* drop https://github.com/aorfanos/vmalert-cli as it seems stale

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 369f3f0da1)
2025-06-24 17:00:04 +02:00
hagen1778
6b07d7ce8e docs: minor typo fix
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6f93f0e1a7)
2025-06-24 17:00:04 +02:00
Roman Khavronenko
2981232aa0 docs: update vmctl docs (#9257)
* split migration mods in separate sub-section. This removes conflicting
#-anchors and makes it easier to read&modify in future
* remove duplicating wording
* simplify texts

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8964

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 96b773198f)
2025-06-24 17:00:03 +02:00
Andrii Chubatiuk
9601bdafb4 app/vmalert: add /api/v1/notifiers endpoint and datasource_type query argument filter for /api/v1/rules and /api/v1/alerts endpoints (#9046)
### Describe Your Changes

added /api/v1/notifiers endpoint and `datasource_type` query argument
for `/api/v1/rules` and `/api/v1/alerts` API endpoints to filter groups
and rules by datasource type. required for
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8989

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8537

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 006381266b)
2025-06-24 17:00:03 +02:00
Max Kotliar
5cda49df7b deployment/docker: Update grafana image version due to CVE issue (#9258)
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9207

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit ae1dffe5d3)
2025-06-24 17:00:03 +02:00
Nikolay
43746d59bf lib/promscrape: remove duplicate targets from service-discovery
Previously, if annotation was update on object, it could result into
duplicate targets register for dropped targets service-discovery page.

 Mostly it affects endpoint annotations update. Endpoint holds
annotation with last update time. If any pod that belongs to the given
annotation changed, it causes duplication targets for all pod backed by
the endpoint. It makes service-discovery debug page hard to use.

 This commit excludes `__metadata_kubernetes_*_annotation_` from key
generation for dropped targets map. Instead it updates target with new
labels value.

  It may lead to some targets collision, but
since hash function already could produce collisions, it should not be a
problem.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8626
2025-06-24 11:41:01 +02:00
f41gh7
3f007655a5 docs: mention v1.120.0 release
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-06-23 16:37:57 +02:00
f41gh7
92f8255539 docs: mention new LTS releases
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-06-23 16:37:57 +02:00
Aliaksandr Valialkin
3b821d449c lib/storage: move testing/synctest usage into separate files ending with _synctest_test.go
The files with synctest usage are built only if the goexperiment.synctest build tag is set.
This allows running `go test ./lib/...` and `go vet ./lib/...` without the need to pass GOEXPERIMENT=synctest to them.

This is a follow-up for d33d7e20be
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8848
2025-06-21 12:38:16 +02:00
Aliaksandr Valialkin
48a0e0fa0d app/vlinsert/journald: add a benchark for the isValidFieldName() function 2025-06-21 12:31:28 +02:00
Aliaksandr Valialkin
2d4b2307de lib/fasttime: remove performance overhead when UnixTimestamp() is called from benchmark loops
Move the synctest-related implementation into a separate file protected with go:build goexperiment.synctest.

This is a follow-up for the commit 06c26315df
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8740
2025-06-21 12:31:27 +02:00
Aliaksandr Valialkin
0563e38a90 docs/victorialogs/data-ingestion/Journald.md: mention that _TRANSPORT and _SYSTEMD_USER_UNIT fields are also good candidates for log streams
Thanks to @septatrix for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9143#issuecomment-2983752442
2025-06-20 23:27:10 +02:00
Aliaksandr Valialkin
5c6c71c954 docs/victorialogs/CHANGELOG.md: typo fixes in links to stats functions 2025-06-20 22:26:08 +02:00
Aliaksandr Valialkin
87124a5166 deployment: update VictoriaLogs Docker image tags from v1.23.3-victorialogs to v1.24.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.24.0-victorialogs
2025-06-20 21:31:41 +02:00
Aliaksandr Valialkin
8b62a8b8b0 docs/victorialogs/CHANGELOG.md: cut v1.24.0-victorialogs 2025-06-20 21:20:47 +02:00
Aliaksandr Valialkin
cd7c919bac app/vlselect/vmui: run make vmui-logs-update after 9e60fc8fc8 2025-06-20 21:19:23 +02:00
Artur Minchukou
c3da5041ad app/vmui/logs: move compact mode of VictoriaLogs table to separate tab (#8745)
### Describe Your Changes

Related issue: #7047 and #8438

- removed compact mode of VictoriaLogs table
- added sorting by field to JSON tab 

![image](https://github.com/user-attachments/assets/c02bbbed-cf2c-41e3-86fe-97bc205654a5)
- added sorting of field names to JSON tab


### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-06-20 17:27:10 +02:00
f41gh7
3890fc9ef6 make vmui-update 2025-06-20 17:05:05 +02:00
f41gh7
c63efaffa6 CHANGELOG.md: cut v1.120.0 release 2025-06-20 16:41:06 +02:00
f41gh7
b6c9b23c22 docker/Makefile: add EXTRA_TAG_SUFFIX variable to buildx publish
New variable should help to change docker image tag name during release prepare.

Instead of publishing an exact version, like v1.1.0, it should use suffixed version v1.1.0-rc1.
Which should be re-tagged later, when release will be promoted to stable.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9136

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-06-20 16:39:40 +02:00
Fred Navruzov
f45be9205e docs/vmanomaly: release v1.24.1 (#9241)
### Describe Your Changes

Patch release doc updates for vmanomaly (v1.24.1)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-20 16:39:39 +02:00
Alexander Frolov
672c954166 lib/httpserver: option for disabling HTTP keep-alives (#9125)
### Describe Your Changes

Some network configurations may not work optimally with long-lived
connections. \
Address the issue described by #2395 for other components.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: Aleksandr Frolov <fxrlv@nebius.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit b91d249a29)
2025-06-20 12:43:55 +02:00
Hui Wang
c709ba8d91 vmalert: correct the rule evaluation timestamp if the system clock is… (#9228)
… changed during runtime

Strip the monotonic clock with `t.Round(d)` or `t.Truncate(d)` before
apply `Sub()`, to force the wall clock usage which corrects the rule
evaluation timestamp if the system clock is changed during runtime.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8790

---------

Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit 7b274e0d6d)
2025-06-20 12:42:20 +02:00
Benjamin Nichols-Farquhar
ac21580ec7 vmalert: respect group.concurrency in replay mode (#9214)
### Describe Your Changes

Revival and modification of original PR
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8762 after
discussion on
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7387.

`group.concurrency` is now respected if and only if
`-replay.rulesDelay=0` rather than always. This allows rules to be run
concurrently without ambiguity about rule chaining. If
`-replay.rulesDelay` is set greater than zero concurrency is still
ignored. This will be the default behavior since it defaults to 1s.

Implementation considerations:

I chose to add split some simple logic into a helper function in
preparation for adding `replay.singleRuleEvaluationConcurrency` in a
follow up PR as thats we're we can share that logic.

cc @Haleygo
### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Co-authored-by: Hui Wang <haley@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit b3c92540e5)
2025-06-20 12:42:20 +02:00
Artem Fetishev
9bfc2d8af2 lib/storage: Change BenchmarkHeadPostingForMatchers to use global index time range explicitly (#9233)
During the data retrieval, VictoriaMetrics switches to global index
search if the time range is > 40 days. Prior
ba0d7dc2fc, this switch was handled in
indexDB code. That commit moved that logic to the storage level. As the
result, indexDB will search per-day index for whatever time range that
is passed to it.

This broke the BenchmarkHeadPostingForMatchers. It now shows very slow
indexDB performance compared to v1.111.0 (the last release before that
commit). This is because now indexDB tries to search 55 years of data by
spawning a separate goroutine for each day. Even though the data was
inserted only for the current day, running that many goroutines
significantly slows down the search.

The fix is to use global index time range explicitly.

Fixes #9203 

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-06-20 11:57:54 +02:00
Benjamin Nichols-Farquhar
d562d8f4f0 vmstorage: add flag for storage metricName cache tuning (#9156)
Similar to other storage caches `storage/metricName` can be very
important to performance, however it is not tunable independently like
other caches.

In high cardinality setups where a large amount of that cardinality is
actively queried we can see a high `metricName` miss rate.

The only way to correct this is to increase available memory either by
provisioning more or by increasing `memory.allowedPercent`, which is
often expensive or undesirable for stability reasons.

Its possible to work around this by increasing `memory.allowedPercent`
and then adjusting `storage.cacheSizeIndexDBDataBlocks` and
`storage.cacheSizeStorageTSID` down as they are the largest caches, but
this is a less ideal solution than being able to directly control this
cache size.

Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8843
2025-06-20 11:37:11 +02:00
Artem Fetishev
6590c5639b lib/storage: extract adding metricIDs to the pendingHourEntries into a separate func (#9138)
This change has been requested to be done in a separate PR during the
partition index review. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8134#discussion_r2131692079

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-06-20 10:15:09 +02:00
Max Kotliar
d153b50767 lib/logstorage: clarify comment on writeBlockResultFunc usage constraints (#9235)
### Describe Your Changes

The `DataBlock` contains structs with string fields, and while the
original comment mentioned not holding references to `br`, it wasn't
immediately clear that this also applies to fields like strings within
the data.

This change clarifies that the `writeBlockResultFunc` must not retain
references to any part of `br`, including its fields. This makes it
explicit that even seemingly safe types like strings must be copied if
needed.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-19 16:31:54 +02:00
Phuong Le
67f679cb71 vlogs: fix inconsistent type for HTTP request duration metric (#9225)
The 'select' HTTP request duration metric is currently implemented as a
summary, while the 'insert' HTTP request duration uses a histogram.

All time series under the same metric name must use the same metric
type. Using the summary type for this metric is preferable, as it
significantly reduces the number of unique time series generated. While
summaries have the limitation of not supporting accurate aggregation
across multiple instances or paths, this trade-off is more manageable
than dealing with a high-cardinality explosion caused by histograms.
2025-06-19 14:49:57 +02:00
Aliaksandr Valialkin
66a177f7dc lib/logstorage: provide standard string representation for all the priority and severity levels in Syslog and Journald protocols inside the "level" field
It is better from usability PoV to provide string representation for all the priority and severity levels
instead of merging some of them into a common groups.

This is requested at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9209
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8535
2025-06-19 14:40:24 +02:00
Aliaksandr Valialkin
536393a16c app/vlinsert/journald: follow-up for 01d413873e
- Add more tests, which cover various edge cases for binary encoding of log field value in journald format
- Moved common code for reading the next line to fieldsBuf.value. This simplifies the code a bit.
- Added more comments, which try explaining the braindead logic for parsing binary-encoded log field values in journald format

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9070
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9153
2025-06-19 13:54:18 +02:00
Andrii Chubatiuk
f6c5701321 app/vlinsert/journald: fixed binary value parsing (#9234)
### Describe Your Changes

- removed unneeded loop for binary field value size extraction, since it
should not be delimited by multiple `\n` symbols
- changed loop condition

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-19 13:48:33 +02:00
Aliaksandr Valialkin
0a5c6196bc app/vlinsert: do not trim prefixes from "/insert/*" URL paths
This improves maintainability and readability of the code, since it simplifies searching
for the particular request handler by its' full path.
2025-06-19 08:48:07 +02:00
Artur Minchukou
b03370185b app/vmui/logs: fix missing field values in auto-complete (#8799)
### Describe Your Changes

Related issue: #8749 
Added query to requests `/field_names` and `/field_values` to get more
precision results:
 
- If `filterName` is `_msg` or `_stream_id`, the query cannot be
generated specifically,
    so a wildcard query (`"*"`) is returned.
 
- If `filterName` is `_stream`, the query is generated using regexp
(`{type=~"value.*"}`).
 
- If `filterName` is `_time`, a simplified query is created by trimming
the value up
    to the first occurrence of a delimiter such as `-` or `:`.
 
- For all other values of `filterName`, a prefix query is returned using
    the `query` value with a `*` appended (e.g., `"value*"`).

Related issue: #8806 
Enhanced autocomplete with parsed field suggestions from unpack pipe.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-19 01:11:06 +02:00
Artur Minchukou
cd3e34b1fe app/vmui/logs: ensure the live tailing tab automatically reconnects on connection loss (#9162)
### Describe Your Changes

Related issue: #9129 

Added restarting the live tailing tab on connection loss

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-19 01:08:39 +02:00
Yury Molodov
86a4721cf9 vmui/logs: fix hits chart not updating on tenant change (#9171)
### Describe Your Changes

Adds `tenant` to the dependency array for log hits fetching to ensure
the chart updates when `AccountID` or `ProjectID` changes.
Related issue: #9157 

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-19 01:06:44 +02:00
Aliaksandr Valialkin
69b4decdd2 docs/victorialogs/CHANGELOG.md: document dc2da9a71b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9200
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9201
2025-06-19 01:03:21 +02:00
Vadim Alekseev
bad9284193 app/logstorage: optimize pipes after appending limit pipe (#9201)
### Describe Your Changes

This PR adds a call to optimize pipes after the `limit` pipe has been
appended.

Related: #9200

While this approach is not ideal, since it forces us to re-optimize all
pipes, but it is simpler.

An alternative would be to reapply only the relevant optimizations
specifically for this case, something like:

```go
func (q *Query) AddPipeLimit(n uint64) {
	if len(q.pipes) > 0 {
		ps, ok := q.pipes[len(q.pipes)-1].(*pipeSort)
		if ok {
			if ps.limit == 0 || n < ps.limit {
				ps.limit = n
			}
			return
		}
		pu, ok := q.pipes[len(q.pipes)-1].(*pipeUniq)
		if ok {
			if pu.limit == 0 || n < pu.limit {
				pu.limit = n
			}
			return
		}
	}
	q.pipes = append(q.pipes, &pipeLimit{
		limit: n,
	})
}
```

### Checklist

The following checks are **mandatory**:

- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-19 00:58:03 +02:00
Artur Minchukou
95cc28fc22 app/vmui/logs: improve usability of the live tailing tab (#9194)
### Describe Your Changes

Related issue: #9130 

- add `ScrollToTopButton` component for better navigation UX and
integrate it into Live Tailing view;
 
<img width="1293" alt="image"
src="https://github.com/user-attachments/assets/98a96ac8-fe2b-43fa-a470-a51f68df2e01"
/>

- replace log throttling logic with the new `LogFlowAnalyzer` utility
for enhanced performance state tracking;
- enhance `GroupLogsItem` with a copy-to-clipboard feature, which allows
to copy only one log object;
 
<img width="1268" alt="image"
src="https://github.com/user-attachments/assets/c42ad5e1-0d02-456e-b231-d42064f48eb4"
/>

 - update styles for better responsiveness and aesthetics;
- change raw json view into expandable view in the live tailing tab by
default and keep the value of this setting in the local storage, it
became possible thanks to [this
issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9130#9083).
  
<img width="554" alt="image"
src="https://github.com/user-attachments/assets/6bcda0f1-dd2f-4608-8e0d-f9c0bc6efac3"
/>



Short demo: 


https://github.com/user-attachments/assets/351133a7-f3ef-41ad-b23d-cc12f030a357


### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-19 00:55:14 +02:00
Nikolay
dd35bd2682 app/vlinsert: remove vlstorage dependency for vlinsert (#9221)
vlinsert package could be used as an imported dependency. Mostly it's
needed for vlagent, but it could be a case for other applications.

 This commit introduces new interface for vlinsertutil package, which
represents actual storage for log rows ingestion.

 It includes CanWriteData, which also could be used to introduce
 back-pressure mechanism for vlinsert clients.

Related PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9034

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-19 00:34:37 +02:00
Nikolay
4704ef52cd lib/logstorage: properly iterate over ForEachRow (#9222)
Previously, ForEachRow always reset last row fields after iteration.
It makes impossible concurrent iteration with forEachRow, since
ForEachRow performed hidden mutation of LogRows.

 This commit resolves this issue by removal of fields reference.

Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076
2025-06-19 00:25:27 +02:00
Aliaksandr Valialkin
cc18a9d3ed lib/logstorage: follow-up for 5d06c74e2b
Move the lex.isQuotedToken() check to the top of the lexer.isInvalidQuotedString() function
in order to simplify understanding the code.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9167
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9219
2025-06-19 00:19:54 +02:00
Andrii Chubatiuk
5e2ec11cbd lib/logstorage: fix panic when not paired quotes are passed as a pipe value (#9219)
### Describe Your Changes

nextToken method, which is called prior to getCompoundTokenExt already
unquotes string, paired quotes check inside getCompoundTokenExt is
redundant
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9167

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-19 00:19:53 +02:00
Aliaksandr Valialkin
7f37cf4f93 app/vlinsert/journald: parse journald logs in streaming manner
This allows parsing unlimited number of logs in a single HTTP request,
without the need to buffer the logs in memory.

This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9070

Thanks to @AndrewChubatiuk for the initial pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9153

This commit is based on the https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9153 .
It contains the following changes comparing to the original pull request:

- Remove ugly function LineReader.NextLineWithLineFn(). Instead, uglify the Journald parser a bit
  with hacky calls to LineReader.NextLine() in order to parse binary-encoded field values.
  This should preserve the maintainability of the LineReader, which is shared among multiple protocol parsers,
  under control, while keeping the complexity of Journald parsing inside the app/vlinsert/journald package.

- Fix a typo bug inside isNameValid() - `(r < '0' && r > '9')` must be written as `(r < '0' || r > '9')`.
  Rewrite isNameValid() into easier to understand code and rename it to isValidJournaldFieldName() for better readability.
  Add tests for this function.

- Remove mentioning of the -journald.maxRequestSize command-line flag from VictoriaLogs docs.

- Add the description of the fix to VictoriaLogs changelog.

- Properly increment errorsTotal metric on every journald parse error.

- Add missing protoparserutil.PutUncompressedReader(reader) call, so the reader could be re-used between client requests.

- Remove improperly working code, which tries continuing parsing the request stream after parse errors.
  It is impossible to recover reliably from journald parse errors related to reading the data from the request stream,
  since the journald protocol format is completely braindead. So it is better to immediately return the error
  to the client instead of trying to recover. The only errors, which could be recovered, are related to invalid field names / values.
  Such errors are logged with the WARN level and the corresponding fields are skipped.

- Fix incorrect storage of the re-used name and value strings into fb.fields. The contents of the name and value strings
  must be copied per every loop, which reads these strings from the request stream. Otherwise the contents of the previously
  added Name and Value fields into fb.fields will be overwritten on the next loop.

- Ensure that LineReader.Line is set to nil after LineReader.NextLine() returns false. This should prevent from subtle bugs
  when the LineReader.Line is read after LineReader.NextLine() returns false.
2025-06-18 23:49:13 +02:00
Fred Navruzov
cd0781b7f1 docs/vmanomaly: release 1.24.0 post-release updates (#9224)
### Describe Your Changes

Fixing typos and missing links after v1.24.0 doc update in #9191 

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-18 23:49:13 +02:00
Fred Navruzov
dbab7106cd Docs: vmanomaly release v1.24.0 (#9191)
### Describe Your Changes

> ⚠️  still draft, don't merge even if already approved

Docs update for vmanomaly v1.24.0 release with stateful service option

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-18 23:49:12 +02:00
Aliaksandr Valialkin
a7e65edf97 lib/logstorage: optimize OR filters where one of these filters is *
Such filters can be optimized to `*`. This avoid executing other OR filters.
For example, `foo or * or bar` is optimized to `*`, while `foo` and `bar` filters aren't executed.

Such filters are frequently generated by Grafana, so this should improve query performance there.
2025-06-18 16:54:01 +02:00
Aliaksandr Valialkin
6c4b2b52c3 lib/logstorage: properly parse unquoted regexp filters ending with *
Use getCompoundToken() instead of getCompoundFuncArg() for obtaining regexp filter value,
since getCompoundFuncArg() skips trailing '*' chars.

This allows detecting invalid queries in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8582 .
2025-06-18 16:54:01 +02:00
Zakhar Bessarab
2ecc6432e4 docs/changelog: add link to an issue after 971c759a
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-18 18:08:29 +04:00
Roman Khavronenko
d080b3635b app/vmalert: rename samples to series (#9204)
`Samples` could be confusing for users, especially for alerting rules:
* we say in docs that each returned "series" will create a new alert
* we say that there are "series fetched", so both columns should be
either "series" or "samples".

Let's rename it to `series` for consistency.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit ddd686c026)
2025-06-18 14:51:58 +02:00
Hui Wang
91a616d8a6 vmalert: do not break vmalert process under replay mode when rule use… (#9206)
…s `query` template, but only logging a warning

The `query` template is not supported in replay mode, because we perform
range queries on the rule’s expression, but not on the `query` template.
Previously, if user see error `query template isn't supported in replay
mode`, they need remove the `query` template from the rule for replay
mode.
Also, templating is only used for alerting rules. Replaying alerting
rules don't send notifications(rule annotations are included here) and
users mainly focus on the generated `ALERTS`, the `query` result is
trivial. This pull request shouldn't break things but simplifies the
usage of replay mode for the case.

related https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8746

(cherry picked from commit 0e8007a02b)
2025-06-18 14:51:58 +02:00
Dmytro Kozlov
97b8e51614 docs/vmctl: add information about the prometheus snapshot folder structure (#9208)
### Describe Your Changes

Users try to migrate from systems like Thanos or Prometheus using
`vmctl` and `promtheus` migration protocol, with questions about the
problems when they try to specify the snapshot, but `vmctl` shows `0`
series to be found for migration.
This issue happens because users specify the block folder instead of the
snapshot folder.
Added clarification about snapshot structure and its appearance with
multiple blocks inside.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit cbd76ac4dc)
2025-06-18 14:51:58 +02:00
Andrii Chubatiuk
dbcdc690b9 lib/streamaggr: fixed rate_avg and rate_sum when scrape interval is bigger than aggregation interval (#9170)
### Describe Your Changes

fixes #9017
additionally introduced testing/synctest library to cover this and cases
from previous releases related to aggregation windows and panics in
outputs with state

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 96b8213b0d)
2025-06-18 14:51:57 +02:00
Zakhar Bessarab
5e14617b5d packaging/make: fix archiving release binaries for cluster (#903)
* packaging/make: fix archiving release binaries for cluster

Cluster binaries for non-windows platforms did not include fips binaries, fix that by properly including binaries.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-18 16:06:26 +04:00
Phuong Le
9887dbe4d0 vlinsert/journald: fix timestamp parsing when using default time field (#9150)
See #9144 

VictoriaLogs was not correctly parsing timestamps from journald data
when using the default time field configuration. This caused
VictoriaLogs to look for a `_time` field instead of
`__REALTIME_TIMESTAMP` in journald data by default, resulting in
timestamps falling back to ingestion time rather than the actual log
timestamps.

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-18 13:01:01 +02:00
Aliaksandr Valialkin
fe0849e284 docs/victorialogs/CHANGELOG.md: move the BUGFIX entry to the correct place after the commit 9b21dc5a30
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9062
2025-06-18 12:43:14 +02:00
Aliaksandr Valialkin
5055e40a7b app/vlinsert/journald: properly read timestamp from __REALTIME_TIMESTAMP field by default
The CommonParams.TimeFields is initialized to []{"_time"} by default. This prevent from the proper usage of the -journald.timeField
as the default field for reading log timestamps.

This bug has been introduced in the commit a1a731eb61

While at it, make sure that every parsed log entry has its own timestamp if the timestamp couldn't be read from the log entry.
This provides reliable sort order of the log fields.
2025-06-18 12:43:14 +02:00
Aliaksandr Valialkin
041e3959a6 app/vlinsert/journald: use (_MACHINE_ID, _HOSTNAME, _SYSTEMD_UNIT) as default log stream fields for logs ingested via journald data ingestion protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9143
2025-06-18 12:43:13 +02:00
Will Sargent
d0ca27c360 deployment/docker/victorialogs: Fix container name to fluentbit-oltp (#9213)
### Describe Your Changes

The container name for oltp has the same name as the loki container.
Fixed.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-18 11:59:44 +04:00
Aliaksandr Valialkin
bbed6b88b4 docs/victorialogs: document facility_keyword field, which is automatically added by Syslog parser
The facility_keyword field has been added in the commit ff9cb3f821
2025-06-17 10:57:18 +02:00
Aliaksandr Valialkin
0fcb16b029 app/vlinsert: use string representation of log level at logs ingested into VictoriaLogs via syslog and journald protocols
It is better from usability PoV to use string representation for the 'level' log field
instead of numeric representation.

Remove the -journald.priorityAsLevel and -syslog.severityAsLevel command-line flags,
since there are zero practical reasons when the `level` log field shouldn't be initialized automatically.

Move the CHANGELOG description for this feature into the correct place at docs/victorialogs/CHANGELOG.md,
and make it more human-readable.

Document the 'level' log field at https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/
and at https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/

This is a follow-up for 50969ca780

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8535
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8553
2025-06-17 10:50:06 +02:00
Andrii Chubatiuk
b4ec479b47 app/vlinsert: introduced flags, that enable syslog severity and journald priority fields casting to a level field (#8553)
### Describe Your Changes

fixes #8535 

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-06-17 08:30:10 +02:00
Zakhar Bessarab
383c3383b3 app/vmbackupmanager: increase storage healthcheck duration (#902)
Storage node with large number of partitions (e.g. 150+ partitions with 12Tb of data) will take more than 30 seconds to start. This was causing vmbackupmanager restarts when running vmstorage colocated with vmbackupmanager in a single pod.

Use exponential backoff for retries and increase overall timeout for storage node healthcheck to 3 minutes to avoid vmbackupmanager restarts during storage node startup.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-16 17:51:10 +04:00
Zakhar Bessarab
46451027db lib/backup/azremote: do not use SAS token for copying objects (#9172)
Using SAS token is not required when copying data within a single
storage account. Using a plain object URL is sufficient for this case. A
single storage account is normally used to store backups so it is safe
to remove SAS tokens usage.

This fixes support of server-side copy when using managed identity as
authentication source as SAS token can only be generated when using
"shared key" type of credentials.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9131

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-16 11:30:52 +04:00
Dmytro Kozlov
6bc60e109c apptest/vmctl: implement integration tests for the remote read protocol (#9164)
Implemented integration tests for the remote read protocol (for both mode stream and default).
Removed old implementation from the vmctl.

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7700

Co-authored-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
2025-06-16 07:16:03 +02:00
Andrii Chubatiuk
c93bd59a66 apptest: validate relabelling after reload (#9175)
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8946

Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-06-13 14:38:28 +02:00
hagen1778
827aaa7591 dashboards: set Y-min and units for re-processing panel
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3382bbf285)
2025-06-13 13:53:50 +02:00
Fred Navruzov
2dbb339c8e docs/vmanomaly: release 1.23.3 (#9180)
### Describe Your Changes

Docs update to vmanomaly v1.23.3

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

(cherry picked from commit 5ec7cc5dd4)
2025-06-13 13:53:49 +02:00
Nick Yang
54067786fc app/vlinsert: support - timestamp for rsyslog
rfc5424 allows for `-` for timestamps when the syslog application cannot
get the current time (e.g., embedded devices that have not yet NTP'd),
and the log ingester should apply its timestamp in that case

RFC5424:
https://datatracker.ietf.org/doc/html/rfc5424#section-6.2.3

Related PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9062
2025-06-11 14:22:11 +02:00
Roman Khavronenko
306448c756 app/vmselect/promql: add rate_prometheus
support
[rate_prometheus](https://docs.victoriametrics.com/victoriametrics/metricsql/#rate_prometheus)
function, an equivalent to `increase_prometheus(series_selector[d]) / d`

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8901
2025-06-11 14:22:11 +02:00
f41gh7
b55956e113 vendor: update metricsql 2025-06-11 14:22:11 +02:00
Roman Khavronenko
8196526f90 app/vmselect: respect staleness markers when calculating rate and increase functions
The new behavior will interrupt rate/increase calculation if last sample
on the selected time window is a staleness marker,
making the series to disappear immediately instead of slowly fading
away.

See more details in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8891#issuecomment-2883119388.
2025-06-11 14:22:11 +02:00
Nikolay
bae92bf1b2 apptest: add base test cases for victoria-logs
This commit adds key concepts and ingestion protocol tests for
victoria-logs component.
2025-06-11 14:22:11 +02:00
Phuong Le
9e7dc65b8d spellcheck: run 2025-06-11 14:21:11 +02:00
Max Kotliar
3432e4b571 .github/workflows: allow codecov to report without failing CI build
### Describe Your Changes

The `fail_ci_if_error` flag only affects the upload step and does not
control
Codecov's status checks (e.g., codecov/patch, codecov/project). My prior
tests
did not surface this behavior.

Switched to 'informational' mode per Codecov docs to avoid blocking CI.
See:

https://docs.codecov.com/docs/common-recipe-list#set-non-blocking-status-checks

Tested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9146

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9139,
52022e482c
2025-06-11 14:21:11 +02:00
Artem Fetishev
402fe76e13 apptest/vmctl: migrate vmctl test for the vm-native migration process (#9059)
Moved the migration process via the native protocol (vmsingle to
vmsingle) test to the apptest folder, where all integration tests are
held

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7700

Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-06-10 16:11:20 +02:00
Andrii Chubatiuk
2f55d6b3af deployment/docker/victorialogs: fixed journald ignore filters in example (#9154)
### Describe Your Changes

removed trailing comma in ignoreFields, which led to _msg field removal,
since it is converted to empty string before filtering

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-10 13:56:10 +02:00
sforests
1853bba452 lib/storage:fix Less Method in tagFilter Struct (#9127)
### Describe Your Changes

This pull request addresses a bug in the Less method of the tagFilter
struct. The original implementation incorrectly assigned the value of
isCompositeB by calling tf.isComposite() instead of other.isComposite().
This caused both isCompositeA and isCompositeB to always have the same
value, leading to incorrect comparisons when determining the order of
tagFilter instances.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
2025-06-10 13:53:03 +02:00
Zakhar Bessarab
e3e4253e69 docs/changelog: backport changes for 1.110.11
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-10 14:03:53 +04:00
hagen1778
238418db70 dashboards: add panel Partitions scheduled for re-processing to VM cluster dashboard
It shows the amount of data scheduled for [downsampling](https://docs.victoriametrics.com/#downsampling)
 or [retention filters](https://docs.victoriametrics.com/#retention-filters).
 The new panel should help to correlate resource usage with background re-processing of partitions.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 780c67d139)
2025-06-10 09:30:44 +02:00
hagen1778
d49bbbfc49 dashboards: use RSS anon memory in per-component sections
Anonymous RSS memory usage per component type is more useful to observe
for finding anomalies.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 9244557b6e)
2025-06-10 09:30:44 +02:00
Aliaksandr Valialkin
542cfe6d8d lib/logstorage: improve performance for isTokenChar() by using 256-byte lookup table
This increases performance for the isTokenChar() by up to 30%.

Thanks to @ahfuzhang for the initial idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9064/files#diff-27b31ccad49a8ceaf033f97deb3d876d62eab4119374cbb3ae65278e894f6c69
2025-06-09 20:59:55 +02:00
Aliaksandr Valialkin
b163487d5b lib/logstorage: call isTokenChar() for ascii chars passed to isTokenRune()
This improves isTokenRune() performance for ascii chars by up to 30%.

Thanks to @ahfuzhang for the initial idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9064/files#diff-27b31ccad49a8ceaf033f97deb3d876d62eab4119374cbb3ae65278e894f6c69
2025-06-09 20:59:54 +02:00
Aliaksandr Valialkin
9cd08be63d lib/prompbmarshal: make size() private methods, since they arent used outside lib/prompbmarshal 2025-06-09 19:32:45 +02:00
Aliaksandr Valialkin
d299f0008b lib/prompbmarshal: make marshalToSizedBuffer() private methods, since they arent used outside lib/prompbmarshal 2025-06-09 19:32:44 +02:00
Max Kotliar
6fe5049fce .github/workflows: allow codecov to report without failing CI build (#9139)
### Describe Your Changes

Currently, some PRs have a failed CI due to low code coverage reported
by Codecov. However, the team typically ignore this and relies on other
quality indicators such as thorough code reviews.

This change configures Codecov to continue posting coverage reports
without marking the build as failed.

It also helps reduce confusion for external contributors, who might
otherwise feel pressured to add unnecessary tests just to satisfy
Codecov requirements (for example
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9002#discussion_r2111651046).

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-09 17:51:48 +02:00
Zakhar Bessarab
d28926f7f7 docs/release-guide: fix link to release follow-up
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-09 16:00:36 +04:00
Zakhar Bessarab
32c3df2d35 docs: update version of VictoriaMetrics components to v1.119.0
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-09 15:57:03 +04:00
Zakhar Bessarab
cea68db3c3 deployment/docker: update versions of VictoriaMetrics components to v1.119.0
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-09 15:57:03 +04:00
Zakhar Bessarab
e4c1b99599 docs/victoriametrics/changelog: backport LTS changelog
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-09 15:15:41 +04:00
Artem Fetishev
d43791257f apptest: Add basic backup/restore integration test (#9133)
This commit adds a basic backup/restore test for vmsingle and vmcluster. A
more sophisticated was originally added to the partition index PR
(#8134) and was aimed to test backup/restore when switching back and
forth between legacy and partition index. During the code review it was
decided that it would be good to have a separate test as well since
legacy code will be removed in future and so will the test.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-06-09 12:33:58 +02:00
Fred Navruzov
385efc119e docs/vmanomaly: release v1.23.2 (#9135)
### Describe Your Changes

Update docs to vmanomaly release v1.23.2

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-09 12:26:41 +02:00
Fred Navruzov
f9127ced94 docs/vmanomaly: release v1.23.1 (#9132)
### Describe Your Changes

Update the docs to release v1.23.1

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-09 12:26:37 +02:00
Aliaksandr Valialkin
9065665f7d deployment/docker/builder/Dockerfile: download musl archives for cross-compilation from the local repository instead of musl.cc
This speeds up building the Go builder image significantly (from hours to a few minutes),
since the build speed was limited by the download speed from https://musl.cc , and this speed
was extremely slow (e.g. 10kb/s and slower).

This also improves build security, since the local mirror of musl.cc is under our control.
2025-06-08 13:48:18 +02:00
Aliaksandr Valialkin
c5d3132546 docs/victorialogs/querying/README.md: mention that web UI supports live tailing
This is a follow-up for the commit 231bfcf4cf

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8882
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7046
2025-06-08 09:14:02 +02:00
Aliaksandr Valialkin
b6175623d3 deployment: update base Docker image from alpine:3.21.3 to alpine:3.22.0
See https://alpinelinux.org/posts/Alpine-3.22.0-released.html
2025-06-08 08:15:09 +02:00
Aliaksandr Valialkin
aaa010d258 go.mod: update Go from 1.24.3 to 1.24.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.24.4+label%3ACherryPickApproved

This is a follow-up for 54dc9cc322
2025-06-08 08:03:27 +02:00
Aliaksandr Valialkin
9ff9842799 deployment: update Go builder from Go1.24.3 to Go1.24.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.24.4+label%3ACherryPickApproved
2025-06-08 08:02:00 +02:00
Aliaksandr Valialkin
eda1dc9df9 docs/victoriametrics/goals.md: clarify the main goal 2025-06-08 07:58:18 +02:00
Zakhar Bessarab
b5b6fbb11f docs/victoriametrics/changelog: cut v1.119.0
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-06 16:48:18 +04:00
Zakhar Bessarab
bcbd458698 app/{vmselect,vlselect}: run make vmui-update vmui-logs-update
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-06 16:38:26 +04:00
DmitrySafonov
c677e7e10f app/promql/rollup_result_cache: include extra_filters in rollupCache key for multi-tenant support
### Describe Your Changes

This PR addresses two issues:

When tenant labels (e.g. vm_account_id, vm_project_id) are passed via
extra_filters, they were not included in the rollupCache key. This could
cause cache entries to be reused across different tenants, resulting in
incorrect query results.
If a tenant is specified only via extra_filters, and that tenant does
not exist in TenantsCached, it gets silently filtered out by
GetTenantTokensFromFilters, causing the query to fall back to a global
(non-tenant) query — which is likely unexpected and potentially unsafe.
This fix ensures correct tenant scoping and avoids unintended data
exposure or cache pollution.

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9001

---------

Co-authored-by: Zakhar Bessarab <me@zekker.dev>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
2025-06-06 15:51:45 +04:00
Aliaksandr Valialkin
92b85c8ded lib/atomicutil: add CacheLineSize const equal to the size of CPU cache line, and use this const for padding against false sharing across the code base
This should reduce the waste of memory on the padding from 128 bytes to 64 bytes on GOARCH=amd64,
while preserving bigger padding for platforms with bigger cache line sizes.

See https://stackoverflow.com/questions/68320687/why-are-most-cache-line-sizes-designed-to-be-64-byte-instead-of-32-128byte-now

Thanks to @tIGO for the hint
2025-06-06 10:24:43 +02:00
Peter Gervai
12719b2827 docs: update LogsQL "field pipe" typo (#9037)
### Describe Your Changes

Typo? It's called "fields" pipe, not "field".

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).

(cherry picked from commit b3d22403eb)
2025-06-06 10:00:53 +02:00
hagen1778
943ae6a422 docs: prettify the changelog
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 0fce51e3b4)
2025-06-06 10:00:52 +02:00
hagen1778
8c0b0aa72b docs: rm 11831-victoria-metrics-cluster-ig1-version dashboard
Remove the community-provided dashboard as it remains without updates
for a few years already. Recommending it may hurt user's experience.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 9e118fe1ee)
2025-06-06 10:00:52 +02:00
Zakhar Bessarab
3ae8806b3a app/vmctl: enable dual-stack mode by default (#9119)
Dual stack mode is disabled by default in order to avoid accidentally
exposing components via IPv6 networks.

vmctl does not expose any endpoints and does not allow using default Go
flags as it is using `urfave/cli` lib.

This commit enables IPv6 support by default since there is no security
risks related to network configuration and this make vmctl easier to use
with default configuration.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9116

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 3553c60399)
2025-06-06 10:00:52 +02:00
Dmitry Ponomaryov
761ebfc580 app/vlinsert: add logging of skipped bytes for log lines exceeding insert.maxLineSizeBytes (#9082)
### Describe Your Changes

This change adds logging of the number of skipped bytes when a log line
exceeds the configured `insert.maxLineSizeBytes`.

it helps diagnose and tune systems dealing with oversized log records by
showing how much to increase the parameter for the log to fit in
storage.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-06 09:09:20 +02:00
Artur Minchukou
aac06155dd app/vmui/logs: optimize live tailing performance by limiting logs to 200 and notifying users (#9083)
### Describe Your Changes

Added a log limit if the 200 logs per second limit is reached and a
notification for the user asking them to add a filter to the query

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-06 09:03:15 +02:00
Nils K
1563dd1345 Fix typo of journald in CLI flag (#9112)
### Describe Your Changes

Fix swapped letters

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Signed-off-by: Septatrix <24257556+septatrix@users.noreply.github.com>
2025-06-06 08:59:44 +02:00
Jose Gómez-Sellés
dc2d4389c2 docs/cloud: add explore data page (#9113)
### Describe Your Changes

This PR adds the explore section to the docs. It emphasizes on
explaining and linking assets for VMUI and
MetricsQL

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-06 08:54:54 +02:00
Fred Navruzov
25f9f0f5d4 docs/vmanomaly - missing updates for v1.23.0 (#9114)
### Describe Your Changes

Some of the missing doc updates after 1.23.0 release

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-06 08:54:54 +02:00
Zakhar Bessarab
643ca33841 lib/netutil/netutil: fix strings index check
### Describe Your Changes

Properly check precense of `/`, previously it was 
ignoring a case where "/" would be at the beginning of the string.

This is a follow-up for 00712b18

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-06 09:02:26 +04:00
Aliaksandr Valialkin
5bd490a147 docs/victorialogs/cluster.md: added missing -storageNode option in the example on how to disable /insert/* requests at vlselect
This is a follow-up for 41558066db

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9061
2025-06-05 18:44:47 +02:00
Aliaksandr Valialkin
64d1c1ac52 app/vmauth: add tests for the case when url_prefix ends with / and the requested path starts with /
This is needed for verifying https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9096
2025-06-05 18:38:50 +02:00
Phuong Le
2ad3a65c7c vlselect/vlinsert: allow disabling the vlinsert and vlselect endpoints (#9067)
## Problem

In vlcluster evel setups, components like vlselect can still accept and
forward /insert requests. The lack of strict endpoint control increases
the risk of human error and undermines deployment security boundaries.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9061

## Fix

Add flags to disable the vlinsert and vlselect endpoints. The
`-insert.disable` flag also disables the internalinsert endpoint.
Similarly for vlselect.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-05 18:38:50 +02:00
Fred Navruzov
c76ce08d5e docs/vmanomaly - release v1.23.0 (#9111)
### Describe Your Changes

Docs update for vmanomaly v1.23.0 release

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-05 18:38:49 +02:00
Nikolay
ff442827cf lib/storage: properly load metric_usage_tracker file content
Previously, if metric_usage_tracker file was corrupted. It prevented
VictoriaMetrics from start and required manual action. Corruption may
happen in various reasons, such as unclean shutdown of the process.

 This commit changes panic into error message, in the same way as other
caches do.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9074
2025-06-05 14:11:22 +02:00
Aliaksandr Valialkin
ae0339b462 lib/{storage,mergeset}: reduce the multi-CPU contention on global stats vars, which are updated during background merge
Background merge updates the global stats on the number of merged / deleted items. This may result in slowdown
when multiple goroutines update these global stats at frequent rate, since every goroutine must fetch the actual value
for the updated stats from slow memory on every update. It is much faster to count the needed stats locally per every goroutine
and then periodically updating the global stats (once per ~second).

Thanks to @tIGO for the intial implementation of this idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683/files#diff-95e28ae911944708f94f3bb31fa9ba8bc185dedc23ae6fb02a272c34b8f83244

This should help improving scalability of background merges on multi-CPU systems.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8682
2025-06-05 12:30:50 +02:00
Aliaksandr Valialkin
324992e6ff lib: make sure that frequently updated global counters are padded in order to protect from false sharing issues on multi-CPU systems
Go linker packs global variables close to each other in the memory. This may lead to false sharing (https://en.wikipedia.org/wiki/False_sharing)
among these variables if frequently updated vars are put close to mostly read-only vars like described
at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8682 .

This commit adds padding to frequently updated global vars. This guarantees that these variables are put into distinct CPU cache lines
comparing to the rest of global variables. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683#issuecomment-2943254119

Thanks to @tIGO for the intial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683
2025-06-05 11:40:45 +02:00
Zakhar Bessarab
12454ce2ef app/vmselect/netstorage/tenant_cache: fix inconsistent fetching of tenants list
### Describe Your Changes

Previously, any case when cache returned items was skipping lookup of
tenants at vmstorage nodes. This leaded to inconsistent results for
cases when cache contained items to cover only some part of requested
time range.

Fix this by forcing a cache item to cover full requested time range.
This forces cache hits to always be "full hits".


See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9042

Target branch for this PR is another PR related to the same issue -
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9048, this is in
order to avoid additional rebasing/merge as this PR will conflict with
cluster branch after initial PR merge. GH will change target for this pr
to cluster brance once #9048 will be merged.

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-05 11:08:36 +04:00
f41gh7
3dedd1ecf6 app/vmgateway: add support of mTLS for read and write backends
This commit introduces new flags for mTLS configuration:

```
  -read.tlsCAFile
  -read.tlsCertFile
  -read.tlsInsecureSkipVerify
  -read.tlsKeyFile
  -read.tlsServerName

  -write.tlsCAFile
  -write.tlsCertFile
  -write.tlsInsecureSkipVerify
  -write.tlsKeyFile
  -write.tlsServerName
```

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8841
2025-06-04 19:36:58 +02:00
Andrii Chubatiuk
ee45e010bb app/vmgateway: added more select routes
This commits adds additional vmselect routes.
Such as `/static`, `/api/v1/status/metric_names_stats` and others.

 In addition it properly redirects `/vmui` and `/vmalert` access endpoints requests. Such endpoints require to preserve trailing `/`. Previously it was omitted and redirect requests failed.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9003
2025-06-04 19:36:58 +02:00
Dmytro Kozlov
1ee1e379b2 apptest/vmctl: migrate vmctl test for the prometheus migration process (#9047)
Moved Prometheus migration process test to the apptest folder, where all
integration tests are held

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7700

Other tests will be migrated one by one.
2025-06-04 17:01:26 +02:00
Zakhar Bessarab
3c4eb0301d app/vmselect/netstorage: allow disabling cache for list of tenants
### Describe Your Changes

Properly respect passing `nocache=1` or using `search.disableCache` when
executing a query. Also allow disabling tenant cache separately in order
to make debugging easier.

Related: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9042

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-04 17:00:03 +04:00
Zakhar Bessarab
6bb675752b deployment/docker: do not update stable tag
### Describe Your Changes

Stop updating `:stable` tags as those are exactly the same as `:latest`
but not used by default by docker/podman and other commands.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-06-04 16:58:46 +04:00
Hui Wang
9e2fb644ae docs: minor fixes (#9090) 2025-06-04 16:58:46 +04:00
Hui Wang
dd97ce359a docs: minor fixes (#9090)
(cherry picked from commit 3c85ffb1e6)
2025-06-04 10:05:30 +02:00
hagen1778
31d3717efc docs: typo fix
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 65cb6468ac)
2025-06-04 10:05:29 +02:00
Robin Hayer
aad6b7ed43 lib/workingsetcache: log error when restoring cache from file (#8952)
### What this PR does

log error returned by `fastcache.LoadFromFile` before falling back to
creating a new cache instance. this improves observability and helps
detect problems like file corruption or permission issues early.

this replaces `fastcache.LoadFromFileOrNew` with a custom function
`loadFromFileOrNewWithLog` that explicitly logs errors encountered
during cache restoration.

---

### Related Issue

Closes #8934

---

### Test Plan

- manually tested by simulating a missing file scenario
- ensured expected log output on cache load failure
- verified normal cache creation fallback path

---

### Changelog

log error when cache fails to restore from file during workingsetcache
initialization (#8934)

---

### Checklist

- [x] Signed commits
- [x] Follows coding and commit message conventions
- [x] Tested manually
- [x] Scope limited to relevant change
- [x] Changelog entry added

Co-authored-by: Robin Hayer <rshayer95@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
(cherry picked from commit 8e645ea708)
2025-06-04 10:05:29 +02:00
412 changed files with 19127 additions and 8291 deletions

1
.gitignore vendored
View File

@@ -12,6 +12,7 @@
/victoria-logs-data
/victoria-metrics-data
/vmagent-remotewrite-data
/vlagent-remotewritewrite
/vmstorage-data
/vmselect-cache
/package/temp-deb-*

View File

@@ -11,6 +11,8 @@ ifeq ($(PKG_TAG),)
PKG_TAG := $(BUILDINFO_TAG)
endif
EXTRA_DOCKER_TAG_SUFFIX ?= EXTRA_DOCKER_TAG_SUFFIX
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(DATEINFO_TAG)-$(BUILDINFO_TAG)'
TAR_OWNERSHIP ?= --owner=1000 --group=1000
@@ -107,6 +109,31 @@ package: \
package-vmselect \
package-vmstorage
publish-final-images:
PKG_TAG=$(TAG) APP_NAME=victoria-metrics $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmagent $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmalert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmalert-tool $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmauth $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmbackup $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmrestore $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmctl $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-cluster APP_NAME=vminsert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-cluster APP_NAME=vmselect $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-cluster APP_NAME=vmstorage $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=victoria-metrics $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmagent $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmalert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmauth $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmbackup $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmrestore $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise-cluster APP_NAME=vminsert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise-cluster APP_NAME=vmselect $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise-cluster APP_NAME=vmstorage $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmgateway $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmbackupmanager $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) $(MAKE) publish-latest
publish-latest:
PKG_TAG=$(TAG) APP_NAME=victoria-metrics $(MAKE) publish-via-docker-latest && \
PKG_TAG=$(TAG) APP_NAME=vmagent $(MAKE) publish-via-docker-latest && \
@@ -236,7 +263,7 @@ test-full:
test-full-386:
GOEXPERIMENT=synctest GOARCH=386 go test -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...
integration-test: all
integration-test: all vmctl vmbackup vmrestore victoria-logs
go test ./apptest/... -skip="^TestSingle.*"
benchmark:

View File

@@ -1,14 +1,14 @@
# VictoriaMetrics
![Latest Release](https://img.shields.io/github/v/release/VictoriaMetrics/VictoriaMetrics?sort=semver&label=&filter=!*-victorialogs&logo=github&labelColor=gray&color=gray&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Freleases%2Flatest)
[![Latest Release](https://img.shields.io/github/v/release/VictoriaMetrics/VictoriaMetrics?sort=semver&label=&filter=!*-victorialogs&logo=github&labelColor=gray&color=gray&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Freleases%2Flatest)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
![Docker Pulls](https://img.shields.io/docker/pulls/victoriametrics/victoria-metrics?label=&logo=docker&logoColor=white&labelColor=2496ED&color=2496ED&link=https%3A%2F%2Fhub.docker.com%2Fr%2Fvictoriametrics%2Fvictoria-metrics)
![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics?link=https%3A%2F%2Fgoreportcard.com%2Freport%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics)
![Build Status](https://github.com/VictoriaMetrics/VictoriaMetrics/actions/workflows/main.yml/badge.svg?branch=master&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Factions)
![codecov](https://codecov.io/gh/VictoriaMetrics/VictoriaMetrics/branch/master/graph/badge.svg?link=https%3A%2F%2Fcodecov.io%2Fgh%2FVictoriaMetrics%2FVictoriaMetrics)
![License](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics?labelColor=green&label=&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Fblob%2Fmaster%2FLICENSE)
[![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics?link=https%3A%2F%2Fgoreportcard.com%2Freport%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics)](https://goreportcard.com/report/github.com/VictoriaMetrics/VictoriaMetrics)
[![Build Status](https://github.com/VictoriaMetrics/VictoriaMetrics/actions/workflows/main.yml/badge.svg?branch=master&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Factions)](https://github.com/VictoriaMetrics/VictoriaMetrics/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/VictoriaMetrics/VictoriaMetrics/branch/master/graph/badge.svg?link=https%3A%2F%2Fcodecov.io%2Fgh%2FVictoriaMetrics%2FVictoriaMetrics)](https://app.codecov.io/gh/VictoriaMetrics/VictoriaMetrics)
[![License](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics?labelColor=green&label=&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Fblob%2Fmaster%2FLICENSE)](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/LICENSE)
![Slack](https://img.shields.io/badge/Join-4A154B?logo=slack&link=https%3A%2F%2Fslack.victoriametrics.com)
![X](https://img.shields.io/twitter/follow/VictoriaMetrics?style=flat&label=Follow&color=black&logo=x&labelColor=black&link=https%3A%2F%2Fx.com%2FVictoriaMetrics)
![Reddit](https://img.shields.io/reddit/subreddit-subscribers/VictoriaMetrics?style=flat&label=Join&labelColor=red&logoColor=white&logo=reddit&link=https%3A%2F%2Fwww.reddit.com%2Fr%2FVictoriaMetrics)
[![X](https://img.shields.io/twitter/follow/VictoriaMetrics?style=flat&label=Follow&color=black&logo=x&labelColor=black&link=https%3A%2F%2Fx.com%2FVictoriaMetrics)](https://x.com/VictoriaMetrics/)
[![Reddit](https://img.shields.io/reddit/subreddit-subscribers/VictoriaMetrics?style=flat&label=Join&labelColor=red&logoColor=white&logo=reddit&link=https%3A%2F%2Fwww.reddit.com%2Fr%2FVictoriaMetrics)](https://www.reddit.com/r/VictoriaMetrics/)
<picture>
<source srcset="docs/victoriametrics/logo_white.webp" media="(prefers-color-scheme: dark)">

View File

@@ -8,6 +8,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlselect"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
@@ -44,6 +45,8 @@ func main() {
vlstorage.Init()
vlselect.Init()
insertutil.SetLogRowsStorage(&vlstorage.Storage{})
vlinsert.Init()
go httpserver.Serve(listenAddrs, requestHandler, httpserver.ServeOptions{

106
app/vlagent/Makefile Normal file
View File

@@ -0,0 +1,106 @@
# All these commands must run from repository root.
vlagent:
APP_NAME=vlagent $(MAKE) app-local
vlagent-race:
APP_NAME=vlagent RACE=-race $(MAKE) app-local
vlagent-prod:
APP_NAME=vlagent $(MAKE) app-via-docker
vlagent-pure-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-pure
vlagent-linux-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-amd64
vlagent-linux-arm-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-arm
vlagent-linux-arm64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-arm64
vlagent-linux-ppc64le-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-ppc64le
vlagent-linux-386-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-386
vlagent-darwin-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-darwin-amd64
vlagent-darwin-arm64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-darwin-arm64
vlagent-freebsd-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-freebsd-amd64
vlagent-openbsd-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-openbsd-amd64
vlagent-windows-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-windows-amd64
package-vlagent:
APP_NAME=vlagent $(MAKE) package-via-docker
package-vlagent-pure:
APP_NAME=vlagent $(MAKE) package-via-docker-pure
package-vlagent-amd64:
APP_NAME=vlagent $(MAKE) package-via-docker-amd64
package-vlagent-arm:
APP_NAME=vlagent $(MAKE) package-via-docker-arm
package-vlagent-arm64:
APP_NAME=vlagent $(MAKE) package-via-docker-arm64
package-vlagent-ppc64le:
APP_NAME=vlagent $(MAKE) package-via-docker-ppc64le
package-vlagent-386:
APP_NAME=vlagent $(MAKE) package-via-docker-386
publish-vlagent:
APP_NAME=vlagent $(MAKE) publish-via-docker
vlagent-linux-amd64:
APP_NAME=vlagent CGO_ENABLED=1 GOOS=linux GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-linux-arm:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=arm $(MAKE) app-local-goos-goarch
vlagent-linux-arm64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=arm64 $(MAKE) app-local-goos-goarch
vlagent-linux-ppc64le:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vlagent-linux-s390x:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vlagent-linux-loong64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=loong64 $(MAKE) app-local-goos-goarch
vlagent-linux-386:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch
vlagent-darwin-amd64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-darwin-arm64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 $(MAKE) app-local-goos-goarch
vlagent-freebsd-amd64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=freebsd GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-openbsd-amd64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-windows-amd64:
GOARCH=amd64 APP_NAME=vlagent $(MAKE) app-local-windows-goarch
vlagent-pure:
APP_NAME=vlagent $(MAKE) app-local-pure

3
app/vlagent/README.md Normal file
View File

@@ -0,0 +1,3 @@
See vlagent docs [here](https://docs.victoriametrics.com/victorialogs/vlagent/).
vlagent docs can be edited at [docs/vlagent.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victorialogs/vlagent.md).

View File

@@ -0,0 +1,8 @@
ARG base_image=non-existing
FROM $base_image
EXPOSE 9429
ENTRYPOINT ["/vlagent-prod"]
ARG src_binary=non-existing
COPY $src_binary ./vlagent-prod

97
app/vlagent/main.go Normal file
View File

@@ -0,0 +1,97 @@
package main
import (
"flag"
"fmt"
"net/http"
"os"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
)
var (
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vlagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
)
func main() {
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
buildinfo.Init()
remotewrite.InitSecretFlags()
logger.Init()
remotewrite.Init()
vlinsert.Init()
insertutil.SetLogRowsStorage(&remotewrite.Storage{})
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":9429"}
}
logger.Infof("starting vlagent at %q...", listenAddrs)
startTime := time.Now()
go httpserver.Serve(listenAddrs, requestHandler, httpserver.ServeOptions{
UseProxyProtocol: useProxyProtocol,
})
logger.Infof("started vlagent in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
sig := procutil.WaitForSigterm()
logger.Infof("received signal %s", sig)
pushmetrics.Stop()
startTime = time.Now()
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
vlinsert.Stop()
remotewrite.Stop()
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
logger.Infof("successfully stopped vlagent in %.3f seconds", time.Since(startTime).Seconds())
}
// RequestHandler handles insert requests for VictoriaLogs
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")
fmt.Fprintf(w, "<h2>vlagent</h2>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/victorialogs/vlagent/'>https://docs.victoriametrics.com/victorialogs/vlagent/</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"metrics", "available service metrics"},
{"flags", "command-line flags"},
})
return true
}
return vlinsert.RequestHandler(w, r)
}
func usage() {
const s = `
vlagent collects logs via popular data ingestion protocols and routes it to VictoriaLogs.
See the docs at https://docs.victoriametrics.com/victorialogs/vlagent/ .
`
flagutil.Usage(s)
}

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image=non-existing
ARG root_image=non-existing
FROM $certs_image AS certs
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 9429
ENTRYPOINT ["/vlagent-prod"]
ARG TARGETARCH
COPY vlagent-linux-${TARGETARCH}-prod ./vlagent-prod

View File

@@ -0,0 +1,462 @@
package remotewrite
import (
"bytes"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strconv"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
)
var (
rateLimit = flagutil.NewArrayInt("remoteWrite.rateLimit", 0, "Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. "+
"By default, the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data ")
sendTimeout = flagutil.NewArrayDuration("remoteWrite.sendTimeout", time.Minute, "Timeout for sending a single block of data to the corresponding -remoteWrite.url")
retryMinInterval = flagutil.NewArrayDuration("remoteWrite.retryMinInterval", time.Second, "The minimum delay between retry attempts to send a block of data to the corresponding -remoteWrite.url. Every next retry attempt will double the delay to prevent hammering of remote database. See also -remoteWrite.retryMaxTime")
retryMaxTime = flagutil.NewArrayDuration("remoteWrite.retryMaxTime", time.Minute, "The max time spent on retry attempts to send a block of data to the corresponding -remoteWrite.url. Change this value if it is expected for -remoteWrite.url to be unreachable for more than -remoteWrite.retryMaxTime. See also -remoteWrite.retryMinInterval")
proxyURL = flagutil.NewArrayString("remoteWrite.proxyURL", "Optional proxy URL for writing data to the corresponding -remoteWrite.url. "+
"Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234")
tlsHandshakeTimeout = flagutil.NewArrayDuration("remoteWrite.tlsHandshakeTimeout", 20*time.Second, "The timeout for establishing tls connections to the corresponding -remoteWrite.url")
tlsInsecureSkipVerify = flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify", "Whether to skip tls verification when connecting to the corresponding -remoteWrite.url")
tlsCertFile = flagutil.NewArrayString("remoteWrite.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -remoteWrite.url")
tlsKeyFile = flagutil.NewArrayString("remoteWrite.tlsKeyFile", "Optional path to client-side TLS certificate key to use when connecting to the corresponding -remoteWrite.url")
tlsCAFile = flagutil.NewArrayString("remoteWrite.tlsCAFile", "Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. "+
"By default, system CA is used")
tlsServerName = flagutil.NewArrayString("remoteWrite.tlsServerName", "Optional TLS server name to use for connections to the corresponding -remoteWrite.url. "+
"By default, the server name from -remoteWrite.url is used")
headers = flagutil.NewArrayString("remoteWrite.headers", "Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. "+
"For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. "+
"Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'")
basicAuthUsername = flagutil.NewArrayString("remoteWrite.basicAuth.username", "Optional basic auth username to use for the corresponding -remoteWrite.url")
basicAuthPassword = flagutil.NewArrayString("remoteWrite.basicAuth.password", "Optional basic auth password to use for the corresponding -remoteWrite.url")
basicAuthPasswordFile = flagutil.NewArrayString("remoteWrite.basicAuth.passwordFile", "Optional path to basic auth password to use for the corresponding -remoteWrite.url. "+
"The file is re-read every second")
bearerToken = flagutil.NewArrayString("remoteWrite.bearerToken", "Optional bearer auth token to use for the corresponding -remoteWrite.url")
bearerTokenFile = flagutil.NewArrayString("remoteWrite.bearerTokenFile", "Optional path to bearer token file to use for the corresponding -remoteWrite.url. "+
"The token is re-read from the file every second")
oauth2ClientID = flagutil.NewArrayString("remoteWrite.oauth2.clientID", "Optional OAuth2 clientID to use for the corresponding -remoteWrite.url")
oauth2ClientSecret = flagutil.NewArrayString("remoteWrite.oauth2.clientSecret", "Optional OAuth2 clientSecret to use for the corresponding -remoteWrite.url")
oauth2ClientSecretFile = flagutil.NewArrayString("remoteWrite.oauth2.clientSecretFile", "Optional OAuth2 clientSecretFile to use for the corresponding -remoteWrite.url")
oauth2EndpointParams = flagutil.NewArrayString("remoteWrite.oauth2.endpointParams", "Optional OAuth2 endpoint parameters to use for the corresponding -remoteWrite.url . "+
`The endpoint parameters must be set in JSON format: {"param1":"value1",...,"paramN":"valueN"}`)
oauth2TokenURL = flagutil.NewArrayString("remoteWrite.oauth2.tokenUrl", "Optional OAuth2 tokenURL to use for the corresponding -remoteWrite.url")
oauth2Scopes = flagutil.NewArrayString("remoteWrite.oauth2.scopes", "Optional OAuth2 scopes to use for the corresponding -remoteWrite.url. Scopes must be delimited by ';'")
)
type client struct {
sanitizedURL string
remoteWriteURL string
fq *persistentqueue.FastQueue
hc *http.Client
retryMinInterval time.Duration
retryMaxTime time.Duration
sendBlock func(block []byte) bool
authCfg *promauth.Config
rl *ratelimiter.RateLimiter
bytesSent *metrics.Counter
blocksSent *metrics.Counter
requestDuration *metrics.Histogram
requestsOKCount *metrics.Counter
errorsCount *metrics.Counter
packetsDropped *metrics.Counter
rateLimit *metrics.Gauge
retriesCount *metrics.Counter
sendDuration *metrics.FloatCounter
wg sync.WaitGroup
stopCh chan struct{}
}
func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Fatalf("cannot initialize auth config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tr := httputil.NewTransport(false, "vlagent_remotewrite")
tr.TLSHandshakeTimeout = tlsHandshakeTimeout.GetOptionalArg(argIdx)
tr.MaxConnsPerHost = 2 * concurrency
tr.MaxIdleConnsPerHost = 2 * concurrency
tr.IdleConnTimeout = time.Minute
tr.WriteBufferSize = 64 * 1024
pURL := proxyURL.GetOptionalArg(argIdx)
if len(pURL) > 0 {
if !strings.Contains(pURL, "://") {
logger.Fatalf("cannot parse -remoteWrite.proxyURL=%q: it must start with `http://`, `https://` or `socks5://`", pURL)
}
pu, err := url.Parse(pURL)
if err != nil {
logger.Fatalf("cannot parse -remoteWrite.proxyURL=%q: %s", pURL, err)
}
tr.Proxy = http.ProxyURL(pu)
}
hc := &http.Client{
Transport: authCfg.NewRoundTripper(tr),
Timeout: sendTimeout.GetOptionalArg(argIdx),
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authCfg: authCfg,
fq: fq,
hc: hc,
retryMinInterval: retryMinInterval.GetOptionalArg(argIdx),
retryMaxTime: retryMaxTime.GetOptionalArg(argIdx),
stopCh: make(chan struct{}),
}
c.sendBlock = c.sendBlockHTTP
return c
}
func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
limitReached := metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_rate_limit_reached_total{url=%q}`, c.sanitizedURL))
if bytesPerSec := rateLimit.GetOptionalArg(argIdx); bytesPerSec > 0 {
logger.Infof("applying %d bytes per second rate limit for -remoteWrite.url=%q", bytesPerSec, sanitizedURL)
c.rl = ratelimiter.New(int64(bytesPerSec), limitReached, c.stopCh)
}
c.bytesSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_bytes_sent_total{url=%q}`, c.sanitizedURL))
c.blocksSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_blocks_sent_total{url=%q}`, c.sanitizedURL))
c.rateLimit = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_rate_limit{url=%q}`, c.sanitizedURL), func() float64 {
return float64(rateLimit.GetOptionalArg(argIdx))
})
c.requestDuration = metrics.GetOrCreateHistogram(fmt.Sprintf(`vlagent_remotewrite_duration_seconds{url=%q}`, c.sanitizedURL))
c.requestsOKCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_requests_total{url=%q, status_code="2XX"}`, c.sanitizedURL))
c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL))
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
c.sendDuration = metrics.GetOrCreateFloatCounter(fmt.Sprintf(`vlagent_remotewrite_send_duration_seconds_total{url=%q}`, c.sanitizedURL))
metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_queues{url=%q}`, c.sanitizedURL), func() float64 {
return float64(*queues)
})
for i := 0; i < concurrency; i++ {
c.wg.Add(1)
go func() {
defer c.wg.Done()
c.runWorker()
}()
}
logger.Infof("initialized client for -remoteWrite.url=%q", c.sanitizedURL)
}
func (c *client) MustStop() {
close(c.stopCh)
c.wg.Wait()
logger.Infof("stopped client for -remoteWrite.url=%q", c.sanitizedURL)
}
func getAuthConfig(argIdx int) (*promauth.Config, error) {
headersValue := headers.GetOptionalArg(argIdx)
var hdrs []string
if headersValue != "" {
hdrs = strings.Split(headersValue, "^^")
}
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
passwordFile := basicAuthPasswordFile.GetOptionalArg(argIdx)
var basicAuthCfg *promauth.BasicAuthConfig
if username != "" || password != "" || passwordFile != "" {
basicAuthCfg = &promauth.BasicAuthConfig{
Username: username,
Password: promauth.NewSecret(password),
PasswordFile: passwordFile,
}
}
token := bearerToken.GetOptionalArg(argIdx)
tokenFile := bearerTokenFile.GetOptionalArg(argIdx)
var oauth2Cfg *promauth.OAuth2Config
clientSecret := oauth2ClientSecret.GetOptionalArg(argIdx)
clientSecretFile := oauth2ClientSecretFile.GetOptionalArg(argIdx)
if clientSecretFile != "" || clientSecret != "" {
endpointParamsJSON := oauth2EndpointParams.GetOptionalArg(argIdx)
endpointParams, err := flagutil.ParseJSONMap(endpointParamsJSON)
if err != nil {
return nil, fmt.Errorf("cannot parse JSON for -remoteWrite.oauth2.endpointParams=%s: %w", endpointParamsJSON, err)
}
oauth2Cfg = &promauth.OAuth2Config{
ClientID: oauth2ClientID.GetOptionalArg(argIdx),
ClientSecret: promauth.NewSecret(clientSecret),
ClientSecretFile: clientSecretFile,
EndpointParams: endpointParams,
TokenURL: oauth2TokenURL.GetOptionalArg(argIdx),
Scopes: strings.Split(oauth2Scopes.GetOptionalArg(argIdx), ";"),
}
}
tlsCfg := &promauth.TLSConfig{
CAFile: tlsCAFile.GetOptionalArg(argIdx),
CertFile: tlsCertFile.GetOptionalArg(argIdx),
KeyFile: tlsKeyFile.GetOptionalArg(argIdx),
ServerName: tlsServerName.GetOptionalArg(argIdx),
InsecureSkipVerify: tlsInsecureSkipVerify.GetOptionalArg(argIdx),
}
opts := &promauth.Options{
BasicAuth: basicAuthCfg,
BearerToken: token,
BearerTokenFile: tokenFile,
OAuth2: oauth2Cfg,
TLSConfig: tlsCfg,
Headers: hdrs,
}
authCfg, err := opts.NewConfig()
if err != nil {
return nil, fmt.Errorf("cannot populate auth config for remoteWrite idx: %d, err: %w", argIdx, err)
}
return authCfg, nil
}
func (c *client) runWorker() {
var ok bool
var block []byte
ch := make(chan bool, 1)
for {
block, ok = c.fq.MustReadBlock(block[:0])
if !ok {
return
}
if len(block) == 0 {
// skip empty data blocks from sending
continue
}
go func() {
startTime := time.Now()
ch <- c.sendBlock(block)
c.sendDuration.Add(time.Since(startTime).Seconds())
}()
select {
case ok := <-ch:
if ok {
// The block has been sent successfully
continue
}
// Return unsent block to the queue.
c.fq.MustWriteBlockIgnoreDisabledPQ(block)
return
case <-c.stopCh:
// c must be stopped. Wait for a while in the hope the block will be sent.
graceDuration := 5 * time.Second
select {
case ok := <-ch:
if !ok {
// Return unsent block to the queue.
c.fq.MustWriteBlockIgnoreDisabledPQ(block)
}
case <-time.After(graceDuration):
// Return unsent block to the queue.
c.fq.MustWriteBlockIgnoreDisabledPQ(block)
}
return
}
}
}
func (c *client) doRequest(url string, body []byte) (*http.Response, error) {
req, err := c.newRequest(url, body)
if err != nil {
return nil, err
}
resp, err := c.hc.Do(req)
if err == nil {
return resp, nil
}
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
return nil, err
}
// It is likely connection became stale or timed out during the first request.
// Make another attempt in hope request will succeed.
// If not, the error should be handled by the caller as usual.
// This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139
req, err = c.newRequest(url, body)
if err != nil {
return nil, fmt.Errorf("second attempt: %w", err)
}
resp, err = c.hc.Do(req)
if err != nil {
return nil, fmt.Errorf("second attempt: %w", err)
}
return resp, nil
}
func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
reqBody := bytes.NewBuffer(body)
req, err := http.NewRequest(http.MethodPost, url, reqBody)
if err != nil {
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", url, err)
}
err = c.authCfg.SetHeaders(req, true)
if err != nil {
return nil, err
}
h := req.Header
h.Set("User-Agent", "vlagent")
h.Set("Content-Encoding", "zstd")
h.Set("Content-Type", "application/octet-stream")
return req, nil
}
// sendBlockHTTP sends the given block to c.remoteWriteURL.
//
// The function returns false only if c.stopCh is closed.
// Otherwise, it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.Register(len(block))
maxRetryDuration := timeutil.AddJitterToDuration(c.retryMaxTime)
retryDuration := timeutil.AddJitterToDuration(c.retryMinInterval)
retriesCount := 0
again:
startTime := time.Now()
resp, err := c.doRequest(c.remoteWriteURL, block)
c.requestDuration.UpdateDuration(startTime)
if err != nil {
c.errorsCount.Inc()
retryDuration *= 2
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
remoteWriteRetryLogger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
t := timerpool.Get(retryDuration)
select {
case <-c.stopCh:
timerpool.Put(t)
return false
case <-t.C:
timerpool.Put(t)
}
c.retriesCount.Inc()
goto again
}
statusCode := resp.StatusCode
if statusCode/100 == 2 {
_ = resp.Body.Close()
c.requestsOKCount.Inc()
c.bytesSent.Add(len(block))
c.blocksSent.Inc()
return true
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode == 400 || statusCode == 404 {
logBlockRejected(block, c.sanitizedURL, resp)
_ = resp.Body.Close()
c.packetsDropped.Inc()
return true
}
// Unexpected status code returned
retriesCount++
retryAfterHeader := parseRetryAfterHeader(resp.Header.Get("Retry-After"))
retryDuration = getRetryDuration(retryAfterHeader, retryDuration, maxRetryDuration)
// Handle response
body, err := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if err != nil {
logger.Errorf("cannot read response body from %q during retry #%d: %s", c.sanitizedURL, retriesCount, err)
} else {
logger.Errorf("unexpected status code received after sending a block with size %d bytes to %q during retry #%d: %d; response body=%q; "+
"re-sending the block in %.3f seconds", len(block), c.sanitizedURL, retriesCount, statusCode, body, retryDuration.Seconds())
}
t := timerpool.Get(retryDuration)
select {
case <-c.stopCh:
timerpool.Put(t)
return false
case <-t.C:
timerpool.Put(t)
}
c.retriesCount.Inc()
goto again
}
var remoteWriteRejectedLogger = logger.WithThrottler("remoteWriteRejected", 5*time.Second)
var remoteWriteRetryLogger = logger.WithThrottler("remoteWriteRetry", 5*time.Second)
// getRetryDuration returns retry duration.
// retryAfterDuration has the highest priority.
// If retryAfterDuration is not specified, retryDuration gets doubled.
// retryDuration can't exceed maxRetryDuration.
//
// Also see: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6097
func getRetryDuration(retryAfterDuration, retryDuration, maxRetryDuration time.Duration) time.Duration {
// retryAfterDuration has the highest priority duration
if retryAfterDuration > 0 {
return timeutil.AddJitterToDuration(retryAfterDuration)
}
// default backoff retry policy
retryDuration *= 2
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
return retryDuration
}
func logBlockRejected(block []byte, sanitizedURL string, resp *http.Response) {
body, err := io.ReadAll(resp.Body)
if err != nil {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; "+
"failed to read response body: %s",
len(block), sanitizedURL, resp.StatusCode, err)
} else {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; response body: %s",
len(block), sanitizedURL, resp.StatusCode, string(body))
}
}
// parseRetryAfterHeader parses `Retry-After` value retrieved from HTTP response header.
// retryAfterString should be in either HTTP-date or a number of seconds.
// It will return time.Duration(0) if `retryAfterString` does not follow RFC 7231.
func parseRetryAfterHeader(retryAfterString string) (retryAfterDuration time.Duration) {
if retryAfterString == "" {
return retryAfterDuration
}
defer func() {
v := retryAfterDuration.Seconds()
logger.Infof("'Retry-After: %s' parsed into %.2f second(s)", retryAfterString, v)
}()
// Retry-After could be in "Mon, 02 Jan 2006 15:04:05 GMT" format.
if parsedTime, err := time.Parse(http.TimeFormat, retryAfterString); err == nil {
return time.Duration(time.Until(parsedTime).Seconds()) * time.Second
}
// Retry-After could be in seconds.
if seconds, err := strconv.Atoi(retryAfterString); err == nil {
return time.Duration(seconds) * time.Second
}
return 0
}

View File

@@ -0,0 +1,99 @@
package remotewrite
import (
"math"
"net/http"
"testing"
"time"
)
func TestCalculateRetryDuration(t *testing.T) {
// `testFunc` call `calculateRetryDuration` for `n` times
// and evaluate if the result of `calculateRetryDuration` is
// 1. >= expectMinDuration
// 2. <= expectMinDuration + 10% (see timeutil.AddJitterToDuration)
f := func(retryAfterDuration, retryDuration time.Duration, n int, expectMinDuration time.Duration) {
t.Helper()
for i := 0; i < n; i++ {
retryDuration = getRetryDuration(retryAfterDuration, retryDuration, time.Minute)
}
expectMaxDuration := helper(expectMinDuration)
expectMinDuration = expectMinDuration - (1000 * time.Millisecond) // Avoid edge case when calculating time.Until(now)
if !(retryDuration >= expectMinDuration && retryDuration <= expectMaxDuration) {
t.Fatalf(
"incorrect retry duration, want (ms): [%d, %d], got (ms): %d",
expectMinDuration.Milliseconds(), expectMaxDuration.Milliseconds(),
retryDuration.Milliseconds(),
)
}
}
// Call calculateRetryDuration for 1 time.
{
// default backoff policy
f(0, time.Second, 1, 2*time.Second)
// default backoff policy exceed max limit"
f(0, 10*time.Minute, 1, time.Minute)
// retry after > default backoff policy
f(10*time.Second, 1*time.Second, 1, 10*time.Second)
// retry after < default backoff policy
f(1*time.Second, 10*time.Second, 1, 1*time.Second)
// retry after invalid and < default backoff policy
f(0, time.Second, 1, 2*time.Second)
}
// Call calculateRetryDuration for multiple times.
{
// default backoff policy 2 times
f(0, time.Second, 2, 4*time.Second)
// default backoff policy 3 times
f(0, time.Second, 3, 8*time.Second)
// default backoff policy N times exceed max limit
f(0, time.Second, 10, time.Minute)
// retry after 120s 1 times
f(120*time.Second, time.Second, 1, 120*time.Second)
// retry after 120s 2 times
f(120*time.Second, time.Second, 2, 120*time.Second)
}
}
func TestParseRetryAfterHeader(t *testing.T) {
f := func(retryAfterString string, expectResult time.Duration) {
t.Helper()
result := parseRetryAfterHeader(retryAfterString)
// expect `expectResult == result` when retryAfterString is in seconds or invalid
// expect the difference between result and expectResult to be lower than 10%
if !(expectResult == result || math.Abs(float64(expectResult-result))/float64(expectResult) < 0.10) {
t.Fatalf(
"incorrect retry after duration, want (ms): %d, got (ms): %d",
expectResult.Milliseconds(), result.Milliseconds(),
)
}
}
// retry after header in seconds
f("10", 10*time.Second)
// retry after header in date time
f(time.Now().Add(30*time.Second).UTC().Format(http.TimeFormat), 30*time.Second)
// retry after header invalid
f("invalid-retry-after", 0)
// retry after header not in GMT
f(time.Now().Add(10*time.Second).Format("Mon, 02 Jan 2006 15:04:05 FAKETZ"), 0)
}
// helper calculate the max possible time duration calculated by timeutil.AddJitterToDuration.
func helper(d time.Duration) time.Duration {
dv := d / 10
if dv > 10*time.Second {
dv = 10 * time.Second
}
return d + dv
}

View File

@@ -0,0 +1,158 @@
package remotewrite
import (
"flag"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
)
var (
maxUnpackedBlockSize = flagutil.NewBytes("remoteWrite.maxBlockSize", 8*1024*1024, "The maximum block size to send to remote storage. Bigger blocks may improve performance at the cost of the increased memory usage.")
flushInterval = flag.Duration("remoteWrite.flushInterval", time.Second, "Interval for flushing the data to remote storage. "+
"This option takes effect only when less than 2MB of data per second are pushed to -remoteWrite.url")
)
type pendingLogs struct {
lastFlushTime atomic.Uint64
// The queue to send blocks to.
fq *persistentqueue.FastQueue
// mu protects wr
mu sync.Mutex
wr writeRequest
stopCh chan struct{}
periodicFlusherWG sync.WaitGroup
}
func newPendingLogs(fq *persistentqueue.FastQueue) *pendingLogs {
pl := &pendingLogs{
fq: fq,
stopCh: make(chan struct{}),
}
pl.periodicFlusherWG.Add(1)
go func() {
defer pl.periodicFlusherWG.Done()
pl.periodicFlusher()
}()
return pl
}
func (pl *pendingLogs) add(lr *logstorage.LogRows) {
lr.ForEachRow(func(_ uint64, r *logstorage.InsertRow) {
pl.addLogRow(r)
})
}
func (pl *pendingLogs) addLogRow(r *logstorage.InsertRow) {
bb := bbPool.Get()
bb.B = r.Marshal(bb.B)
pl.mu.Lock()
_, _ = pl.wr.pendingData.Write(bb.B)
pl.wr.pendingLogRowsCount++
if len(pl.wr.pendingData.B) > maxUnpackedBlockSize.IntN() {
pl.mustFlushLocked()
}
pl.mu.Unlock()
bbPool.Put(bb)
}
func (pl *pendingLogs) mustFlushLocked() {
pl.lastFlushTime.Store(fasttime.UnixTimestamp())
pl.wr.push(func(b []byte) {
if !pl.fq.TryWriteBlock(b) {
logger.Fatalf("BUG: TryWriteBlock cannot return false")
}
})
pl.wr.reset()
}
func (pl *pendingLogs) periodicFlusher() {
flushSeconds := int64(flushInterval.Seconds())
if flushSeconds <= 0 {
flushSeconds = 1
}
d := timeutil.AddJitterToDuration(*flushInterval)
ticker := time.NewTicker(d)
defer ticker.Stop()
for {
select {
case <-pl.stopCh:
pl.mu.Lock()
pl.mustFlushOnStop()
pl.mu.Unlock()
return
case <-ticker.C:
if fasttime.UnixTimestamp()-pl.lastFlushTime.Load() < uint64(flushSeconds) {
continue
}
}
pl.mu.Lock()
pl.mustFlushLocked()
pl.mu.Unlock()
}
}
// mustFlushOnStop force pushes wr data
//
// This is needed in order to properly save in-memory data to persistent queue on graceful shutdown.
func (pl *pendingLogs) mustFlushOnStop() {
pl.wr.push(pl.fq.MustWriteBlockIgnoreDisabledPQ)
pl.wr.reset()
}
func (pl *pendingLogs) mustStop() {
close(pl.stopCh)
pl.periodicFlusherWG.Wait()
}
type writeRequest struct {
pendingData bytesutil.ByteBuffer
pendingLogRowsCount int64
}
func (wr *writeRequest) push(pushBlock func([]byte)) {
if len(wr.pendingData.B) == 0 {
return
}
b := wr.pendingData.B
zb := compressBufPool.Get()
zb.B = zstd.CompressLevel(zb.B[:0], b, 1)
zbLen := len(zb.B)
pushBlock(zb.B)
compressBufPool.Put(zb)
blockSizeBytes.Update(float64(zbLen))
blockSizeLogRows.Update(float64(wr.pendingLogRowsCount))
}
func (wr *writeRequest) reset() {
wr.pendingData.Reset()
wr.pendingLogRowsCount = 0
}
var (
blockSizeBytes = metrics.NewHistogram(`vlagent_remotewrite_block_size_bytes`)
blockSizeLogRows = metrics.NewHistogram(`vlagent_remotewrite_block_size_rows`)
)
var (
compressBufPool bytesutil.ByteBufferPool
bbPool bytesutil.ByteBufferPool
)

View File

@@ -0,0 +1,277 @@
package remotewrite
import (
"flag"
"fmt"
"net/url"
"path/filepath"
"sync"
"sync/atomic"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
)
var (
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support VictoriaLogs native protocol. "+
"Example url: http://<victorialogs-host>:9428/internal/insert. "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems.")
maxPendingBytesPerURL = flagutil.NewArrayBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
"for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. "+
"Buffered data is stored in ~500MB chunks. It is recommended to set the value for this flag to a multiple of the block size 500MB. "+
"Disk usage is unlimited if the value is set to 0")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vlagent-remotewrite-data", "Path to directory for storing pending data, which isn't sent to the configured -remoteWrite.url . "+
"See also -remoteWrite.maxDiskUsagePerURL")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. "+
"Default value depends on the number of available CPU cores. It should work fine in most cases since it minimizes resource usage")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
)
// rwctxsGlobal contains statically populated entries when -remoteWrite.url is specified.
var rwctxsGlobal []*remoteWriteCtx
// Storage implements insertutil.LogRowsStorage interface
type Storage struct{}
// MustAddRows implements insertutil.LogRowsStorage interface
func (*Storage) MustAddRows(lr *logstorage.LogRows) {
pushToRemoteStorages(lr)
}
// CanWriteData implements insertutil.LogRowsStorage interface
func (*Storage) CanWriteData() error {
return nil
}
// maxQueues limits the maximum value for `-remoteWrite.queues`. There is no sense in setting too high value,
// since it may lead to high memory usage due to big number of buffers.
var maxQueues = cgroup.AvailableCPUs() * 16
const persistentQueueDirname = "persistent-queue"
// InitSecretFlags must be called after flag.Parse and before any logging.
func InitSecretFlags() {
if !*showRemoteWriteURL {
// remoteWrite.url can contain authentication codes, so hide it at `/metrics` output.
flagutil.RegisterSecretFlag("remoteWrite.url")
}
}
// Init initializes remotewrite.
//
// It must be called after flag.Parse().
//
// Stop must be called for graceful shutdown.
func Init() {
if len(*remoteWriteURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
}
if *queues > maxQueues {
*queues = maxQueues
}
if *queues <= 0 {
*queues = 1
}
initRemoteWriteCtxs(*remoteWriteURLs)
dropDanglingQueues()
}
// Stop stops remotewrite.
//
// It is expected that nobody calls TryPush during and after the call to this func.
func Stop() {
for _, rwctx := range rwctxsGlobal {
rwctx.mustStop()
}
rwctxsGlobal = nil
}
func dropDanglingQueues() {
// Remove dangling persistent queues, if any.
// This is required for the case when the number of queues has been changed or URL have been changed.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
//
// In case if there were many persistent queues with identical *remoteWriteURLs
// the queue with the last index will be dropped.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6140
existingQueues := make(map[string]struct{}, len(rwctxsGlobal))
for _, rwctx := range rwctxsGlobal {
existingQueues[rwctx.fq.Dirname()] = struct{}{}
}
queuesDir := filepath.Join(*tmpDataPath, persistentQueueDirname)
files := fs.MustReadDir(queuesDir)
removed := 0
for _, f := range files {
dirname := f.Name()
if _, ok := existingQueues[dirname]; !ok {
logger.Infof("removing dangling queue %q", dirname)
fullPath := filepath.Join(queuesDir, dirname)
fs.MustRemoveAll(fullPath)
removed++
}
}
if removed > 0 {
logger.Infof("removed %d dangling queues from %q, active queues: %d", removed, *tmpDataPath, len(rwctxsGlobal))
}
}
func initRemoteWriteCtxs(urls []string) {
if len(urls) == 0 {
logger.Panicf("BUG: urls must be non-empty")
}
maxInmemoryBlocks := memory.Allowed() / len(urls) / 10000
if maxInmemoryBlocks / *queues > 100 {
// There is no much sense in keeping higher number of blocks in memory,
// since this means that the producer outperforms consumer and the queue
// will continue growing. It is better storing the queue to file.
maxInmemoryBlocks = 100 * *queues
}
if maxInmemoryBlocks < 2 {
maxInmemoryBlocks = 2
}
rwctxs := make([]*remoteWriteCtx, len(urls))
rwctxIdx := make([]int, len(urls))
for i, remoteWriteURLRaw := range urls {
remoteWriteURL, err := url.Parse(remoteWriteURLRaw)
if err != nil {
logger.Fatalf("invalid -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
if *showRemoteWriteURL {
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
}
rwctxs[i] = newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
rwctxIdx[i] = i
}
rwctxsGlobal = rwctxs
}
func pushToRemoteStorages(lr *logstorage.LogRows) {
rwctxs := rwctxsGlobal
if len(rwctxs) == 1 {
// fast path
rwctxs[0].push(lr)
return
}
// Push samples to remote storage systems in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
var wg sync.WaitGroup
for _, rwctx := range rwctxs {
wg.Add(1)
go func(rwctx *remoteWriteCtx) {
defer wg.Done()
rwctx.push(lr)
}(rwctx)
}
wg.Wait()
}
type remoteWriteCtx struct {
idx int
fq *persistentqueue.FastQueue
c *client
pls []*pendingLogs
pssNextIdx atomic.Uint64
}
func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks int, sanitizedURL string) *remoteWriteCtx {
// protocol version is required by victoria-logs
q := remoteWriteURL.Query()
q.Set("version", netinsert.ProtocolVersion)
remoteWriteURL.RawQuery = q.Encode()
// strip query params, otherwise changing params resets pq
pqURL := *remoteWriteURL
pqURL.RawQuery = ""
pqURL.Fragment = ""
h := xxhash.Sum64([]byte(pqURL.String()))
queuePath := filepath.Join(*tmpDataPath, persistentQueueDirname, fmt.Sprintf("%d_%016X", argIdx+1, h))
maxPendingBytes := maxPendingBytesPerURL.GetOptionalArg(argIdx)
if maxPendingBytes != 0 && maxPendingBytes < persistentqueue.DefaultChunkFileSize {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4195
logger.Warnf("rounding the -remoteWrite.maxDiskUsagePerURL=%d to the minimum supported value: %d", maxPendingBytes, persistentqueue.DefaultChunkFileSize)
maxPendingBytes = persistentqueue.DefaultChunkFileSize
}
fq := persistentqueue.MustOpenFastQueue(queuePath, sanitizedURL, maxInmemoryBlocks, maxPendingBytes, false)
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_pending_data_bytes{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetPendingBytes())
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_pending_inmemory_blocks{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetInmemoryQueueLen())
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_queue_blocked{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
if fq.IsWriteBlocked() {
return 1
}
return 0
})
var c *client
switch remoteWriteURL.Scheme {
case "http", "https":
c = newHTTPClient(argIdx, remoteWriteURL.String(), sanitizedURL, fq, *queues)
default:
logger.Fatalf("unsupported scheme: %s for remoteWriteURL: %s, want `http`, `https`", remoteWriteURL.Scheme, sanitizedURL)
}
c.init(argIdx, *queues, sanitizedURL)
// Initialize pss
plsLen := *queues
if n := cgroup.AvailableCPUs(); plsLen > n {
// There is no sense in running more than availableCPUs concurrent pendingLogs,
// since every pendingLogs can saturate up to a single CPU.
plsLen = n
}
pls := make([]*pendingLogs, plsLen)
for i := range pls {
pls[i] = newPendingLogs(fq)
}
rwctx := &remoteWriteCtx{
idx: argIdx,
fq: fq,
c: c,
pls: pls,
}
return rwctx
}
func (rwctx *remoteWriteCtx) push(lr *logstorage.LogRows) {
pls := rwctx.pls
idx := rwctx.pssNextIdx.Add(1) % uint64(len(pls))
pls[idx].add(lr)
}
func (rwctx *remoteWriteCtx) mustStop() {
for _, ps := range rwctx.pls {
ps.mustStop()
}
rwctx.idx = 0
rwctx.pls = nil
rwctx.fq.UnblockAllReaders()
rwctx.c.MustStop()
rwctx.c = nil
rwctx.fq.MustClose()
rwctx.fq = nil
}

View File

@@ -11,7 +11,6 @@ import (
"github.com/valyala/fastjson"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
@@ -33,10 +32,10 @@ var parserPool fastjson.ParserPool
// RequestHandler processes Datadog insert requests
func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
switch path {
case "/api/v1/validate":
case "/insert/datadog/api/v1/validate":
fmt.Fprintf(w, `{}`)
return true
case "/api/v2/logs":
case "/insert/datadog/api/v2/logs":
return datadogLogsIngestion(w, r)
default:
return false
@@ -74,7 +73,7 @@ func datadogLogsIngestion(w http.ResponseWriter, r *http.Request) bool {
cp.IgnoreFields = *datadogIgnoreFields
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
@@ -102,7 +101,7 @@ func datadogLogsIngestion(w http.ResponseWriter, r *http.Request) bool {
var (
v2LogsRequestsTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/datadog/api/v2/logs"}`)
v2LogsRequestDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/datadog/api/v2/logs"}`)
v2LogsRequestDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/datadog/api/v2/logs"}`)
)
// datadog message field has two formats:

View File

@@ -11,7 +11,6 @@ import (
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bufferedwriter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
@@ -31,36 +30,38 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
// This header is needed for Logstash
w.Header().Set("X-Elastic-Product", "Elasticsearch")
if strings.HasPrefix(path, "/_ilm/policy") {
if strings.HasPrefix(path, "/insert/elasticsearch/_ilm/policy") {
// Return fake response for Elasticsearch ilm request.
fmt.Fprintf(w, `{}`)
return true
}
if strings.HasPrefix(path, "/_index_template") {
if strings.HasPrefix(path, "/insert/elasticsearch/_index_template") {
// Return fake response for Elasticsearch index template request.
fmt.Fprintf(w, `{}`)
return true
}
if strings.HasPrefix(path, "/_ingest") {
if strings.HasPrefix(path, "/insert/elasticsearch/_ingest") {
// Return fake response for Elasticsearch ingest pipeline request.
// See: https://www.elastic.co/guide/en/elasticsearch/reference/8.8/put-pipeline-api.html
fmt.Fprintf(w, `{}`)
return true
}
if strings.HasPrefix(path, "/_nodes") {
if strings.HasPrefix(path, "/insert/elasticsearch/_nodes") {
// Return fake response for Elasticsearch nodes discovery request.
// See: https://www.elastic.co/guide/en/elasticsearch/reference/8.8/cluster.html
fmt.Fprintf(w, `{}`)
return true
}
if strings.HasPrefix(path, "/logstash") || strings.HasPrefix(path, "/_logstash") {
if strings.HasPrefix(path, "/insert/elasticsearch/logstash") || strings.HasPrefix(path, "/insert/elasticsearch/_logstash") {
// Return fake response for Logstash APIs requests.
// See: https://www.elastic.co/guide/en/elasticsearch/reference/8.8/logstash-apis.html
fmt.Fprintf(w, `{}`)
return true
}
switch path {
case "/", "":
// some clients may omit trailing slash
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8353
case "/insert/elasticsearch/", "/insert/elasticsearch":
switch r.Method {
case http.MethodGet:
// Return fake response for Elasticsearch ping request.
@@ -75,7 +76,7 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/_license":
case "/insert/elasticsearch/_license":
// Return fake response for Elasticsearch license request.
fmt.Fprintf(w, `{
"license": {
@@ -86,7 +87,7 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
}
}`)
return true
case "/_bulk":
case "/insert/elasticsearch/_bulk":
startTime := time.Now()
bulkRequestsTotal.Inc()
@@ -95,7 +96,7 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
httpserver.Errorf(w, r, "%s", err)
return true
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
@@ -128,7 +129,7 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
var (
bulkRequestsTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/elasticsearch/_bulk"}`)
bulkRequestDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/elasticsearch/_bulk"}`)
bulkRequestDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/elasticsearch/_bulk"}`)
)
func readBulkRequest(streamName string, r io.Reader, encoding string, timeFields, msgFields []string, lmp insertutil.LogMessageProcessor) (int, error) {

View File

@@ -7,11 +7,11 @@ import (
"strconv"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -36,6 +36,7 @@ type CommonParams struct {
DecolorizeFields []string
ExtraFields []logstorage.Field
IsTimeFieldSet bool
Debug bool
DebugRequestURI string
DebugRemoteAddr string
@@ -49,8 +50,10 @@ func GetCommonParams(r *http.Request) (*CommonParams, error) {
return nil, err
}
var isTimeFieldSet bool
timeFields := []string{"_time"}
if tfs := httputil.GetArray(r, "_time_field", "VL-Time-Field"); len(tfs) > 0 {
isTimeFieldSet = true
timeFields = tfs
}
@@ -86,9 +89,11 @@ func GetCommonParams(r *http.Request) (*CommonParams, error) {
IgnoreFields: ignoreFields,
DecolorizeFields: decolorizeFields,
ExtraFields: extraFields,
Debug: debug,
DebugRequestURI: debugRequestURI,
DebugRemoteAddr: debugRemoteAddr,
IsTimeFieldSet: isTimeFieldSet,
Debug: debug,
DebugRequestURI: debugRequestURI,
DebugRemoteAddr: debugRemoteAddr,
}
return cp, nil
@@ -141,6 +146,29 @@ func GetCommonParamsForSyslog(tenantID logstorage.TenantID, streamFields, ignore
return cp
}
// LogRowsStorage is an interface for ingesting logs into the storage.
type LogRowsStorage interface {
// MustAddRows must add lr to the underlying storage.
MustAddRows(lr *logstorage.LogRows)
// CanWriteData must returns non-nil error if logs cannot be added to the underlying storage.
CanWriteData() error
}
var logRowsStorage LogRowsStorage
// SetLogRowsStorage sets the storage for writing data to via LogMessageProcessor.
//
// This function must be called before using LogMessageProcessor and CanWriteData from this package.
func SetLogRowsStorage(storage LogRowsStorage) {
logRowsStorage = storage
}
// CanWriteData returns non-nil error if data cannot be written to the underlying storage.
func CanWriteData() error {
return logRowsStorage.CanWriteData()
}
// LogMessageProcessor is an interface for log message processors.
type LogMessageProcessor interface {
// AddRow must add row to the LogMessageProcessor with the given timestamp and fields.
@@ -165,6 +193,7 @@ type logMessageProcessor struct {
rowsIngestedTotal *metrics.Counter
bytesIngestedTotal *metrics.Counter
flushDuration *metrics.Summary
}
func (lmp *logMessageProcessor) initPeriodicFlush() {
@@ -263,9 +292,11 @@ func (lmp *logMessageProcessor) AddInsertRow(r *logstorage.InsertRow) {
// flushLocked must be called under locked lmp.mu.
func (lmp *logMessageProcessor) flushLocked() {
lmp.lastFlushTime = time.Now()
vlstorage.MustAddRows(lmp.lr)
start := time.Now()
lmp.lastFlushTime = start
logRowsStorage.MustAddRows(lmp.lr)
lmp.lr.ResetKeepSettings()
lmp.flushDuration.UpdateDuration(start)
}
// MustClose flushes the remaining data to the underlying storage and closes lmp.
@@ -276,6 +307,7 @@ func (lmp *logMessageProcessor) MustClose() {
lmp.flushLocked()
logstorage.PutLogRows(lmp.lr)
lmp.lr = nil
messageProcessorCount.Add(-1)
}
// NewLogMessageProcessor returns new LogMessageProcessor for the given cp.
@@ -285,12 +317,14 @@ func (cp *CommonParams) NewLogMessageProcessor(protocolName string, isStreamMode
lr := logstorage.GetLogRows(cp.StreamFields, cp.IgnoreFields, cp.DecolorizeFields, cp.ExtraFields, *defaultMsgValue)
rowsIngestedTotal := metrics.GetOrCreateCounter(fmt.Sprintf("vl_rows_ingested_total{type=%q}", protocolName))
bytesIngestedTotal := metrics.GetOrCreateCounter(fmt.Sprintf("vl_bytes_ingested_total{type=%q}", protocolName))
flushDuration := metrics.GetOrCreateSummary(fmt.Sprintf("vl_insert_flush_duration_seconds{type=%q}", protocolName))
lmp := &logMessageProcessor{
cp: cp,
lr: lr,
rowsIngestedTotal: rowsIngestedTotal,
bytesIngestedTotal: bytesIngestedTotal,
flushDuration: flushDuration,
stopCh: make(chan struct{}),
}
@@ -299,10 +333,13 @@ func (cp *CommonParams) NewLogMessageProcessor(protocolName string, isStreamMode
lmp.initPeriodicFlush()
}
messageProcessorCount.Add(1)
return lmp
}
var (
rowsDroppedTotalDebug = metrics.NewCounter(`vl_rows_dropped_total{reason="debug"}`)
rowsDroppedTotalTooManyFields = metrics.NewCounter(`vl_rows_dropped_total{reason="too_many_fields"}`)
_ = metrics.NewGauge(`vl_insert_processors_count`, func() float64 { return float64(messageProcessorCount.Load()) })
messageProcessorCount atomic.Int64
)

View File

@@ -56,6 +56,7 @@ func NewLineReader(name string, r io.Reader) *LineReader {
// Check for Err in this case.
func (lr *LineReader) NextLine() bool {
for {
lr.Line = nil
if lr.bufOffset >= len(lr.buf) {
if lr.err != nil || lr.eofReached {
return false
@@ -101,9 +102,11 @@ func (lr *LineReader) readMoreData() bool {
bufLen := len(lr.buf)
if bufLen >= MaxLineSizeBytes.IntN() {
logger.Warnf("%s: the line length exceeds -insert.maxLineSizeBytes=%d; skipping it; line contents=%q", lr.name, MaxLineSizeBytes.IntN(), lr.buf)
ok, skippedBytes := lr.skipUntilNextLine()
logger.Warnf("%s: the line length exceeds -insert.maxLineSizeBytes=%d; skipping it; total skipped bytes=%d",
lr.name, MaxLineSizeBytes.IntN(), skippedBytes)
tooLongLinesSkipped.Inc()
return lr.skipUntilNextLine()
return ok
}
lr.buf = slicesutil.SetLength(lr.buf, MaxLineSizeBytes.IntN())
@@ -121,26 +124,35 @@ func (lr *LineReader) readMoreData() bool {
var tooLongLinesSkipped = metrics.NewCounter("vl_too_long_lines_skipped_total")
func (lr *LineReader) skipUntilNextLine() bool {
func (lr *LineReader) skipUntilNextLine() (bool, int) {
// Initialize skipped bytes count with MaxLineSizeBytes because
// we've already read that many bytes without encountering a newline,
// indicating the line size exceeds the maximum allowed limit.
skipSizeBytes := MaxLineSizeBytes.IntN()
for {
lr.buf = slicesutil.SetLength(lr.buf, MaxLineSizeBytes.IntN())
n, err := lr.r.Read(lr.buf)
skipSizeBytes += n
lr.buf = lr.buf[:n]
if err != nil {
if errors.Is(err, io.EOF) {
lr.eofReached = true
lr.buf = lr.buf[:0]
return true
return true, skipSizeBytes
}
lr.err = fmt.Errorf("cannot skip the current line: %s", err)
return false
return false, skipSizeBytes
}
if n := bytes.IndexByte(lr.buf, '\n'); n >= 0 {
// Include skipped bytes before \n, including the newline itself.
skipSizeBytes += n + 1 - len(lr.buf)
// Include \n in the buf, so too long line is replaced with an empty line.
// This is needed for maintaining synchorinzation consistency between lines
// in protocols such as Elasticsearch bulk import.
lr.buf = append(lr.buf[:0], lr.buf[n:]...)
return true
return true, skipSizeBytes
}
}
}

View File

@@ -24,6 +24,9 @@ func TestLineReader_Success(t *testing.T) {
if lr.NextLine() {
t.Fatalf("expecting error on the second call to NextLine()")
}
if len(lr.Line) > 0 {
t.Fatalf("unexpected non-empty line after failed NextLine(): %q", lr.Line)
}
if !reflect.DeepEqual(lines, linesExpected) {
t.Fatalf("unexpected lines\ngot\n%q\nwant\n%q", lines, linesExpected)
}

View File

@@ -38,7 +38,10 @@ func ExtractTimestampFromFields(timeFields []string, fields []logstorage.Field)
}
func parseTimestamp(s string) (int64, error) {
if s == "" || s == "0" {
// "-" is a nil timestamp value, if the syslog
// application is incapable of obtaining system time
// https://datatracker.ietf.org/doc/html/rfc5424#section-6.2.3
if s == "" || s == "0" || s == "-" {
return time.Now().UnixNano(), nil
}
if len(s) <= len("YYYY") || s[len("YYYY")] != '-' {

View File

@@ -133,6 +133,33 @@ func TestExtractTimestampFromFields_Success(t *testing.T) {
}, 1718773640000000000)
}
func TestExtractTimestampFromFields_Now(t *testing.T) {
f := func(timeField string, fields []logstorage.Field) {
t.Helper()
nsecs, err := ExtractTimestampFromFields([]string{timeField}, fields)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
if nsecs < 1 {
t.Fatalf("expected generated timestamp, got error: %s", err)
}
}
// RFC5424 allows `-` for nil timestamp (log ingestion time)
f("time", []logstorage.Field{
{Name: "time", Value: "-"},
})
f("time", []logstorage.Field{
{Name: "time", Value: ""},
})
f("time", []logstorage.Field{
{Name: "time", Value: "0"},
})
}
func TestExtractTimestampFromFields_Error(t *testing.T) {
f := func(s string) {
t.Helper()

View File

@@ -1,7 +1,6 @@
package internalinsert
import (
"flag"
"fmt"
"net/http"
"time"
@@ -9,7 +8,6 @@ import (
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
@@ -18,17 +16,11 @@ import (
)
var (
disableInsert = flag.Bool("internalinsert.disable", false, "Whether to disable /internal/insert HTTP endpoint")
maxRequestSize = flagutil.NewBytes("internalinsert.maxRequestSize", 64*1024*1024, "The maximum size in bytes of a single request, which can be accepted at /internal/insert HTTP endpoint")
)
// RequestHandler processes /internal/insert requests.
func RequestHandler(w http.ResponseWriter, r *http.Request) {
if *disableInsert {
httpserver.Errorf(w, r, "requests to /internal/insert are disabled with -internalinsert.disable command-line flag")
return
}
startTime := time.Now()
if r.Method != "POST" {
w.WriteHeader(http.StatusMethodNotAllowed)
@@ -47,7 +39,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) {
httpserver.Errorf(w, r, "%s", err)
return
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
@@ -92,5 +84,5 @@ var (
requestsTotal = metrics.NewCounter(`vl_http_requests_total{path="/internal/insert"}`)
errorsTotal = metrics.NewCounter(`vl_http_errors_total{path="/internal/insert"}`)
requestDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/internal/insert"}`)
requestDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/internal/insert"}`)
)

View File

@@ -3,29 +3,30 @@ package journald
import (
"bytes"
"encoding/binary"
"errors"
"flag"
"fmt"
"io"
"net/http"
"regexp"
"slices"
"strconv"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
// See https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-journal/journal-file.c#L1703
const journaldEntryMaxNameLen = 64
var allowedJournaldEntryNameChars = regexp.MustCompile(`^[A-Z_][A-Z0-9_]*`)
const maxFieldNameLen = 64
var (
journaldStreamFields = flagutil.NewArrayString("journald.streamFields", "Comma-separated list of fields to use as log stream fields for logs ingested over journald protocol. "+
@@ -36,9 +37,7 @@ var (
"See https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/#time-field")
journaldTenantID = flag.String("journald.tenantID", "0:0", "TenantID for logs ingested via the Journald endpoint. "+
"See https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/#multitenancy")
journaldIncludeEntryMetadata = flag.Bool("journald.includeEntryMetadata", false, "Include journal entry fields, which with double underscores.")
maxRequestSize = flagutil.NewBytes("journald.maxRequestSize", 64*1024*1024, "The maximum size in bytes of a single journald request")
journaldIncludeEntryMetadata = flag.Bool("journald.includeEntryMetadata", false, "Include Journald fields with double underscore prefixes")
)
func getCommonParams(r *http.Request) (*insertutil.CommonParams, error) {
@@ -53,11 +52,12 @@ func getCommonParams(r *http.Request) (*insertutil.CommonParams, error) {
}
cp.TenantID = tenantID
}
if len(cp.TimeFields) == 0 {
if !cp.IsTimeFieldSet {
cp.TimeFields = []string{*journaldTimeField}
}
if len(cp.StreamFields) == 0 {
cp.StreamFields = *journaldStreamFields
cp.StreamFields = getStreamFields()
}
if len(cp.IgnoreFields) == 0 {
cp.IgnoreFields = *journaldIgnoreFields
@@ -66,10 +66,23 @@ func getCommonParams(r *http.Request) (*insertutil.CommonParams, error) {
return cp, nil
}
func getStreamFields() []string {
if len(*journaldStreamFields) > 0 {
return *journaldStreamFields
}
return defaultStreamFields
}
var defaultStreamFields = []string{
"_MACHINE_ID",
"_HOSTNAME",
"_SYSTEMD_UNIT",
}
// RequestHandler processes Journald Export insert requests
func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
switch path {
case "/upload":
case "/insert/journald/upload":
if r.Header.Get("Content-Type") != "application/vnd.fdo.journal" {
httpserver.Errorf(w, r, "only application/vnd.fdo.journal encoding is supported for Journald")
return true
@@ -84,7 +97,7 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
// handleJournald parses Journal binary entries
func handleJournald(r *http.Request, w http.ResponseWriter) {
startTime := time.Now()
requestsJournaldTotal.Inc()
requestsTotal.Inc()
cp, err := getCommonParams(r)
if err != nil {
@@ -93,19 +106,25 @@ func handleJournald(r *http.Request, w http.ResponseWriter) {
return
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
errorsTotal.Inc()
httpserver.Errorf(w, r, "%s", err)
return
}
encoding := r.Header.Get("Content-Encoding")
err = protoparserutil.ReadUncompressedData(r.Body, encoding, maxRequestSize, func(data []byte) error {
lmp := cp.NewLogMessageProcessor("journald", false)
err := parseJournaldRequest(data, lmp, cp)
lmp.MustClose()
return err
})
reader, err := protoparserutil.GetUncompressedReader(r.Body, encoding)
if err != nil {
errorsTotal.Inc()
logger.Errorf("cannot decode journald request: %s", err)
return
}
lmp := cp.NewLogMessageProcessor("journald", true)
streamName := fmt.Sprintf("remoteAddr=%s, requestURI=%q", httpserver.GetQuotedRemoteAddr(r), r.RequestURI)
err = processStreamInternal(streamName, reader, lmp, cp)
protoparserutil.PutUncompressedReader(reader)
lmp.MustClose()
if err != nil {
errorsTotal.Inc()
httpserver.Errorf(w, r, "cannot read journald protocol data: %s", err)
@@ -117,102 +136,185 @@ func handleJournald(r *http.Request, w http.ResponseWriter) {
// See https://github.com/systemd/systemd/pull/34822
w.Header().Set("Accept-Encoding", "zstd")
// update requestJournaldDuration only for successfully parsed requests
// There is no need in updating requestJournaldDuration for request errors,
// update requestDuration only for successfully parsed requests
// There is no need in updating requestDuration for request errors,
// since their timings are usually much smaller than the timing for successful request parsing.
requestJournaldDuration.UpdateDuration(startTime)
requestDuration.UpdateDuration(startTime)
}
var (
requestsJournaldTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/journald/upload"}`)
errorsTotal = metrics.NewCounter(`vl_http_errors_total{path="/insert/journald/upload"}`)
requestJournaldDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/journald/upload"}`)
requestsTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/journald/upload"}`)
errorsTotal = metrics.NewCounter(`vl_http_errors_total{path="/insert/journald/upload"}`)
requestDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/journald/upload"}`)
)
func processStreamInternal(streamName string, r io.Reader, lmp insertutil.LogMessageProcessor, cp *insertutil.CommonParams) error {
wcr := writeconcurrencylimiter.GetReader(r)
defer writeconcurrencylimiter.PutReader(wcr)
lr := insertutil.NewLineReader("journald", wcr)
for {
err := readJournaldLogEntry(streamName, lr, lmp, cp)
wcr.DecConcurrency()
if err != nil {
if errors.Is(err, io.EOF) {
return nil
}
return fmt.Errorf("%s: %w", streamName, err)
}
}
}
type fieldsBuf struct {
fields []logstorage.Field
buf []byte
name []byte
value []byte
}
func (fb *fieldsBuf) reset() {
fb.fields = fb.fields[:0]
fb.buf = fb.buf[:0]
fb.name = fb.name[:0]
fb.value = fb.value[:0]
}
func (fb *fieldsBuf) addField(name, value string) {
bufLen := len(fb.buf)
fb.buf = append(fb.buf, name...)
nameCopy := bytesutil.ToUnsafeString(fb.buf[bufLen:])
bufLen = len(fb.buf)
fb.buf = append(fb.buf, value...)
valueCopy := bytesutil.ToUnsafeString(fb.buf[bufLen:])
fb.fields = append(fb.fields, logstorage.Field{
Name: nameCopy,
Value: valueCopy,
})
}
func (fb *fieldsBuf) appendNextLineToValue(lr *insertutil.LineReader) error {
if !lr.NextLine() {
if err := lr.Err(); err != nil {
return err
}
return fmt.Errorf("unexpected end of stream")
}
fb.value = append(fb.value, lr.Line...)
fb.value = append(fb.value, '\n')
return nil
}
func getFieldsBuf() *fieldsBuf {
fb := fieldsBufPool.Get()
if fb == nil {
return &fieldsBuf{}
}
return fb.(*fieldsBuf)
}
func putFieldsBuf(fb *fieldsBuf) {
fb.reset()
fieldsBufPool.Put(fb)
}
var fieldsBufPool sync.Pool
// readJournaldLogEntry reads a single log entry in Journald format.
//
// See https://systemd.io/JOURNAL_EXPORT_FORMATS/#journal-export-format
func parseJournaldRequest(data []byte, lmp insertutil.LogMessageProcessor, cp *insertutil.CommonParams) error {
var fields []logstorage.Field
func readJournaldLogEntry(streamName string, lr *insertutil.LineReader, lmp insertutil.LogMessageProcessor, cp *insertutil.CommonParams) error {
var ts int64
var size uint64
var name, value string
var line []byte
currentTimestamp := time.Now().UnixNano()
fb := getFieldsBuf()
defer putFieldsBuf(fb)
for len(data) > 0 {
idx := bytes.IndexByte(data, '\n')
switch {
case idx > 0:
// process fields
line = data[:idx]
data = data[idx+1:]
case idx == 0:
// next message or end of file
// double new line is a separator for the next message
if len(fields) > 0 {
if !lr.NextLine() {
if err := lr.Err(); err != nil {
return fmt.Errorf("cannot read the first field: %w", err)
}
return io.EOF
}
for {
line := lr.Line
if len(line) == 0 {
// The end of a single log entry. Write it to the storage
if len(fb.fields) > 0 {
if ts == 0 {
ts = currentTimestamp
ts = time.Now().UnixNano()
}
lmp.AddRow(ts, fields, nil)
fields = fields[:0]
lmp.AddRow(ts, fb.fields, nil)
}
// skip newline separator
data = data[1:]
continue
case idx < 0:
return fmt.Errorf("missing new line separator, unread data left=%d", len(data))
return nil
}
idx = bytes.IndexByte(line, '=')
// could b either e key=value\n pair
// or just key\n
// with binary data at the buffer
if idx > 0 {
name = bytesutil.ToUnsafeString(line[:idx])
value = bytesutil.ToUnsafeString(line[idx+1:])
// line could be either "key=value" or "key"
// according to https://systemd.io/JOURNAL_EXPORT_FORMATS/#journal-export-format
if n := bytes.IndexByte(line, '='); n >= 0 {
// line = "key=value"
fb.name = append(fb.name[:0], line[:n]...)
name = bytesutil.ToUnsafeString(fb.name)
fb.value = append(fb.value[:0], line[n+1:]...)
value = bytesutil.ToUnsafeString(fb.value)
} else {
name = bytesutil.ToUnsafeString(line)
if len(data) == 0 {
return fmt.Errorf("unexpected zero data for binary field value of key=%s", name)
// line = "key"
// Parse the binary-encoded value from the next line according to "key\n<little_endian_size_64>value\n" format
fb.name = append(fb.name[:0], line...)
name = bytesutil.ToUnsafeString(fb.name)
fb.value = fb.value[:0]
for len(fb.value) < 8 {
if err := fb.appendNextLineToValue(lr); err != nil {
return fmt.Errorf("cannot read value size: %w", err)
}
}
// size of binary data encoded as le i64 at the begging
idx, err := binary.Decode(data, binary.LittleEndian, &size)
if err != nil {
return fmt.Errorf("failed to extract binary field %q value size: %w", name, err)
size := binary.LittleEndian.Uint64(fb.value[:8])
// Read the value until its length exceeds the given size - the last char in the read value will always be '\n'
// because it is appended by appendNextLineToValue().
for uint64(len(fb.value[8:])) <= size {
if err := fb.appendNextLineToValue(lr); err != nil {
return fmt.Errorf("cannot read %q value with size %d bytes; read only %d bytes: %w", fb.name, size, len(fb.value[8:]), err)
}
}
// skip binary data size
data = data[idx:]
if size == 0 {
return fmt.Errorf("unexpected zero binary data size decoded %d", size)
value = bytesutil.ToUnsafeString(fb.value[8 : len(fb.value)-1])
if uint64(len(value)) != size {
return fmt.Errorf("unexpected %q value size; got %d bytes; want %d bytes; value: %q", fb.name, len(value), size, value)
}
if int(size) > len(data) {
return fmt.Errorf("binary data size=%d cannot exceed size of the data at buffer=%d", size, len(data))
}
value = bytesutil.ToUnsafeString(data[:size])
data = data[int(size):]
// binary data must has new line separator for the new line or next field
if len(data) == 0 {
return fmt.Errorf("unexpected empty buffer after binary field=%s read", name)
}
lastB := data[0]
if lastB != '\n' {
return fmt.Errorf("expected new line separator after binary field=%s, got=%s", name, string(lastB))
}
data = data[1:]
}
if len(name) > journaldEntryMaxNameLen {
return fmt.Errorf("journald entry name should not exceed %d symbols, got: %q", journaldEntryMaxNameLen, name)
if !lr.NextLine() {
if err := lr.Err(); err != nil {
return fmt.Errorf("cannot read the next log field: %w", err)
}
// add the last log field below before the return
}
if !allowedJournaldEntryNameChars.MatchString(name) {
return fmt.Errorf("journald entry name should consist of `A-Z0-9_` characters and must start from non-digit symbol")
if len(name) > maxFieldNameLen {
logger.Errorf("%s: field name size should not exceed %d bytes; got %d bytes: %q; skipping this field", streamName, maxFieldNameLen, len(name), name)
continue
}
if !isValidFieldName(name) {
logger.Errorf("%s: invalid field name %q; it must consist of `A-Z0-9_` chars and must start from non-digit char; skipping this field", streamName, name)
continue
}
if slices.Contains(cp.TimeFields, name) {
n, err := strconv.ParseInt(value, 10, 64)
t, err := strconv.ParseInt(value, 10, 64)
if err != nil {
return fmt.Errorf("failed to parse Journald timestamp, %w", err)
logger.Errorf("%s: cannot parse timestamp from the field %q: %w; using the current timestamp", streamName, name, err)
ts = 0
} else {
// Convert journald microsecond timestamp to nanoseconds
ts = t * 1e3
}
ts = n * 1e3
continue
}
@@ -220,18 +322,56 @@ func parseJournaldRequest(data []byte, lmp insertutil.LogMessageProcessor, cp *i
name = "_msg"
}
if *journaldIncludeEntryMetadata || !strings.HasPrefix(name, "__") {
fields = append(fields, logstorage.Field{
Name: name,
Value: value,
})
if name == "PRIORITY" {
priority := journaldPriorityToLevel(value)
fb.addField("level", priority)
}
if !strings.HasPrefix(name, "__") || *journaldIncludeEntryMetadata {
fb.addField(name, value)
}
}
if len(fields) > 0 {
if ts == 0 {
ts = currentTimestamp
}
lmp.AddRow(ts, fields, nil)
}
return nil
}
func journaldPriorityToLevel(priority string) string {
// See https://wiki.archlinux.org/title/Systemd/Journal#Priority_level
// and https://grafana.com/docs/grafana/latest/explore/logs-integration/#log-level
switch priority {
case "0":
return "emerg"
case "1":
return "alert"
case "2":
return "critical"
case "3":
return "error"
case "4":
return "warning"
case "5":
return "notice"
case "6":
return "info"
case "7":
return "debug"
default:
return priority
}
}
func isValidFieldName(s string) bool {
if len(s) == 0 {
return false
}
c := s[0]
if !(c >= 'A' && c <= 'Z' || c == '_') {
return false
}
for i := 1; i < len(s); i++ {
c := s[i]
if !(c >= 'A' && c <= 'Z' || c >= '0' && c <= '9' || c == '_') {
return false
}
}
return true
}

View File

@@ -1,20 +1,81 @@
package journald
import (
"bytes"
"net/http"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
)
func TestPushJournaldOk(t *testing.T) {
func TestIsValidFieldName(t *testing.T) {
f := func(name string, resultExpected bool) {
t.Helper()
result := isValidFieldName(name)
if result != resultExpected {
t.Fatalf("unexpected result for isValidJournaldFieldName(%q); got %v; want %v", name, result, resultExpected)
}
}
f("", false)
f("a", false)
f("1", false)
f("_", true)
f("X", true)
f("Xa", false)
f("X_343", true)
f("X_0123456789_AZ", true)
f("SDDFD sdf", false)
}
func TestGetCommonParams_TimeField(t *testing.T) {
f := func(timeFieldHeader, expectedTimeField string) {
t.Helper()
req, err := http.NewRequest("POST", "/insert/journald/upload", nil)
if err != nil {
t.Fatalf("unexpected error creating request: %s", err)
}
if timeFieldHeader != "" {
req.Header.Set("VL-Time-Field", timeFieldHeader)
}
cp, err := getCommonParams(req)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
if len(cp.TimeFields) != 1 || cp.TimeFields[0] != expectedTimeField {
t.Fatalf("unexpected TimeFields; got %v; want [%s]", cp.TimeFields, expectedTimeField)
}
}
// Test default behavior - when no custom time field is specified, journald uses __REALTIME_TIMESTAMP
f("", "__REALTIME_TIMESTAMP")
// Test custom time field - when a custom time field is specified via HTTP header, it's respected
f("custom_time", "custom_time")
}
func TestPushJournald_Success(t *testing.T) {
f := func(src string, timestampsExpected []int64, resultExpected string) {
t.Helper()
tlp := &insertutil.TestLogMessageProcessor{}
cp := &insertutil.CommonParams{
TimeFields: []string{"__REALTIME_TIMESTAMP"},
MsgFields: []string{"MESSAGE"},
r, err := http.NewRequest("GET", "https://foo.bar/baz", nil)
if err != nil {
t.Fatalf("cannot create request: %s", err)
}
if err := parseJournaldRequest([]byte(src), tlp, cp); err != nil {
cp, err := getCommonParams(r)
if err != nil {
t.Fatalf("cannot create commonParams: %s", err)
}
buf := bytes.NewBufferString(src)
if err := processStreamInternal("test", buf, tlp, cp); err != nil {
t.Fatalf("unexpected error: %s", err)
}
@@ -22,16 +83,17 @@ func TestPushJournaldOk(t *testing.T) {
t.Fatal(err)
}
}
// Single event
f("__REALTIME_TIMESTAMP=91723819283\nMESSAGE=Test message\n",
f("__REALTIME_TIMESTAMP=91723819283\nMESSAGE=Test message\n\n",
[]int64{91723819283000},
"{\"_msg\":\"Test message\"}",
)
// Multiple events
f("__REALTIME_TIMESTAMP=91723819283\nMESSAGE=Test message\n\n__REALTIME_TIMESTAMP=91723819284\nMESSAGE=Test message2\n",
f("__REALTIME_TIMESTAMP=91723819283\nPRIORITY=3\nMESSAGE=Test message\n\n__REALTIME_TIMESTAMP=91723819284\nMESSAGE=Test message2\n",
[]int64{91723819283000, 91723819284000},
"{\"_msg\":\"Test message\"}\n{\"_msg\":\"Test message2\"}",
"{\"level\":\"error\",\"PRIORITY\":\"3\",\"_msg\":\"Test message\"}\n{\"_msg\":\"Test message2\"}",
)
// Parse binary data
@@ -39,30 +101,66 @@ func TestPushJournaldOk(t *testing.T) {
[]int64{1729698775704404000},
"{\"E\":\"JobStateChanged\",\"_BOOT_ID\":\"f778b6e2f7584a77b991a2366612a7b5\",\"_UID\":\"0\",\"_GID\":\"0\",\"_MACHINE_ID\":\"a4a970370c30a925df02a13c67167847\",\"_HOSTNAME\":\"ecd5e4555787\",\"_RUNTIME_SCOPE\":\"system\",\"_TRANSPORT\":\"journal\",\"_CAP_EFFECTIVE\":\"1ffffffffff\",\"_SYSTEMD_CGROUP\":\"/init.scope\",\"_SYSTEMD_UNIT\":\"init.scope\",\"_SYSTEMD_SLICE\":\"-.slice\",\"CODE_FILE\":\"\\u003cstdin>\",\"CODE_LINE\":\"1\",\"CODE_FUNC\":\"\\u003cmodule>\",\"SYSLOG_IDENTIFIER\":\"python3\",\"_COMM\":\"python3\",\"_EXE\":\"/usr/bin/python3.12\",\"_CMDLINE\":\"python3\",\"_msg\":\"foo\\nbar\\n\\n\\nasda\\nasda\",\"_PID\":\"2763\",\"_SOURCE_REALTIME_TIMESTAMP\":\"1729698775704375\"}",
)
// Parse binary data with trailing newline
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x14\x00\x00\x00\x00\x00\x00\x00foo\nbar\n\n\nasda\nasda\n\n_PID=2763\n\n",
[]int64{1729698775704404000},
`{"_CMDLINE":"python3","_msg":"foo\nbar\n\n\nasda\nasda\n","_PID":"2763"}`,
)
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x00\x00\x00\x00\x00\x00\x00\x00\n_PID=2763\n\n",
[]int64{1729698775704404000},
`{"_CMDLINE":"python3","_PID":"2763"}`,
)
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x0A\x00\x00\x00\x00\x00\x00\x00123456789\n\n_PID=2763\n\n",
[]int64{1729698775704404000},
`{"_CMDLINE":"python3","_msg":"123456789\n","_PID":"2763"}`,
)
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x0A\x00\x00\x00\x00\x00\x00\x001234567890\n_PID=2763\n\n",
[]int64{1729698775704404000},
`{"_CMDLINE":"python3","_msg":"1234567890","_PID":"2763"}`,
)
// Empty field name must be ignored
f("__REALTIME_TIMESTAMP=91723819283\na=b\n=Test message", nil, "")
f("__REALTIME_TIMESTAMP=91723819284\nMESSAGE=Test message2\n\n__REALTIME_TIMESTAMP=91723819283\n=Test message\n", []int64{91723819284000}, `{"_msg":"Test message2"}`)
// field name starting with number must be ignored
f("__REALTIME_TIMESTAMP=91723819283\n1incorrect=Test message\n\n__REALTIME_TIMESTAMP=91723819284\nMESSAGE=Test message2\n\n", []int64{91723819284000}, `{"_msg":"Test message2"}`)
// field name exceeding 64 bytes limit must be ignored
f("__REALTIME_TIMESTAMP=91723819283\ntoolooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooongcorrecooooooooooooong=Test message\n", nil, "")
// field name with invalid chars must be ignored
f("__REALTIME_TIMESTAMP=91723819283\nbadC!@$!@$as=Test message\n", nil, "")
}
func TestPushJournald_Failure(t *testing.T) {
f := func(data string) {
t.Helper()
tlp := &insertutil.TestLogMessageProcessor{}
cp := &insertutil.CommonParams{
TimeFields: []string{"__REALTIME_TIMESTAMP"},
MsgFields: []string{"MESSAGE"},
r, err := http.NewRequest("GET", "https://foo.bar/baz", nil)
if err != nil {
t.Fatalf("cannot create request: %s", err)
}
if err := parseJournaldRequest([]byte(data), tlp, cp); err == nil {
t.Fatalf("expected non nil error")
cp, err := getCommonParams(r)
if err != nil {
t.Fatalf("cannot create commonParams: %s", err)
}
buf := bytes.NewBufferString(data)
if err := processStreamInternal("test", buf, tlp, cp); err == nil {
t.Fatalf("expecting non-nil error")
}
}
// missing new line terminator for binary encoded message
f("__CURSOR=s=e0afe8412a6a49d2bfcf66aa7927b588;i=1f06;b=f778b6e2f7584a77b991a2366612a7b5;m=300bdfd420;t=62526e1182354;x=930dc44b370963b7\n__REALTIME_TIMESTAMP=1729698775704404\nMESSAGE\n\x13\x00\x00\x00\x00\x00\x00\x00foo\nbar\n\n\nasdaasda2")
// missing new line terminator
f("__REALTIME_TIMESTAMP=91723819283\n=Test message")
// empty field name
f("__REALTIME_TIMESTAMP=91723819283\n=Test message\n")
// field name starting with number
f("__REALTIME_TIMESTAMP=91723819283\n1incorrect=Test message\n")
// field name exceeds 64 limit
f("__REALTIME_TIMESTAMP=91723819283\ntoolooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooongcorrecooooooooooooong=Test message\n")
// Only allow A-Z0-9 and '_'
f("__REALTIME_TIMESTAMP=91723819283\nbadC!@$!@$as=Test message\n")
// too short binary encoded message
f("__CURSOR=s=e0afe8412a6a49d2bfcf66aa7927b588;i=1f06;b=f778b6e2f7584a77b991a2366612a7b5;m=300bdfd420;t=62526e1182354;x=930dc44b370963b7\n__REALTIME_TIMESTAMP=1729698775704404\nMESSAGE\n\x13\x00\x00\x00\x00\x00\x00\x00foo\nbar\n\n\nasdaasd")
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x00\x00\x00\x00\x00\x00\x00\x00_PID=2763\n\n")
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x0A\x00\x00\x00\x00\x00\x00\x001234567890_PID=2763\n\n")
f("__REALTIME_TIMESTAMP=1729698775704404\n_CMDLINE=python3\nMESSAGE\n\x0A\x00\x00\x00\x00\x00\x00\x00123456789\n_PID=2763\n\n")
// too long binary encoded message
f("__CURSOR=s=e0afe8412a6a49d2bfcf66aa7927b588;i=1f06;b=f778b6e2f7584a77b991a2366612a7b5;m=300bdfd420;t=62526e1182354;x=930dc44b370963b7\n__REALTIME_TIMESTAMP=1729698775704404\nMESSAGE\n\x13\x00\x00\x00\x00\x00\x00\x00foo\nbar\n\n\nasdaasdakljlsfd")
}

View File

@@ -0,0 +1,82 @@
package journald
import (
"bytes"
"encoding/binary"
"fmt"
"strings"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
)
func BenchmarkIsValidFieldName(b *testing.B) {
b.ReportAllocs()
b.SetBytes(int64(len(benchmarkFields)))
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
for _, field := range benchmarkFields {
if !isValidFieldName(field) {
panic(fmt.Errorf("cannot validate field %q", field))
}
}
}
})
}
var benchmarkFields = strings.Split(
"E,_BOOT_ID,_UID,_GID,_MACHINE_ID,_HOSTNAME,_RUNTIME_SCOPE,_TRANSPORT,_CAP_EFFECTIVE,_SYSTEMD_CGROUP,_SYSTEMD_UNIT,"+
"_SYSTEMD_SLICE,CODE_FILE,CODE_LINE,CODE_FUNC,SYSLOG_IDENTIFIER,_COMM,_EXE,_CMDLINE,MESSAGE,_PID,_SOURCE_REALTIME_TIMESTAMP,_REALTIME_TIMESTAMP",
",")
func BenchmarkPushJournaldPerformance(b *testing.B) {
cp := &insertutil.CommonParams{
TimeFields: []string{"__REALTIME_TIMESTAMP"},
MsgFields: []string{"MESSAGE"},
}
const dataChunkSize = 1024 * 1024
data := generateJournaldData(dataChunkSize)
b.ReportAllocs()
b.SetBytes(int64(len(data)))
b.RunParallel(func(pb *testing.PB) {
r := &bytes.Reader{}
blp := &insertutil.BenchmarkLogMessageProcessor{}
for pb.Next() {
r.Reset(data)
if err := processStreamInternal("performance_test", r, blp, cp); err != nil {
panic(fmt.Errorf("unexpected error: %w", err))
}
}
})
}
func generateJournaldData(size int) []byte {
var buf []byte
timestamp := time.Now().UnixMicro()
binaryMsg := []byte("binary message data for performance test")
var sizeBuf [8]byte
for len(buf) < size {
timestamp++
var entry string
// Generate a mix of simple and binary messages
if timestamp%10 == 0 {
// Generate binary message
binary.LittleEndian.PutUint64(sizeBuf[:], uint64(len(binaryMsg)))
entry = fmt.Sprintf("__REALTIME_TIMESTAMP=%d\nMESSAGE\n%s%s\n\n",
timestamp,
sizeBuf[:],
binaryMsg,
)
} else {
// Generate simple message
entry = fmt.Sprintf("__REALTIME_TIMESTAMP=%d\nMESSAGE=Performance test message %d\n\n", timestamp, timestamp)
}
buf = append(buf, entry...)
}
return buf
}

View File

@@ -7,7 +7,6 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
@@ -33,7 +32,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) {
httpserver.Errorf(w, r, "%s", err)
return
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
@@ -120,5 +119,5 @@ var (
requestsTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/jsonline"}`)
errorsTotal = metrics.NewCounter(`vl_http_errors_total{path="/insert/jsonline"}`)
requestDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/jsonline"}`)
requestDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/jsonline"}`)
)

View File

@@ -16,10 +16,10 @@ var disableMessageParsing = flag.Bool("loki.disableMessageParsing", false, "Whet
// RequestHandler processes Loki insert requests
func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
switch path {
case "/api/v1/push":
case "/insert/loki/api/v1/push":
handleInsert(r, w)
return true
case "/ready":
case "/insert/loki/ready":
// See https://grafana.com/docs/loki/latest/api/#identify-ready-loki-instance
w.WriteHeader(http.StatusOK)
w.Write([]byte("ready"))

View File

@@ -9,7 +9,6 @@ import (
"github.com/valyala/fastjson"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
@@ -30,7 +29,7 @@ func handleJSON(r *http.Request, w http.ResponseWriter) {
httpserver.Errorf(w, r, "cannot parse common params from request: %s", err)
return
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
@@ -59,7 +58,7 @@ func handleJSON(r *http.Request, w http.ResponseWriter) {
var (
requestsJSONTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/loki/api/v1/push",format="json"}`)
requestJSONDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/loki/api/v1/push",format="json"}`)
requestJSONDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/loki/api/v1/push",format="json"}`)
)
func parseJSONRequest(data []byte, lmp insertutil.LogMessageProcessor, msgFields []string, useDefaultStreamFields, parseMessage bool) error {

View File

@@ -9,7 +9,6 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
@@ -29,7 +28,7 @@ func handleProtobuf(r *http.Request, w http.ResponseWriter) {
httpserver.Errorf(w, r, "cannot parse common params from request: %s", err)
return
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
@@ -63,7 +62,7 @@ func handleProtobuf(r *http.Request, w http.ResponseWriter) {
var (
requestsProtobufTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/loki/api/v1/push",format="protobuf"}`)
requestProtobufDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/loki/api/v1/push",format="protobuf"}`)
requestProtobufDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/loki/api/v1/push",format="protobuf"}`)
)
func parseProtobufRequest(data []byte, lmp insertutil.LogMessageProcessor, msgFields []string, useDefaultStreamFields, parseMessage bool) error {

View File

@@ -1,6 +1,7 @@
package vlinsert
import (
"flag"
"fmt"
"net/http"
"strings"
@@ -13,6 +14,12 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/loki"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/opentelemetry"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/syslog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
)
var (
disableInsert = flag.Bool("insert.disable", false, "Whether to disable /insert/* HTTP endpoints")
disableInternal = flag.Bool("internalinsert.disable", false, "Whether to disable /internal/insert HTTP endpoint")
)
// Init initializes vlinsert
@@ -27,49 +34,55 @@ func Stop() {
// RequestHandler handles insert requests for VictoriaLogs
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
path := strings.ReplaceAll(r.URL.Path, "//", "/")
if strings.HasPrefix(path, "/insert/") {
if *disableInsert {
httpserver.Errorf(w, r, "requests to /insert/* are disabled with -insert.disable command-line flag")
return true
}
return insertHandler(w, r, path)
}
if path == "/internal/insert" {
if *disableInternal || *disableInsert {
httpserver.Errorf(w, r, "requests to /internal/insert are disabled with -internalinsert.disable or -insert.disable command-line flag")
return true
}
internalinsert.RequestHandler(w, r)
return true
}
if !strings.HasPrefix(path, "/insert/") {
// Skip requests, which do not start with /insert/, since these aren't our requests.
return false
}
path = strings.TrimPrefix(path, "/insert")
path = strings.ReplaceAll(path, "//", "/")
return false
}
func insertHandler(w http.ResponseWriter, r *http.Request, path string) bool {
switch path {
case "/jsonline":
case "/insert/jsonline":
jsonline.RequestHandler(w, r)
return true
case "/ready":
case "/insert/ready":
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(200)
fmt.Fprintf(w, `{"status":"ok"}`)
return true
}
switch {
case strings.HasPrefix(path, "/elasticsearch"):
// some clients may omit trailing slash
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8353
path = strings.TrimPrefix(path, "/elasticsearch")
// some clients may omit trailing slash at elasticsearch protocol.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8353
case strings.HasPrefix(path, "/insert/elasticsearch"):
return elasticsearch.RequestHandler(path, w, r)
case strings.HasPrefix(path, "/loki/"):
path = strings.TrimPrefix(path, "/loki")
case strings.HasPrefix(path, "/insert/loki/"):
return loki.RequestHandler(path, w, r)
case strings.HasPrefix(path, "/opentelemetry/"):
path = strings.TrimPrefix(path, "/opentelemetry")
case strings.HasPrefix(path, "/insert/opentelemetry/"):
return opentelemetry.RequestHandler(path, w, r)
case strings.HasPrefix(path, "/journald/"):
path = strings.TrimPrefix(path, "/journald")
case strings.HasPrefix(path, "/insert/journald/"):
return journald.RequestHandler(path, w, r)
case strings.HasPrefix(path, "/datadog/"):
path = strings.TrimPrefix(path, "/datadog")
case strings.HasPrefix(path, "/insert/datadog/"):
return datadog.RequestHandler(path, w, r)
default:
return false
}
return false
}

View File

@@ -6,7 +6,6 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
@@ -22,7 +21,7 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool {
switch path {
// use the same path as opentelemetry collector
// https://opentelemetry.io/docs/specs/otlp/#otlphttp-request
case "/v1/logs":
case "/insert/opentelemetry/v1/logs":
if r.Header.Get("Content-Type") == "application/json" {
httpserver.Errorf(w, r, "json encoding isn't supported for opentelemetry format. Use protobuf encoding")
return true
@@ -43,7 +42,7 @@ func handleProtobuf(r *http.Request, w http.ResponseWriter) {
httpserver.Errorf(w, r, "cannot parse common params from request: %s", err)
return
}
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
@@ -71,7 +70,7 @@ var (
requestsProtobufTotal = metrics.NewCounter(`vl_http_requests_total{path="/insert/opentelemetry/v1/logs",format="protobuf"}`)
errorsTotal = metrics.NewCounter(`vl_http_errors_total{path="/insert/opentelemetry/v1/logs",format="protobuf"}`)
requestProtobufDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/opentelemetry/v1/logs",format="protobuf"}`)
requestProtobufDuration = metrics.NewSummary(`vl_http_request_duration_seconds{path="/insert/opentelemetry/v1/logs",format="protobuf"}`)
)
func pushProtobufRequest(data []byte, lmp insertutil.LogMessageProcessor, msgFields []string, useDefaultStreamFields bool) error {

View File

@@ -17,7 +17,6 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
@@ -385,7 +384,7 @@ func serveTCP(ln net.Listener, tenantID logstorage.TenantID, encoding string, us
// processStream parses a stream of syslog messages from r and ingests them into vlstorage.
func processStream(protocol string, r io.Reader, encoding string, useLocalTimestamp bool, cp *insertutil.CommonParams) error {
if err := vlstorage.CanWriteData(); err != nil {
if err := insertutil.CanWriteData(); err != nil {
return err
}

View File

@@ -101,8 +101,8 @@ func TestProcessStreamInternal_Success(t *testing.T) {
currentYear := 2023
timestampsExpected := []int64{1685794113000000000, 1685880513000000000, 1685814132345000000}
resultExpected := `{"format":"rfc3164","hostname":"abcd","app_name":"systemd","_msg":"Starting Update the local ESM caches..."}
{"priority":"165","facility":"20","severity":"5","format":"rfc3164","hostname":"abcd","app_name":"systemd","proc_id":"345","_msg":"abc defg"}
{"priority":"123","facility":"15","severity":"3","format":"rfc5424","hostname":"mymachine.example.com","app_name":"appname","proc_id":"12345","msg_id":"ID47","exampleSDID@32473.iut":"3","exampleSDID@32473.eventSource":"Application 123 = ] 56","exampleSDID@32473.eventID":"11211","_msg":"This is a test message with structured data."}`
{"priority":"165","facility_keyword":"local4","level":"notice","facility":"20","severity":"5","format":"rfc3164","hostname":"abcd","app_name":"systemd","proc_id":"345","_msg":"abc defg"}
{"priority":"123","facility_keyword":"solaris-cron","level":"error","facility":"15","severity":"3","format":"rfc5424","hostname":"mymachine.example.com","app_name":"appname","proc_id":"12345","msg_id":"ID47","exampleSDID@32473.iut":"3","exampleSDID@32473.eventSource":"Application 123 = ] 56","exampleSDID@32473.eventID":"11211","_msg":"This is a test message with structured data."}`
f(data, currentYear, timestampsExpected, resultExpected)
}

View File

@@ -2,7 +2,6 @@ package internalselect
import (
"context"
"flag"
"fmt"
"net/http"
"strconv"
@@ -22,15 +21,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
)
var disableSelect = flag.Bool("internalselect.disable", false, "Whether to disable /internal/select/* HTTP endpoints")
// RequestHandler processes requests to /internal/select/*
func RequestHandler(ctx context.Context, w http.ResponseWriter, r *http.Request) {
if *disableSelect {
httpserver.Errorf(w, r, "requests to /internal/select/* are disabled with -internalselect.disable command-line flag")
return
}
startTime := time.Now()
path := r.URL.Path

View File

@@ -25,6 +25,9 @@ var (
maxQueueDuration = flag.Duration("search.maxQueueDuration", 10*time.Second, "The maximum time the search request waits for execution when -search.maxConcurrentRequests "+
"limit is reached; see also -search.maxQueryDuration")
maxQueryDuration = flag.Duration("search.maxQueryDuration", time.Second*30, "The maximum duration for query execution. It can be overridden to a smaller value on a per-query basis via 'timeout' query arg")
disableSelect = flag.Bool("select.disable", false, "Whether to disable /select/* HTTP endpoints")
disableInternal = flag.Bool("internalselect.disable", false, "Whether to disable /internal/select/* HTTP endpoints")
)
func getDefaultMaxConcurrentRequests() int {
@@ -71,13 +74,31 @@ var vmuiFileServer = http.FileServer(http.FS(vmuiFiles))
// RequestHandler handles select requests for VictoriaLogs
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
path := strings.ReplaceAll(r.URL.Path, "//", "/")
if !strings.HasPrefix(path, "/select/") && !strings.HasPrefix(path, "/internal/select/") {
// Skip requests, which do not start with /select/, since these aren't our requests.
return false
if strings.HasPrefix(path, "/select/") {
if *disableSelect {
httpserver.Errorf(w, r, "requests to /select/* are disabled with -select.disable command-line flag")
return true
}
return selectHandler(w, r, path)
}
path = strings.ReplaceAll(path, "//", "/")
if strings.HasPrefix(path, "/internal/select/") {
if *disableInternal || *disableSelect {
httpserver.Errorf(w, r, "requests to /internal/select/* are disabled with -internalselect.disable or -select.disable command-line flag")
return true
}
internalselect.RequestHandler(r.Context(), w, r)
return true
}
return false
}
func selectHandler(w http.ResponseWriter, r *http.Request, path string) bool {
ctx := r.Context()
if path == "/select/vmui" {
// VMUI access via incomplete url without `/` in the end. Redirect to complete url.
@@ -100,7 +121,6 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
ctx := r.Context()
if path == "/select/logsql/tail" {
logsqlTailRequests.Inc()
// Process live tailing request without timeout, since it is OK to run live tailing requests for very long time.
@@ -120,13 +140,6 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
defer decRequestConcurrency()
if strings.HasPrefix(path, "/internal/select/") {
// Process internal request from vlselect without timeout (e.g. use ctx instead of ctxWithTimeout),
// since the timeout must be controlled by the vlselect.
internalselect.RequestHandler(ctx, w, r)
return true
}
ok := processSelectRequest(ctxWithTimeout, w, r, path)
if !ok {
return false

View File

@@ -742,7 +742,23 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
This function is supported by PromQL.
See also [irate](#irate) and [rollup_rate](#rollup_rate).
See also [irate](#irate), [rollup_rate](#rollup_rate) and [rate_prometheus](#rate_prometheus).
#### rate_prometheus
`rate_prometheus(series_selector[d])` {{% available_from "#" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
increase rate over the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#filtering).
The resulting calculation is equivalent to `increase_prometheus(series_selector[d]) / d`.
It doesn't take into account the last sample before the given lookbehind window `d` when calculating the result in the same way as Prometheus does.
See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details.
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
This function is usually applied to [counters](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#counter).
See also [increase_prometheus](#increase_prometheus) and [rate](#rate).
#### rate_over_sum

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -35,10 +35,10 @@
<meta property="og:title" content="UI for VictoriaLogs">
<meta property="og:url" content="https://victoriametrics.com/products/victorialogs/">
<meta property="og:description" content="Explore your log data with VictoriaLogs UI">
<script type="module" crossorigin src="./assets/index-BaRvaPfA.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-D8IJGiEn.js">
<script type="module" crossorigin src="./assets/index-721xTF8u.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-V4vnRsM-.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-D1GxaB_c.css">
<link rel="stylesheet" crossorigin href="./assets/index-C85_NB5q.css">
<link rel="stylesheet" crossorigin href="./assets/index-C36SC0pJ.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>

View File

@@ -253,8 +253,11 @@ func processForceFlush(w http.ResponseWriter, r *http.Request) bool {
return true
}
// Storage implements insertutil.LogRowsStorage interface
type Storage struct{}
// CanWriteData returns non-nil error if it cannot write data to vlstorage
func CanWriteData() error {
func (*Storage) CanWriteData() error {
if localStorage == nil {
// The data can be always written in non-local mode.
return nil
@@ -273,7 +276,7 @@ func CanWriteData() error {
// MustAddRows adds lr to vlstorage
//
// It is advised to call CanWriteData() before calling MustAddRows()
func MustAddRows(lr *logstorage.LogRows) {
func (*Storage) MustAddRows(lr *logstorage.LogRows) {
if localStorage != nil {
// Store lr in the local storage.
localStorage.MustAddRows(lr)
@@ -351,6 +354,10 @@ func writeStorageMetrics(w io.Writer, strg *logstorage.Storage) {
var ss logstorage.StorageStats
strg.UpdateStats(&ss)
if maxDiskSpaceUsageBytes.N > 0 {
metrics.WriteGaugeUint64(w, fmt.Sprintf(`vl_max_disk_space_usage_bytes{path=%q}`, *storageDataPath), uint64(maxDiskSpaceUsageBytes.N))
}
metrics.WriteGaugeUint64(w, fmt.Sprintf(`vl_free_disk_space_bytes{path=%q}`, *storageDataPath), fs.MustGetFreeSpace(*storageDataPath))
isReadOnly := uint64(0)

View File

@@ -18,9 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
)
var (
defaultRuleType = flag.String("rule.defaultRuleType", "prometheus", `Default type for rule expressions, can be overridden via "type" parameter on the group level, see https://docs.victoriametrics.com/victoriametrics/vmalert/#groups. Supported values: "graphite", "prometheus" and "vlogs".`)
)
var defaultRuleType = flag.String("rule.defaultRuleType", "prometheus", `Default type for rule expressions, can be overridden via "type" parameter on the group level, see https://docs.victoriametrics.com/victoriametrics/vmalert/#groups. Supported values: "graphite", "prometheus" and "vlogs".`)
// Group contains list of Rules grouped into
// entity with one name and evaluation interval
@@ -293,12 +291,6 @@ func parse(files map[string][]byte, validateTplFn ValidateTplFn, validateExpress
if err := errGroup.Err(); err != nil {
return nil, err
}
sort.SliceStable(groups, func(i, j int) bool {
if groups[i].File != groups[j].File {
return groups[i].File < groups[j].File
}
return groups[i].Name < groups[j].Name
})
return groups, nil
}

View File

@@ -89,15 +89,18 @@ func (t *Type) ValidateExpr(expr string) error {
return nil
}
// SupportedType is true if given datasource type is supported
func SupportedType(dsType string) bool {
return dsType == "graphite" || dsType == "prometheus" || dsType == "vlogs"
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (t *Type) UnmarshalYAML(unmarshal func(any) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
switch s {
case "graphite", "prometheus", "vlogs":
default:
if !SupportedType(s) {
return fmt.Errorf("unknown datasource type=%q, want prometheus, graphite or vlogs", s)
}
t.Name = s

View File

@@ -6,6 +6,7 @@ import (
"fmt"
"net/url"
"os"
"sort"
"strconv"
"strings"
"sync"
@@ -148,9 +149,13 @@ func main() {
if err != nil {
logger.Fatalf("failed to init datasource: %s", err)
}
if err := replay(groupsCfg, q, rw); err != nil {
totalRows, droppedRows, err := replay(groupsCfg, q, rw)
if err != nil {
logger.Fatalf("replay failed: %s", err)
}
if droppedRows > 0 {
logger.Fatalf("failed to push all generated samples to remote write url, dropped %d samples out of %d", droppedRows, totalRows)
}
logger.Infof("replay succeed!")
return
}
@@ -403,6 +408,21 @@ func configsEqual(a, b []config.Group) bool {
if len(a) != len(b) {
return false
}
// sort both slices by file and name before comparing them
sort.SliceStable(a, func(i, j int) bool {
if a[i].File != a[j].File {
return a[i].File < a[j].File
}
return a[i].Name < a[j].Name
})
sort.SliceStable(b, func(i, j int) bool {
if b[i].File != b[j].File {
return b[i].File < b[j].File
}
return b[i].Name < b[j].Name
})
for i := range a {
if a[i].Checksum != b[i].Checksum {
return false

View File

@@ -194,9 +194,10 @@ func mergeLabels(target string, metaLabels *promutil.Labels, cfg *Config) *promu
alertsPath = address[n:]
address = address[:n]
}
m.Add("__address__", address)
m.Add("__scheme__", scheme)
m.Add("__alerts_path__", alertsPath)
m.AddFrom(metaLabels)
// force labels
m.Set("__address__", address)
m.Set("__scheme__", scheme)
m.Set("__alerts_path__", alertsPath)
return m
}

View File

@@ -2,6 +2,7 @@ package notifier
import (
"fmt"
"sort"
"sync"
"time"
@@ -53,8 +54,11 @@ func (cw *configWatcher) notifiers() []Notifier {
for _, n := range ns {
notifiers = append(notifiers, n.Notifier)
}
}
// deterministically sort the output
sort.Slice(notifiers, func(i, j int) bool {
return notifiers[i].Addr() < notifiers[j].Addr()
})
return notifiers
}
@@ -85,12 +89,12 @@ func (cw *configWatcher) reload(path string) error {
}
func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn getLabels) error {
targets, errors := targetsFromLabels(labelsFn, cw.cfg, cw.genFn)
targetMetadata, errors := getTargetMetadata(labelsFn, cw.cfg)
for _, err := range errors {
return fmt.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.setTargets(typeK, targets)
cw.updateTargets(typeK, targetMetadata, cw.cfg, cw.genFn)
cw.wg.Add(1)
go func() {
@@ -105,22 +109,22 @@ func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn
return
case <-ticker.C:
}
updateTargets, errors := targetsFromLabels(labelsFn, cw.cfg, cw.genFn)
targetMetadata, errors := getTargetMetadata(labelsFn, cw.cfg)
for _, err := range errors {
logger.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.setTargets(typeK, updateTargets)
cw.updateTargets(typeK, targetMetadata, cw.cfg, cw.genFn)
}
}()
return nil
}
func targetsFromLabels(labelsFn getLabels, cfg *Config, genFn AlertURLGenerator) ([]Target, []error) {
func getTargetMetadata(labelsFn getLabels, cfg *Config) (map[string]*promutil.Labels, []error) {
metaLabels, err := labelsFn()
if err != nil {
return nil, []error{fmt.Errorf("failed to get labels: %w", err)}
}
var targets []Target
targetMetadata := make(map[string]*promutil.Labels, len(metaLabels))
var errors []error
duplicates := make(map[string]struct{})
for _, labels := range metaLabels {
@@ -143,18 +147,9 @@ func targetsFromLabels(labelsFn getLabels, cfg *Config, genFn AlertURLGenerator)
continue
}
duplicates[u] = struct{}{}
am, err := NewAlertManager(u, genFn, cfg.HTTPClientConfig, cfg.parsedAlertRelabelConfigs, cfg.Timeout.Duration())
if err != nil {
errors = append(errors, err)
continue
}
targets = append(targets, Target{
Notifier: am,
Labels: processedLabels,
})
targetMetadata[u] = processedLabels
}
return targets, errors
return targetMetadata, errors
}
type getLabels func() ([]*promutil.Labels, error)
@@ -241,21 +236,40 @@ func (cw *configWatcher) mustStop() {
func (cw *configWatcher) setTargets(key TargetType, targets []Target) {
cw.targetsMu.Lock()
newT := make(map[string]Target)
for _, t := range targets {
newT[t.Addr()] = t
}
oldT := cw.targets[key]
for _, ot := range oldT {
if _, ok := newT[ot.Addr()]; !ok {
ot.Notifier.Close()
}
}
cw.targets[key] = targets
cw.targetsMu.Unlock()
}
func (cw *configWatcher) updateTargets(key TargetType, targetMetadata map[string]*promutil.Labels, cfg *Config, genFn AlertURLGenerator) {
cw.targetsMu.Lock()
defer cw.targetsMu.Unlock()
oldTargets := cw.targets[key]
var updatedTargets []Target
for _, ot := range oldTargets {
if _, ok := targetMetadata[ot.Addr()]; !ok {
// if target not exists in currentTargets, close it
ot.Notifier.Close()
} else {
updatedTargets = append(updatedTargets, ot)
delete(targetMetadata, ot.Addr())
}
}
// create new resources for the new targets
for addr, labels := range targetMetadata {
am, err := NewAlertManager(addr, genFn, cfg.HTTPClientConfig, cfg.parsedAlertRelabelConfigs, cfg.Timeout.Duration())
if err != nil {
logger.Errorf("failed to init %s notifier with addr %q: %w", key, addr, err)
continue
}
updatedTargets = append(updatedTargets, Target{
Notifier: am,
Labels: labels,
})
}
cw.targets[key] = updatedTargets
}
// mergeHTTPClientConfigs merges fields between child and parent params
// by populating child from parent params if they're missing.
func mergeHTTPClientConfigs(parent, child promauth.HTTPClientConfig) promauth.HTTPClientConfig {

View File

@@ -8,8 +8,10 @@ import (
"os"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discovery/consul"
)
func TestConfigWatcherReload(t *testing.T) {
@@ -59,6 +61,11 @@ static_configs:
}
func TestConfigWatcherStart(t *testing.T) {
oldSDCheckInterval := consul.SDCheckInterval
defer func() { consul.SDCheckInterval = oldSDCheckInterval }()
consulCheckInterval := 100 * time.Millisecond
consul.SDCheckInterval = &consulCheckInterval
consulSDServer := newFakeConsulServer()
defer consulSDServer.Close()
@@ -97,6 +104,11 @@ consul_sd_configs:
if n2.Addr() != expAddr2 {
t.Fatalf("exp address %q; got %q", expAddr2, n2.Addr())
}
f := func() bool { return len(cw.notifiers()) == 1 }
if !waitFor(f, time.Second) {
t.Fatalf("expected to get 1 notifiers; got %d", len(cw.notifiers()))
}
}
// TestConfigWatcherReloadConcurrent supposed to test concurrent
@@ -193,6 +205,7 @@ const (
)
func newFakeConsulServer() *httptest.Server {
requestCount := 0
mux := http.NewServeMux()
mux.HandleFunc("/v1/agent/self", func(rw http.ResponseWriter, _ *http.Request) {
rw.Write([]byte(`{"Config": {"Datacenter": "dc1"}}`))
@@ -207,8 +220,9 @@ func newFakeConsulServer() *httptest.Server {
}`))
})
mux.HandleFunc("/v1/health/service/alertmanager", func(rw http.ResponseWriter, _ *http.Request) {
rw.Header().Set("X-Consul-Index", "1")
rw.Write([]byte(`
if requestCount == 0 {
rw.Header().Set("X-Consul-Index", "1")
rw.Write([]byte(`
[
{
"Node": {
@@ -297,6 +311,56 @@ func newFakeConsulServer() *httptest.Server {
}
}
]`))
} else {
rw.Header().Set("X-Consul-Index", "2")
rw.Write([]byte(`
[
{
"Node": {
"ID": "e8e3629a-3f50-9d6e-aaf8-f173b5b05c72",
"Node": "machine",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"lan_ipv4": "127.0.0.1",
"wan": "127.0.0.1",
"wan_ipv4": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 13,
"ModifyIndex": 14
},
"Service": {
"ID": "am3",
"Service": "alertmanager",
"Tags": [
"alertmanager",
"__scheme__=http"
],
"Address": "",
"Meta": null,
"Port": 9097,
"Weights": {
"Passing": 1,
"Warning": 1
},
"EnableTagOverride": false,
"Proxy": {
"Mode": "",
"MeshGateway": {},
"Expose": {}
},
"Connect": {},
"CreateIndex": 16,
"ModifyIndex": 16
}
}
]`))
}
requestCount++
})
return httptest.NewServer(mux)
@@ -357,3 +421,13 @@ func TestParseLabels_Success(t *testing.T) {
PathPrefix: "test",
}, "https://alertmanager:9093/api/v1/alerts")
}
func waitFor(f func() bool, timeout time.Duration) bool {
for start := time.Now(); time.Since(start) < timeout; {
if f() == true {
return true
}
time.Sleep(50 * time.Millisecond)
}
return false
}

View File

@@ -216,7 +216,7 @@ var (
)
// GetDroppedRows returns value of droppedRows metric
func GetDroppedRows() int64 { return int64(droppedRows.Get()) }
func GetDroppedRows() int { return int(droppedRows.Get()) }
// flush is a blocking function that marshals WriteRequest and sends
// it to remote-write endpoint. Flush performs limited amount of retries

View File

@@ -20,9 +20,8 @@ var (
"The time filter in RFC3339 format to finish the replay by. E.g. '2020-01-01T20:07:00Z'. "+
"By default, is set to the current time.")
replayRulesDelay = flag.Duration("replay.rulesDelay", time.Second,
"Delay between rules evaluation within the group. Could be important if there are chained rules inside the group "+
"and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule."+
"Keep it equal or bigger than -remoteWrite.flushInterval.")
"Delay before evaluating the next rule within the group. Is important for chained rules. "+
"Keep it equal or bigger than -remoteWrite.flushInterval. When set to >0, replay ignores group's concurrency setting.")
replayMaxDatapoints = flag.Int("replay.maxDatapointsPerQuery", 1e3,
"Max number of data points expected in one request. It affects the max time range for every '/query_range' request during the replay. The higher the value, the less requests will be made during replay.")
replayRuleRetryAttempts = flag.Int("replay.ruleRetryAttempts", 5,
@@ -31,13 +30,13 @@ var (
"Progress bar rendering might be verbose or break the logs parsing, so it is recommended to be disabled when not used in interactive mode.")
)
func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewrite.RWClient) error {
func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewrite.RWClient) (totalRows, droppedRows int, err error) {
if *replayMaxDatapoints < 1 {
return fmt.Errorf("replay.maxDatapointsPerQuery can't be lower than 1")
return 0, 0, fmt.Errorf("replay.maxDatapointsPerQuery can't be lower than 1")
}
tFrom, err := time.Parse(time.RFC3339, *replayFrom)
if err != nil {
return fmt.Errorf("failed to parse replay.timeFrom=%q: %w", *replayFrom, err)
return 0, 0, fmt.Errorf("failed to parse replay.timeFrom=%q: %w", *replayFrom, err)
}
// use tFrom location for default value, otherwise filters could have different locations
@@ -45,12 +44,12 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
if *replayTo != "" {
tTo, err = time.Parse(time.RFC3339, *replayTo)
if err != nil {
return fmt.Errorf("failed to parse replay.timeTo=%q: %w", *replayTo, err)
return 0, 0, fmt.Errorf("failed to parse replay.timeTo=%q: %w", *replayTo, err)
}
}
if !tTo.After(tFrom) {
return fmt.Errorf("replay.timeTo=%v must be bigger than replay.timeFrom=%v", tTo, tFrom)
return 0, 0, fmt.Errorf("replay.timeTo=%v must be bigger than replay.timeFrom=%v", tTo, tFrom)
}
labels := make(map[string]string)
for _, s := range *externalLabels {
@@ -59,7 +58,7 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
}
n := strings.IndexByte(s, '=')
if n < 0 {
return fmt.Errorf("missing '=' in `-label`. It must contain label in the form `name=value`; got %q", s)
return 0, 0, fmt.Errorf("missing '=' in `-label`. It must contain label in the form `name=value`; got %q", s)
}
labels[s[:n]] = s[n+1:]
}
@@ -70,18 +69,14 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
"\nmax data points per request: %d\n",
tFrom, tTo, *replayMaxDatapoints)
var total int
for _, cfg := range groupsCfg {
ng := rule.NewGroup(cfg, qb, *evaluationInterval, labels)
total += ng.Replay(tFrom, tTo, rw, *replayMaxDatapoints, *replayRuleRetryAttempts, *replayRulesDelay, *disableProgressBar)
totalRows += ng.Replay(tFrom, tTo, rw, *replayMaxDatapoints, *replayRuleRetryAttempts, *replayRulesDelay, *disableProgressBar)
}
logger.Infof("replay evaluation finished, generated %d samples", total)
logger.Infof("replay evaluation finished, generated %d samples", totalRows)
if err := rw.Close(); err != nil {
return err
return 0, 0, err
}
droppedRows := remotewrite.GetDroppedRows()
if droppedRows > 0 {
return fmt.Errorf("failed to push all generated samples to remote write url, dropped %d samples out of %d", droppedRows, total)
}
return nil
droppedRows = remotewrite.GetDroppedRows()
return totalRows, droppedRows, nil
}

View File

@@ -8,38 +8,45 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
)
type fakeReplayQuerier struct {
datasource.FakeQuerier
registry map[string]map[string]struct{}
registry map[string]map[string][]datasource.Metric
}
func (fr *fakeReplayQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fr
}
type fakeRWClient struct{}
func (fc *fakeRWClient) Push(_ prompbmarshal.TimeSeries) error {
return nil
}
func (fc *fakeRWClient) Close() error {
return nil
}
func (fr *fakeReplayQuerier) QueryRange(_ context.Context, q string, from, to time.Time) (res datasource.Result, err error) {
key := fmt.Sprintf("%s+%s", from.Format("15:04:05"), to.Format("15:04:05"))
dps, ok := fr.registry[q]
if !ok {
return res, fmt.Errorf("unexpected query received: %q", q)
}
_, ok = dps[key]
metrics, ok := dps[key]
if !ok {
return res, fmt.Errorf("unexpected time range received: %q", key)
}
delete(dps, key)
if len(fr.registry[q]) < 1 {
delete(fr.registry, q)
}
res.Data = metrics
return res, nil
}
func TestReplay(t *testing.T) {
f := func(from, to string, maxDP int, cfg []config.Group, qb *fakeReplayQuerier) {
f := func(from, to string, maxDP int, ruleDelay time.Duration, cfg []config.Group, qb *fakeReplayQuerier, expectTotalRows int) {
t.Helper()
fromOrig, toOrig, maxDatapointsOrig := *replayFrom, *replayTo, *replayMaxDatapoints
@@ -51,90 +58,172 @@ func TestReplay(t *testing.T) {
}()
*replayRuleRetryAttempts = 1
*replayRulesDelay = time.Millisecond
rwb := &remotewrite.DebugClient{}
*replayRulesDelay = ruleDelay
rwb := &fakeRWClient{}
*replayFrom = from
*replayTo = to
*replayMaxDatapoints = maxDP
if err := replay(cfg, qb, rwb); err != nil {
totalRows, _, err := replay(cfg, qb, rwb)
if err != nil {
t.Fatalf("replay failed: %s", err)
}
if len(qb.registry) > 0 {
t.Fatalf("not all requests were sent: %#v", qb.registry)
if totalRows != expectTotalRows {
t.Fatalf("unexpected total rows count: got %d, want %d", totalRows, expectTotalRows)
}
}
// one rule + one response
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:00.000Z", 10, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:00.000Z", 10, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {"12:00:00+12:02:00": {}},
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {"12:00:00+12:02:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
}},
},
})
}, 1)
// one rule + multiple responses
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
"12:02:00+12:02:30": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
},
},
})
}, 2)
// datapoints per step
f("2021-01-01T12:00:00.000Z", "2021-01-01T15:02:30.000Z", 60, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T15:02:30.000Z", 60, time.Millisecond, []config.Group{
{Interval: promutil.NewDuration(time.Minute), Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+13:00:00": {},
"13:00:00+14:00:00": {},
"12:00:00+13:00:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"13:00:00+14:00:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"14:00:00+15:00:00": {},
"15:00:00+15:02:30": {},
},
},
})
}, 3)
// multiple recording rules + multiple responses
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
{Rules: []config.Rule{{Record: "bar", Expr: "max(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:01:00+12:02:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"12:02:00+12:02:30": {},
},
},
})
}, 4)
// multiple alerting rules + multiple responses
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, []config.Group{
// alerting rule generates two series `ALERTS` and `ALERTS_FOR_STATE` when triggered
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Alert: "foo", Expr: "sum(up) > 1"}}},
{Rules: []config.Rule{{Alert: "bar", Expr: "max(up) < 1"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up) > 1": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
})
}, 6)
// multiple recording rules in one group+ multiple responses + concurrency
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 0, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up) > 1"}, {Record: "bar", Expr: "max(up) < 1"}}, Concurrency: 2}}, &fakeReplayQuerier{
registry: map[string]map[string][]datasource.Metric{
"sum(up) > 1": {
"12:00:00+12:01:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:01:00+12:02:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:02:00+12:02:30": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {{
Timestamps: []int64{1},
Values: []float64{1},
}},
"12:02:00+12:02:30": {},
},
},
}, 4)
}

View File

@@ -4,6 +4,7 @@ import (
"context"
"fmt"
"hash/fnv"
"math"
"sort"
"strings"
"sync"
@@ -335,7 +336,9 @@ func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]
var result []prompbmarshal.TimeSeries
holdAlertState := make(map[uint64]*notifier.Alert)
qFn := func(_ string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported in replay mode")
logger.Warnf("`query` template isn't supported in replay mode, mocked data is used")
// mock query results to allow common used template {{ query <$expr> | first | value }}
return []datasource.Metric{{Timestamps: []int64{0}, Values: []float64{math.NaN()}}}, nil
}
for _, s := range res.Data {
ls, as, err := ar.expandTemplates(s, qFn, time.Time{})
@@ -413,7 +416,7 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
return nil, fmt.Errorf("failed to execute query %q: %w", ar.Expr, err)
}
ar.logDebugf(ts, nil, "query returned %d samples (elapsed: %s, isPartial: %t)", curState.Samples, curState.Duration, isPartialResponse(res))
ar.logDebugf(ts, nil, "query returned %d series (elapsed: %s, isPartial: %t)", curState.Samples, curState.Duration, isPartialResponse(res))
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res.Data, err
@@ -738,6 +741,13 @@ func (ar *AlertingRule) restore(ctx context.Context, q datasource.Querier, ts ti
}
var labelsFilter string
for k, v := range ar.Labels {
if strings.Contains(v, "{{") && strings.Contains(v, "}}") {
// do not append label to the filter when value contains template,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9305.
// it's ok to do the simple check to skip some labels,
// since we verify the results' hash afterward to ensure the alerts match.
continue
}
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
}
// use `default_rollup()` instead of `last_over_time()` here to accounts for possible staleness markers

View File

@@ -815,10 +815,6 @@ func TestGroup_Restore(t *testing.T) {
t.Helper()
defer fqr.Reset()
for _, r := range rules {
fqr.Set(r.Expr, metricWithValueAndLabels(t, 0, "__name__", r.Alert))
}
fg := NewGroup(config.Group{Name: "TestRestore", Rules: rules}, fqr, time.Second, nil)
fg.Init()
wg := sync.WaitGroup{}
@@ -871,6 +867,7 @@ func TestGroup_Restore(t *testing.T) {
}
// one active alert, no previous state
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutil.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
@@ -879,10 +876,10 @@ func TestGroup_Restore(t *testing.T) {
ActiveAt: defaultTS,
},
})
fqr.Reset()
// one active alert with state restore
ts := time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fn(
@@ -894,8 +891,33 @@ func TestGroup_Restore(t *testing.T) {
},
})
// one rule, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo",
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "prod"),
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "dev"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
// only env=prod has state metric, so only it will have state restore
stateMetric("foo", ts, "env", "prod"))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutil.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
Name: "foo",
ActiveAt: defaultTS,
},
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "prod"}): {
Name: "foo",
ActiveAt: ts,
},
})
// two rules, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set("bar", metricWithValueAndLabels(t, 0, "__name__", "bar"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("bar", ts))
fn(
@@ -916,6 +938,8 @@ func TestGroup_Restore(t *testing.T) {
// two rules, two active alerts, two with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set("bar", metricWithValueAndLabels(t, 0, "__name__", "bar"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
@@ -938,6 +962,7 @@ func TestGroup_Restore(t *testing.T) {
// one active alert but wrong state restore
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertname="bar",alertgroup="TestRestore"}[3600s])`,
stateMetric("wrong alert", ts))
fn(
@@ -951,6 +976,7 @@ func TestGroup_Restore(t *testing.T) {
// one active alert with labels
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev"))
fn(
@@ -964,6 +990,7 @@ func TestGroup_Restore(t *testing.T) {
// one active alert with restore labels mismatch
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev", "team", "foo"))
fn(
@@ -974,6 +1001,27 @@ func TestGroup_Restore(t *testing.T) {
ActiveAt: defaultTS,
},
})
// two active alerts with dynamic labels and restore
ts = time.Now().Truncate(time.Hour)
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts, "env", "dev"),
stateMetric("foo", ts.Add(time.Second), "env", "prod"))
fqr.Set("foo",
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "dev"),
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "prod"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "{{$labels.env}}"}, For: promutil.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
Name: "foo",
ActiveAt: ts,
},
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "prod"}): {
Name: "foo",
ActiveAt: ts.Add(time.Second),
},
})
}
func TestAlertingRule_Exec_Negative(t *testing.T) {

View File

@@ -445,11 +445,17 @@ func (g *Group) Start(ctx context.Context, nts func() []notifier.Notifier, rw re
g.infof("re-started")
case <-t.C:
missed := (time.Since(evalTS) / g.Interval) - 1
// calculate the real wall clock offset by stripping the monotonic clock first,
// then evalTS can be corrected when wall clock is adjusted.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8790#issuecomment-2986541829
offset := time.Now().Round(0).Sub(evalTS.Round(0))
missed := (offset / g.Interval) - 1
if missed < 0 {
// missed can become < 0 due to irregular delays during evaluation
// which can result in time.Since(evalTS) < g.Interval
// which can result in time.Since(evalTS) < g.Interval;
// or the system wall clock was changed backward
missed = 0
evalTS = time.Now()
}
if missed > 0 {
g.metrics.iterationMissed.Inc()
@@ -514,36 +520,84 @@ func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoi
iterations := int(end.Sub(start)/step) + 1
fmt.Printf("\nGroup %q"+
"\ninterval: \t%v"+
"\nrequests to make: \t%d"+
"\nconcurrency: \t %d"+
"\nrequests to make per rule: \t%d"+
"\nmax range per request: \t%v\n",
g.Name, g.Interval, iterations, step)
g.Name, g.Interval, g.Concurrency, iterations, step)
if g.Limit > 0 {
fmt.Printf("\nPlease note, `limit: %d` param has no effect during replay.\n",
fmt.Printf("\nWarning: `limit: %d` param has no effect during replay.\n",
g.Limit)
}
for _, rule := range g.Rules {
fmt.Printf("> Rule %q (ID: %d)\n", rule, rule.ID())
var bar *pb.ProgressBar
if !disableProgressBar {
bar = pb.StartNew(iterations)
}
ri.reset()
for ri.next() {
n, err := replayRule(rule, ri.s, ri.e, rw, replayRuleRetryAttempts)
if err != nil {
logger.Fatalf("rule %q: %s", rule, err)
concurrency := g.Concurrency
if g.Concurrency > 1 && replayDelay > 0 {
fmt.Printf("\nWarning: group concurrency %d will be ignored since `-replay.rulesDelay` is %.3f seconds."+
" Set -replay.rulesDelay=0 to enable concurrency for replay.\n", g.Concurrency, replayDelay.Seconds())
concurrency = 1
}
if concurrency == 1 {
for _, rule := range g.Rules {
var bar *pb.ProgressBar
if !disableProgressBar {
bar = pb.StartNew(iterations)
}
total += n
// pass ri as a copy, so it can be modified within the replayRuleRange
total += replayRuleRange(rule, ri, bar, rw, replayRuleRetryAttempts)
if bar != nil {
bar.Increment()
bar.Finish()
}
// sleep to let remote storage to flush data on-disk
// so chained rules could be calculated correctly
time.Sleep(replayDelay)
}
return total
}
sem := make(chan struct{}, g.Concurrency)
res := make(chan int, len(g.Rules)*iterations)
wg := sync.WaitGroup{}
var bar *pb.ProgressBar
if !disableProgressBar {
bar = pb.StartNew(iterations * len(g.Rules))
}
for _, r := range g.Rules {
sem <- struct{}{}
wg.Add(1)
go func(r Rule, ri rangeIterator) {
// pass ri as a copy, so it can be modified within the replayRuleRange
res <- replayRuleRange(r, ri, bar, rw, replayRuleRetryAttempts)
<-sem
wg.Done()
}(r, ri)
}
wg.Wait()
close(res)
close(sem)
if bar != nil {
bar.Finish()
}
total = 0
for n := range res {
total += n
}
return total
}
func replayRuleRange(r Rule, ri rangeIterator, bar *pb.ProgressBar, rw remotewrite.RWClient, replayRuleRetryAttempts int) int {
fmt.Printf("> Rule %q (ID: %d)\n", r, r.ID())
total := 0
for ri.next() {
n, err := replayRule(r, ri.s, ri.e, rw, replayRuleRetryAttempts)
if err != nil {
logger.Fatalf("rule %q: %s", r, err)
}
if bar != nil {
bar.Finish()
bar.Increment()
}
// sleep to let remote storage to flush data on-disk
// so chained rules could be calculated correctly
time.Sleep(replayDelay)
total += n
}
return total
}
@@ -570,11 +624,10 @@ type rangeIterator struct {
s, e time.Time
}
func (ri *rangeIterator) reset() {
ri.iter = 0
ri.s, ri.e = time.Time{}, time.Time{}
}
// next iterates with given step between start and end
// by modifying iter, s and e.
// Returns true until it reaches end.
// next modifies ri and isn't thread-safe.
func (ri *rangeIterator) next() bool {
ri.s = ri.start.Add(ri.step * time.Duration(ri.iter))
if !ri.end.After(ri.s) {

View File

@@ -99,3 +99,15 @@ textarea.curl-area {
padding: 0;
overflow: scroll;
}
.w-10 {
width: 10%;
}
.w-20 {
width: 20%;
}
.w-60 {
width: 60%;
}

View File

@@ -9,6 +9,7 @@ import (
"strconv"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/rule"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
@@ -27,6 +28,7 @@ var (
// such as Grafana, and proxied via vmselect.
{"api/v1/rules", "list all loaded groups and rules"},
{"api/v1/alerts", "list all active alerts"},
{"api/v1/notifiers", "list all notifiers"},
{fmt.Sprintf("api/v1/alert?%s=<int>&%s=<int>", paramGroupID, paramAlertID), "get alert status by group and alert ID"},
}
systemLinks = [][2]string{
@@ -42,6 +44,10 @@ var (
{Name: "Notifiers", URL: "notifiers"},
{Name: "Docs", URL: "https://docs.victoriametrics.com/victoriametrics/vmalert/"},
}
ruleTypeMap = map[string]string{
"alert": ruleTypeAlerting,
"record": ruleTypeRecording,
}
)
type requestHandler struct {
@@ -89,10 +95,13 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
WriteRuleDetails(w, r, rule)
return true
case "/vmalert/groups":
filter := r.URL.Query().Get("filter")
rf := extractRulesFilter(r, filter)
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data := rh.groups(rf)
WriteListGroups(w, r, data, filter)
WriteListGroups(w, r, data, rf.filter)
return true
case "/vmalert/notifiers":
WriteListTargets(w, r, notifier.GetTargets())
@@ -102,23 +111,35 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
// served without `vmalert` prefix:
case "/rules":
// Grafana makes an extra request to `/rules`
// handler in addition to `/api/v1/rules` calls in alerts UI,
var data []apiGroup
filter := r.URL.Query().Get("filter")
rf := extractRulesFilter(r, filter)
// handler in addition to `/api/v1/rules` calls in alerts UI
var data []*apiGroup
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data = rh.groups(rf)
WriteListGroups(w, r, data, filter)
WriteListGroups(w, r, data, rf.filter)
return true
case "/vmalert/api/v1/notifiers", "/api/v1/notifiers":
data, err := rh.listNotifiers()
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json")
w.Write(data)
return true
case "/vmalert/api/v1/rules", "/api/v1/rules":
// path used by Grafana for ng alerting
var data []byte
var err error
filter := r.URL.Query().Get("filter")
rf := extractRulesFilter(r, filter)
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data, err = rh.listGroups(rf)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
@@ -129,7 +150,12 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/vmalert/api/v1/alerts", "/api/v1/alerts":
// path used by Grafana for ng alerting
data, err := rh.listAlerts()
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data, err := rh.listAlerts(rf)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
@@ -218,7 +244,7 @@ func (rh *requestHandler) getAlert(r *http.Request) (*apiAlert, error) {
type listGroupsResponse struct {
Status string `json:"status"`
Data struct {
Groups []apiGroup `json:"groups"`
Groups []*apiGroup `json:"groups"`
} `json:"data"`
}
@@ -229,82 +255,102 @@ type rulesFilter struct {
ruleNames []string
ruleType string
excludeAlerts bool
onlyUnhealthy bool
onlyNoMatch bool
filter string
dsType config.Type
}
func extractRulesFilter(r *http.Request, filter string) rulesFilter {
rf := rulesFilter{}
func newRulesFilter(r *http.Request) (*rulesFilter, error) {
rf := &rulesFilter{}
query := r.URL.Query()
var ruleType string
ruleTypeParam := r.URL.Query().Get("type")
// for some reason, `type` in filter doesn't match `type` in response,
// so we use this matching here
if ruleTypeParam == "alert" {
ruleType = ruleTypeAlerting
} else if ruleTypeParam == "record" {
ruleType = ruleTypeRecording
ruleTypeParam := query.Get("type")
if len(ruleTypeParam) > 0 {
if ruleType, ok := ruleTypeMap[ruleTypeParam]; ok {
rf.ruleType = ruleType
} else {
return nil, errResponse(fmt.Errorf(`invalid parameter "type": not supported value %q`, ruleTypeParam), http.StatusBadRequest)
}
}
dsType := query.Get("datasource_type")
if len(dsType) > 0 {
if config.SupportedType(dsType) {
rf.dsType = config.NewRawType(dsType)
} else {
return nil, errResponse(fmt.Errorf(`invalid parameter "datasource_type": not supported value %q`, dsType), http.StatusBadRequest)
}
}
filter := strings.ToLower(query.Get("filter"))
if len(filter) > 0 {
if filter == "nomatch" || filter == "unhealthy" {
rf.filter = filter
} else {
return nil, errResponse(fmt.Errorf(`invalid parameter "filter": not supported value %q`, filter), http.StatusBadRequest)
}
}
rf.ruleType = ruleType
rf.excludeAlerts = httputil.GetBool(r, "exclude_alerts")
rf.ruleNames = append([]string{}, r.Form["rule_name[]"]...)
rf.groupNames = append([]string{}, r.Form["rule_group[]"]...)
rf.files = append([]string{}, r.Form["file[]"]...)
switch filter {
case "unhealthy":
rf.onlyUnhealthy = true
case "noMatch":
rf.onlyNoMatch = true
}
return rf
return rf, nil
}
func (rh *requestHandler) groups(rf rulesFilter) []apiGroup {
func (rf *rulesFilter) matchesGroup(group *rule.Group) bool {
if len(rf.groupNames) > 0 && !slices.Contains(rf.groupNames, group.Name) {
return false
}
if len(rf.files) > 0 && !slices.Contains(rf.files, group.File) {
return false
}
if len(rf.dsType.Name) > 0 && rf.dsType.String() != group.Type.String() {
return false
}
return true
}
func (rh *requestHandler) groups(rf *rulesFilter) []*apiGroup {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
groups := make([]apiGroup, 0)
groups := make([]*apiGroup, 0)
for _, group := range rh.m.groups {
if len(rf.groupNames) > 0 && !slices.Contains(rf.groupNames, group.Name) {
if !rf.matchesGroup(group) {
continue
}
if len(rf.files) > 0 && !slices.Contains(rf.files, group.File) {
continue
}
g := groupToAPI(group)
// the returned list should always be non-nil
// https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221
filteredRules := make([]apiRule, 0)
for _, r := range g.Rules {
if rf.ruleType != "" && rf.ruleType != r.Type {
for _, rule := range g.Rules {
if rf.ruleType != "" && rf.ruleType != rule.Type {
continue
}
if len(rf.ruleNames) > 0 && !slices.Contains(rf.ruleNames, r.Name) {
if len(rf.ruleNames) > 0 && !slices.Contains(rf.ruleNames, rule.Name) {
continue
}
if (rule.LastError == "" && rf.filter == "unhealthy") || (!isNoMatch(rule) && rf.filter == "nomatch") {
continue
}
if rf.excludeAlerts {
r.Alerts = nil
rule.Alerts = nil
}
if (r.LastError == "" && rf.onlyUnhealthy) || (!isNoMatch(r) && rf.onlyNoMatch) {
continue
}
if r.LastError != "" {
if rule.LastError != "" {
g.Unhealthy++
} else {
g.Healthy++
}
if isNoMatch(r) {
if isNoMatch(rule) {
g.NoMatch++
}
filteredRules = append(filteredRules, r)
filteredRules = append(filteredRules, rule)
}
g.Rules = filteredRules
groups = append(groups, g)
}
// sort list of groups for deterministic output
slices.SortFunc(groups, func(a, b apiGroup) int {
slices.SortFunc(groups, func(a, b *apiGroup) int {
if a.Name != b.Name {
return strings.Compare(a.Name, b.Name)
}
@@ -313,7 +359,7 @@ func (rh *requestHandler) groups(rf rulesFilter) []apiGroup {
return groups
}
func (rh *requestHandler) listGroups(rf rulesFilter) ([]byte, error) {
func (rh *requestHandler) listGroups(rf *rulesFilter) ([]byte, error) {
lr := listGroupsResponse{Status: "success"}
lr.Data.Groups = rh.groups(rf)
b, err := json.Marshal(lr)
@@ -360,14 +406,17 @@ func (rh *requestHandler) groupAlerts() []groupAlerts {
return gAlerts
}
func (rh *requestHandler) listAlerts() ([]byte, error) {
func (rh *requestHandler) listAlerts(rf *rulesFilter) ([]byte, error) {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
lr := listAlertsResponse{Status: "success"}
lr.Data.Alerts = make([]*apiAlert, 0)
for _, g := range rh.m.groups {
for _, r := range g.Rules {
for _, group := range rh.m.groups {
if !rf.matchesGroup(group) {
continue
}
for _, r := range group.Rules {
a, ok := r.(*rule.AlertingRule)
if !ok {
continue
@@ -391,6 +440,42 @@ func (rh *requestHandler) listAlerts() ([]byte, error) {
return b, nil
}
type listNotifiersResponse struct {
Status string `json:"status"`
Data struct {
Notifiers []*apiNotifier `json:"notifiers"`
} `json:"data"`
}
func (rh *requestHandler) listNotifiers() ([]byte, error) {
targets := notifier.GetTargets()
lr := listNotifiersResponse{Status: "success"}
lr.Data.Notifiers = make([]*apiNotifier, 0)
for protoName, protoTargets := range targets {
notifier := &apiNotifier{
Kind: string(protoName),
Targets: make([]*apiTarget, 0, len(protoTargets)),
}
for _, target := range protoTargets {
notifier.Targets = append(notifier.Targets, &apiTarget{
Address: target.Notifier.Addr(),
Labels: target.Labels.ToMap(),
})
}
lr.Data.Notifiers = append(lr.Data.Notifiers, notifier)
}
b, err := json.Marshal(lr)
if err != nil {
return nil, &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf(`error encoding list of notifiers: %w`, err),
StatusCode: http.StatusInternalServerError,
}
}
return b, nil
}
func errResponse(err error, sc int) *httpserver.ErrorWithStatusCode {
return &httpserver.ErrorWithStatusCode{
Err: err,

View File

@@ -93,18 +93,18 @@
{%= tpl.Footer(r) %}
{% endfunc %}
{% func ListGroups(r *http.Request, groups []apiGroup, filter string) %}
{% func ListGroups(r *http.Request, groups []*apiGroup, filter string) %}
{%code
prefix := vmalertutil.Prefix(r.URL.Path)
filters := map[string]string{
"": "All",
"unhealthy": "Unhealthy",
"noMatch": "No Match",
"nomatch": "No Match",
}
icons := map[string]string{
"": "all",
"unhealthy": "unhealthy",
"noMatch": "nomatch",
"nomatch": "nomatch",
}
currentText := filters[filter]
currentIcon := icons[filter]
@@ -161,9 +161,9 @@
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" style="width: 60%">Rule</th>
<th scope="col" style="width: 20%" class="text-center" title="How many samples were produced by the rule">Samples</th>
<th scope="col" style="width: 20%" class="text-center" title="How many seconds ago rule was executed">Updated</th>
<th scope="col" class="w-60">Rule</th>
<th scope="col" class="w-20" class="text-center" title="How many series were produced by the rule">Series</th>
<th scope="col" class="w-20" class="text-center" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
@@ -594,9 +594,9 @@
<thead>
<tr>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" style="width: 10%" class="text-center" title="How many samples were returned">Samples</th>
{% if seriesFetchedEnabled %}<th scope="col" style="width: 10%" class="text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" style="width: 10%" class="text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
{% if seriesFetchedEnabled %}<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>

View File

@@ -316,7 +316,7 @@ func Welcome(r *http.Request) string {
}
//line app/vmalert/web.qtpl:96
func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGroup, filter string) {
func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []*apiGroup, filter string) {
//line app/vmalert/web.qtpl:96
qw422016.N().S(`
`)
@@ -325,12 +325,12 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGr
filters := map[string]string{
"": "All",
"unhealthy": "Unhealthy",
"noMatch": "No Match",
"nomatch": "No Match",
}
icons := map[string]string{
"": "all",
"unhealthy": "unhealthy",
"noMatch": "nomatch",
"nomatch": "nomatch",
}
currentText := filters[filter]
currentIcon := icons[filter]
@@ -523,9 +523,9 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGr
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" style="width: 60%">Rule</th>
<th scope="col" style="width: 20%" class="text-center" title="How many samples were produced by the rule">Samples</th>
<th scope="col" style="width: 20%" class="text-center" title="How many seconds ago rule was executed">Updated</th>
<th scope="col" class="w-60">Rule</th>
<th scope="col" class="w-20" class="text-center" title="How many series were produced by the rule">Series</th>
<th scope="col" class="w-20" class="text-center" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
@@ -722,7 +722,7 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGr
}
//line app/vmalert/web.qtpl:222
func WriteListGroups(qq422016 qtio422016.Writer, r *http.Request, groups []apiGroup, filter string) {
func WriteListGroups(qq422016 qtio422016.Writer, r *http.Request, groups []*apiGroup, filter string) {
//line app/vmalert/web.qtpl:222
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:222
@@ -733,7 +733,7 @@ func WriteListGroups(qq422016 qtio422016.Writer, r *http.Request, groups []apiGr
}
//line app/vmalert/web.qtpl:222
func ListGroups(r *http.Request, groups []apiGroup, filter string) string {
func ListGroups(r *http.Request, groups []*apiGroup, filter string) string {
//line app/vmalert/web.qtpl:222
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:222
@@ -1697,17 +1697,17 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule apiRule)
<thead>
<tr>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" style="width: 10%" class="text-center" title="How many samples were returned">Samples</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
`)
//line app/vmalert/web.qtpl:598
if seriesFetchedEnabled {
//line app/vmalert/web.qtpl:598
qw422016.N().S(`<th scope="col" style="width: 10%" class="text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>`)
qw422016.N().S(`<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>`)
//line app/vmalert/web.qtpl:598
}
//line app/vmalert/web.qtpl:598
qw422016.N().S(`
<th scope="col" style="width: 10%" class="text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>

View File

@@ -19,25 +19,34 @@ import (
func TestHandler(t *testing.T) {
fq := &datasource.FakeQuerier{}
fq.Add(datasource.Metric{
Values: []float64{1}, Timestamps: []int64{0},
Values: []float64{1},
Timestamps: []int64{0},
})
g := rule.NewGroup(config.Group{
Name: "group",
File: "rules.yaml",
Concurrency: 1,
Rules: []config.Rule{
{ID: 0, Alert: "alert"},
{ID: 1, Record: "record"},
},
}, fq, 1*time.Minute, nil)
ar := g.Rules[0].(*rule.AlertingRule)
rr := g.Rules[1].(*rule.RecordingRule)
g.ExecOnce(context.Background(), func() []notifier.Notifier { return nil }, nil, time.Time{})
m := &manager{groups: map[uint64]*rule.Group{
g.CreateID(): g,
}}
m := &manager{groups: map[uint64]*rule.Group{}}
var ar *rule.AlertingRule
var rr *rule.RecordingRule
for _, dsType := range []string{"prometheus", "", "graphite"} {
g := rule.NewGroup(config.Group{
Name: "group",
File: "rules.yaml",
Type: config.NewRawType(dsType),
Concurrency: 1,
Rules: []config.Rule{
{
ID: 0,
Alert: "alert",
},
{
ID: 1,
Record: "record",
},
},
}, fq, 1*time.Minute, nil)
ar = g.Rules[0].(*rule.AlertingRule)
rr = g.Rules[1].(*rule.RecordingRule)
g.ExecOnce(context.Background(), func() []notifier.Notifier { return nil }, nil, time.Time{})
m.groups[g.CreateID()] = g
}
rh := &requestHandler{m: m}
getResp := func(t *testing.T, url string, to any, code int) {
@@ -54,7 +63,7 @@ func TestHandler(t *testing.T) {
t.Fatalf("err closing body %s", err)
}
}()
if to != nil {
if to != nil && code < 300 {
if err = json.NewDecoder(resp.Body).Decode(to); err != nil {
t.Fatalf("unexpected err %s", err)
}
@@ -95,14 +104,23 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/alerts", func(t *testing.T) {
lr := listAlertsResponse{}
getResp(t, ts.URL+"/api/v1/alerts", &lr, 200)
if length := len(lr.Data.Alerts); length != 1 {
t.Fatalf("expected 1 alert got %d", length)
if length := len(lr.Data.Alerts); length != 3 {
t.Fatalf("expected 3 alert got %d", length)
}
lr = listAlertsResponse{}
getResp(t, ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
if length := len(lr.Data.Alerts); length != 1 {
t.Fatalf("expected 1 alert got %d", length)
if length := len(lr.Data.Alerts); length != 3 {
t.Fatalf("expected 3 alert got %d", length)
}
lr = listAlertsResponse{}
getResp(t, ts.URL+"/api/v1/alerts?datasource_type=test", &lr, 400)
lr = listAlertsResponse{}
getResp(t, ts.URL+"/api/v1/alerts?datasource_type=prometheus", &lr, 200)
if length := len(lr.Data.Alerts); length != 2 {
t.Fatalf("expected 2 alert got %d", length)
}
})
t.Run("/api/v1/alert?alertID&groupID", func(t *testing.T) {
@@ -138,14 +156,14 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/rules", func(t *testing.T) {
lr := listGroupsResponse{}
getResp(t, ts.URL+"/api/v1/rules", &lr, 200)
if length := len(lr.Data.Groups); length != 1 {
t.Fatalf("expected 1 group got %d", length)
if length := len(lr.Data.Groups); length != 3 {
t.Fatalf("expected 3 group got %d", length)
}
lr = listGroupsResponse{}
getResp(t, ts.URL+"/vmalert/api/v1/rules", &lr, 200)
if length := len(lr.Data.Groups); length != 1 {
t.Fatalf("expected 1 group got %d", length)
if length := len(lr.Data.Groups); length != 3 {
t.Fatalf("expected 3 group got %d", length)
}
})
t.Run("/api/v1/rule?ruleID&groupID", func(t *testing.T) {
@@ -172,10 +190,10 @@ func TestHandler(t *testing.T) {
})
t.Run("/api/v1/rules&filters", func(t *testing.T) {
check := func(url string, expGroups, expRules int) {
check := func(url string, statusCode, expGroups, expRules int) {
t.Helper()
lr := listGroupsResponse{}
getResp(t, ts.URL+url, &lr, 200)
getResp(t, ts.URL+url, &lr, statusCode)
if length := len(lr.Data.Groups); length != expGroups {
t.Fatalf("expected %d groups got %d", expGroups, length)
}
@@ -191,25 +209,31 @@ func TestHandler(t *testing.T) {
}
}
check("/api/v1/rules?type=alert", 1, 1)
check("/api/v1/rules?type=record", 1, 1)
check("/api/v1/rules?type=alert", 200, 3, 3)
check("/api/v1/rules?type=record", 200, 3, 3)
check("/api/v1/rules?type=records", 400, 0, 0)
check("/vmalert/api/v1/rules?type=alert", 1, 1)
check("/vmalert/api/v1/rules?type=record", 1, 1)
check("/vmalert/api/v1/rules?type=alert", 200, 3, 3)
check("/vmalert/api/v1/rules?type=record", 200, 3, 3)
check("/vmalert/api/v1/rules?type=recording", 400, 0, 0)
check("/vmalert/api/v1/rules?datasource_type=prometheus", 200, 2, 4)
check("/vmalert/api/v1/rules?datasource_type=graphite", 200, 1, 2)
check("/vmalert/api/v1/rules?datasource_type=graphiti", 400, 0, 0)
// no filtering expected due to bad params
check("/api/v1/rules?type=badParam", 1, 2)
check("/api/v1/rules?foo=bar", 1, 2)
check("/api/v1/rules?type=badParam", 400, 0, 0)
check("/api/v1/rules?foo=bar", 200, 3, 6)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=bar", 0, 0)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=group&rule_group[]=bar", 1, 2)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=bar", 200, 0, 0)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=group&rule_group[]=bar", 200, 3, 6)
check("/api/v1/rules?rule_group[]=group&file[]=foo", 0, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=foo", 200, 0, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml", 200, 3, 6)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=foo", 1, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert", 1, 1)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert&rule_name[]=record", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=foo", 200, 3, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert", 200, 3, 3)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert&rule_name[]=record", 200, 3, 6)
})
t.Run("/api/v1/rules&exclude_alerts=true", func(t *testing.T) {
// check if response returns active alerts by default
@@ -259,7 +283,7 @@ func TestEmptyResponse(t *testing.T) {
t.Fatalf("err closing body %s", err)
}
}()
if to != nil {
if to != nil && code < 300 {
if err = json.NewDecoder(resp.Body).Decode(to); err != nil {
t.Fatalf("unexpected err %s", err)
}

View File

@@ -20,6 +20,16 @@ const (
paramRuleID = "rule_id"
)
type apiNotifier struct {
Kind string `json:"kind"`
Targets []*apiTarget `json:"targets"`
}
type apiTarget struct {
Address string `json:"address"`
Labels map[string]string `json:"labels"`
}
// apiAlert represents a notifier.AlertingRule state
// for WEB view
// https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#get-apiv1rules
@@ -108,7 +118,7 @@ type apiGroup struct {
// groupAlerts represents a group of alerts for WEB view
type groupAlerts struct {
Group apiGroup
Group *apiGroup
Alerts []*apiAlert
}
@@ -327,7 +337,7 @@ func newAlertAPI(ar *rule.AlertingRule, a *notifier.Alert) *apiAlert {
return aa
}
func groupToAPI(g *rule.Group) apiGroup {
func groupToAPI(g *rule.Group) *apiGroup {
g = g.DeepCopy()
ag := apiGroup{
// encode as string to avoid rounding
@@ -353,7 +363,7 @@ func groupToAPI(g *rule.Group) apiGroup {
for _, r := range g.Rules {
ag.Rules = append(ag.Rules, ruleToAPI(r))
}
return ag
return &ag
}
func urlValuesToStrings(values url.Values) []string {

View File

@@ -120,6 +120,9 @@ func normalizeURL(uOrig *url.URL) *url.URL {
u := *uOrig
// Prevent from attacks with using `..` in r.URL.Path
u.Path = path.Clean(u.Path)
if u.Path == "." {
u.Path = "/"
}
if !strings.HasSuffix(u.Path, "/") && strings.HasSuffix(uOrig.Path, "/") {
// The path.Clean() removes trailing slash.
// Return it back if needed.

View File

@@ -128,7 +128,40 @@ func TestCreateTargetURLSuccess(t *testing.T) {
// Simple routing with `url_prefix`
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "", "http://foo.bar/.", "", "", nil, "least_loaded", 0)
}, "", "http://foo.bar", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "/", "http://foo.bar", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "http://aaa///", "http://foo.bar", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/"),
}, "/", "http://foo.bar/", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/"),
}, "/x", "http://foo.bar/x", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/"),
}, "/x/", "http://foo.bar/x/", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/"),
}, "http://abc///x/", "http://foo.bar/x/", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/"),
}, "http://foo//x", "http://foo.bar/x", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/baz"),
}, "", "http://foo.bar/baz", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/baz"),
}, "/", "http://foo.bar/baz", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/x/"),
}, "/abc", "http://foo.bar/x/abc", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/x/"),
}, "/abc/", "http://foo.bar/x/abc/", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
HeadersConf: HeadersConf{
@@ -149,6 +182,12 @@ func TestCreateTargetURLSuccess(t *testing.T) {
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "a/b?c=d", "http://foo.bar/a/b?c=d", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "/a/b?c=d", "http://foo.bar/a/b?c=d", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/"),
}, "/a/b?c=d", "http://foo.bar/a/b?c=d", "", "", nil, "least_loaded", 0)
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/z", "https://sss:3894/x/y/z", "", "", nil, "least_loaded", 0)

View File

@@ -51,6 +51,13 @@ func main() {
buildinfo.Init()
logger.Init()
ctx, cancel := context.WithCancel(context.Background())
go func() {
procutil.WaitForSigterm()
logger.Infof("received stop signal, canceling backup operation")
cancel()
}()
// Storing snapshot delete function to be able to call it in case
// of error since logger.Fatal will exit the program without
// calling deferred functions.
@@ -79,7 +86,7 @@ func main() {
}
logger.Infof("Snapshot delete url %s", deleteURL.Redacted())
name, err := snapshot.Create(createURL.String())
name, err := snapshot.Create(ctx, createURL.String())
if err != nil {
logger.Fatalf("cannot create snapshot: %s", err)
}
@@ -89,7 +96,9 @@ func main() {
}
deleteSnapshot = func() {
err := snapshot.Delete(deleteURL.String(), name)
// Do not use ctx here as it may be canceled by the time deleteSnapshot is called
// if process is interrupted.
err := snapshot.Delete(context.Background(), deleteURL.String(), name)
if err != nil {
logger.Fatalf("cannot delete snapshot: %s", err)
}
@@ -99,13 +108,6 @@ func main() {
listenAddrs := []string{*httpListenAddr}
go httpserver.Serve(listenAddrs, nil, httpserver.ServeOptions{})
ctx, cancel := context.WithCancel(context.Background())
go func() {
procutil.WaitForSigterm()
logger.Infof("received stop signal, canceling backup operation")
cancel()
}()
pushmetrics.Init()
err := makeBackup(ctx)
deleteSnapshot()

View File

@@ -70,7 +70,7 @@ var (
Usage: "VictoriaMetrics address to perform import requests. \n" +
"Should be the same as --httpListenAddr value for single-node version or vminsert component. \n" +
"When importing into the clustered version do not forget to set additionally --vm-account-id flag. \n" +
"Please note, that `vmctl` performs initial readiness check for the given address by checking `/health` endpoint.",
"Please note, that vmctl performs initial readiness check for the given address by checking /health endpoint.",
},
&cli.StringFlag{
Name: vmUser,
@@ -514,27 +514,27 @@ var (
},
&cli.StringFlag{
Name: vmNativeSrcBearerToken,
Usage: "Optional bearer auth token to use for the corresponding `--vm-native-src-addr`",
Usage: "Optional bearer auth token to use for the corresponding --vm-native-src-addr",
},
&cli.StringFlag{
Name: vmNativeSrcCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to `--vm-native-src-addr`",
Usage: "Optional path to client-side TLS certificate file to use when connecting to --vm-native-src-addr",
},
&cli.StringFlag{
Name: vmNativeSrcKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to `--vm-native-src-addr`",
Usage: "Optional path to client-side TLS key to use when connecting to --vm-native-src-addr",
},
&cli.StringFlag{
Name: vmNativeSrcCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to `--vm-native-src-addr`. By default, system CA is used",
Usage: "Optional path to TLS CA file to use for verifying connections to --vm-native-src-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: vmNativeSrcServerName,
Usage: "Optional TLS server name to use for connections to `--vm-native-src-addr`. By default, the server name from `--vm-native-src-addr` is used",
Usage: "Optional TLS server name to use for connections to --vm-native-src-addr. By default, the server name from --vm-native-src-addr is used",
},
&cli.BoolFlag{
Name: vmNativeSrcInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to `--vm-native-src-addr`",
Usage: "Whether to skip TLS certificate verification when connecting to --vm-native-src-addr",
Value: false,
},
@@ -563,27 +563,27 @@ var (
},
&cli.StringFlag{
Name: vmNativeDstBearerToken,
Usage: "Optional bearer auth token to use for the corresponding `--vm-native-dst-addr`",
Usage: "Optional bearer auth token to use for the corresponding --vm-native-dst-addr",
},
&cli.StringFlag{
Name: vmNativeDstCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to `--vm-native-dst-addr`",
Usage: "Optional path to client-side TLS certificate file to use when connecting to --vm-native-dst-addr",
},
&cli.StringFlag{
Name: vmNativeDstKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to `--vm-native-dst-addr`",
Usage: "Optional path to client-side TLS key to use when connecting to --vm-native-dst-addr",
},
&cli.StringFlag{
Name: vmNativeDstCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to `--vm-native-dst-addr`. By default, system CA is used",
Usage: "Optional path to TLS CA file to use for verifying connections to --vm-native-dst-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: vmNativeDstServerName,
Usage: "Optional TLS server name to use for connections to `--vm-native-dst-addr`. By default, the server name from `--vm-native-dst-addr` is used",
Usage: "Optional TLS server name to use for connections to --vm-native-dst-addr. By default, the server name from --vm-native-dst-addr is used",
},
&cli.BoolFlag{
Name: vmNativeDstInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to `--vm-native-dst-addr`",
Usage: "Whether to skip TLS certificate verification when connecting to --vm-native-dst-addr",
Value: false,
},
@@ -597,7 +597,7 @@ var (
Name: vmRateLimit,
Usage: "Optional data transfer rate limit in bytes per second.\n" +
"By default, the rate limit is disabled. It can be useful for limiting load on source or destination databases. \n" +
"Rate limit is applied per worker, see `--vm-concurrency`.",
"Rate limit is applied per worker, see --vm-concurrency.",
},
&cli.BoolFlag{
Name: vmInterCluster,

View File

@@ -19,6 +19,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/barpool"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/native"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/remoteread"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/opentsdb"
@@ -44,6 +45,7 @@ func main() {
if c.Bool(globalDisableProgressBar) {
barpool.Disable(true)
}
netutil.EnableIPv6()
return nil
}
app := &cli.App{

View File

@@ -1,168 +0,0 @@
package main
import (
"context"
"os"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
)
// If you want to run this test:
// 1. provide test snapshot path in const testSnapshot
// 2. define httpAddr const with your victoriametrics address
// 3. run victoria metrics with defined address
// 4. remove t.Skip() from Test_prometheusProcessor_run
// 5. run tests one by one not all at one time
const (
httpAddr = "http://127.0.0.1:8428/"
testSnapshot = "./testdata/20220427T130947Z-70ba49b1093fd0bf"
)
// This test simulates close process if user abort it
func TestPrometheusProcessorRun(t *testing.T) {
t.Skip()
defer func() { isSilent = false }()
type fields struct {
cfg prometheus.Config
vmCfg vm.Config
cl func(prometheus.Config) *prometheus.Client
im func(vm.Config) *vm.Importer
closer func(importer *vm.Importer)
cc int
}
type args struct {
silent bool
verbose bool
}
tests := []struct {
name string
fields fields
args args
wantErr bool
}{
{
name: "simulate syscall.SIGINT",
fields: fields{
cfg: prometheus.Config{
Snapshot: testSnapshot,
Filter: prometheus.Filter{},
},
cl: func(cfg prometheus.Config) *prometheus.Client {
client, err := prometheus.NewClient(cfg)
if err != nil {
t.Fatalf("error init prometeus client: %s", err)
}
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}
return importer
},
closer: func(importer *vm.Importer) {
// simulate syscall.SIGINT
time.Sleep(time.Second * 5)
if importer != nil {
importer.Close()
}
},
vmCfg: vm.Config{Addr: httpAddr, Concurrency: 1},
cc: 2,
},
args: args{
silent: false,
verbose: false,
},
wantErr: true,
},
{
name: "simulate correct work",
fields: fields{
cfg: prometheus.Config{
Snapshot: testSnapshot,
Filter: prometheus.Filter{},
},
cl: func(cfg prometheus.Config) *prometheus.Client {
client, err := prometheus.NewClient(cfg)
if err != nil {
t.Fatalf("error init prometeus client: %s", err)
}
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}
return importer
},
closer: nil,
vmCfg: vm.Config{Addr: httpAddr, Concurrency: 5},
cc: 2,
},
args: args{
silent: true,
verbose: false,
},
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
client := tt.fields.cl(tt.fields.cfg)
importer := tt.fields.im(tt.fields.vmCfg)
isSilent = tt.args.silent
pp := &prometheusProcessor{
cl: client,
im: importer,
cc: tt.fields.cc,
isVerbose: tt.args.verbose,
}
// we should answer on prompt
if !tt.args.silent {
input := []byte("Y\n")
r, w, err := os.Pipe()
if err != nil {
t.Fatal(err)
}
_, err = w.Write(input)
if err != nil {
t.Fatalf("cannot send 'Y' to importer: %s", err)
}
err = w.Close()
if err != nil {
t.Fatalf("cannot close writer: %s", err)
}
stdin := os.Stdin
// Restore stdin right after the test.
defer func() {
os.Stdin = stdin
_ = r.Close()
}()
os.Stdin = r
}
// simulate close if needed
if tt.fields.closer != nil {
go tt.fields.closer(importer)
}
if err := pp.run(); (err != nil) != tt.wantErr {
t.Fatalf("run() error = %v, wantErr %v", err, tt.wantErr)
}
})
}
}

View File

@@ -140,6 +140,15 @@ func (p *vmNativeProcessor) runSingle(ctx context.Context, f native.Filter, srcU
written, err := io.Copy(w, reader)
if err != nil {
// io.Copy could fail if ImportPipe will fail before and close the pr
// so we check if that's the case and to not ignore importErr if it exists.
select {
case importErr := <-importCh:
if importErr != nil {
return fmt.Errorf("failed to import %s: %w", p.dst.Addr, importErr)
}
default:
}
return fmt.Errorf("failed to write into %q: %s", p.dst.Addr, err)
}

View File

@@ -757,7 +757,16 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
w.Header().Set("Content-Type", "application/json")
fmt.Fprint(w, `{"status":"success","data":{"alerts":[]}}`)
return true
case "prometheus/api/v1/metadata":
case "/prometheus/api/v1/notifiers", "/notifiers":
notifiersRequests.Inc()
if len(*vmalertProxyURL) > 0 {
proxyVMAlertRequests(w, r, p.Suffix)
return true
}
w.Header().Set("Content-Type", "application/json")
fmt.Fprint(w, `{"status":"success","data":{"notifiers":[]}}`)
return true
case "/prometheus/api/v1/metadata":
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
metadataRequests.Inc()
w.Header().Set("Content-Type", "application/json")
@@ -909,9 +918,10 @@ var (
expandWithExprsRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/expand-with-exprs"}`)
prettifyQueryRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/prettify-query"}`)
vmalertRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/vmalert"}`)
rulesRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/api/v1/rules"}`)
alertsRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/api/v1/alerts"}`)
notifiersRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/notifiers"}`)
vmalertRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/vmalert"}`)
rulesRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/api/v1/rules"}`)
alertsRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/api/v1/alerts"}`)
metadataRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/api/v1/metadata"}`)
buildInfoRequests = metrics.NewCounter(`vm_http_requests_total{path="/select/{}/prometheus/api/v1/buildinfo"}`)

View File

@@ -22,6 +22,7 @@ import (
"github.com/cespare/xxhash/v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/atomicutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
@@ -1476,9 +1477,8 @@ func (tbfwLocal *tmpBlocksFileWrapperShard) init() {
type tmpBlocksFileWrapperShardWithPadding struct {
tmpBlocksFileWrapperShard
// The padding prevents false sharing on widespread platforms with
// 128 mod (cache line size) = 0 .
_ [128 - unsafe.Sizeof(tmpBlocksFileWrapperShard{})%128]byte
// The padding prevents false sharing
_ [atomicutil.CacheLineSize - unsafe.Sizeof(tmpBlocksFileWrapperShard{})%atomicutil.CacheLineSize]byte
}
type blockAddrs struct {
@@ -1888,9 +1888,9 @@ func processBlocks(qt *querytracer.Tracer, sns []*storageNode, denyPartialRespon
}
type wgWithPadding struct {
wgStruct
// The padding prevents false sharing on widespread platforms with
// 128 mod (cache line size) = 0 .
_ [128 - unsafe.Sizeof(wgStruct{})%128]byte
// The padding prevents false sharing
_ [atomicutil.CacheLineSize - unsafe.Sizeof(wgStruct{})%atomicutil.CacheLineSize]byte
}
wgs := make([]wgWithPadding, len(sns))
f := func(mb *storage.MetricBlock, workerID uint) error {
@@ -3270,9 +3270,9 @@ func applyGraphiteRegexpFilter(filter string, ss []string) ([]string, error) {
type uint64WithPadding struct {
n uint64
// The padding prevents false sharing on widespread platforms with
// 128 mod (cache line size) = 0 .
_ [128 - unsafe.Sizeof(uint64(0))%128]byte
// The padding prevents false sharing
_ [atomicutil.CacheLineSize - unsafe.Sizeof(uint64(0))%atomicutil.CacheLineSize]byte
}
type perNodeCounter struct {

View File

@@ -17,28 +17,36 @@ import (
)
var (
tenantsCacheDuration = flag.Duration("search.tenantCacheExpireDuration", 5*time.Minute, "The expiry duration for list of tenants for multi-tenant queries.")
tenantsCacheDuration = flag.Duration("search.tenantCacheExpireDuration", 5*time.Minute, "Expiry duration for caching tenants in memory. A zero value disables caching, causing tenants to be fetched from storage nodes on every query.")
)
// TenantsCached returns the list of tenants available in the storage.
func TenantsCached(qt *querytracer.Tracer, tr storage.TimeRange, deadline searchutil.Deadline) ([]storage.TenantToken, error) {
qt.Printf("fetching tenants on timeRange=%s", tr.String())
func TenantsCached(qt *querytracer.Tracer, tr storage.TimeRange, deadline searchutil.Deadline, mayCache bool) ([]storage.TenantToken, error) {
qtL := qt.NewChild("fetching tenants on timeRange=%s", tr.String())
defer qtL.Done()
initTenantsCacheVOnce.Do(func() {
tenantsCacheV = newTenantsCache(*tenantsCacheDuration)
})
cached := tenantsCacheV.get(tr)
qt.Printf("fetched %d tenants from cache", len(cached))
if len(cached) > 0 {
return cached, nil
useCache := mayCache && tenantsCacheDuration.Seconds() > 0
if useCache {
cached := tenantsCacheV.get(tr)
qtL.Printf("fetched %d tenants from cache", len(cached))
if len(cached) > 0 {
return cached, nil
}
} else {
qtL.Printf("do not fetch list of tenants from cache")
}
tenants, err := Tenants(qt, tr, deadline)
tenants, err := Tenants(qtL, tr, deadline)
if err != nil {
return nil, fmt.Errorf("cannot obtain tenants: %w", err)
}
qt.Printf("fetched %d tenants from storage", len(tenants))
qtL.Printf("fetched %d tenants from storage", len(tenants))
tt := make([]storage.TenantToken, len(tenants))
for i, t := range tenants {
@@ -50,8 +58,12 @@ func TenantsCached(qt *querytracer.Tracer, tr storage.TimeRange, deadline search
tt[i].ProjectID = projectID
}
tenantsCacheV.put(tr, tt)
qt.Printf("put %d tenants into cache", len(tenants))
if useCache {
tenantsCacheV.put(tr, tt)
qtL.Printf("put %d tenants into cache", len(tenants))
} else {
qtL.Printf("do not put list of tenants into cache")
}
return tt, nil
}
@@ -151,6 +163,7 @@ func (tc *tenantsCache) getInternal(tr storage.TimeRange) []storage.TenantToken
tc.mu.Lock()
defer tc.mu.Unlock()
if len(tc.items) == 0 {
tc.misses.Add(1)
return nil
}
ct := time.Now()
@@ -164,7 +177,7 @@ func (tc *tenantsCache) getInternal(tr storage.TimeRange) []storage.TenantToken
continue
}
if hasIntersection(tr, ci.tr) {
if isWithin(tr, ci.tr) {
for _, t := range ci.tenants {
result[t] = struct{}{}
}
@@ -179,7 +192,9 @@ func (tc *tenantsCache) getInternal(tr storage.TimeRange) []storage.TenantToken
for t := range result {
tenants = append(tenants, t)
}
if len(tenants) == 0 {
tc.misses.Add(1)
}
return tenants
}
@@ -191,7 +206,7 @@ func alignTrToDay(tr *storage.TimeRange) {
tr.MaxTimestamp = timeutil.EndOfDay(tr.MaxTimestamp)
}
// hasIntersection checks if there is any intersection of the given time ranges
func hasIntersection(a, b storage.TimeRange) bool {
return a.MinTimestamp <= b.MaxTimestamp && a.MaxTimestamp >= b.MinTimestamp
// isWithin Check if range inner is fully within range outer
func isWithin(inner, outer storage.TimeRange) bool {
return inner.MinTimestamp >= outer.MinTimestamp && inner.MaxTimestamp <= outer.MaxTimestamp
}

View File

@@ -12,7 +12,7 @@ import (
func TestFetchingTenants(t *testing.T) {
tc := newTenantsCache(5 * time.Second)
dayMs := (time.Hour * 24 * 1000).Milliseconds()
dayMs := (time.Hour * 24).Milliseconds()
tc.put(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 0}, []storage.TenantToken{
{AccountID: 1, ProjectID: 1},
@@ -68,23 +68,22 @@ func TestFetchingTenants(t *testing.T) {
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: dayMs / 2}, []storage.TenantToken{{AccountID: 1, ProjectID: 1}, {AccountID: 1, ProjectID: 0}})
f(storage.TimeRange{MinTimestamp: dayMs / 2, MaxTimestamp: dayMs - 1}, []storage.TenantToken{{AccountID: 1, ProjectID: 1}, {AccountID: 1, ProjectID: 0}})
// Overlapping time ranges
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 2*dayMs - 1}, []storage.TenantToken{{AccountID: 1, ProjectID: 1}, {AccountID: 1, ProjectID: 0}, {AccountID: 2, ProjectID: 1}, {AccountID: 2, ProjectID: 0}})
f(storage.TimeRange{MinTimestamp: dayMs / 2, MaxTimestamp: dayMs + dayMs/2}, []storage.TenantToken{{AccountID: 1, ProjectID: 1}, {AccountID: 1, ProjectID: 0}, {AccountID: 2, ProjectID: 1}, {AccountID: 2, ProjectID: 0}})
// No single item to cover full range
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 2*dayMs - 1}, []storage.TenantToken{})
f(storage.TimeRange{MinTimestamp: dayMs / 2, MaxTimestamp: dayMs + dayMs/2}, []storage.TenantToken{})
}
func TestHasIntersection(t *testing.T) {
func TestIsWithin(t *testing.T) {
f := func(inner, outer storage.TimeRange, expected bool) {
t.Helper()
if hasIntersection(inner, outer) != expected {
if isWithin(inner, outer) != expected {
t.Fatalf("unexpected result for inner=%+v, outer=%+v", inner, outer)
}
}
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 150}, storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 0}, true)
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 150}, storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 100}, true)
f(storage.TimeRange{MinTimestamp: 50, MaxTimestamp: 150}, storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 100}, true)
f(storage.TimeRange{MinTimestamp: 50, MaxTimestamp: 150}, storage.TimeRange{MinTimestamp: 10, MaxTimestamp: 80}, true)
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 0}, storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 150}, true)
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 100}, storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 150}, true)
f(storage.TimeRange{MinTimestamp: 50, MaxTimestamp: 80}, storage.TimeRange{MinTimestamp: 50, MaxTimestamp: 150}, true)
f(storage.TimeRange{MinTimestamp: 0, MaxTimestamp: 50}, storage.TimeRange{MinTimestamp: 60, MaxTimestamp: 100}, false)
f(storage.TimeRange{MinTimestamp: 100, MaxTimestamp: 150}, storage.TimeRange{MinTimestamp: 60, MaxTimestamp: 80}, false)

View File

@@ -12,8 +12,8 @@ import (
)
// GetTenantTokensFromFilters returns the list of tenant tokens and the list of filters without tenant filters.
func GetTenantTokensFromFilters(qt *querytracer.Tracer, tr storage.TimeRange, tfs [][]storage.TagFilter, deadline searchutil.Deadline) ([]storage.TenantToken, [][]storage.TagFilter, error) {
tenants, err := TenantsCached(qt, tr, deadline)
func GetTenantTokensFromFilters(qt *querytracer.Tracer, tr storage.TimeRange, tfs [][]storage.TagFilter, deadline searchutil.Deadline, mayCache bool) ([]storage.TenantToken, [][]storage.TagFilter, error) {
tenants, err := TenantsCached(qt, tr, deadline, mayCache)
if err != nil {
return nil, nil, fmt.Errorf("cannot obtain tenants: %w", err)
}

View File

@@ -759,7 +759,7 @@ func getSearchQuery(qt *querytracer.Tracer, at *auth.Token, cp *commonParams, ma
if at != nil {
return storage.NewSearchQuery(at.AccountID, at.ProjectID, cp.start, cp.end, cp.filterss, maxSeries), nil
}
tt, tfs, err := netstorage.GetTenantTokensFromFilters(qt, storage.TimeRange{MinTimestamp: cp.start, MaxTimestamp: cp.end}, cp.filterss, cp.deadline)
tt, tfs, err := netstorage.GetTenantTokensFromFilters(qt, storage.TimeRange{MinTimestamp: cp.start, MaxTimestamp: cp.end}, cp.filterss, cp.deadline, true)
if err != nil {
return nil, fmt.Errorf("cannot obtain tenant tokens: %w", err)
}
@@ -845,7 +845,7 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, at *auth.Token, w
ct := startTime.UnixNano() / 1e6
deadline := searchutil.GetDeadlineForQuery(r, startTime)
mayCache := !httputil.GetBool(r, "nocache")
noCache := httputil.GetBool(r, "nocache")
query := r.FormValue("query")
if len(query) == 0 {
return fmt.Errorf("missing `query` arg")
@@ -948,10 +948,11 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, at *auth.Token, w
MaxSeries: *maxUniqueTimeseries,
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
MayCache: mayCache,
NoCache: noCache,
LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilterss: etfs,
CacheTagFilters: etfs,
GetRequestURI: func() string {
return httpserver.GetRequestURI(r)
},
@@ -1035,7 +1036,7 @@ func QueryRangeHandler(qt *querytracer.Tracer, startTime time.Time, at *auth.Tok
func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, at *auth.Token, w http.ResponseWriter, query string,
start, end, step int64, r *http.Request, ct int64, etfs [][]storage.TagFilter) error {
deadline := searchutil.GetDeadlineForQuery(r, startTime)
mayCache := !httputil.GetBool(r, "nocache")
noCache := httputil.GetBool(r, "nocache")
lookbackDelta, err := getMaxLookback(r)
if err != nil {
return err
@@ -1051,7 +1052,7 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, at *auth.Tok
if err := promql.ValidateMaxPointsPerSeries(start, end, step, *maxPointsPerTimeseries); err != nil {
return fmt.Errorf("%w; (see -search.maxPointsPerTimeseries command-line flag)", err)
}
if mayCache {
if !noCache {
start, end = promql.AdjustStartEnd(start, end, step)
}
@@ -1063,10 +1064,11 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, at *auth.Tok
MaxSeries: *maxUniqueTimeseries,
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
MayCache: mayCache,
NoCache: noCache,
LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilterss: etfs,
CacheTagFilters: etfs,
GetRequestURI: func() string {
return httpserver.GetRequestURI(r)
},
@@ -1118,7 +1120,7 @@ func populateAuthTokens(qt *querytracer.Tracer, ec *promql.EvalConfig, at *auth.
return nil
}
tt, tfs, err := netstorage.GetTenantTokensFromFilters(qt, storage.TimeRange{MinTimestamp: ec.Start, MaxTimestamp: ec.End}, ec.EnforcedTagFilterss, deadline)
tt, tfs, err := netstorage.GetTenantTokensFromFilters(qt, storage.TimeRange{MinTimestamp: ec.Start, MaxTimestamp: ec.End}, ec.EnforcedTagFilterss, deadline, ec.MayCache())
if err != nil {
return fmt.Errorf("cannot obtain tenant tokens for the given search query: %w", err)
}

View File

@@ -5,8 +5,10 @@ import (
"strings"
"unsafe"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/metricsql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/atomicutil"
)
// callbacks for optimized incremental calculations for aggregate functions
@@ -66,9 +68,8 @@ var incrementalAggrFuncCallbacksMap = map[string]*incrementalAggrFuncCallbacks{
type incrementalAggrContextMap struct {
m map[string]*incrementalAggrContext
// The padding prevents false sharing on widespread platforms with
// 128 mod (cache line size) = 0 .
_ [128 - unsafe.Sizeof(map[string]*incrementalAggrContext{})%128]byte
// The padding prevents false sharing
_ [atomicutil.CacheLineSize - unsafe.Sizeof(map[string]*incrementalAggrContext{})%atomicutil.CacheLineSize]byte
}
type incrementalAggrFuncContext struct {

View File

@@ -17,6 +17,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/atomicutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
@@ -72,7 +73,7 @@ func ValidateMaxPointsPerSeries(start, end, step int64, maxPoints int) error {
// AdjustStartEnd adjusts start and end values, so response caching may be enabled.
//
// See EvalConfig.mayCache() for details.
// See EvalConfig.MayCache() for details.
func AdjustStartEnd(start, end, step int64) (int64, int64) {
if *disableCache {
// Do not adjust start and end values when cache is disabled.
@@ -86,7 +87,7 @@ func AdjustStartEnd(start, end, step int64) (int64, int64) {
}
// Round start and end to values divisible by step in order
// to enable response caching (see EvalConfig.mayCache).
// to enable response caching (see EvalConfig.MayCache).
start, end = alignStartEnd(start, end, step)
// Make sure that the new number of points is the same as the initial number of points.
@@ -131,8 +132,8 @@ type EvalConfig struct {
Deadline searchutil.Deadline
// Whether the response can be cached.
MayCache bool
// Whether the response must not be cached.
NoCache bool
// LookbackDelta is analog to `-query.lookback-delta` from Prometheus.
LookbackDelta int64
@@ -143,6 +144,13 @@ type EvalConfig struct {
// EnforcedTagFilterss may contain additional label filters to use in the query.
EnforcedTagFilterss [][]storage.TagFilter
// CacheTagFilters stores the original tag-filter sets and extra_label from the request.
// The slice is never modified after creation and is used only to build
// the query-cache key.
//
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9001
CacheTagFilters [][]storage.TagFilter
// The callback, which returns the request URI during logging.
// The request URI isn't stored here because its' construction may take non-trivial amounts of CPU.
GetRequestURI func() string
@@ -173,10 +181,11 @@ func copyEvalConfig(src *EvalConfig) *EvalConfig {
ec.MaxSeries = src.MaxSeries
ec.MaxPointsPerSeries = src.MaxPointsPerSeries
ec.Deadline = src.Deadline
ec.MayCache = src.MayCache
ec.NoCache = src.NoCache
ec.LookbackDelta = src.LookbackDelta
ec.RoundDigits = src.RoundDigits
ec.EnforcedTagFilterss = src.EnforcedTagFilterss
ec.CacheTagFilters = src.CacheTagFilters
ec.GetRequestURI = src.GetRequestURI
ec.DenyPartialResponse = src.DenyPartialResponse
ec.IsPartialResponse.Store(src.IsPartialResponse.Load())
@@ -199,11 +208,12 @@ func (ec *EvalConfig) validate() {
}
}
func (ec *EvalConfig) mayCache() bool {
// MayCache returns true if the query results can be cached.
func (ec *EvalConfig) MayCache() bool {
if *disableCache {
return false
}
if !ec.MayCache {
if ec.NoCache {
return false
}
if ec.Start == ec.End {
@@ -261,7 +271,7 @@ func evalExpr(qt *querytracer.Tracer, ec *EvalConfig, e metricsql.Expr) ([]*time
if qt.Enabled() {
query := string(e.AppendString(nil))
query = stringsutil.LimitStringLen(query, 300)
mayCache := ec.mayCache()
mayCache := ec.MayCache()
qt = qt.NewChild("eval: query=%s, timeRange=%s, step=%d, mayCache=%v", query, ec.timeRangeString(), ec.Step, mayCache)
}
rv, err := evalExprInternal(qt, ec, e)
@@ -853,7 +863,7 @@ func evalRollupFuncWithoutAt(qt *querytracer.Tracer, ec *EvalConfig, funcName st
ecNew = copyEvalConfig(ecNew)
ecNew.Start -= offset
ecNew.End -= offset
// There is no need in calling AdjustStartEnd() on ecNew if ecNew.MayCache is set to true,
// There is no need in calling AdjustStartEnd() on ecNew if ecNew.NoCache is set to true,
// since the time range alignment has been already performed by the caller,
// so cache hit rate should be quite good.
// See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/976
@@ -1160,7 +1170,7 @@ func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string,
return tssCached, offset, nil
}
if !ec.mayCache() {
if !ec.MayCache() {
qt.Printf("do not apply instant rollup optimization because of disabled cache")
return evalAt(qt, timestamp, window)
}
@@ -1641,7 +1651,7 @@ func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcNa
}
return tss, nil
}
if !ec.mayCache() {
if !ec.MayCache() {
qt.Printf("do not fetch series from cache, since it is disabled in the current context")
return evalWithConfig(ec)
}
@@ -1932,9 +1942,8 @@ func doRollupForTimeseries(funcName string, keepMetricNames bool, rc *rollupConf
type timeseriesWithPadding struct {
tss []*timeseries
// The padding prevents false sharing on widespread platforms with
// 128 mod (cache line size) = 0 .
_ [128 - unsafe.Sizeof([]*timeseries{})%128]byte
// The padding prevents false sharing
_ [atomicutil.CacheLineSize - unsafe.Sizeof([]*timeseries{})%atomicutil.CacheLineSize]byte
}
type timeseriesByWorkerID struct {
@@ -2017,11 +2026,14 @@ func sumNoOverflow(a, b int64) int64 {
}
func dropStaleNaNs(funcName string, values []float64, timestamps []int64) ([]float64, []int64) {
if *noStaleMarkers || funcName == "default_rollup" || funcName == "stale_samples_over_time" {
if *noStaleMarkers || funcName == "stale_samples_over_time" ||
funcName == "default_rollup" || funcName == "increase" || funcName == "rate" {
// Do not drop Prometheus staleness marks (aka stale NaNs) for default_rollup() function,
// since it uses them for Prometheus-style staleness detection.
// Do not drop staleness marks for stale_samples_over_time() function, since it needs
// to calculate the number of staleness markers.
// Do not drop staleness marks for increase() and rate() function, so they could stop
// returning results for stale series. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8891
return values, timestamps
}
// Remove Prometheus staleness marks, so non-default rollup functions don't hit NaN values.

View File

@@ -71,7 +71,8 @@ var rollupFuncs = map[string]newRollupFunc{
"quantile_over_time": newRollupQuantile,
"quantiles_over_time": newRollupQuantiles,
"range_over_time": newRollupFuncOneArg(rollupRange),
"rate": newRollupFuncOneArg(rollupDerivFast), // + rollupFuncsRemoveCounterResets
"rate": newRollupFuncOneArg(rollupDerivFast), // + rollupFuncsRemoveCounterResets
"rate_prometheus": newRollupFuncOneArg(rollupDerivFastPrometheus), // + rollupFuncsRemoveCounterResets
"rate_over_sum": newRollupFuncOneArg(rollupRateOverSum),
"resets": newRollupFuncOneArg(rollupResets),
"rollup": newRollupFuncOneOrTwoArgs(rollupFake),
@@ -195,7 +196,7 @@ var rollupAggrFuncs = map[string]rollupFunc{
"zscore_over_time": rollupZScoreOverTime,
}
// VictoriaMetrics can extends lookbehind window for these functions
// VictoriaMetrics can extend lookbehind window for these functions
// in order to make sure it contains enough points for returning non-empty results.
//
// This is needed for returning the expected non-empty graphs when zooming in the graph in Grafana,
@@ -225,6 +226,7 @@ var rollupFuncsRemoveCounterResets = map[string]bool{
"increase_pure": true,
"irate": true,
"rate": true,
"rate_prometheus": true,
"rollup_increase": true,
"rollup_rate": true,
}
@@ -252,6 +254,7 @@ var rollupFuncsSamplesScannedPerCall = map[string]int{
"lifetime": 2,
"present_over_time": 1,
"rate": 2,
"rate_prometheus": 2,
"scrape_interval": 2,
"tfirst_over_time": 1,
"timestamp": 1,
@@ -374,11 +377,13 @@ func getRollupConfigs(funcName string, rf rollupFunc, expr metricsql.Expr, start
preFunc := func(_ []float64, _ []int64) {}
funcName = strings.ToLower(funcName)
// window > lookbackDelta could result in negative delta.
// See issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8342
stalenessInterval := lookbackDelta
if stalenessInterval != 0 && stalenessInterval < window {
stalenessInterval = window
if stalenessInterval != 0 {
// If stalenessInterval was set, it should additionally account for [window] range to cover following cases:
// * window > stalenessInterval, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8342
// * window captures prevValue in doInternal while removeCounterResets does not,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-3000735468
stalenessInterval += window
}
if rollupFuncsRemoveCounterResets[funcName] {
@@ -918,15 +923,18 @@ func getMaxPrevInterval(scrapeInterval int64) int64 {
return scrapeInterval + scrapeInterval/8
}
// removeCounterResets removes resets for rollup functions over counters - see rollupFuncsRemoveCounterResets
// it doesn't remove resets between samples with staleNaNs, or samples that exceed maxStalenessInterval
func removeCounterResets(values []float64, timestamps []int64, maxStalenessInterval int64) {
// There is no need in handling NaNs here, since they are impossible
// on values from vmstorage.
if len(values) == 0 {
return
}
var correction float64
prevValue := values[0]
for i, v := range values {
if decimal.IsStaleNaN(v) {
continue
}
d := v - prevValue
if d < 0 {
if (-d * 8) < prevValue {
@@ -1858,8 +1866,13 @@ func rollupIncreasePure(rfa *rollupFuncArg) float64 {
func rollupDelta(rfa *rollupFuncArg) float64 {
// There is no need in handling NaNs here, since they must be cleaned up
// before calling rollup funcs.
// before calling rollup funcs. Only StaleNaNs could remain in values - see dropStaleNaNs().
values := rfa.values
if len(values) > 0 && decimal.IsStaleNaN(values[len(values)-1]) {
// if last sample on interval is staleness marker then the selected series is expected
// to stop rendering immediately. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8891
return nan
}
prevValue := rfa.prevValue
if math.IsNaN(prevValue) {
if len(values) == 0 {
@@ -1943,10 +1956,23 @@ func rollupDerivSlow(rfa *rollupFuncArg) float64 {
return k
}
func rollupDerivFastPrometheus(rfa *rollupFuncArg) float64 {
delta := rollupDeltaPrometheus(rfa)
if math.IsNaN(delta) || rfa.window == 0 {
return nan
}
return delta / (float64(rfa.window) / 1e3)
}
func rollupDerivFast(rfa *rollupFuncArg) float64 {
// There is no need in handling NaNs here, since they must be cleaned up
// before calling rollup funcs.
// before calling rollup funcs. Only StaleNaNs could remain in values - see - see dropStaleNaNs().
values := rfa.values
if len(values) > 0 && decimal.IsStaleNaN(values[len(values)-1]) {
// if last sample on interval is staleness marker then the selected series is expected
// to stop rendering immediately. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8891
return nan
}
timestamps := rfa.timestamps
prevValue := rfa.prevValue
prevTimestamp := rfa.prevTimestamp

View File

@@ -241,7 +241,7 @@ func (rrc *rollupResultCache) GetSeries(qt *querytracer.Tracer, ec *EvalConfig,
at = ec.AuthTokens[0]
}
bb.B = marshalRollupResultCacheKeyForSeries(bb.B[:0], at, expr, window, ec.Step, ec.EnforcedTagFilterss)
bb.B = marshalRollupResultCacheKeyForSeries(bb.B[:0], at, expr, window, ec.Step, ec.CacheTagFilters)
metainfoBuf := rrc.c.Get(nil, bb.B)
if len(metainfoBuf) == 0 {
qt.Printf("nothing found")
@@ -263,7 +263,7 @@ func (rrc *rollupResultCache) GetSeries(qt *querytracer.Tracer, ec *EvalConfig,
if !ok {
mi.RemoveKey(key)
metainfoBuf = mi.Marshal(metainfoBuf[:0])
bb.B = marshalRollupResultCacheKeyForSeries(bb.B[:0], at, expr, window, ec.Step, ec.EnforcedTagFilterss)
bb.B = marshalRollupResultCacheKeyForSeries(bb.B[:0], at, expr, window, ec.Step, ec.CacheTagFilters)
rrc.c.Set(bb.B, metainfoBuf)
return nil, ec.Start
}
@@ -373,7 +373,7 @@ func (rrc *rollupResultCache) PutSeries(qt *querytracer.Tracer, ec *EvalConfig,
if !ec.IsMultiTenant {
at = ec.AuthTokens[0]
}
metainfoKey.B = marshalRollupResultCacheKeyForSeries(metainfoKey.B[:0], at, expr, window, ec.Step, ec.EnforcedTagFilterss)
metainfoKey.B = marshalRollupResultCacheKeyForSeries(metainfoKey.B[:0], at, expr, window, ec.Step, ec.CacheTagFilters)
metainfoBuf.B = rrc.c.Get(metainfoBuf.B[:0], metainfoKey.B)
var mi rollupResultCacheMetainfo
if len(metainfoBuf.B) > 0 {

View File

@@ -46,7 +46,7 @@ func TestRollupResultCache(t *testing.T) {
ProjectID: 843,
}},
MayCache: true,
NoCache: false,
}
me := &metricsql.MetricExpr{
LabelFilterss: [][]metricsql.LabelFilter{

View File

@@ -156,6 +156,14 @@ func TestRemoveCounterResets(t *testing.T) {
removeCounterResets(values, timestamps, 10)
testRowsEqual(t, values, timestamps, valuesExpected, timestamps)
// verify that staleNaNs are respected
// it is important to have counter reset in values below to trigger correction logic
values = []float64{2, 4, 2, decimal.StaleNaN}
timestamps = []int64{10, 20, 30, 40}
valuesExpected = []float64{2, 4, 6, decimal.StaleNaN}
removeCounterResets(values, timestamps, 10)
testRowsEqual(t, values, timestamps, valuesExpected, timestamps)
// verify results always increase monotonically with possible float operations precision error
values = []float64{34.094223, 2.7518, 2.140669, 0.044878, 1.887095, 2.546569, 2.490149, 0.045, 0.035684, 0.062454, 0.058296}
timestampsExpected = []int64{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
@@ -648,6 +656,7 @@ func TestRollupNewRollupFuncSuccess(t *testing.T) {
f("irate", 0)
f("outlier_iqr_over_time", nan)
f("rate", 2200)
f("rate_prometheus", 2200)
f("resets", 5)
f("range_over_time", 111)
f("avg_over_time", 47.083333333333336)
@@ -1525,16 +1534,31 @@ func testRowsEqual(t *testing.T, values []float64, timestamps []int64, valuesExp
i, ts, tsExpected, timestamps, timestampsExpected)
}
vExpected := valuesExpected[i]
if decimal.IsStaleNaN(v) {
if !decimal.IsStaleNaN(vExpected) {
t.Fatalf("unexpected stale NaN value at values[%d]; want %f\nvalues=\n%v\nvaluesExpected=\n%v",
i, vExpected, values, valuesExpected)
}
continue
}
// staleNaNBits == math.NaN(), but decimal.IsStaleNaN(math.NaN()) == false
// so we check for decimal.IsStaleNaN first.
if decimal.IsStaleNaN(vExpected) {
if !decimal.IsStaleNaN(v) {
t.Fatalf("unexpected value at values[%d]; got %f; want stale NaN\nvalues=\n%v\nvaluesExpected=\n%v",
i, v, values, valuesExpected)
}
}
if math.IsNaN(v) {
if !math.IsNaN(vExpected) {
t.Fatalf("unexpected nan value at values[%d]; want %f\nvalues=\n%v\nvaluesExpected=\n%v",
t.Fatalf("unexpected NaN value at values[%d]; want %f\nvalues=\n%v\nvaluesExpected=\n%v",
i, vExpected, values, valuesExpected)
}
continue
}
if math.IsNaN(vExpected) {
if !math.IsNaN(v) {
t.Fatalf("unexpected value at values[%d]; got %f; want nan\nvalues=\n%v\nvaluesExpected=\n%v",
t.Fatalf("unexpected value at values[%d]; got %f; want NaN\nvalues=\n%v\nvaluesExpected=\n%v",
i, v, values, valuesExpected)
}
continue
@@ -1608,6 +1632,33 @@ func TestRollupDelta(t *testing.T) {
f(100, nan, nan, nil, 0)
}
func TestRollupDerivFastPrometheus(t *testing.T) {
f := func(values []float64, window int64, resultExpected float64) {
t.Helper()
rfa := &rollupFuncArg{
values: values,
window: window,
}
result := rollupDerivFastPrometheus(rfa)
if math.IsNaN(result) {
if !math.IsNaN(resultExpected) {
t.Fatalf("unexpected result; got %v; want %v", result, resultExpected)
}
return
}
if result != resultExpected {
t.Fatalf("unexpected result; got %v; want %v", result, resultExpected)
}
}
f(nil, 0, nan)
f(nil, 10, nan)
f([]float64{0, 10}, 0, nan)
f([]float64{10}, 10, nan)
f([]float64{0, 20}, 10e3, 2)
f([]float64{0, 10, 20}, 10e3, 2)
}
func TestRollupDeltaWithStaleness(t *testing.T) {
// there is a gap between samples in the dataset below
timestamps := []int64{0, 15000, 30000, 70000}
@@ -1746,6 +1797,28 @@ func TestRollupDeltaWithStaleness(t *testing.T) {
timestampsExpected := []int64{0, 30e3, 60e3, 90e3}
testRowsEqual(t, gotValues, rc.Timestamps, valuesExpected, timestampsExpected)
})
// the last sample is stale NaN
timestamps = []int64{0, 10000, 20000, 30000, 40000}
values = []float64{0, 0, 0, 10, decimal.StaleNaN}
t.Run("last point is stale nan", func(t *testing.T) {
rc := rollupConfig{
Func: rollupDelta,
Start: 40001,
End: 40001,
Step: 50000,
Window: 0,
MaxPointsPerSeries: 1e4,
}
rc.Timestamps = rc.getTimestamps()
gotValues, samplesScanned := rc.Do(nil, values, timestamps)
if samplesScanned != 10 {
t.Fatalf("expecting 10 samplesScanned from rollupConfig.Do; got %d", samplesScanned)
}
valuesExpected := []float64{nan}
timestampsExpected := []int64{40001}
testRowsEqual(t, gotValues, rc.Timestamps, valuesExpected, timestampsExpected)
})
}
func TestRollupIncreasePureWithStaleness(t *testing.T) {
@@ -1860,3 +1933,48 @@ func TestRollupIncreasePureWithStaleness(t *testing.T) {
testRowsEqual(t, gotValues, rc.Timestamps, valuesExpected, timestampsExpected)
})
}
func TestRollupDerivFastWithStaleness(t *testing.T) {
timestamps := []int64{0, 10000, 20000, 30000, 40000}
values := []float64{0, 0, 0, 0, 10}
t.Run("no stale marker", func(t *testing.T) {
rc := rollupConfig{
Func: rollupDerivFast,
Start: 40001,
End: 40001,
Step: 50000,
Window: 0,
MaxPointsPerSeries: 1e4,
}
rc.Timestamps = rc.getTimestamps()
gotValues, samplesScanned := rc.Do(nil, values, timestamps)
if samplesScanned != 10 {
t.Fatalf("expecting 10 samplesScanned from rollupConfig.Do; got %d", samplesScanned)
}
valuesExpected := []float64{0.25}
timestampsExpected := []int64{40001}
testRowsEqual(t, gotValues, rc.Timestamps, valuesExpected, timestampsExpected)
})
// the last sample is stale NaN
timestamps = []int64{0, 10000, 20000, 30000, 40000}
values = []float64{0, 0, 0, 10, decimal.StaleNaN}
t.Run("last point is stale nan", func(t *testing.T) {
rc := rollupConfig{
Func: rollupDerivFast,
Start: 40001,
End: 40001,
Step: 50000,
Window: 0,
MaxPointsPerSeries: 1e4,
}
rc.Timestamps = rc.getTimestamps()
gotValues, samplesScanned := rc.Do(nil, values, timestamps)
if samplesScanned != 10 {
t.Fatalf("expecting 10 samplesScanned from rollupConfig.Do; got %d", samplesScanned)
}
valuesExpected := []float64{nan}
timestampsExpected := []int64{40001}
testRowsEqual(t, gotValues, rc.Timestamps, valuesExpected, timestampsExpected)
})
}

View File

@@ -742,7 +742,23 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
This function is supported by PromQL.
See also [irate](#irate) and [rollup_rate](#rollup_rate).
See also [irate](#irate), [rollup_rate](#rollup_rate) and [rate_prometheus](#rate_prometheus).
#### rate_prometheus
`rate_prometheus(series_selector[d])` {{% available_from "#" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
increase rate over the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#filtering).
The resulting calculation is equivalent to `increase_prometheus(series_selector[d]) / d`.
It doesn't take into account the last sample before the given lookbehind window `d` when calculating the result in the same way as Prometheus does.
See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details.
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
This function is usually applied to [counters](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#counter).
See also [increase_prometheus](#increase_prometheus) and [rate](#rate).
#### rate_over_sum

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -36,10 +36,10 @@
<meta property="og:title" content="UI for VictoriaMetrics">
<meta property="og:url" content="https://victoriametrics.com/">
<meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data">
<script type="module" crossorigin src="./assets/index-xmjGcv4-.js"></script>
<script type="module" crossorigin src="./assets/index-BiQY-19a.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-D8IJGiEn.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-D1GxaB_c.css">
<link rel="stylesheet" crossorigin href="./assets/index-C85_NB5q.css">
<link rel="stylesheet" crossorigin href="./assets/index-ojCMu5lE.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>

View File

@@ -76,6 +76,8 @@ var (
cacheSizeStorageTSID = flagutil.NewBytes("storage.cacheSizeStorageTSID", 0, "Overrides max size for storage/tsid cache. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning")
cacheSizeStorageMetricName = flagutil.NewBytes("storage.cacheSizeStorageMetricName", 0, "Overrides max size for storage/metricName cache. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning")
cacheSizeIndexDBIndexBlocks = flagutil.NewBytes("storage.cacheSizeIndexDBIndexBlocks", 0, "Overrides max size for indexdb/indexBlocks cache. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning")
cacheSizeIndexDBDataBlocks = flagutil.NewBytes("storage.cacheSizeIndexDBDataBlocks", 0, "Overrides max size for indexdb/dataBlocks cache. "+
@@ -125,6 +127,7 @@ func main() {
}
storage.SetFinalDedupScheduleInterval(*finalDedupScheduleInterval)
storage.SetMetricNamesStatsCacheSize(cacheSizeMetricNamesStats.IntN())
storage.SetMetricNameCacheSize(cacheSizeStorageMetricName.IntN())
mergeset.SetIndexBlocksCacheSize(cacheSizeIndexDBIndexBlocks.IntN())
mergeset.SetDataBlocksCacheSize(cacheSizeIndexDBDataBlocks.IntN())
mergeset.SetDataBlocksSparseCacheSize(cacheSizeIndexDBDataBlocksSparse.IntN())
@@ -571,6 +574,15 @@ func writeStorageMetrics(w io.Writer, strg *storage.Storage) {
metrics.WriteCounterUint64(w, `vm_cache_misses_total{type="indexdb/tagFiltersToMetricIDs"}`, idbm.TagFiltersToMetricIDsCacheMisses)
metrics.WriteCounterUint64(w, `vm_cache_misses_total{type="storage/regexps"}`, storage.RegexpCacheMisses())
metrics.WriteCounterUint64(w, `vm_cache_misses_total{type="storage/regexpPrefixes"}`, storage.RegexpPrefixesCacheMisses())
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/tsid", reason="cache_size"}`, m.TSIDCacheSizeEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/tsid", reason="miss_percentage"}`, m.TSIDCacheMissEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/tsid", reason="expiration"}`, m.TSIDCacheExpireEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricName", reason="cache_size"}`, m.MetricNameCacheSizeEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricName", reason="miss_percentage"}`, m.MetricNameCacheMissEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricName", reason="expiration"}`, m.MetricNameCacheExpireEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricIDs", reason="cache_size"}`, m.MetricIDCacheSizeEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricIDs", reason="miss_percentage"}`, m.MetricIDCacheMissEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricIDs", reason="expiration"}`, m.MetricIDCacheExpireEvictionBytes)
metrics.WriteCounterUint64(w, `vm_deleted_metrics_total{type="indexdb"}`, idbm.DeletedMetricsCount)

View File

@@ -1,4 +1,4 @@
FROM golang:1.24.3 AS build-web-stage
FROM golang:1.24.4 AS build-web-stage
COPY build /build
WORKDIR /build
@@ -6,7 +6,7 @@ COPY web/ /build/
RUN GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o web-amd64 github.com/VictoriMetrics/vmui/ && \
GOOS=windows GOARCH=amd64 CGO_ENABLED=0 go build -o web-windows github.com/VictoriMetrics/vmui/
FROM alpine:3.21.3
FROM alpine:3.22.0
USER root
COPY --from=build-web-stage /build/web-amd64 /app/web

View File

@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="/manifest.json"/>
<link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.

View File

@@ -2,9 +2,9 @@
<html lang="en">
<head>
<meta charset="utf-8"/>
<link rel="icon" href="/favicon.svg" />
<link rel="apple-touch-icon" href="/favicon.svg" />
<link rel="mask-icon" href="/favicon.svg" color="#000000">
<link rel="icon" href="/favicon.victorialogs.svg" />
<link rel="apple-touch-icon" href="/favicon.victorialogs.svg" />
<link rel="mask-icon" href="/favicon.victorialogs.svg" color="#000000">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=5"/>
<meta name="theme-color" content="#000000"/>
@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="/manifest.json"/>
<link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.

View File

@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="/manifest.json"/>
<link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.

View File

@@ -10,6 +10,8 @@
"dependencies": {
"@types/lodash.debounce": "^4.0.9",
"@types/lodash.get": "^4.4.9",
"@types/lodash.orderBy": "^4.6.9",
"@types/lodash.throttle": "^4.1.9",
"@types/qs": "^6.9.18",
"@types/react": "^19.1.2",
"@types/react-input-mask": "^3.0.6",
@@ -18,6 +20,8 @@
"dayjs": "^1.11.13",
"lodash.debounce": "^4.0.8",
"lodash.get": "^4.4.2",
"lodash.orderBy": "^4.6.0",
"lodash.throttle": "^4.1.1",
"marked": "^15.0.8",
"marked-emoji": "^2.0.0",
"preact": "^10.26.5",
@@ -2191,6 +2195,24 @@
"@types/lodash": "*"
}
},
"node_modules/@types/lodash.orderBy": {
"version": "4.6.9",
"resolved": "https://registry.npmjs.org/@types/lodash.orderby/-/lodash.orderby-4.6.9.tgz",
"integrity": "sha512-T9o2wkIJOmxXwVTPTmwJ59W6eTi2FseiLR369fxszG649Po/xe9vqFNhf/MtnvT5jrbDiyWKxPFPZbpSVK0SVQ==",
"license": "MIT",
"dependencies": {
"@types/lodash": "*"
}
},
"node_modules/@types/lodash.throttle": {
"version": "4.1.9",
"resolved": "https://registry.npmjs.org/@types/lodash.throttle/-/lodash.throttle-4.1.9.tgz",
"integrity": "sha512-PCPVfpfueguWZQB7pJQK890F2scYKoDUL3iM522AptHWn7d5NQmeS/LTEHIcLr5PaTzl3dK2Z0xSUHHTHwaL5g==",
"license": "MIT",
"dependencies": {
"@types/lodash": "*"
}
},
"node_modules/@types/node": {
"version": "22.14.1",
"resolved": "https://registry.npmjs.org/@types/node/-/node-22.14.1.tgz",
@@ -5742,6 +5764,18 @@
"dev": true,
"license": "MIT"
},
"node_modules/lodash.orderBy": {
"version": "4.6.0",
"resolved": "https://registry.npmjs.org/lodash.orderby/-/lodash.orderby-4.6.0.tgz",
"integrity": "sha512-T0rZxKmghOOf5YPnn8EY5iLYeWCpZq8G41FfqoVHH5QDTAFaghJRmAdLiadEDq+ztgM2q5PjA+Z1fOwGrLgmtg==",
"license": "MIT"
},
"node_modules/lodash.throttle": {
"version": "4.1.1",
"resolved": "https://registry.npmjs.org/lodash.throttle/-/lodash.throttle-4.1.1.tgz",
"integrity": "sha512-wIkUCfVKpVsWo3JSZlc+8MB5it+2AN5W8J7YVMST30UrvcQNZ1Okbj+rbVniijTWE6FGYy4XJq/rHkas8qJMLQ==",
"license": "MIT"
},
"node_modules/loose-envify": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/loose-envify/-/loose-envify-1.4.0.tgz",

View File

@@ -7,6 +7,8 @@
"dependencies": {
"@types/lodash.debounce": "^4.0.9",
"@types/lodash.get": "^4.4.9",
"@types/lodash.orderBy": "^4.6.9",
"@types/lodash.throttle": "^4.1.9",
"@types/qs": "^6.9.18",
"@types/react": "^19.1.2",
"@types/react-input-mask": "^3.0.6",
@@ -15,6 +17,8 @@
"dayjs": "^1.11.13",
"lodash.debounce": "^4.0.8",
"lodash.get": "^4.4.2",
"lodash.orderBy": "^4.6.0",
"lodash.throttle": "^4.1.1",
"marked": "^15.0.8",
"marked-emoji": "^2.0.0",
"preact": "^10.26.5",

View File

@@ -0,0 +1,5 @@
<svg width="48" height="48" fill="#e94600" xmlns="http://www.w3.org/2000/svg">
<path d="M24.5475 0C10.3246.0265251 1.11379 3.06365 4.40623 6.10077c0 0 12.32997 11.23333 16.58217 14.84083.8131.6896 2.1728 1.1936 3.5191 1.2201h.1199c1.3463-.0265 2.706-.5305 3.5191-1.2201 4.2522-3.5942 16.5422-14.84083 16.5422-14.84083C48.0478 3.06365 38.8636.0265251 24.6674 0"/>
<path d="M28.1579 27.0159c-.8131.6896-2.1728 1.1936-3.5191 1.2201h-.12c-1.3463-.0265-2.7059-.5305-3.519-1.2201-2.9725-2.5067-13.35639-11.87-17.26201-15.3979v5.4112c0 .5968.22661 1.3793.6265 1.7506C7.00358 21.1936 17.2675 30.5437 20.9731 33.6737c.8132.6896 2.1728 1.1936 3.5191 1.2201h.12c1.3463-.0265 2.7059-.5305 3.519-1.2201 3.679-3.13 13.9429-12.4536 16.6089-14.8939.4132-.3713.6265-1.1538.6265-1.7506V11.618c-3.9323 3.5411-14.3162 12.931-17.2354 15.3979h.0267Z"/>
<path d="M28.1579 39.748c-.8131.6897-2.1728 1.1937-3.5191 1.2202h-.12c-1.3463-.0265-2.7059-.5305-3.519-1.2202-2.9725-2.4933-13.35639-11.8567-17.26201-15.3978v5.4111c0 .5969.22661 1.3793.6265 1.7507C7.00358 33.9258 17.2675 43.2759 20.9731 46.4058c.8132.6897 2.1728 1.1937 3.5191 1.2202h.12c1.3463-.0265 2.7059-.5305 3.519-1.2202 3.679-3.1299 13.9429-12.4535 16.6089-14.8938.4132-.3714.6265-1.1538.6265-1.7507v-5.4111c-3.9323 3.5411-14.3162 12.931-17.2354 15.3978h.0267Z"/>
</svg>

After

Width:  |  Height:  |  Size: 1.3 KiB

View File

@@ -742,7 +742,23 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
This function is supported by PromQL.
See also [irate](#irate) and [rollup_rate](#rollup_rate).
See also [irate](#irate), [rollup_rate](#rollup_rate) and [rate_prometheus](#rate_prometheus).
#### rate_prometheus
`rate_prometheus(series_selector[d])` {{% available_from "#" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
increase rate over the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#filtering).
The resulting calculation is equivalent to `increase_prometheus(series_selector[d]) / d`.
It doesn't take into account the last sample before the given lookbehind window `d` when calculating the result in the same way as Prometheus does.
See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details.
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
This function is usually applied to [counters](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#counter).
See also [increase_prometheus](#increase_prometheus) and [rate](#rate).
#### rate_over_sum

View File

@@ -33,8 +33,12 @@ const LogsQueryEditorAutocomplete: FC<QueryEditorAutocompleteProps> = ({
const part = logicalParts.find(p => caretPosition[0] >= p.position[0] && caretPosition[0] <= p.position[1]);
if (!part) return;
const cursorStartPosition = caretPosition[0] - part.position[0];
const prevPart = logicalParts.find(p => p.id === part.id - 1);
const queryBeforeIncompleteFilter = prevPart ? value.substring(0, prevPart.position[1] + 1) : undefined;
return {
...part,
queryBeforeIncompleteFilter,
query: value,
...getContextData(part, cursorStartPosition)
};
}, [logicalParts, caretPosition]);
@@ -50,6 +54,8 @@ const LogsQueryEditorAutocomplete: FC<QueryEditorAutocompleteProps> = ({
return fieldValues;
case ContextType.PipeName:
return pipeList;
case ContextType.FilterOrPipeName:
return [...fieldNames, ...pipeList];
default:
return [];
}
@@ -58,7 +64,7 @@ const LogsQueryEditorAutocomplete: FC<QueryEditorAutocompleteProps> = ({
const getUpdatedValue = (insertValue: string, logicalParts: LogicalPart[], id?: number) => {
return logicalParts.reduce((acc, part) => {
const value = part.id === id ? insertValue : part.value;
const separator = part.type === LogicalPartType.Pipe ? " | " : " ";
const separator = part.separator === "|" ? " | " : " ";
return `${acc}${separator}${value}`;
}, "").trim();
};
@@ -70,7 +76,7 @@ const LogsQueryEditorAutocomplete: FC<QueryEditorAutocompleteProps> = ({
modifiedInsert += ":";
} else if (contextType === ContextType.FilterValue) {
const insertWithQuotes = value.startsWith("_stream:") ? modifiedInsert : `${JSON.stringify(modifiedInsert)}`;
modifiedInsert = `${contextData?.filterName || ""}:${insertWithQuotes}`;
modifiedInsert = `${contextData?.filterName || ""}${contextData?.operator || ":"}${insertWithQuotes}`;
}
return modifiedInsert;
@@ -86,7 +92,13 @@ const LogsQueryEditorAutocomplete: FC<QueryEditorAutocompleteProps> = ({
const insertValue = getModifyInsert(insert, contextType, value, item.type);
const newValue = getUpdatedValue(insertValue, logicalParts, id);
const updatedPosition = (position[0] || 1) + insertValue.length + (item.type === ContextType.PipeName ? 1 : 0);
const logicalPart = logicalParts.find(p => p.id === id);
const getPositionCorrection = () => {
if (logicalPart?.type === LogicalPartType.FilterOrPipe) return 1;
if (item.type === ContextType.PipeName) return 1;
return 0;
};
const updatedPosition = (position[0] || 1) + insertValue.length + getPositionCorrection();
onSelect(newValue, updatedPosition);
}, [contextData, logicalParts]);

View File

@@ -9,6 +9,7 @@ export const splitLogicalParts = (expr: string) => {
const input = expr; //.replace(/\s*:\s*/g, ":");
const parts: LogicalPart[] = [];
let currentPart = "";
let separator: undefined | " " | "|" = undefined;
let isPipePart = false;
const quotes = ["'", "\"", "`"];
@@ -43,8 +44,9 @@ export const splitLogicalParts = (expr: string) => {
isPipePart = true;
const countStartSpaces = currentPart.match(/^ */)?.[0].length || 0;
const countEndSpaces = currentPart.match(/ *$/)?.[0].length || 0;
pushPart(currentPart, true, [startIndex + countStartSpaces, i - countEndSpaces - 1], parts);
pushPart(currentPart, true, [startIndex + countStartSpaces, i - countEndSpaces - 1], parts, separator);
currentPart = "";
separator = "|";
startIndex = i + 1;
continue;
}
@@ -54,7 +56,8 @@ export const splitLogicalParts = (expr: string) => {
const nextStr = input.slice(i).replace(/^\s*/, "");
const prevStr = input.slice(0, i).replace(/\s*$/, "");
if (!nextStr.startsWith(":") && !prevStr.endsWith(":")) {
pushPart(currentPart, false, [startIndex, i - 1], parts);
pushPart(currentPart, false, [startIndex, i - 1], parts, separator);
separator = " ";
currentPart = "";
startIndex = i + 1;
continue;
@@ -65,26 +68,35 @@ export const splitLogicalParts = (expr: string) => {
}
// push the last part
pushPart(currentPart, isPipePart, [startIndex, input.length], parts);
pushPart(currentPart, isPipePart, [startIndex, input.length], parts, separator);
return parts;
};
const pushPart = (currentPart: string, isPipePart: boolean, position: LogicalPartPosition, parts: LogicalPart[]) => {
const pushPart = (currentPart: string, isPipePart: boolean, position: LogicalPartPosition, parts: LogicalPart[], separator: LogicalPart["separator"]) => {
const trimmedPart = currentPart.trim();
if (!trimmedPart) return;
const isOperator = BUILDER_OPERATORS.includes(trimmedPart.toUpperCase());
const pipesTypes = [LogicalPartType.Pipe, LogicalPartType.FilterOrPipe];
const isPreviousPartPipe = parts.length > 0 && pipesTypes.includes(parts[parts.length - 1].type);
const getType = () => {
if (isPreviousPartPipe) return LogicalPartType.FilterOrPipe;
if (isPipePart) return LogicalPartType.Pipe;
if (isOperator) return LogicalPartType.Operator;
return LogicalPartType.Filter;
};
parts.push({
id: parts.length,
value: trimmedPart,
position,
type: isPipePart
? LogicalPartType.Pipe
: isOperator ? LogicalPartType.Operator : LogicalPartType.Filter,
type: getType(),
separator,
});
};
export const getContextData = (part: LogicalPart, cursorPos: number) => {
export const getContextData = (part: LogicalPart, cursorPos: number): ContextData => {
const valueBeforeCursor = part.value.substring(0, cursorPos);
const valueAfterCursor = part.value.substring(cursorPos);
@@ -95,23 +107,91 @@ export const getContextData = (part: LogicalPart, cursorPos: number) => {
contextType: ContextType.Unknown,
};
if (part.type === LogicalPartType.Filter) {
const noColon = !valueBeforeCursor.includes(":") && !valueAfterCursor.includes(":");
if (noColon) {
metaData.contextType = ContextType.FilterUnknown;
} else if (valueBeforeCursor.includes(":")) {
const [filterName, filterValue] = valueBeforeCursor.split(":");
metaData.contextType = ContextType.FilterValue;
metaData.filterName = filterName;
metaData.valueContext = filterValue;
} else {
metaData.contextType = ContextType.FilterName;
}
} else if (part.type === LogicalPartType.Pipe) {
const valueStartWithPipe = PIPE_NAMES.some(p => part.value.startsWith(p));
metaData.contextType = valueStartWithPipe ? ContextType.PipeValue : ContextType.PipeName;
}
// Determine context type based on logical part type
determineContextType(part, valueBeforeCursor, valueAfterCursor, metaData);
// Clean up quotes in valueContext
metaData.valueContext = metaData.valueContext.replace(/^["']|["']$/g, "");
return metaData;
};
/** Helper function to determine if a string starts with any of the pipe names */
const startsWithPipe = (value: string): boolean => {
return PIPE_NAMES.some(p => value.startsWith(p));
};
/** Helper function to check for colon presence */
const hasNoColon = (before: string, after: string): boolean => {
return !before.includes(":") && !after.includes(":");
};
/** Helper function to extract filter name and update metadata for filter values */
const handleFilterValue = (valueBeforeCursor: string, metaData: ContextData): void => {
const [filterName, ...filterValue] = valueBeforeCursor.split(":");
metaData.contextType = ContextType.FilterValue;
metaData.filterName = filterName;
const enhanceOperators = ["=", "-", "!", "~", "<", ">", "<=", ">="] as const;
const enhanceOperator = enhanceOperators.find(op => op === filterValue[0]);
if (enhanceOperator) {
metaData.valueContext = filterValue.slice(1).join(":");
metaData.operator = `:${enhanceOperator}`;
} else {
metaData.valueContext = filterValue.join(":");
metaData.operator = ":";
}
};
/** Function to determine context type based on part type and value */
const determineContextType = (
part: LogicalPart,
valueBeforeCursor: string,
valueAfterCursor: string,
metaData: ContextData
): void => {
switch (part.type) {
case LogicalPartType.Filter:
handleFilterType(valueBeforeCursor, valueAfterCursor, metaData);
break;
case LogicalPartType.Pipe:
metaData.contextType = startsWithPipe(part.value)
? ContextType.PipeValue
: ContextType.PipeName;
break;
case LogicalPartType.FilterOrPipe:
handleFilterOrPipeType(part.value, valueBeforeCursor, metaData);
break;
}
};
/** Handle filter type context determination */
const handleFilterType = (
valueBeforeCursor: string,
valueAfterCursor: string,
metaData: ContextData
): void => {
if (hasNoColon(valueBeforeCursor, valueAfterCursor)) {
metaData.contextType = ContextType.FilterUnknown;
} else if (valueBeforeCursor.includes(":")) {
handleFilterValue(valueBeforeCursor, metaData);
} else {
metaData.contextType = ContextType.FilterName;
}
};
/** Handle FilterOrPipeType context determination */
const handleFilterOrPipeType = (
value: string,
valueBeforeCursor: string,
metaData: ContextData
): void => {
if (startsWithPipe(value)) {
metaData.contextType = ContextType.PipeValue;
} else if (valueBeforeCursor.includes(":")) {
handleFilterValue(valueBeforeCursor, metaData);
} else {
metaData.contextType = ContextType.FilterOrPipeName;
}
};

View File

@@ -2,15 +2,19 @@ export enum LogicalPartType {
Filter = "Filter",
Pipe = "Pipe",
Operator = "Operator",
FilterOrPipe = "FilterOrPipe",
}
export type LogicalPartPosition = [start: number, end: number];
export type LogicalPartSeparator = " " | "|";
export interface LogicalPart {
id: number;
value: string;
type: LogicalPartType;
position: LogicalPartPosition;
separator?: LogicalPartSeparator;
}
export interface ContextData {
@@ -19,6 +23,10 @@ export interface ContextData {
contextType: ContextType;
valueContext: string;
filterName?: string;
query?: string;
queryBeforeIncompleteFilter?: string;
separator?: LogicalPartSeparator;
operator?: ":" | ":!" | ":-" | ":=" | ":~" | ":<" | ":>" | ":<=" | ":>=";
}
export enum ContextType {
@@ -28,4 +36,5 @@ export enum ContextType {
PipeName = "Pipes",
PipeValue = "PipeValue",
Unknown = "Unknown",
FilterOrPipeName = "FilterOrPipeName",
}

View File

@@ -10,11 +10,11 @@ import { AUTOCOMPLETE_LIMITS } from "../../../../constants/queryAutocomplete";
import { LogsFiledValues } from "../../../../api/types";
import { useLogsDispatch, useLogsState } from "../../../../state/logsPanel/LogsStateContext";
import { useTenant } from "../../../../hooks/useTenant";
import { generateQuery } from "./utils";
type FetchDataArgs = {
urlSuffix: string;
setter: Dispatch<SetStateAction<AutocompleteOptions[]>>
type: ContextType;
setter: (value: LogsFiledValues[]) => void;
params?: URLSearchParams;
}
@@ -24,7 +24,8 @@ const icons = {
[ContextType.FilterValue]: <ValueIcon/>,
[ContextType.PipeName]: <FunctionIcon/>,
[ContextType.PipeValue]: <LabelIcon/>,
[ContextType.Unknown]: <ValueIcon/>
[ContextType.Unknown]: <ValueIcon/>,
[ContextType.FilterOrPipeName]: <FunctionIcon/>
};
export const useFetchLogsQLOptions = (contextData?: ContextData) => {
@@ -61,7 +62,7 @@ export const useFetchLogsQLOptions = (contextData?: ContextData) => {
}));
};
const fetchData = async ({ urlSuffix, setter, type, params }: FetchDataArgs) => {
const fetchData = async ({ urlSuffix, setter, params }: FetchDataArgs) => {
abortControllerRef.current.abort();
abortControllerRef.current = new AbortController();
const { signal } = abortControllerRef.current;
@@ -73,7 +74,7 @@ export const useFetchLogsQLOptions = (contextData?: ContextData) => {
try {
const cachedData = autocompleteCache.get(key);
if (cachedData) {
setter(processData(cachedData, type));
setter(cachedData);
setLoading(false);
return;
}
@@ -86,7 +87,7 @@ export const useFetchLogsQLOptions = (contextData?: ContextData) => {
if (response.ok) {
const data = await response.json();
const value = (data?.values || []) as LogsFiledValues[];
setter(value ? processData(value, type) : []);
setter(value || []);
dispatch({ type: "SET_AUTOCOMPLETE_CACHE", payload: { key, value } });
}
setLoading(false);
@@ -101,7 +102,7 @@ export const useFetchLogsQLOptions = (contextData?: ContextData) => {
// fetch field names
useEffect(() => {
const validContexts = [ContextType.FilterName, ContextType.FilterUnknown];
const validContexts = [ContextType.FilterName, ContextType.FilterUnknown, ContextType.FilterOrPipeName];
const isInvalidContext = !validContexts.includes(contextData?.contextType || ContextType.Unknown);
if (!serverUrl || isInvalidContext) {
return;
@@ -109,11 +110,14 @@ export const useFetchLogsQLOptions = (contextData?: ContextData) => {
setFieldNames([]);
const setter = (filterNames: LogsFiledValues[]) => {
setFieldNames(processData(filterNames, ContextType.FilterName));
};
fetchData({
urlSuffix: "field_names",
setter: setFieldNames,
type: ContextType.FilterName,
params: getQueryParams({ query: "*" })
setter: setter,
params: getQueryParams({ query: contextData?.queryBeforeIncompleteFilter || "*" })
});
return () => abortControllerRef.current?.abort();
@@ -128,11 +132,14 @@ export const useFetchLogsQLOptions = (contextData?: ContextData) => {
setFieldValues([]);
const setter = (filterValues: LogsFiledValues[]) => {
setFieldValues(processData(filterValues, ContextType.FilterValue));
};
fetchData({
urlSuffix: "field_values",
setter: setFieldValues,
type: ContextType.FilterValue,
params: getQueryParams({ query: "*", field: contextData.filterName })
setter: setter,
params: getQueryParams({ query: generateQuery(contextData), field: contextData.filterName })
});
return () => abortControllerRef.current?.abort();

View File

@@ -0,0 +1,131 @@
import { expect } from "vitest";
import { generateQuery } from "./utils";
import { ContextType } from "./types";
describe("utils", () => {
describe("_time", () => {
it("should return the trimmed value by `-`", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "_stream:{type=\"WatchEvent\"}",
contextType: ContextType.FilterValue,
filterName: "_time",
query: "_stream:{type=\"WatchEvent\"} _time:2025-04-1",
valueAfterCursor: "",
valueBeforeCursor: "_time=2025-04-1",
valueContext: "2025-04-1"
})).toStrictEqual("_stream:{type=\"WatchEvent\"} _time:2025-04");
});
it("should return the trimmed value by `:` if char `-` also exist in the query", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "_stream:{type=\"WatchEvent\"}",
contextType: ContextType.FilterValue,
filterName: "_time",
query: "_stream:{type=\"WatchEvent\"} _time:2025-04-10T23:45:5",
valueAfterCursor: "",
valueBeforeCursor: "_time=2025-04-10T23:45:5",
valueContext: "2025-04-10T23:45:5"
})).toStrictEqual("_stream:{type=\"WatchEvent\"} _time:2025-04-10T23:45");
});
it("should return default `*` instead of -time filter", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "_stream:{type=\"WatchEvent\"}",
contextType: ContextType.FilterValue,
filterName: "_time",
query: "_stream:{type=\"WatchEvent\"} _time:202",
valueAfterCursor: "",
valueBeforeCursor: "_time=202",
valueContext: "202"
})).toStrictEqual("_stream:{type=\"WatchEvent\"} *");
});
});
describe("_stream", () => {
it("should add regexp to filter value", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "",
contextType: ContextType.FilterValue,
filterName: "_stream",
query: "_stream:{type=\"WatchEve",
valueAfterCursor: "",
valueBeforeCursor: "_stream:{type=\"WatchEve",
valueContext: "{type=\"WatchEve"
})).toStrictEqual("_stream:{type=~\"WatchEve.*\"}");
});
it("should add regexp to filter value if cursor in the middle of value", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "",
contextType: ContextType.FilterValue,
filterName: "_stream",
query: "_stream:{type=\"WatchEve\"}",
valueAfterCursor: "",
valueBeforeCursor: "_stream:{type=\"WatchEve",
valueContext: "{type=\"WatchEve"
})).toStrictEqual("_stream:{type=~\"WatchEve.*\"}");
});
it("should return * if do not have value after =", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "",
contextType: ContextType.FilterValue,
filterName: "_stream",
query: "_stream:{type=",
valueAfterCursor: "",
valueBeforeCursor: "_stream:{type=",
valueContext: "{type="
})).toStrictEqual("*");
});
});
it("_msg", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "_stream:{type=\"WatchEvent\"}",
contextType: ContextType.FilterValue,
filterName: "_msg",
query: "_stream:{type=\"WatchEvent\"} _msg:453",
valueAfterCursor: "",
valueBeforeCursor: "_msg:453",
valueContext: "453"
})).toStrictEqual("_stream:{type=\"WatchEvent\"} *");
});
it("_stream_id", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "_stream:{type=\"WatchEvent\"}",
contextType: ContextType.FilterValue,
filterName: "_stream_id",
query: "_stream:{type=\"WatchEvent\"} _stream_id:453",
valueAfterCursor: "",
valueBeforeCursor: "_stream_id:453",
valueContext: "453"
})).toStrictEqual("_stream:{type=\"WatchEvent\"} *");
});
describe("other fields", () => {
it("should add prefix filter to other type of field names", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "",
contextType: ContextType.FilterValue,
filterName: "repo.name",
query: "repo.name:Victori",
valueAfterCursor: "",
valueBeforeCursor: "repo.name:Victori",
valueContext: "Victori"
})).toStrictEqual("repo.name:Victori*");
});
it("should add prefix filter to other type of field names with escaped via double quote", () => {
expect(generateQuery({
queryBeforeIncompleteFilter: "",
contextType: ContextType.FilterValue,
filterName: "repo.name",
query: "repo.name:\"Victori",
valueAfterCursor: "",
valueBeforeCursor: "repo.name:\"Victori",
valueContext: "Victori"
})).toStrictEqual("repo.name:Victori*");
});
});
});

View File

@@ -0,0 +1,61 @@
import { ContextData } from "./types";
const getStreamFieldQuery = (valueContext: string) => {
if (valueContext.includes("=")) {
const [fieldName, fieldValue] = valueContext.split("=");
if (fieldValue) {
return `_stream:${fieldName}=~${fieldValue}.*"}`;
}
}
return "*";
};
const getLastPartUntilDelimiter = (value: string, delimiter: string) => {
const lastIndexOfDelimiter = value.lastIndexOf(delimiter);
return lastIndexOfDelimiter !== -1 ? value.slice(0, lastIndexOfDelimiter) : "";
};
const getDateQuery = (contextData: ContextData) => {
let fieldValue = "";
if (contextData.valueContext.includes(":")) {
fieldValue = getLastPartUntilDelimiter(contextData.valueContext, ":");
} else if (contextData.valueContext.includes("-")) {
fieldValue = getLastPartUntilDelimiter(contextData.valueContext, "-");
}
return fieldValue ? `${contextData.filterName}:${fieldValue}` : "*";
};
/**
* Generates a query string based on the provided context data.
*
* The function processes the input based on the `filterName` property:
*
* - If `filterName` is `_msg` or `_stream_id`, the query cannot be generated specifically,
* so a wildcard query (`"*"`) is returned.
*
* - If `filterName` is `_stream`, the query is generated using regexp (`{type=~"value.*"}`).
*
* - If `filterName` is `_time`, a simplified query is created by trimming the value up
* to the first occurrence of a delimiter such as `-` or `:`.
*
* - For all other values of `filterName`, a prefix query is returned using
* the `query` value with a `*` appended (e.g., `"value*"`).
*
* @param {ContextData} contextData - The context object containing query parameters and metadata.
* @returns {string} The generated query string.
*/
export const generateQuery = (contextData: ContextData): string => {
let fieldQuery = "";
if (!contextData.filterName || !contextData.query || ["_msg", "_stream_id"].includes(contextData.filterName)) {
fieldQuery = "*";
} else if ("_stream" === contextData.filterName) {
fieldQuery = getStreamFieldQuery(contextData.valueContext);
} else if ("_time" === contextData.filterName) {
fieldQuery = getDateQuery(contextData);
} else {
fieldQuery = `${contextData.filterName}:${contextData.valueContext}*`;
}
return contextData.queryBeforeIncompleteFilter ? `${contextData.queryBeforeIncompleteFilter}${contextData.separator ?? " "}${fieldQuery}` : fieldQuery;
};

View File

@@ -57,7 +57,7 @@ const QueryEditor: FC<QueryEditorProps> = ({
const [caretPositionInput, setCaretPositionInput] = useState<[number, number]>([0, 0]);
const autocompleteAnchorEl = useRef<HTMLInputElement>(null);
const [showAutocomplete, setShowAutocomplete] = useState(!!AutocompleteEl);
const [showAutocomplete, setShowAutocomplete] = useState(false);
const debouncedSetShowAutocomplete = useRef(debounce(setShowAutocomplete, 500)).current;
const warning = [
@@ -128,7 +128,7 @@ const QueryEditor: FC<QueryEditorProps> = ({
useEffect(() => {
setShowAutocomplete(false);
debouncedSetShowAutocomplete(true);
debouncedSetShowAutocomplete(caretPositionAutocomplete.every(Boolean));
}, [caretPositionAutocomplete]);
return (

Some files were not shown because too many files have changed in this diff Show More