Compare commits

...

76 Commits

Author SHA1 Message Date
Max Kotliar
b1003c0e65 docs/CHANGELOG.md: cut v1.121.0 2025-07-04 13:07:59 +00:00
Max Kotliar
4c5cde71eb app/vmselect: run make vmui-update 2025-07-04 12:44:00 +00:00
Max Kotliar
62611ee384 docs/victoriametrics/changelog: fix typo 2025-07-04 11:18:35 +00:00
Max Kotliar
3d7420d135 docs/victoriametrics/changelog: Add an update note on -retryMaxTime flag deprecation 2025-07-04 10:53:42 +00:00
f41gh7
9e9d697a31 docs/changelog: adjust changelog entry ordering
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-07-04 10:35:25 +02:00
Andrii Chubatiuk
f0859403da lib/promscrape: added label_limit global and per job scrape param (#7795)
This commit introduces new scrape config option - `label_limit`. Which defines labels count limit for time series.
Whole scrape will be discarded In case of limit breach.
 This option could also be defined with label `__label_limit__` during scrape relabeling process and globally for all scrape targets.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7660 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3233
2025-07-04 10:18:23 +02:00
Benjamin Nichols-Farquhar
63e133fe04 app/vmalert: introduce new flag replay.ruleEvaluationConcurrency
As discussed in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7387 we want a way to speed up execution within the replay of a single rule in vmalert.

This commit adds the flag `replay.singleRuleEvaluationConcurrency` which
limits the concurrency for requests within the replay of a single
rule.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7387
2025-07-04 10:13:43 +02:00
leiwingqueen
e049bdcfbd app/vmagent: rename flag remoteWrite.retryMaxTime
This commit renames flag `remoteWrite.retryMaxTime` into `remoteWrite.retryMaxInterval`. New name aligns with corresponding `MinInterval` flag. Previous flag name still could be used, but vmagent will log warning message with suggested migration.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9169
2025-07-03 18:28:51 +02:00
Vadim Alekseev
53b2082f53 deployment/docker: add configuration examples of vlagent for Docker
Modified the existing HA setups to use vlagent instead of writing to two
sources

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9328
2025-07-03 18:18:54 +02:00
Max Kotliar
d6a7dc0ea0 lib/fs: enhance MustReadAt panic message
enhance `MustReadAt` panic message to include filename for easier debugging of out-of-range read

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9106
2025-07-03 18:17:54 +02:00
f41gh7
28fb87baea lib/fs: make linter happy for windows build
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-07-03 18:12:16 +02:00
Andrei Baidarov
1799478438 apptest: Add an integration test for /graphite/metrics/index.json (#9357)
While working on #8134 it has been discovered that
`/graphite/index.json` always returns an empty result. This PR adds a
test and a fix. This fix is a no-op for the master but adding it here in
order to reduce the diff in #8134.

Co-authored-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
2025-07-03 15:14:11 +02:00
Olexandr88
dd40ed2456 docs: add links to codecov, release, A+, license icons (#9335)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-07-03 11:40:35 +02:00
Hui Wang
420f6b322e vmalert: fix alerts state restoration for alerting rule using templat… (#9348)
…ing in the labels

fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9305

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-07-03 11:16:00 +02:00
Andrii Chubatiuk
8eb88798ca lib/streamaggr: properly clean quantiles output state on flush (#9351)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9350

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-07-03 10:28:30 +02:00
Hui Wang
6a71c53c41 vmalert: respect group order defined in the rule file during replay m… (#9349)
…ode to allow chained group if needed

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9334
2025-07-03 10:26:09 +02:00
Phuong Le
746cea61d0 VictoriaLogs: Add metrics to track the retention.maxDiskSpaceUsageBytes flag (#9319)
- `vl_max_disk_space_usage_bytes{path}` reports the configured maximum
disk space usage limit for the storage path. It helps to show the limit
on the dashboard.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9320
2025-07-03 08:29:26 +02:00
Phuong Le
ef88302975 VictoriaLogs: add message processor metrics (#9316)
Add metrics to track the behavior of the message processor:

- `vl_insert_processors_count` tracks the current number of active log
message processors.
- `vl_insert_flush_duration_seconds{type="protocol"}` measures the time
taken to flush log data from recently parsed logs to the underlying
storage system (in-memory buffering, remote storage nodes, or local
storage).

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9320

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-03 08:07:39 +02:00
Phuong Le
857dd8edec VictoriaLogs: fix rate_sum() inconsistency between normal queries and vmalert recording rules (#9304)
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9303

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-07-03 07:25:08 +02:00
Mac G
fedb5bbc9a docs/readme: fix typo in metric name
Fix typo in metric name mentioned in doc: vm_rows_ignored_total
2025-07-02 19:37:16 +04:00
Zakhar Bessarab
96633a3c4e app/vmbackupmanager: use "lib/snapshot" for snapshot operations
Re-use existing `lib/snapshot` for snapshot operations in order to reduce code duplicates and unify supported options and error handling.

Previously, vmbackupmanager did not support `snapshot.tls*` options and did not handle errors in a same was as vmbackup does.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9340
2025-07-02 14:44:23 +02:00
Max Kotliar
1a1e4fe2d2 lib/license: improve error message when an empty license is provided (#911) 2025-07-02 14:06:36 +02:00
Yury Molodov
d455c8afa1 vmui: disable autocomplete popup on initial page load
When the page loads, the query field automatically receives focus, which
is expected. However, this also immediately opens the autocomplete
popup, triggering an unnecessary request and showing suggestions before
the user starts typing.

 This PR commit the autocomplete popup from opening on initial page
load.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9342
2025-07-02 13:53:55 +02:00
Nikolay
93843a1faf docs/release-guide: mention release candidate tagging process
This commits changes release process by introducing new intermediate
docker image tag with `-rcY` suffix. It's needed to properly test
releases for possible regressions without need to erase release docker
images. Which could be cached by some proxy and actually used in
production.

 Release candidate image must re-tagged into final release via make
command.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9136
2025-07-02 13:48:04 +02:00
Benjamin Nichols-Farquhar
994dadb4d5 lib/storage: Introduce vm_cache_eviction_bytes_total metric
This commit adds the following new metric `vm_cache_eviction_bytes_total` for tsid, metricName and metricIDs caches backed by workingset cache.

 The problem this is attempting to solve is doing fine grained tuning for
caches on large high churn, high query load VictoriaMetrics deployments.

Cache performance in those scenarios is critical for cluster
performance, however for stability reasons its imperative not to provide
_too much_ cache space compared to available memory or there becomes a
risk of OOM which can easily destabilize a cluster.

Given the cost of memory, especially for large deployments, there is
therefore a need to optimize cache usage to be near the minimum required
to provide acceptable performance under churn, ingestion and query
loads.

VM provides an excellent set of flags for configuring this cache
performance(size and expiry and removal percentage).

However specifically when tuning `prevCacheRemovalPercent` and
`cacheExpireDuration` it can be hard to known exactly what impact
they're having. Sometimes its very clear when theres cyclical behavior
that cacheExpireDuration is involved, but in other scenarios it can be
very unclear why cache miss rate goes up.

Specifically when attempting to debug spikes in cache miss rate there
are questions that are hard to answer:

* Did a new workload come in that was poorly cached?
* Were caches rotated due to expiry and the `prev` cache contained many
active entries due to limited cache size?
* Where the caches rotated due to miss percentage and our tuning is too
aggressive?

In many of these debugging scenarios it would be greatly helpful to have
explicit metrics on how many bytes of key caches were evicted and for
what reasons, it then becomes trivial to associate spikes in eviction
for a specific reason with a miss rate spike that causes poor
performance. The appropriate tuning them becomes much more obvious.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
2025-07-02 13:41:54 +02:00
hagen1778
469b337707 docs: refresh FAQ.md
Partially based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9244

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-07-01 14:20:42 +02:00
dependabot[bot]
2b131c13b5 build(deps): bump github.com/go-viper/mapstructure/v2 from 2.2.1 to 2.3.0 (#9299)
Bumps
[github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure)
from 2.2.1 to 2.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/go-viper/mapstructure/releases">github.com/go-viper/mapstructure/v2's
releases</a>.</em></p>
<blockquote>
<h2>v2.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>build(deps): bump actions/checkout from 4.1.7 to 4.2.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/46">go-viper/mapstructure#46</a></li>
<li>build(deps): bump golangci/golangci-lint-action from 6.1.0 to 6.1.1
by <a href="https://github.com/dependabot"><code>@​dependabot</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/47">go-viper/mapstructure#47</a></li>
<li>[enhancement] Add check for <code>reflect.Value</code> in
<code>ComposeDecodeHookFunc</code> by <a
href="https://github.com/mahadzaryab1"><code>@​mahadzaryab1</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/52">go-viper/mapstructure#52</a></li>
<li>build(deps): bump actions/setup-go from 5.0.2 to 5.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/51">go-viper/mapstructure#51</a></li>
<li>build(deps): bump actions/checkout from 4.2.0 to 4.2.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/50">go-viper/mapstructure#50</a></li>
<li>build(deps): bump actions/setup-go from 5.1.0 to 5.2.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/55">go-viper/mapstructure#55</a></li>
<li>build(deps): bump actions/setup-go from 5.2.0 to 5.3.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/58">go-viper/mapstructure#58</a></li>
<li>ci: add Go 1.24 to the test matrix by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/74">go-viper/mapstructure#74</a></li>
<li>build(deps): bump golangci/golangci-lint-action from 6.1.1 to 6.5.0
by <a href="https://github.com/dependabot"><code>@​dependabot</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/72">go-viper/mapstructure#72</a></li>
<li>build(deps): bump golangci/golangci-lint-action from 6.5.0 to 6.5.1
by <a href="https://github.com/dependabot"><code>@​dependabot</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/76">go-viper/mapstructure#76</a></li>
<li>build(deps): bump actions/setup-go from 5.3.0 to 5.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/78">go-viper/mapstructure#78</a></li>
<li>feat: add decode hook for netip.Prefix by <a
href="https://github.com/tklauser"><code>@​tklauser</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/85">go-viper/mapstructure#85</a></li>
<li>Updates by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/86">go-viper/mapstructure#86</a></li>
<li>build(deps): bump github/codeql-action from 2.13.4 to 3.28.15 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/87">go-viper/mapstructure#87</a></li>
<li>build(deps): bump actions/setup-go from 5.4.0 to 5.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/93">go-viper/mapstructure#93</a></li>
<li>build(deps): bump github/codeql-action from 3.28.15 to 3.28.17 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/92">go-viper/mapstructure#92</a></li>
<li>build(deps): bump github/codeql-action from 3.28.17 to 3.28.19 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/97">go-viper/mapstructure#97</a></li>
<li>build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/96">go-viper/mapstructure#96</a></li>
<li>Update README.md by <a
href="https://github.com/peczenyj"><code>@​peczenyj</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/90">go-viper/mapstructure#90</a></li>
<li>Add omitzero tag. by <a
href="https://github.com/Crystalix007"><code>@​Crystalix007</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/98">go-viper/mapstructure#98</a></li>
<li>Use error structs instead of duplicated strings by <a
href="https://github.com/m1k1o"><code>@​m1k1o</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/102">go-viper/mapstructure#102</a></li>
<li>build(deps): bump github/codeql-action from 3.28.19 to 3.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/101">go-viper/mapstructure#101</a></li>
<li>feat: add common error interface by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/105">go-viper/mapstructure#105</a></li>
<li>update linter by <a
href="https://github.com/sagikazarmark"><code>@​sagikazarmark</code></a>
in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/106">go-viper/mapstructure#106</a></li>
<li>Feature allow unset pointer by <a
href="https://github.com/rostislaved"><code>@​rostislaved</code></a> in
<a
href="https://redirect.github.com/go-viper/mapstructure/pull/80">go-viper/mapstructure#80</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/tklauser"><code>@​tklauser</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/85">go-viper/mapstructure#85</a></li>
<li><a href="https://github.com/peczenyj"><code>@​peczenyj</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/90">go-viper/mapstructure#90</a></li>
<li><a
href="https://github.com/Crystalix007"><code>@​Crystalix007</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/98">go-viper/mapstructure#98</a></li>
<li><a
href="https://github.com/rostislaved"><code>@​rostislaved</code></a>
made their first contribution in <a
href="https://redirect.github.com/go-viper/mapstructure/pull/80">go-viper/mapstructure#80</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0">https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8c61ec1924"><code>8c61ec1</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/80">#80</a>
from rostislaved/feature-allow-unset-pointer</li>
<li><a
href="df765f469a"><code>df765f4</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/106">#106</a>
from go-viper/update-linter</li>
<li><a
href="5f34b05aa1"><code>5f34b05</code></a>
update linter</li>
<li><a
href="36de1e1d74"><code>36de1e1</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/105">#105</a>
from go-viper/error-refactor</li>
<li><a
href="6a283a390e"><code>6a283a3</code></a>
chore: update error type doc</li>
<li><a
href="599cb73236"><code>599cb73</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/101">#101</a>
from go-viper/dependabot/github_actions/github/codeql...</li>
<li><a
href="ed3f921815"><code>ed3f921</code></a>
feat: remove value from error messages</li>
<li><a
href="a3f8b227dc"><code>a3f8b22</code></a>
revert: error message change</li>
<li><a
href="9661f6d07c"><code>9661f6d</code></a>
feat: add common error interface</li>
<li><a
href="f12f6c76fe"><code>f12f6c7</code></a>
Merge pull request <a
href="https://redirect.github.com/go-viper/mapstructure/issues/102">#102</a>
from m1k1o/prettify-errors2</li>
<li>Additional commits viewable in <a
href="https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/go-viper/mapstructure/v2&package-manager=go_modules&previous-version=2.2.1&new-version=2.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-01 12:59:21 +02:00
Olexandr88
aa342cea3b docs: add a link to the main, twitter, reddit icon (#9331)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-07-01 12:55:00 +02:00
Roman Khavronenko
26a2f4c25c deployment/docker: add troubleshooting section (#9329)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9323

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-07-01 12:53:02 +02:00
Andrii Chubatiuk
8a3a5e6867 app/vmalert: removed inline styles from UI, which are not working with recommended CSP (#9295)
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9236

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-07-01 12:50:49 +02:00
Phuong Le
9f9a1e1996 document/data-ingestion/journald: fix small typo (#9325)
s/`-journlad.streamFields`/`-journald.streamFields`
2025-07-01 10:46:05 +02:00
Roman Khavronenko
2d27404639 app/vmselect/promql: respect window interval for rollup functions when removing resets (#9273)
removeCounterResets was operating only using stalenessInterval (when set
by user) and didn't account for user-specified [window] in rollup
functions. Since in rollup functions MetricsQL could look behind for up
to [window] interval - it can capture those datapoints that weren't
accounted in removeCounterResets. With this change, we remove resets on
stalenessInterval+window interval to avoid this corner case.

Fixes
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-2978728661

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-07-01 10:17:39 +02:00
Nils K
91a05697fd docs: Update journald ingestion docs regarding suggested fields (#9275)
### Describe Your Changes

Point out a some things to watch out for with the default stream field
set and explain what the `_TRANSPORT` field is useful for

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Signed-off-by: Septatrix <24257556+septatrix@users.noreply.github.com>
2025-07-01 10:14:01 +02:00
hagen1778
e9571576b8 docs: add alias for vlogs keyConcepts page
GA noticed requests to /victorialogs/keyConcepts/ not being served properly

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-07-01 10:11:58 +02:00
Nikolay
314fed78cc app/vmagent: introduce concurrency setting for kafka producer (#906)
This commit adds a new option `concurrency` for kafka producer, which adds additional workers
for publishing messages into the kafka cluster.

 This setting could be useful at networks with high latency. And it increases write throughput.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9249
2025-06-30 18:21:47 +02:00
f41gh7
6751e43975 app/vmselect/promql: support search.logSlowQueryStatsHeaders for whitelisting header keys
Headers sent together with queries could contain important information
about the source of the request. This could help to find sources that send specific queries.

Logging all headers could expose sensitive info in logs.

So introducing `-search.logSlowQueryStatsHeaders` to whitelist headers that need
to be logged. See test case for example.
2025-06-30 18:21:39 +02:00
Andrii Chubatiuk
aaefe469c1 app/vmselect: proxy /api/v1/notifiers requests to vmalert (#9267)
Proxy /api/v1/notifiers requests to vmalert in the same manner, how it's done for other vmalert endpoints

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9267
2025-06-30 17:48:41 +02:00
Andrii Chubatiuk
fc33411d87 lib/awsapi: fixed static AWS credentials precedence
Previously lib/awsapi used incorrect precedence of authorization methods.

 This commit aligns this behaviour  with aws-sdk.

 Also, it fixes usage of `AWS_ROLE_ARN` env variable. It can only be used with web identity file and it must be ignored for any other configurations. 32873b942d/config/resolve_credentials.go (L121)

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9168
2025-06-30 17:47:02 +02:00
Nikolay
07574bdf7f lib/storage: properly optimise .+|^$ regex
Previously, regex `.+|^$` was optimised into `.+`. But in fact, it must
optimised into `.*`. Since this regex matches any non empty symbols and
empty value. Which combines into any input.

 This commit adds a special case of isDotStar check, which covers this
expression.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9290
2025-06-30 17:27:58 +02:00
Nikolay
e553a41fa0 app: add vlagent component
This commit introduces new component - VictoriaLogs Agent (vlagent).

It accepts logs data via any data ingestion protocol supported by
VictoriaLogs and forwards it to the provided remote storages.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766
2025-06-30 17:01:05 +02:00
Alexander Frolov
f9aa74c367 lib/mergeset: misuse of unsafe.Pointer
It appears the `lib/mergeset.Item` violates the 3rd rule
https://pkg.go.dev/unsafe#Pointer

> (3) Conversion of a Pointer to a uintptr and back, with arithmetic.
> 
> Unlike in C, it is not valid to advance a pointer just beyond the end
of its original allocation:
> ```
> // INVALID: end points outside allocated space.
> b := make([]byte, n)`
> end = unsafe.Pointer(uintptr(unsafe.Pointer(&b[0])) + uintptr(n))
> ```


`CGO_ENABLED=0 go test -trimpath -gcflags=all=-d=checkptr -run
TestTableAddItemsSerial`
```
fatal error: checkptr: pointer arithmetic result points to invalid allocation


 This commit adds a special path to item encoding, that prevents invalid memory references.

 See this PR for details:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9264
2025-06-30 16:56:28 +02:00
hagen1778
21878f0760 docs: mention vmui changes separately in vlogs and vm changelogs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-30 15:02:38 +02:00
hagen1778
aca321f235 docs: add changelog after 71eef98b0e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-30 14:56:20 +02:00
hagen1778
0b2e976706 dashboards: make dashboards-sync
follow-up after dc08b590c4

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-30 14:27:17 +02:00
Vadim Rutkovsky
dc08b590c4 dashboards: set equal heights for each row (#9298)
### Describe Your Changes
Several rows in cluster dashboard have 7 height, although the rest are 8
height. Setting all rows to 8 to be consistent


![Untitled](https://github.com/user-attachments/assets/16ba9601-9005-45f2-a01f-4b5fd59cd5b0)


### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Depends on: #9253
2025-06-30 14:26:32 +02:00
Artur Minchukou
71eef98b0e app/vmui: add crossorigin attribute to manifest links
The change should solve the problem with loading manifest when vmui is behind reverse-proxy
with basic auth enabled.
2025-06-30 14:25:40 +02:00
Artem Navoiev
6bc48ce669 docs: enterprise - clarify FIPS builds (#9310)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-06-30 14:24:20 +02:00
hagen1778
482ae8135f app/vmalert/notifier: follow-up a9b641531b
* reduce time wait on notifiers update
* deterministically sort notifiers() call. This improves and tests and output for users

a9b641531b
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-30 14:22:54 +02:00
Andrii Chubatiuk
393398996a docs/anomaly-detection: mark import_json_path as deprecated (#9283)
### Describe Your Changes

as noticed @f41gh7 in [operator
PR](https://github.com/VictoriaMetrics/operator/pull/1427#discussion_r2164171944)
vm writer `import_json_path` became deprecated starting
[v1.19.2](cc2cd0e084).
Adding this information to docs

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-27 15:51:53 +02:00
Mikhail
7703a95709 docs: fix typo in multi-region setup guide (#9297)
### Describe Your Changes

Just a typo correction in docs
2025-06-27 15:51:26 +02:00
Max Kotliar
114d657670 docs/victoriametrics/changelog: Add changelog for merged PR "Log errors during cache restore from file" (#9296)
### Describe Your Changes

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8952

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-27 15:49:56 +02:00
Hui Wang
a9b641531b vmalert: fix duplicated metrics for dynamically discovered notifier… (#9285)
…s via Consul and DNS

fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9260.

Bug was introduced in
[v1.114.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.114.0).
2025-06-27 15:49:10 +02:00
hagen1778
536df52682 dashboards: warn about correlation betweem concurrent requests and mem usage in panel description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-27 11:36:05 +02:00
Zakhar Bessarab
5143ad7674 lib/backup/s3remote: retry ExpiredToken errors (#9286)
Tolerate token expiration as it might be handled ty token rotation
automatically when using automatic token rotation with EKS Pod Identity
and similar options.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9280

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
2025-06-27 13:06:55 +04:00
Roman Khavronenko
6e7df578c4 app/vmctl: print import error before pipe error (#9288)
See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9220#issuecomment-3001880833:
```
2025-06-24T17:27:47.062Z      error VictoriaMetrics/app/vmctl/backoff/backoff.go:60 got error: failed to write into "http://vminsert-vmcluster.metrics.svc.cluster.local:8480/insert/0/prometheus": io: read/write on closed pipe on attempt: 1; will retry in 2s
```

This error could happen only if pipe-reader was closed before we tried
to write into pipe-writer. But in this case we won't see the error
explaining why pipe-reader was closed. In this change, we checking if
there was a pipe-reader error before checking pipe-writer error.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-27 09:23:12 +02:00
hagen1778
4a7b4ae852 docs: whitelist /admin/ requests in vmauth config for cluster VM
Besides `/select/`-prefixed requests vmauth also does `/admin/` requests
to fetch tenant IDs. Without whitelisting, such requests will fail to route.

I am adding this change only to generic vmauth configs where /admin/ access
is expected to be by default.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-27 09:18:54 +02:00
Jose Gómez-Sellés
65087f08c4 Migrate docs to cloud repo (#9230)
This is ready, since https://github.com/VictoriaMetrics/cloud/pull/3229
is merged .

This PR deletes the cloud docs folder after moving it into the private
cloud repo. The main reason is to keep things tidy and handle reviews in
a better way, as the helm charts repo is doing.

Future updates should be protected since the github actions file with
rsync command should not be messing with this folder when updating
everything.

### Checklist

The following checks are **mandatory**:

- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-26 12:55:25 +02:00
Artur Minchukou
ca0479fff3 app/vmui/logs: update the color of the VictoriaLogs favicon to make the color different from VictoriaMetrics (#9270)
### Describe Your Changes

Updated the favicon color for Victoria Logs that the icon could be
distinguished from Victoria Metrics. Took the same color as in
victorialogs-datasource plugin.

![image](https://github.com/user-attachments/assets/521fc747-8ea7-43c7-a719-5434fe39ab06)


### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-26 10:39:15 +02:00
Yury Molodov
644c7a97c8 victorialogs/docs: fix invalid stats example in "Comments" section (#9272)
### Describe Your Changes

The example in the VictoriaLogs docs (Comments section) is currently
missing an aggregate function in the `stats` pipe:

```logsql
error                       # find logs with `error` word
  | stats by (_stream) logs # then count the number of logs per `_stream` label
  | sort by (logs) desc     # then sort by the found logs in descending order
  | limit 5                 # and show top 5 streams with the biggest number of logs
```

However, `stats by (_stream) logs` is invalid syntax - `logs` is
interpreted as a function name, which causes a parsing error.

This fix replaces it with a valid version using `count()`:

```logsql
| stats by (_stream) count() as logs
```

Without `count()`, the query fails with:

```
 cannot parse 'stats' pipe: unknown stats func "logs"
```

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
2025-06-26 10:34:57 +02:00
Roman Khavronenko
098cba5b73 dashboards: fix adhoc filters for vmalert and vmagent (#9271)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8657

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-26 10:33:55 +02:00
Hui Wang
5d0e8c0d1b vmalert: fix data race in replay ut (#9278)
see https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9265
2025-06-26 15:15:35 +08:00
Aliaksandr Valialkin
442bfa6c35 lib/logstorage: add tests, which verify that NaN and Inf values cannot be parsed by tryParseFloat64
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8474
2025-06-25 23:00:56 +02:00
Aliaksandr Valialkin
ee031b21a7 docs/victorialogs/LogsQL.md: document that sum() and avg() returns NaN when all the field values are non-numeric
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8474
2025-06-25 22:51:42 +02:00
hagen1778
deec361a64 lib/prombpmarshal: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 22:06:45 +02:00
Roman Khavronenko
fd4dce81ce lib/prombp{marshal}: support metadata in remote write protocol (#9124)
This change adds support for parsing and sending Metadata field in
Prometheus Remote Write protocol. It implements the first step for
Metadata support.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 22:00:34 +02:00
hagen1778
5431696d83 docs: fix various typos and grammar errors
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 21:45:55 +02:00
hagen1778
1b71184bfb docs: fix unresolved link in cluster docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 21:39:35 +02:00
Phuong Le
29c06b9543 VictoriaLogs: add a High Availability section to the cluster documentation. (#9247) 2025-06-24 18:53:24 +02:00
hagen1778
369f3f0da1 docs: rm integrations section from single-node readme
* move netdata and carbon-api integrations to a dedicated `Integrations` page
* drop https://github.com/aorfanos/vmalert-cli as it seems stale

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 16:58:56 +02:00
hagen1778
6f93f0e1a7 docs: minor typo fix
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 16:51:29 +02:00
Roman Khavronenko
96b773198f docs: update vmctl docs (#9257)
* split migration mods in separate sub-section. This removes conflicting
#-anchors and makes it easier to read&modify in future
* remove duplicating wording
* simplify texts

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8964

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 16:48:13 +02:00
Andrii Chubatiuk
006381266b app/vmalert: add /api/v1/notifiers endpoint and datasource_type query argument filter for /api/v1/rules and /api/v1/alerts endpoints (#9046)
### Describe Your Changes

added /api/v1/notifiers endpoint and `datasource_type` query argument
for `/api/v1/rules` and `/api/v1/alerts` API endpoints to filter groups
and rules by datasource type. required for
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8989

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8537

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-06-24 16:41:38 +02:00
Max Kotliar
ae1dffe5d3 deployment/docker: Update grafana image version due to CVE issue (#9258)
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9207

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-24 16:31:53 +02:00
Nikolay
80e508eac7 lib/promscrape: remove duplicate targets from service-discovery
Previously, if annotation was update on object, it could result into
duplicate targets register for dropped targets service-discovery page.

 Mostly it affects endpoint annotations update. Endpoint holds
annotation with last update time. If any pod that belongs to the given
annotation changed, it causes duplication targets for all pod backed by
the endpoint. It makes service-discovery debug page hard to use.

 This commit excludes `__metadata_kubernetes_*_annotation_` from key
generation for dropped targets map. Instead it updates target with new
labels value.

  It may lead to some targets collision, but
since hash function already could produce collisions, it should not be a
problem.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8626
2025-06-24 11:32:29 +02:00
f41gh7
86d8095417 docs: mention v1.120.0 release
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-06-23 16:36:30 +02:00
f41gh7
df847e62c0 docs: mention new LTS releases
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-06-23 16:32:35 +02:00
259 changed files with 11774 additions and 5933 deletions

1
.gitignore vendored
View File

@@ -12,6 +12,7 @@
/victoria-logs-data
/victoria-metrics-data
/vmagent-remotewrite-data
/vlagent-remotewritewrite
/vmstorage-data
/vmselect-cache
/package/temp-deb-*

View File

@@ -11,6 +11,8 @@ ifeq ($(PKG_TAG),)
PKG_TAG := $(BUILDINFO_TAG)
endif
EXTRA_DOCKER_TAG_SUFFIX ?= EXTRA_DOCKER_TAG_SUFFIX
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(DATEINFO_TAG)-$(BUILDINFO_TAG)'
TAR_OWNERSHIP ?= --owner=1000 --group=1000
@@ -195,6 +197,31 @@ vmutils-crossbuild: \
vmutils-openbsd-amd64 \
vmutils-windows-amd64
publish-final-images:
PKG_TAG=$(TAG) APP_NAME=victoria-metrics $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmagent $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmalert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmalert-tool $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmauth $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmbackup $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmrestore $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) APP_NAME=vmctl $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-cluster APP_NAME=vminsert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-cluster APP_NAME=vmselect $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-cluster APP_NAME=vmstorage $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=victoria-metrics $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmagent $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmalert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmauth $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmbackup $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmrestore $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise-cluster APP_NAME=vminsert $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise-cluster APP_NAME=vmselect $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise-cluster APP_NAME=vmstorage $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmgateway $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG)-enterprise APP_NAME=vmbackupmanager $(MAKE) publish-via-docker-from-rc && \
PKG_TAG=$(TAG) $(MAKE) publish-latest
publish-latest:
PKG_TAG=$(TAG) APP_NAME=victoria-metrics $(MAKE) publish-via-docker-latest && \
PKG_TAG=$(TAG) APP_NAME=vmagent $(MAKE) publish-via-docker-latest && \
@@ -545,7 +572,7 @@ test-full:
test-full-386:
GOEXPERIMENT=synctest GOARCH=386 go test -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...
integration-test: victoria-metrics vmagent vmalert vmauth vmctl vmbackup vmrestore victoria-logs
integration-test: victoria-metrics vmagent vmalert vmauth vmctl vmbackup vmrestore victoria-logs vlagent
go test ./apptest/... -skip="^TestCluster.*"
benchmark:

View File

@@ -1,14 +1,14 @@
# VictoriaMetrics
![Latest Release](https://img.shields.io/github/v/release/VictoriaMetrics/VictoriaMetrics?sort=semver&label=&filter=!*-victorialogs&logo=github&labelColor=gray&color=gray&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Freleases%2Flatest)
[![Latest Release](https://img.shields.io/github/v/release/VictoriaMetrics/VictoriaMetrics?sort=semver&label=&filter=!*-victorialogs&logo=github&labelColor=gray&color=gray&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Freleases%2Flatest)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
![Docker Pulls](https://img.shields.io/docker/pulls/victoriametrics/victoria-metrics?label=&logo=docker&logoColor=white&labelColor=2496ED&color=2496ED&link=https%3A%2F%2Fhub.docker.com%2Fr%2Fvictoriametrics%2Fvictoria-metrics)
![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics?link=https%3A%2F%2Fgoreportcard.com%2Freport%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics)
![Build Status](https://github.com/VictoriaMetrics/VictoriaMetrics/actions/workflows/main.yml/badge.svg?branch=master&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Factions)
![codecov](https://codecov.io/gh/VictoriaMetrics/VictoriaMetrics/branch/master/graph/badge.svg?link=https%3A%2F%2Fcodecov.io%2Fgh%2FVictoriaMetrics%2FVictoriaMetrics)
![License](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics?labelColor=green&label=&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Fblob%2Fmaster%2FLICENSE)
[![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics?link=https%3A%2F%2Fgoreportcard.com%2Freport%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics)](https://goreportcard.com/report/github.com/VictoriaMetrics/VictoriaMetrics)
[![Build Status](https://github.com/VictoriaMetrics/VictoriaMetrics/actions/workflows/main.yml/badge.svg?branch=master&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Factions)](https://github.com/VictoriaMetrics/VictoriaMetrics/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/VictoriaMetrics/VictoriaMetrics/branch/master/graph/badge.svg?link=https%3A%2F%2Fcodecov.io%2Fgh%2FVictoriaMetrics%2FVictoriaMetrics)](https://app.codecov.io/gh/VictoriaMetrics/VictoriaMetrics)
[![License](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics?labelColor=green&label=&link=https%3A%2F%2Fgithub.com%2FVictoriaMetrics%2FVictoriaMetrics%2Fblob%2Fmaster%2FLICENSE)](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/LICENSE)
![Slack](https://img.shields.io/badge/Join-4A154B?logo=slack&link=https%3A%2F%2Fslack.victoriametrics.com)
![X](https://img.shields.io/twitter/follow/VictoriaMetrics?style=flat&label=Follow&color=black&logo=x&labelColor=black&link=https%3A%2F%2Fx.com%2FVictoriaMetrics)
![Reddit](https://img.shields.io/reddit/subreddit-subscribers/VictoriaMetrics?style=flat&label=Join&labelColor=red&logoColor=white&logo=reddit&link=https%3A%2F%2Fwww.reddit.com%2Fr%2FVictoriaMetrics)
[![X](https://img.shields.io/twitter/follow/VictoriaMetrics?style=flat&label=Follow&color=black&logo=x&labelColor=black&link=https%3A%2F%2Fx.com%2FVictoriaMetrics)](https://x.com/VictoriaMetrics/)
[![Reddit](https://img.shields.io/reddit/subreddit-subscribers/VictoriaMetrics?style=flat&label=Join&labelColor=red&logoColor=white&logo=reddit&link=https%3A%2F%2Fwww.reddit.com%2Fr%2FVictoriaMetrics)](https://www.reddit.com/r/VictoriaMetrics/)
<picture>
<source srcset="docs/victoriametrics/logo_white.webp" media="(prefers-color-scheme: dark)">

106
app/vlagent/Makefile Normal file
View File

@@ -0,0 +1,106 @@
# All these commands must run from repository root.
vlagent:
APP_NAME=vlagent $(MAKE) app-local
vlagent-race:
APP_NAME=vlagent RACE=-race $(MAKE) app-local
vlagent-prod:
APP_NAME=vlagent $(MAKE) app-via-docker
vlagent-pure-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-pure
vlagent-linux-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-amd64
vlagent-linux-arm-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-arm
vlagent-linux-arm64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-arm64
vlagent-linux-ppc64le-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-ppc64le
vlagent-linux-386-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-linux-386
vlagent-darwin-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-darwin-amd64
vlagent-darwin-arm64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-darwin-arm64
vlagent-freebsd-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-freebsd-amd64
vlagent-openbsd-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-openbsd-amd64
vlagent-windows-amd64-prod:
APP_NAME=vlagent $(MAKE) app-via-docker-windows-amd64
package-vlagent:
APP_NAME=vlagent $(MAKE) package-via-docker
package-vlagent-pure:
APP_NAME=vlagent $(MAKE) package-via-docker-pure
package-vlagent-amd64:
APP_NAME=vlagent $(MAKE) package-via-docker-amd64
package-vlagent-arm:
APP_NAME=vlagent $(MAKE) package-via-docker-arm
package-vlagent-arm64:
APP_NAME=vlagent $(MAKE) package-via-docker-arm64
package-vlagent-ppc64le:
APP_NAME=vlagent $(MAKE) package-via-docker-ppc64le
package-vlagent-386:
APP_NAME=vlagent $(MAKE) package-via-docker-386
publish-vlagent:
APP_NAME=vlagent $(MAKE) publish-via-docker
vlagent-linux-amd64:
APP_NAME=vlagent CGO_ENABLED=1 GOOS=linux GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-linux-arm:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=arm $(MAKE) app-local-goos-goarch
vlagent-linux-arm64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=arm64 $(MAKE) app-local-goos-goarch
vlagent-linux-ppc64le:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vlagent-linux-s390x:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vlagent-linux-loong64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=loong64 $(MAKE) app-local-goos-goarch
vlagent-linux-386:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch
vlagent-darwin-amd64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-darwin-arm64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 $(MAKE) app-local-goos-goarch
vlagent-freebsd-amd64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=freebsd GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-openbsd-amd64:
APP_NAME=vlagent CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 $(MAKE) app-local-goos-goarch
vlagent-windows-amd64:
GOARCH=amd64 APP_NAME=vlagent $(MAKE) app-local-windows-goarch
vlagent-pure:
APP_NAME=vlagent $(MAKE) app-local-pure

3
app/vlagent/README.md Normal file
View File

@@ -0,0 +1,3 @@
See vlagent docs [here](https://docs.victoriametrics.com/victorialogs/vlagent/).
vlagent docs can be edited at [docs/vlagent.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victorialogs/vlagent.md).

View File

@@ -0,0 +1,8 @@
ARG base_image=non-existing
FROM $base_image
EXPOSE 9429
ENTRYPOINT ["/vlagent-prod"]
ARG src_binary=non-existing
COPY $src_binary ./vlagent-prod

97
app/vlagent/main.go Normal file
View File

@@ -0,0 +1,97 @@
package main
import (
"flag"
"fmt"
"net/http"
"os"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
)
var (
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vlagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
)
func main() {
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
buildinfo.Init()
remotewrite.InitSecretFlags()
logger.Init()
remotewrite.Init()
vlinsert.Init()
insertutil.SetLogRowsStorage(&remotewrite.Storage{})
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":9429"}
}
logger.Infof("starting vlagent at %q...", listenAddrs)
startTime := time.Now()
go httpserver.Serve(listenAddrs, requestHandler, httpserver.ServeOptions{
UseProxyProtocol: useProxyProtocol,
})
logger.Infof("started vlagent in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
sig := procutil.WaitForSigterm()
logger.Infof("received signal %s", sig)
pushmetrics.Stop()
startTime = time.Now()
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
vlinsert.Stop()
remotewrite.Stop()
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
logger.Infof("successfully stopped vlagent in %.3f seconds", time.Since(startTime).Seconds())
}
// RequestHandler handles insert requests for VictoriaLogs
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")
fmt.Fprintf(w, "<h2>vlagent</h2>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/victorialogs/vlagent/'>https://docs.victoriametrics.com/victorialogs/vlagent/</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"metrics", "available service metrics"},
{"flags", "command-line flags"},
})
return true
}
return vlinsert.RequestHandler(w, r)
}
func usage() {
const s = `
vlagent collects logs via popular data ingestion protocols and routes it to VictoriaLogs.
See the docs at https://docs.victoriametrics.com/victorialogs/vlagent/ .
`
flagutil.Usage(s)
}

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image=non-existing
ARG root_image=non-existing
FROM $certs_image AS certs
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 9429
ENTRYPOINT ["/vlagent-prod"]
ARG TARGETARCH
COPY vlagent-linux-${TARGETARCH}-prod ./vlagent-prod

View File

@@ -0,0 +1,462 @@
package remotewrite
import (
"bytes"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strconv"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
)
var (
rateLimit = flagutil.NewArrayInt("remoteWrite.rateLimit", 0, "Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. "+
"By default, the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data ")
sendTimeout = flagutil.NewArrayDuration("remoteWrite.sendTimeout", time.Minute, "Timeout for sending a single block of data to the corresponding -remoteWrite.url")
retryMinInterval = flagutil.NewArrayDuration("remoteWrite.retryMinInterval", time.Second, "The minimum delay between retry attempts to send a block of data to the corresponding -remoteWrite.url. Every next retry attempt will double the delay to prevent hammering of remote database. See also -remoteWrite.retryMaxTime")
retryMaxTime = flagutil.NewArrayDuration("remoteWrite.retryMaxTime", time.Minute, "The max time spent on retry attempts to send a block of data to the corresponding -remoteWrite.url. Change this value if it is expected for -remoteWrite.url to be unreachable for more than -remoteWrite.retryMaxTime. See also -remoteWrite.retryMinInterval")
proxyURL = flagutil.NewArrayString("remoteWrite.proxyURL", "Optional proxy URL for writing data to the corresponding -remoteWrite.url. "+
"Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234")
tlsHandshakeTimeout = flagutil.NewArrayDuration("remoteWrite.tlsHandshakeTimeout", 20*time.Second, "The timeout for establishing tls connections to the corresponding -remoteWrite.url")
tlsInsecureSkipVerify = flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify", "Whether to skip tls verification when connecting to the corresponding -remoteWrite.url")
tlsCertFile = flagutil.NewArrayString("remoteWrite.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -remoteWrite.url")
tlsKeyFile = flagutil.NewArrayString("remoteWrite.tlsKeyFile", "Optional path to client-side TLS certificate key to use when connecting to the corresponding -remoteWrite.url")
tlsCAFile = flagutil.NewArrayString("remoteWrite.tlsCAFile", "Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. "+
"By default, system CA is used")
tlsServerName = flagutil.NewArrayString("remoteWrite.tlsServerName", "Optional TLS server name to use for connections to the corresponding -remoteWrite.url. "+
"By default, the server name from -remoteWrite.url is used")
headers = flagutil.NewArrayString("remoteWrite.headers", "Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. "+
"For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. "+
"Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'")
basicAuthUsername = flagutil.NewArrayString("remoteWrite.basicAuth.username", "Optional basic auth username to use for the corresponding -remoteWrite.url")
basicAuthPassword = flagutil.NewArrayString("remoteWrite.basicAuth.password", "Optional basic auth password to use for the corresponding -remoteWrite.url")
basicAuthPasswordFile = flagutil.NewArrayString("remoteWrite.basicAuth.passwordFile", "Optional path to basic auth password to use for the corresponding -remoteWrite.url. "+
"The file is re-read every second")
bearerToken = flagutil.NewArrayString("remoteWrite.bearerToken", "Optional bearer auth token to use for the corresponding -remoteWrite.url")
bearerTokenFile = flagutil.NewArrayString("remoteWrite.bearerTokenFile", "Optional path to bearer token file to use for the corresponding -remoteWrite.url. "+
"The token is re-read from the file every second")
oauth2ClientID = flagutil.NewArrayString("remoteWrite.oauth2.clientID", "Optional OAuth2 clientID to use for the corresponding -remoteWrite.url")
oauth2ClientSecret = flagutil.NewArrayString("remoteWrite.oauth2.clientSecret", "Optional OAuth2 clientSecret to use for the corresponding -remoteWrite.url")
oauth2ClientSecretFile = flagutil.NewArrayString("remoteWrite.oauth2.clientSecretFile", "Optional OAuth2 clientSecretFile to use for the corresponding -remoteWrite.url")
oauth2EndpointParams = flagutil.NewArrayString("remoteWrite.oauth2.endpointParams", "Optional OAuth2 endpoint parameters to use for the corresponding -remoteWrite.url . "+
`The endpoint parameters must be set in JSON format: {"param1":"value1",...,"paramN":"valueN"}`)
oauth2TokenURL = flagutil.NewArrayString("remoteWrite.oauth2.tokenUrl", "Optional OAuth2 tokenURL to use for the corresponding -remoteWrite.url")
oauth2Scopes = flagutil.NewArrayString("remoteWrite.oauth2.scopes", "Optional OAuth2 scopes to use for the corresponding -remoteWrite.url. Scopes must be delimited by ';'")
)
type client struct {
sanitizedURL string
remoteWriteURL string
fq *persistentqueue.FastQueue
hc *http.Client
retryMinInterval time.Duration
retryMaxTime time.Duration
sendBlock func(block []byte) bool
authCfg *promauth.Config
rl *ratelimiter.RateLimiter
bytesSent *metrics.Counter
blocksSent *metrics.Counter
requestDuration *metrics.Histogram
requestsOKCount *metrics.Counter
errorsCount *metrics.Counter
packetsDropped *metrics.Counter
rateLimit *metrics.Gauge
retriesCount *metrics.Counter
sendDuration *metrics.FloatCounter
wg sync.WaitGroup
stopCh chan struct{}
}
func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Fatalf("cannot initialize auth config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tr := httputil.NewTransport(false, "vlagent_remotewrite")
tr.TLSHandshakeTimeout = tlsHandshakeTimeout.GetOptionalArg(argIdx)
tr.MaxConnsPerHost = 2 * concurrency
tr.MaxIdleConnsPerHost = 2 * concurrency
tr.IdleConnTimeout = time.Minute
tr.WriteBufferSize = 64 * 1024
pURL := proxyURL.GetOptionalArg(argIdx)
if len(pURL) > 0 {
if !strings.Contains(pURL, "://") {
logger.Fatalf("cannot parse -remoteWrite.proxyURL=%q: it must start with `http://`, `https://` or `socks5://`", pURL)
}
pu, err := url.Parse(pURL)
if err != nil {
logger.Fatalf("cannot parse -remoteWrite.proxyURL=%q: %s", pURL, err)
}
tr.Proxy = http.ProxyURL(pu)
}
hc := &http.Client{
Transport: authCfg.NewRoundTripper(tr),
Timeout: sendTimeout.GetOptionalArg(argIdx),
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authCfg: authCfg,
fq: fq,
hc: hc,
retryMinInterval: retryMinInterval.GetOptionalArg(argIdx),
retryMaxTime: retryMaxTime.GetOptionalArg(argIdx),
stopCh: make(chan struct{}),
}
c.sendBlock = c.sendBlockHTTP
return c
}
func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
limitReached := metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_rate_limit_reached_total{url=%q}`, c.sanitizedURL))
if bytesPerSec := rateLimit.GetOptionalArg(argIdx); bytesPerSec > 0 {
logger.Infof("applying %d bytes per second rate limit for -remoteWrite.url=%q", bytesPerSec, sanitizedURL)
c.rl = ratelimiter.New(int64(bytesPerSec), limitReached, c.stopCh)
}
c.bytesSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_bytes_sent_total{url=%q}`, c.sanitizedURL))
c.blocksSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_blocks_sent_total{url=%q}`, c.sanitizedURL))
c.rateLimit = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_rate_limit{url=%q}`, c.sanitizedURL), func() float64 {
return float64(rateLimit.GetOptionalArg(argIdx))
})
c.requestDuration = metrics.GetOrCreateHistogram(fmt.Sprintf(`vlagent_remotewrite_duration_seconds{url=%q}`, c.sanitizedURL))
c.requestsOKCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_requests_total{url=%q, status_code="2XX"}`, c.sanitizedURL))
c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL))
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
c.sendDuration = metrics.GetOrCreateFloatCounter(fmt.Sprintf(`vlagent_remotewrite_send_duration_seconds_total{url=%q}`, c.sanitizedURL))
metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_queues{url=%q}`, c.sanitizedURL), func() float64 {
return float64(*queues)
})
for i := 0; i < concurrency; i++ {
c.wg.Add(1)
go func() {
defer c.wg.Done()
c.runWorker()
}()
}
logger.Infof("initialized client for -remoteWrite.url=%q", c.sanitizedURL)
}
func (c *client) MustStop() {
close(c.stopCh)
c.wg.Wait()
logger.Infof("stopped client for -remoteWrite.url=%q", c.sanitizedURL)
}
func getAuthConfig(argIdx int) (*promauth.Config, error) {
headersValue := headers.GetOptionalArg(argIdx)
var hdrs []string
if headersValue != "" {
hdrs = strings.Split(headersValue, "^^")
}
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
passwordFile := basicAuthPasswordFile.GetOptionalArg(argIdx)
var basicAuthCfg *promauth.BasicAuthConfig
if username != "" || password != "" || passwordFile != "" {
basicAuthCfg = &promauth.BasicAuthConfig{
Username: username,
Password: promauth.NewSecret(password),
PasswordFile: passwordFile,
}
}
token := bearerToken.GetOptionalArg(argIdx)
tokenFile := bearerTokenFile.GetOptionalArg(argIdx)
var oauth2Cfg *promauth.OAuth2Config
clientSecret := oauth2ClientSecret.GetOptionalArg(argIdx)
clientSecretFile := oauth2ClientSecretFile.GetOptionalArg(argIdx)
if clientSecretFile != "" || clientSecret != "" {
endpointParamsJSON := oauth2EndpointParams.GetOptionalArg(argIdx)
endpointParams, err := flagutil.ParseJSONMap(endpointParamsJSON)
if err != nil {
return nil, fmt.Errorf("cannot parse JSON for -remoteWrite.oauth2.endpointParams=%s: %w", endpointParamsJSON, err)
}
oauth2Cfg = &promauth.OAuth2Config{
ClientID: oauth2ClientID.GetOptionalArg(argIdx),
ClientSecret: promauth.NewSecret(clientSecret),
ClientSecretFile: clientSecretFile,
EndpointParams: endpointParams,
TokenURL: oauth2TokenURL.GetOptionalArg(argIdx),
Scopes: strings.Split(oauth2Scopes.GetOptionalArg(argIdx), ";"),
}
}
tlsCfg := &promauth.TLSConfig{
CAFile: tlsCAFile.GetOptionalArg(argIdx),
CertFile: tlsCertFile.GetOptionalArg(argIdx),
KeyFile: tlsKeyFile.GetOptionalArg(argIdx),
ServerName: tlsServerName.GetOptionalArg(argIdx),
InsecureSkipVerify: tlsInsecureSkipVerify.GetOptionalArg(argIdx),
}
opts := &promauth.Options{
BasicAuth: basicAuthCfg,
BearerToken: token,
BearerTokenFile: tokenFile,
OAuth2: oauth2Cfg,
TLSConfig: tlsCfg,
Headers: hdrs,
}
authCfg, err := opts.NewConfig()
if err != nil {
return nil, fmt.Errorf("cannot populate auth config for remoteWrite idx: %d, err: %w", argIdx, err)
}
return authCfg, nil
}
func (c *client) runWorker() {
var ok bool
var block []byte
ch := make(chan bool, 1)
for {
block, ok = c.fq.MustReadBlock(block[:0])
if !ok {
return
}
if len(block) == 0 {
// skip empty data blocks from sending
continue
}
go func() {
startTime := time.Now()
ch <- c.sendBlock(block)
c.sendDuration.Add(time.Since(startTime).Seconds())
}()
select {
case ok := <-ch:
if ok {
// The block has been sent successfully
continue
}
// Return unsent block to the queue.
c.fq.MustWriteBlockIgnoreDisabledPQ(block)
return
case <-c.stopCh:
// c must be stopped. Wait for a while in the hope the block will be sent.
graceDuration := 5 * time.Second
select {
case ok := <-ch:
if !ok {
// Return unsent block to the queue.
c.fq.MustWriteBlockIgnoreDisabledPQ(block)
}
case <-time.After(graceDuration):
// Return unsent block to the queue.
c.fq.MustWriteBlockIgnoreDisabledPQ(block)
}
return
}
}
}
func (c *client) doRequest(url string, body []byte) (*http.Response, error) {
req, err := c.newRequest(url, body)
if err != nil {
return nil, err
}
resp, err := c.hc.Do(req)
if err == nil {
return resp, nil
}
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
return nil, err
}
// It is likely connection became stale or timed out during the first request.
// Make another attempt in hope request will succeed.
// If not, the error should be handled by the caller as usual.
// This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139
req, err = c.newRequest(url, body)
if err != nil {
return nil, fmt.Errorf("second attempt: %w", err)
}
resp, err = c.hc.Do(req)
if err != nil {
return nil, fmt.Errorf("second attempt: %w", err)
}
return resp, nil
}
func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
reqBody := bytes.NewBuffer(body)
req, err := http.NewRequest(http.MethodPost, url, reqBody)
if err != nil {
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", url, err)
}
err = c.authCfg.SetHeaders(req, true)
if err != nil {
return nil, err
}
h := req.Header
h.Set("User-Agent", "vlagent")
h.Set("Content-Encoding", "zstd")
h.Set("Content-Type", "application/octet-stream")
return req, nil
}
// sendBlockHTTP sends the given block to c.remoteWriteURL.
//
// The function returns false only if c.stopCh is closed.
// Otherwise, it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.Register(len(block))
maxRetryDuration := timeutil.AddJitterToDuration(c.retryMaxTime)
retryDuration := timeutil.AddJitterToDuration(c.retryMinInterval)
retriesCount := 0
again:
startTime := time.Now()
resp, err := c.doRequest(c.remoteWriteURL, block)
c.requestDuration.UpdateDuration(startTime)
if err != nil {
c.errorsCount.Inc()
retryDuration *= 2
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
remoteWriteRetryLogger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
t := timerpool.Get(retryDuration)
select {
case <-c.stopCh:
timerpool.Put(t)
return false
case <-t.C:
timerpool.Put(t)
}
c.retriesCount.Inc()
goto again
}
statusCode := resp.StatusCode
if statusCode/100 == 2 {
_ = resp.Body.Close()
c.requestsOKCount.Inc()
c.bytesSent.Add(len(block))
c.blocksSent.Inc()
return true
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vlagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode == 400 || statusCode == 404 {
logBlockRejected(block, c.sanitizedURL, resp)
_ = resp.Body.Close()
c.packetsDropped.Inc()
return true
}
// Unexpected status code returned
retriesCount++
retryAfterHeader := parseRetryAfterHeader(resp.Header.Get("Retry-After"))
retryDuration = getRetryDuration(retryAfterHeader, retryDuration, maxRetryDuration)
// Handle response
body, err := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if err != nil {
logger.Errorf("cannot read response body from %q during retry #%d: %s", c.sanitizedURL, retriesCount, err)
} else {
logger.Errorf("unexpected status code received after sending a block with size %d bytes to %q during retry #%d: %d; response body=%q; "+
"re-sending the block in %.3f seconds", len(block), c.sanitizedURL, retriesCount, statusCode, body, retryDuration.Seconds())
}
t := timerpool.Get(retryDuration)
select {
case <-c.stopCh:
timerpool.Put(t)
return false
case <-t.C:
timerpool.Put(t)
}
c.retriesCount.Inc()
goto again
}
var remoteWriteRejectedLogger = logger.WithThrottler("remoteWriteRejected", 5*time.Second)
var remoteWriteRetryLogger = logger.WithThrottler("remoteWriteRetry", 5*time.Second)
// getRetryDuration returns retry duration.
// retryAfterDuration has the highest priority.
// If retryAfterDuration is not specified, retryDuration gets doubled.
// retryDuration can't exceed maxRetryDuration.
//
// Also see: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6097
func getRetryDuration(retryAfterDuration, retryDuration, maxRetryDuration time.Duration) time.Duration {
// retryAfterDuration has the highest priority duration
if retryAfterDuration > 0 {
return timeutil.AddJitterToDuration(retryAfterDuration)
}
// default backoff retry policy
retryDuration *= 2
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
return retryDuration
}
func logBlockRejected(block []byte, sanitizedURL string, resp *http.Response) {
body, err := io.ReadAll(resp.Body)
if err != nil {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; "+
"failed to read response body: %s",
len(block), sanitizedURL, resp.StatusCode, err)
} else {
remoteWriteRejectedLogger.Errorf("sending a block with size %d bytes to %q was rejected (skipping the block): status code %d; response body: %s",
len(block), sanitizedURL, resp.StatusCode, string(body))
}
}
// parseRetryAfterHeader parses `Retry-After` value retrieved from HTTP response header.
// retryAfterString should be in either HTTP-date or a number of seconds.
// It will return time.Duration(0) if `retryAfterString` does not follow RFC 7231.
func parseRetryAfterHeader(retryAfterString string) (retryAfterDuration time.Duration) {
if retryAfterString == "" {
return retryAfterDuration
}
defer func() {
v := retryAfterDuration.Seconds()
logger.Infof("'Retry-After: %s' parsed into %.2f second(s)", retryAfterString, v)
}()
// Retry-After could be in "Mon, 02 Jan 2006 15:04:05 GMT" format.
if parsedTime, err := time.Parse(http.TimeFormat, retryAfterString); err == nil {
return time.Duration(time.Until(parsedTime).Seconds()) * time.Second
}
// Retry-After could be in seconds.
if seconds, err := strconv.Atoi(retryAfterString); err == nil {
return time.Duration(seconds) * time.Second
}
return 0
}

View File

@@ -0,0 +1,99 @@
package remotewrite
import (
"math"
"net/http"
"testing"
"time"
)
func TestCalculateRetryDuration(t *testing.T) {
// `testFunc` call `calculateRetryDuration` for `n` times
// and evaluate if the result of `calculateRetryDuration` is
// 1. >= expectMinDuration
// 2. <= expectMinDuration + 10% (see timeutil.AddJitterToDuration)
f := func(retryAfterDuration, retryDuration time.Duration, n int, expectMinDuration time.Duration) {
t.Helper()
for i := 0; i < n; i++ {
retryDuration = getRetryDuration(retryAfterDuration, retryDuration, time.Minute)
}
expectMaxDuration := helper(expectMinDuration)
expectMinDuration = expectMinDuration - (1000 * time.Millisecond) // Avoid edge case when calculating time.Until(now)
if !(retryDuration >= expectMinDuration && retryDuration <= expectMaxDuration) {
t.Fatalf(
"incorrect retry duration, want (ms): [%d, %d], got (ms): %d",
expectMinDuration.Milliseconds(), expectMaxDuration.Milliseconds(),
retryDuration.Milliseconds(),
)
}
}
// Call calculateRetryDuration for 1 time.
{
// default backoff policy
f(0, time.Second, 1, 2*time.Second)
// default backoff policy exceed max limit"
f(0, 10*time.Minute, 1, time.Minute)
// retry after > default backoff policy
f(10*time.Second, 1*time.Second, 1, 10*time.Second)
// retry after < default backoff policy
f(1*time.Second, 10*time.Second, 1, 1*time.Second)
// retry after invalid and < default backoff policy
f(0, time.Second, 1, 2*time.Second)
}
// Call calculateRetryDuration for multiple times.
{
// default backoff policy 2 times
f(0, time.Second, 2, 4*time.Second)
// default backoff policy 3 times
f(0, time.Second, 3, 8*time.Second)
// default backoff policy N times exceed max limit
f(0, time.Second, 10, time.Minute)
// retry after 120s 1 times
f(120*time.Second, time.Second, 1, 120*time.Second)
// retry after 120s 2 times
f(120*time.Second, time.Second, 2, 120*time.Second)
}
}
func TestParseRetryAfterHeader(t *testing.T) {
f := func(retryAfterString string, expectResult time.Duration) {
t.Helper()
result := parseRetryAfterHeader(retryAfterString)
// expect `expectResult == result` when retryAfterString is in seconds or invalid
// expect the difference between result and expectResult to be lower than 10%
if !(expectResult == result || math.Abs(float64(expectResult-result))/float64(expectResult) < 0.10) {
t.Fatalf(
"incorrect retry after duration, want (ms): %d, got (ms): %d",
expectResult.Milliseconds(), result.Milliseconds(),
)
}
}
// retry after header in seconds
f("10", 10*time.Second)
// retry after header in date time
f(time.Now().Add(30*time.Second).UTC().Format(http.TimeFormat), 30*time.Second)
// retry after header invalid
f("invalid-retry-after", 0)
// retry after header not in GMT
f(time.Now().Add(10*time.Second).Format("Mon, 02 Jan 2006 15:04:05 FAKETZ"), 0)
}
// helper calculate the max possible time duration calculated by timeutil.AddJitterToDuration.
func helper(d time.Duration) time.Duration {
dv := d / 10
if dv > 10*time.Second {
dv = 10 * time.Second
}
return d + dv
}

View File

@@ -0,0 +1,158 @@
package remotewrite
import (
"flag"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
)
var (
maxUnpackedBlockSize = flagutil.NewBytes("remoteWrite.maxBlockSize", 8*1024*1024, "The maximum block size to send to remote storage. Bigger blocks may improve performance at the cost of the increased memory usage.")
flushInterval = flag.Duration("remoteWrite.flushInterval", time.Second, "Interval for flushing the data to remote storage. "+
"This option takes effect only when less than 2MB of data per second are pushed to -remoteWrite.url")
)
type pendingLogs struct {
lastFlushTime atomic.Uint64
// The queue to send blocks to.
fq *persistentqueue.FastQueue
// mu protects wr
mu sync.Mutex
wr writeRequest
stopCh chan struct{}
periodicFlusherWG sync.WaitGroup
}
func newPendingLogs(fq *persistentqueue.FastQueue) *pendingLogs {
pl := &pendingLogs{
fq: fq,
stopCh: make(chan struct{}),
}
pl.periodicFlusherWG.Add(1)
go func() {
defer pl.periodicFlusherWG.Done()
pl.periodicFlusher()
}()
return pl
}
func (pl *pendingLogs) add(lr *logstorage.LogRows) {
lr.ForEachRow(func(_ uint64, r *logstorage.InsertRow) {
pl.addLogRow(r)
})
}
func (pl *pendingLogs) addLogRow(r *logstorage.InsertRow) {
bb := bbPool.Get()
bb.B = r.Marshal(bb.B)
pl.mu.Lock()
_, _ = pl.wr.pendingData.Write(bb.B)
pl.wr.pendingLogRowsCount++
if len(pl.wr.pendingData.B) > maxUnpackedBlockSize.IntN() {
pl.mustFlushLocked()
}
pl.mu.Unlock()
bbPool.Put(bb)
}
func (pl *pendingLogs) mustFlushLocked() {
pl.lastFlushTime.Store(fasttime.UnixTimestamp())
pl.wr.push(func(b []byte) {
if !pl.fq.TryWriteBlock(b) {
logger.Fatalf("BUG: TryWriteBlock cannot return false")
}
})
pl.wr.reset()
}
func (pl *pendingLogs) periodicFlusher() {
flushSeconds := int64(flushInterval.Seconds())
if flushSeconds <= 0 {
flushSeconds = 1
}
d := timeutil.AddJitterToDuration(*flushInterval)
ticker := time.NewTicker(d)
defer ticker.Stop()
for {
select {
case <-pl.stopCh:
pl.mu.Lock()
pl.mustFlushOnStop()
pl.mu.Unlock()
return
case <-ticker.C:
if fasttime.UnixTimestamp()-pl.lastFlushTime.Load() < uint64(flushSeconds) {
continue
}
}
pl.mu.Lock()
pl.mustFlushLocked()
pl.mu.Unlock()
}
}
// mustFlushOnStop force pushes wr data
//
// This is needed in order to properly save in-memory data to persistent queue on graceful shutdown.
func (pl *pendingLogs) mustFlushOnStop() {
pl.wr.push(pl.fq.MustWriteBlockIgnoreDisabledPQ)
pl.wr.reset()
}
func (pl *pendingLogs) mustStop() {
close(pl.stopCh)
pl.periodicFlusherWG.Wait()
}
type writeRequest struct {
pendingData bytesutil.ByteBuffer
pendingLogRowsCount int64
}
func (wr *writeRequest) push(pushBlock func([]byte)) {
if len(wr.pendingData.B) == 0 {
return
}
b := wr.pendingData.B
zb := compressBufPool.Get()
zb.B = zstd.CompressLevel(zb.B[:0], b, 1)
zbLen := len(zb.B)
pushBlock(zb.B)
compressBufPool.Put(zb)
blockSizeBytes.Update(float64(zbLen))
blockSizeLogRows.Update(float64(wr.pendingLogRowsCount))
}
func (wr *writeRequest) reset() {
wr.pendingData.Reset()
wr.pendingLogRowsCount = 0
}
var (
blockSizeBytes = metrics.NewHistogram(`vlagent_remotewrite_block_size_bytes`)
blockSizeLogRows = metrics.NewHistogram(`vlagent_remotewrite_block_size_rows`)
)
var (
compressBufPool bytesutil.ByteBufferPool
bbPool bytesutil.ByteBufferPool
)

View File

@@ -0,0 +1,277 @@
package remotewrite
import (
"flag"
"fmt"
"net/url"
"path/filepath"
"sync"
"sync/atomic"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage/netinsert"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
)
var (
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support VictoriaLogs native protocol. "+
"Example url: http://<victorialogs-host>:9428/internal/insert. "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems.")
maxPendingBytesPerURL = flagutil.NewArrayBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
"for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. "+
"Buffered data is stored in ~500MB chunks. It is recommended to set the value for this flag to a multiple of the block size 500MB. "+
"Disk usage is unlimited if the value is set to 0")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vlagent-remotewrite-data", "Path to directory for storing pending data, which isn't sent to the configured -remoteWrite.url . "+
"See also -remoteWrite.maxDiskUsagePerURL")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. "+
"Default value depends on the number of available CPU cores. It should work fine in most cases since it minimizes resource usage")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
)
// rwctxsGlobal contains statically populated entries when -remoteWrite.url is specified.
var rwctxsGlobal []*remoteWriteCtx
// Storage implements insertutil.LogRowsStorage interface
type Storage struct{}
// MustAddRows implements insertutil.LogRowsStorage interface
func (*Storage) MustAddRows(lr *logstorage.LogRows) {
pushToRemoteStorages(lr)
}
// CanWriteData implements insertutil.LogRowsStorage interface
func (*Storage) CanWriteData() error {
return nil
}
// maxQueues limits the maximum value for `-remoteWrite.queues`. There is no sense in setting too high value,
// since it may lead to high memory usage due to big number of buffers.
var maxQueues = cgroup.AvailableCPUs() * 16
const persistentQueueDirname = "persistent-queue"
// InitSecretFlags must be called after flag.Parse and before any logging.
func InitSecretFlags() {
if !*showRemoteWriteURL {
// remoteWrite.url can contain authentication codes, so hide it at `/metrics` output.
flagutil.RegisterSecretFlag("remoteWrite.url")
}
}
// Init initializes remotewrite.
//
// It must be called after flag.Parse().
//
// Stop must be called for graceful shutdown.
func Init() {
if len(*remoteWriteURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
}
if *queues > maxQueues {
*queues = maxQueues
}
if *queues <= 0 {
*queues = 1
}
initRemoteWriteCtxs(*remoteWriteURLs)
dropDanglingQueues()
}
// Stop stops remotewrite.
//
// It is expected that nobody calls TryPush during and after the call to this func.
func Stop() {
for _, rwctx := range rwctxsGlobal {
rwctx.mustStop()
}
rwctxsGlobal = nil
}
func dropDanglingQueues() {
// Remove dangling persistent queues, if any.
// This is required for the case when the number of queues has been changed or URL have been changed.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
//
// In case if there were many persistent queues with identical *remoteWriteURLs
// the queue with the last index will be dropped.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6140
existingQueues := make(map[string]struct{}, len(rwctxsGlobal))
for _, rwctx := range rwctxsGlobal {
existingQueues[rwctx.fq.Dirname()] = struct{}{}
}
queuesDir := filepath.Join(*tmpDataPath, persistentQueueDirname)
files := fs.MustReadDir(queuesDir)
removed := 0
for _, f := range files {
dirname := f.Name()
if _, ok := existingQueues[dirname]; !ok {
logger.Infof("removing dangling queue %q", dirname)
fullPath := filepath.Join(queuesDir, dirname)
fs.MustRemoveAll(fullPath)
removed++
}
}
if removed > 0 {
logger.Infof("removed %d dangling queues from %q, active queues: %d", removed, *tmpDataPath, len(rwctxsGlobal))
}
}
func initRemoteWriteCtxs(urls []string) {
if len(urls) == 0 {
logger.Panicf("BUG: urls must be non-empty")
}
maxInmemoryBlocks := memory.Allowed() / len(urls) / 10000
if maxInmemoryBlocks / *queues > 100 {
// There is no much sense in keeping higher number of blocks in memory,
// since this means that the producer outperforms consumer and the queue
// will continue growing. It is better storing the queue to file.
maxInmemoryBlocks = 100 * *queues
}
if maxInmemoryBlocks < 2 {
maxInmemoryBlocks = 2
}
rwctxs := make([]*remoteWriteCtx, len(urls))
rwctxIdx := make([]int, len(urls))
for i, remoteWriteURLRaw := range urls {
remoteWriteURL, err := url.Parse(remoteWriteURLRaw)
if err != nil {
logger.Fatalf("invalid -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
if *showRemoteWriteURL {
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
}
rwctxs[i] = newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
rwctxIdx[i] = i
}
rwctxsGlobal = rwctxs
}
func pushToRemoteStorages(lr *logstorage.LogRows) {
rwctxs := rwctxsGlobal
if len(rwctxs) == 1 {
// fast path
rwctxs[0].push(lr)
return
}
// Push samples to remote storage systems in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
var wg sync.WaitGroup
for _, rwctx := range rwctxs {
wg.Add(1)
go func(rwctx *remoteWriteCtx) {
defer wg.Done()
rwctx.push(lr)
}(rwctx)
}
wg.Wait()
}
type remoteWriteCtx struct {
idx int
fq *persistentqueue.FastQueue
c *client
pls []*pendingLogs
pssNextIdx atomic.Uint64
}
func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, maxInmemoryBlocks int, sanitizedURL string) *remoteWriteCtx {
// protocol version is required by victoria-logs
q := remoteWriteURL.Query()
q.Set("version", netinsert.ProtocolVersion)
remoteWriteURL.RawQuery = q.Encode()
// strip query params, otherwise changing params resets pq
pqURL := *remoteWriteURL
pqURL.RawQuery = ""
pqURL.Fragment = ""
h := xxhash.Sum64([]byte(pqURL.String()))
queuePath := filepath.Join(*tmpDataPath, persistentQueueDirname, fmt.Sprintf("%d_%016X", argIdx+1, h))
maxPendingBytes := maxPendingBytesPerURL.GetOptionalArg(argIdx)
if maxPendingBytes != 0 && maxPendingBytes < persistentqueue.DefaultChunkFileSize {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4195
logger.Warnf("rounding the -remoteWrite.maxDiskUsagePerURL=%d to the minimum supported value: %d", maxPendingBytes, persistentqueue.DefaultChunkFileSize)
maxPendingBytes = persistentqueue.DefaultChunkFileSize
}
fq := persistentqueue.MustOpenFastQueue(queuePath, sanitizedURL, maxInmemoryBlocks, maxPendingBytes, false)
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_pending_data_bytes{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetPendingBytes())
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_pending_inmemory_blocks{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetInmemoryQueueLen())
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vlagent_remotewrite_queue_blocked{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
if fq.IsWriteBlocked() {
return 1
}
return 0
})
var c *client
switch remoteWriteURL.Scheme {
case "http", "https":
c = newHTTPClient(argIdx, remoteWriteURL.String(), sanitizedURL, fq, *queues)
default:
logger.Fatalf("unsupported scheme: %s for remoteWriteURL: %s, want `http`, `https`", remoteWriteURL.Scheme, sanitizedURL)
}
c.init(argIdx, *queues, sanitizedURL)
// Initialize pss
plsLen := *queues
if n := cgroup.AvailableCPUs(); plsLen > n {
// There is no sense in running more than availableCPUs concurrent pendingLogs,
// since every pendingLogs can saturate up to a single CPU.
plsLen = n
}
pls := make([]*pendingLogs, plsLen)
for i := range pls {
pls[i] = newPendingLogs(fq)
}
rwctx := &remoteWriteCtx{
idx: argIdx,
fq: fq,
c: c,
pls: pls,
}
return rwctx
}
func (rwctx *remoteWriteCtx) push(lr *logstorage.LogRows) {
pls := rwctx.pls
idx := rwctx.pssNextIdx.Add(1) % uint64(len(pls))
pls[idx].add(lr)
}
func (rwctx *remoteWriteCtx) mustStop() {
for _, ps := range rwctx.pls {
ps.mustStop()
}
rwctx.idx = 0
rwctx.pls = nil
rwctx.fq.UnblockAllReaders()
rwctx.c.MustStop()
rwctx.c = nil
rwctx.fq.MustClose()
rwctx.fq = nil
}

View File

@@ -7,6 +7,7 @@ import (
"strconv"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
@@ -192,6 +193,7 @@ type logMessageProcessor struct {
rowsIngestedTotal *metrics.Counter
bytesIngestedTotal *metrics.Counter
flushDuration *metrics.Summary
}
func (lmp *logMessageProcessor) initPeriodicFlush() {
@@ -290,9 +292,11 @@ func (lmp *logMessageProcessor) AddInsertRow(r *logstorage.InsertRow) {
// flushLocked must be called under locked lmp.mu.
func (lmp *logMessageProcessor) flushLocked() {
lmp.lastFlushTime = time.Now()
start := time.Now()
lmp.lastFlushTime = start
logRowsStorage.MustAddRows(lmp.lr)
lmp.lr.ResetKeepSettings()
lmp.flushDuration.UpdateDuration(start)
}
// MustClose flushes the remaining data to the underlying storage and closes lmp.
@@ -303,6 +307,7 @@ func (lmp *logMessageProcessor) MustClose() {
lmp.flushLocked()
logstorage.PutLogRows(lmp.lr)
lmp.lr = nil
messageProcessorCount.Add(-1)
}
// NewLogMessageProcessor returns new LogMessageProcessor for the given cp.
@@ -312,12 +317,14 @@ func (cp *CommonParams) NewLogMessageProcessor(protocolName string, isStreamMode
lr := logstorage.GetLogRows(cp.StreamFields, cp.IgnoreFields, cp.DecolorizeFields, cp.ExtraFields, *defaultMsgValue)
rowsIngestedTotal := metrics.GetOrCreateCounter(fmt.Sprintf("vl_rows_ingested_total{type=%q}", protocolName))
bytesIngestedTotal := metrics.GetOrCreateCounter(fmt.Sprintf("vl_bytes_ingested_total{type=%q}", protocolName))
flushDuration := metrics.GetOrCreateSummary(fmt.Sprintf("vl_insert_flush_duration_seconds{type=%q}", protocolName))
lmp := &logMessageProcessor{
cp: cp,
lr: lr,
rowsIngestedTotal: rowsIngestedTotal,
bytesIngestedTotal: bytesIngestedTotal,
flushDuration: flushDuration,
stopCh: make(chan struct{}),
}
@@ -326,10 +333,13 @@ func (cp *CommonParams) NewLogMessageProcessor(protocolName string, isStreamMode
lmp.initPeriodicFlush()
}
messageProcessorCount.Add(1)
return lmp
}
var (
rowsDroppedTotalDebug = metrics.NewCounter(`vl_rows_dropped_total{reason="debug"}`)
rowsDroppedTotalTooManyFields = metrics.NewCounter(`vl_rows_dropped_total{reason="too_many_fields"}`)
_ = metrics.NewGauge(`vl_insert_processors_count`, func() float64 { return float64(messageProcessorCount.Load()) })
messageProcessorCount atomic.Int64
)

View File

@@ -276,7 +276,7 @@ func readJournaldLogEntry(streamName string, lr *insertutil.LineReader, lmp inse
}
size := binary.LittleEndian.Uint64(fb.value[:8])
// Read the value until its lenth exceeds the given size - the last char in the read value will always be '\n'
// Read the value until its length exceeds the given size - the last char in the read value will always be '\n'
// because it is appended by appendNextLineToValue().
for uint64(len(fb.value[8:])) <= size {
if err := fb.appendNextLineToValue(lr); err != nil {

View File

@@ -354,6 +354,10 @@ func writeStorageMetrics(w io.Writer, strg *logstorage.Storage) {
var ss logstorage.StorageStats
strg.UpdateStats(&ss)
if maxDiskSpaceUsageBytes.N > 0 {
metrics.WriteGaugeUint64(w, fmt.Sprintf(`vl_max_disk_space_usage_bytes{path=%q}`, *storageDataPath), uint64(maxDiskSpaceUsageBytes.N))
}
metrics.WriteGaugeUint64(w, fmt.Sprintf(`vl_free_disk_space_bytes{path=%q}`, *storageDataPath), fs.MustGetFreeSpace(*storageDataPath))
isReadOnly := uint64(0)

View File

@@ -38,8 +38,10 @@ var (
"By default, the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"is sent after temporary unavailability of the remote storage. See also -maxIngestionRate")
sendTimeout = flagutil.NewArrayDuration("remoteWrite.sendTimeout", time.Minute, "Timeout for sending a single block of data to the corresponding -remoteWrite.url")
retryMinInterval = flagutil.NewArrayDuration("remoteWrite.retryMinInterval", time.Second, "The minimum delay between retry attempts to send a block of data to the corresponding -remoteWrite.url. Every next retry attempt will double the delay to prevent hammering of remote database. See also -remoteWrite.retryMaxTime")
retryMaxTime = flagutil.NewArrayDuration("remoteWrite.retryMaxTime", time.Minute, "The max time spent on retry attempts to send a block of data to the corresponding -remoteWrite.url. Change this value if it is expected for -remoteWrite.url to be unreachable for more than -remoteWrite.retryMaxTime. See also -remoteWrite.retryMinInterval")
retryMinInterval = flagutil.NewArrayDuration("remoteWrite.retryMinInterval", time.Second, "The minimum delay between retry attempts to send a block of data to the corresponding -remoteWrite.url. Every next retry attempt will double the delay to prevent hammering of remote database. See also -remoteWrite.retryMaxInterval")
// deprecated in the future. use -remoteWrite.retryMaxInterval instead
retryMaxTime = flagutil.NewArrayDuration("remoteWrite.retryMaxTime", time.Minute, "The max time spent on retry attempts to send a block of data to the corresponding -remoteWrite.url. This flag is deprecated, use -remoteWrite.retryMaxInterval instead")
retryMaxInterval = flagutil.NewArrayDuration("remoteWrite.retryMaxInterval", time.Minute, "The maximum delay between retry attempts to send a block of data to the corresponding -remoteWrite.url. The delay doubles with each retry until this maximum is reached, after which it remains constant. See also -remoteWrite.retryMinInterval")
proxyURL = flagutil.NewArrayString("remoteWrite.proxyURL", "Optional proxy URL for writing data to the corresponding -remoteWrite.url. "+
"Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234")
@@ -97,7 +99,7 @@ type client struct {
hc *http.Client
retryMinInterval time.Duration
retryMaxTime time.Duration
retryMaxInterval time.Duration
sendBlock func(block []byte) bool
authCfg *promauth.Config
@@ -151,6 +153,10 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
Transport: authCfg.NewRoundTripper(tr),
Timeout: sendTimeout.GetOptionalArg(argIdx),
}
retryMaxIntervalFlag := retryMaxTime
if retryMaxInterval.String() != "" {
retryMaxIntervalFlag = retryMaxInterval
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
@@ -159,7 +165,7 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
fq: fq,
hc: hc,
retryMinInterval: retryMinInterval.GetOptionalArg(argIdx),
retryMaxTime: retryMaxTime.GetOptionalArg(argIdx),
retryMaxInterval: retryMaxIntervalFlag.GetOptionalArg(argIdx),
stopCh: make(chan struct{}),
}
c.sendBlock = c.sendBlockHTTP
@@ -404,7 +410,7 @@ func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
// Otherwise, it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.Register(len(block))
maxRetryDuration := timeutil.AddJitterToDuration(c.retryMaxTime)
maxRetryDuration := timeutil.AddJitterToDuration(c.retryMaxInterval)
retryDuration := timeutil.AddJitterToDuration(c.retryMinInterval)
retriesCount := 0

View File

@@ -279,6 +279,9 @@ func initRemoteWriteCtxs(urls []string) {
}
rwctxs := make([]*remoteWriteCtx, len(urls))
rwctxIdx := make([]int, len(urls))
if retryMaxTime.String() != "" {
logger.Warnf("-remoteWrite.retryMaxTime is deprecated; use -remoteWrite.retryMaxInterval instead")
}
for i, remoteWriteURLRaw := range urls {
remoteWriteURL, err := url.Parse(remoteWriteURLRaw)
if err != nil {

View File

@@ -18,9 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
)
var (
defaultRuleType = flag.String("rule.defaultRuleType", "prometheus", `Default type for rule expressions, can be overridden via "type" parameter on the group level, see https://docs.victoriametrics.com/victoriametrics/vmalert/#groups. Supported values: "graphite", "prometheus" and "vlogs".`)
)
var defaultRuleType = flag.String("rule.defaultRuleType", "prometheus", `Default type for rule expressions, can be overridden via "type" parameter on the group level, see https://docs.victoriametrics.com/victoriametrics/vmalert/#groups. Supported values: "graphite", "prometheus" and "vlogs".`)
// Group contains list of Rules grouped into
// entity with one name and evaluation interval
@@ -293,12 +291,6 @@ func parse(files map[string][]byte, validateTplFn ValidateTplFn, validateExpress
if err := errGroup.Err(); err != nil {
return nil, err
}
sort.SliceStable(groups, func(i, j int) bool {
if groups[i].File != groups[j].File {
return groups[i].File < groups[j].File
}
return groups[i].Name < groups[j].Name
})
return groups, nil
}

View File

@@ -89,15 +89,18 @@ func (t *Type) ValidateExpr(expr string) error {
return nil
}
// SupportedType is true if given datasource type is supported
func SupportedType(dsType string) bool {
return dsType == "graphite" || dsType == "prometheus" || dsType == "vlogs"
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (t *Type) UnmarshalYAML(unmarshal func(any) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
switch s {
case "graphite", "prometheus", "vlogs":
default:
if !SupportedType(s) {
return fmt.Errorf("unknown datasource type=%q, want prometheus, graphite or vlogs", s)
}
t.Name = s

View File

@@ -6,6 +6,7 @@ import (
"fmt"
"net/url"
"os"
"sort"
"strconv"
"strings"
"sync"
@@ -148,9 +149,13 @@ func main() {
if err != nil {
logger.Fatalf("failed to init datasource: %s", err)
}
if err := replay(groupsCfg, q, rw); err != nil {
totalRows, droppedRows, err := replay(groupsCfg, q, rw)
if err != nil {
logger.Fatalf("replay failed: %s", err)
}
if droppedRows > 0 {
logger.Fatalf("failed to push all generated samples to remote write url, dropped %d samples out of %d", droppedRows, totalRows)
}
logger.Infof("replay succeed!")
return
}
@@ -403,6 +408,21 @@ func configsEqual(a, b []config.Group) bool {
if len(a) != len(b) {
return false
}
// sort both slices by file and name before comparing them
sort.SliceStable(a, func(i, j int) bool {
if a[i].File != a[j].File {
return a[i].File < a[j].File
}
return a[i].Name < a[j].Name
})
sort.SliceStable(b, func(i, j int) bool {
if b[i].File != b[j].File {
return b[i].File < b[j].File
}
return b[i].Name < b[j].Name
})
for i := range a {
if a[i].Checksum != b[i].Checksum {
return false

View File

@@ -194,9 +194,10 @@ func mergeLabels(target string, metaLabels *promutil.Labels, cfg *Config) *promu
alertsPath = address[n:]
address = address[:n]
}
m.Add("__address__", address)
m.Add("__scheme__", scheme)
m.Add("__alerts_path__", alertsPath)
m.AddFrom(metaLabels)
// force labels
m.Set("__address__", address)
m.Set("__scheme__", scheme)
m.Set("__alerts_path__", alertsPath)
return m
}

View File

@@ -2,6 +2,7 @@ package notifier
import (
"fmt"
"sort"
"sync"
"time"
@@ -53,8 +54,11 @@ func (cw *configWatcher) notifiers() []Notifier {
for _, n := range ns {
notifiers = append(notifiers, n.Notifier)
}
}
// deterministically sort the output
sort.Slice(notifiers, func(i, j int) bool {
return notifiers[i].Addr() < notifiers[j].Addr()
})
return notifiers
}
@@ -85,12 +89,12 @@ func (cw *configWatcher) reload(path string) error {
}
func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn getLabels) error {
targets, errors := targetsFromLabels(labelsFn, cw.cfg, cw.genFn)
targetMetadata, errors := getTargetMetadata(labelsFn, cw.cfg)
for _, err := range errors {
return fmt.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.setTargets(typeK, targets)
cw.updateTargets(typeK, targetMetadata, cw.cfg, cw.genFn)
cw.wg.Add(1)
go func() {
@@ -105,22 +109,22 @@ func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn
return
case <-ticker.C:
}
updateTargets, errors := targetsFromLabels(labelsFn, cw.cfg, cw.genFn)
targetMetadata, errors := getTargetMetadata(labelsFn, cw.cfg)
for _, err := range errors {
logger.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.setTargets(typeK, updateTargets)
cw.updateTargets(typeK, targetMetadata, cw.cfg, cw.genFn)
}
}()
return nil
}
func targetsFromLabels(labelsFn getLabels, cfg *Config, genFn AlertURLGenerator) ([]Target, []error) {
func getTargetMetadata(labelsFn getLabels, cfg *Config) (map[string]*promutil.Labels, []error) {
metaLabels, err := labelsFn()
if err != nil {
return nil, []error{fmt.Errorf("failed to get labels: %w", err)}
}
var targets []Target
targetMetadata := make(map[string]*promutil.Labels, len(metaLabels))
var errors []error
duplicates := make(map[string]struct{})
for _, labels := range metaLabels {
@@ -143,18 +147,9 @@ func targetsFromLabels(labelsFn getLabels, cfg *Config, genFn AlertURLGenerator)
continue
}
duplicates[u] = struct{}{}
am, err := NewAlertManager(u, genFn, cfg.HTTPClientConfig, cfg.parsedAlertRelabelConfigs, cfg.Timeout.Duration())
if err != nil {
errors = append(errors, err)
continue
}
targets = append(targets, Target{
Notifier: am,
Labels: processedLabels,
})
targetMetadata[u] = processedLabels
}
return targets, errors
return targetMetadata, errors
}
type getLabels func() ([]*promutil.Labels, error)
@@ -241,21 +236,40 @@ func (cw *configWatcher) mustStop() {
func (cw *configWatcher) setTargets(key TargetType, targets []Target) {
cw.targetsMu.Lock()
newT := make(map[string]Target)
for _, t := range targets {
newT[t.Addr()] = t
}
oldT := cw.targets[key]
for _, ot := range oldT {
if _, ok := newT[ot.Addr()]; !ok {
ot.Notifier.Close()
}
}
cw.targets[key] = targets
cw.targetsMu.Unlock()
}
func (cw *configWatcher) updateTargets(key TargetType, targetMetadata map[string]*promutil.Labels, cfg *Config, genFn AlertURLGenerator) {
cw.targetsMu.Lock()
defer cw.targetsMu.Unlock()
oldTargets := cw.targets[key]
var updatedTargets []Target
for _, ot := range oldTargets {
if _, ok := targetMetadata[ot.Addr()]; !ok {
// if target not exists in currentTargets, close it
ot.Notifier.Close()
} else {
updatedTargets = append(updatedTargets, ot)
delete(targetMetadata, ot.Addr())
}
}
// create new resources for the new targets
for addr, labels := range targetMetadata {
am, err := NewAlertManager(addr, genFn, cfg.HTTPClientConfig, cfg.parsedAlertRelabelConfigs, cfg.Timeout.Duration())
if err != nil {
logger.Errorf("failed to init %s notifier with addr %q: %w", key, addr, err)
continue
}
updatedTargets = append(updatedTargets, Target{
Notifier: am,
Labels: labels,
})
}
cw.targets[key] = updatedTargets
}
// mergeHTTPClientConfigs merges fields between child and parent params
// by populating child from parent params if they're missing.
func mergeHTTPClientConfigs(parent, child promauth.HTTPClientConfig) promauth.HTTPClientConfig {

View File

@@ -8,8 +8,10 @@ import (
"os"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discovery/consul"
)
func TestConfigWatcherReload(t *testing.T) {
@@ -59,6 +61,11 @@ static_configs:
}
func TestConfigWatcherStart(t *testing.T) {
oldSDCheckInterval := consul.SDCheckInterval
defer func() { consul.SDCheckInterval = oldSDCheckInterval }()
consulCheckInterval := 100 * time.Millisecond
consul.SDCheckInterval = &consulCheckInterval
consulSDServer := newFakeConsulServer()
defer consulSDServer.Close()
@@ -97,6 +104,11 @@ consul_sd_configs:
if n2.Addr() != expAddr2 {
t.Fatalf("exp address %q; got %q", expAddr2, n2.Addr())
}
f := func() bool { return len(cw.notifiers()) == 1 }
if !waitFor(f, time.Second) {
t.Fatalf("expected to get 1 notifiers; got %d", len(cw.notifiers()))
}
}
// TestConfigWatcherReloadConcurrent supposed to test concurrent
@@ -193,6 +205,7 @@ const (
)
func newFakeConsulServer() *httptest.Server {
requestCount := 0
mux := http.NewServeMux()
mux.HandleFunc("/v1/agent/self", func(rw http.ResponseWriter, _ *http.Request) {
rw.Write([]byte(`{"Config": {"Datacenter": "dc1"}}`))
@@ -207,8 +220,9 @@ func newFakeConsulServer() *httptest.Server {
}`))
})
mux.HandleFunc("/v1/health/service/alertmanager", func(rw http.ResponseWriter, _ *http.Request) {
rw.Header().Set("X-Consul-Index", "1")
rw.Write([]byte(`
if requestCount == 0 {
rw.Header().Set("X-Consul-Index", "1")
rw.Write([]byte(`
[
{
"Node": {
@@ -297,6 +311,56 @@ func newFakeConsulServer() *httptest.Server {
}
}
]`))
} else {
rw.Header().Set("X-Consul-Index", "2")
rw.Write([]byte(`
[
{
"Node": {
"ID": "e8e3629a-3f50-9d6e-aaf8-f173b5b05c72",
"Node": "machine",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"lan_ipv4": "127.0.0.1",
"wan": "127.0.0.1",
"wan_ipv4": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 13,
"ModifyIndex": 14
},
"Service": {
"ID": "am3",
"Service": "alertmanager",
"Tags": [
"alertmanager",
"__scheme__=http"
],
"Address": "",
"Meta": null,
"Port": 9097,
"Weights": {
"Passing": 1,
"Warning": 1
},
"EnableTagOverride": false,
"Proxy": {
"Mode": "",
"MeshGateway": {},
"Expose": {}
},
"Connect": {},
"CreateIndex": 16,
"ModifyIndex": 16
}
}
]`))
}
requestCount++
})
return httptest.NewServer(mux)
@@ -357,3 +421,13 @@ func TestParseLabels_Success(t *testing.T) {
PathPrefix: "test",
}, "https://alertmanager:9093/api/v1/alerts")
}
func waitFor(f func() bool, timeout time.Duration) bool {
for start := time.Now(); time.Since(start) < timeout; {
if f() == true {
return true
}
time.Sleep(50 * time.Millisecond)
}
return false
}

View File

@@ -216,7 +216,7 @@ var (
)
// GetDroppedRows returns value of droppedRows metric
func GetDroppedRows() int64 { return int64(droppedRows.Get()) }
func GetDroppedRows() int { return int(droppedRows.Get()) }
// flush is a blocking function that marshals WriteRequest and sends
// it to remote-write endpoint. Flush performs limited amount of retries

View File

@@ -28,15 +28,17 @@ var (
"Defines how many retries to make before giving up on rule if request for it returns an error.")
disableProgressBar = flag.Bool("replay.disableProgressBar", false, "Whether to disable rendering progress bars during the replay. "+
"Progress bar rendering might be verbose or break the logs parsing, so it is recommended to be disabled when not used in interactive mode.")
ruleEvaluationConcurrency = flag.Int("replay.ruleEvaluationConcurrency", 1, "The maximum number of concurrent `/query_range` requests for a single rule. "+
"Increasing this value when replaying for a long time and a single request range is limited by `-replay.maxDatapointsPerQuery`.")
)
func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewrite.RWClient) error {
func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewrite.RWClient) (totalRows, droppedRows int, err error) {
if *replayMaxDatapoints < 1 {
return fmt.Errorf("replay.maxDatapointsPerQuery can't be lower than 1")
return 0, 0, fmt.Errorf("replay.maxDatapointsPerQuery can't be lower than 1")
}
tFrom, err := time.Parse(time.RFC3339, *replayFrom)
if err != nil {
return fmt.Errorf("failed to parse replay.timeFrom=%q: %w", *replayFrom, err)
return 0, 0, fmt.Errorf("failed to parse replay.timeFrom=%q: %w", *replayFrom, err)
}
// use tFrom location for default value, otherwise filters could have different locations
@@ -44,12 +46,12 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
if *replayTo != "" {
tTo, err = time.Parse(time.RFC3339, *replayTo)
if err != nil {
return fmt.Errorf("failed to parse replay.timeTo=%q: %w", *replayTo, err)
return 0, 0, fmt.Errorf("failed to parse replay.timeTo=%q: %w", *replayTo, err)
}
}
if !tTo.After(tFrom) {
return fmt.Errorf("replay.timeTo=%v must be bigger than replay.timeFrom=%v", tTo, tFrom)
return 0, 0, fmt.Errorf("replay.timeTo=%v must be bigger than replay.timeFrom=%v", tTo, tFrom)
}
labels := make(map[string]string)
for _, s := range *externalLabels {
@@ -58,7 +60,7 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
}
n := strings.IndexByte(s, '=')
if n < 0 {
return fmt.Errorf("missing '=' in `-label`. It must contain label in the form `name=value`; got %q", s)
return 0, 0, fmt.Errorf("missing '=' in `-label`. It must contain label in the form `name=value`; got %q", s)
}
labels[s[:n]] = s[n+1:]
}
@@ -69,18 +71,14 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
"\nmax data points per request: %d\n",
tFrom, tTo, *replayMaxDatapoints)
var total int
for _, cfg := range groupsCfg {
ng := rule.NewGroup(cfg, qb, *evaluationInterval, labels)
total += ng.Replay(tFrom, tTo, rw, *replayMaxDatapoints, *replayRuleRetryAttempts, *replayRulesDelay, *disableProgressBar)
totalRows += ng.Replay(tFrom, tTo, rw, *replayMaxDatapoints, *replayRuleRetryAttempts, *replayRulesDelay, *disableProgressBar, *ruleEvaluationConcurrency)
}
logger.Infof("replay evaluation finished, generated %d samples", total)
logger.Infof("replay evaluation finished, generated %d samples", totalRows)
if err := rw.Close(); err != nil {
return err
return 0, 0, err
}
droppedRows := remotewrite.GetDroppedRows()
if droppedRows > 0 {
return fmt.Errorf("failed to push all generated samples to remote write url, dropped %d samples out of %d", droppedRows, total)
}
return nil
droppedRows = remotewrite.GetDroppedRows()
return totalRows, droppedRows, nil
}

View File

@@ -8,38 +8,45 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
)
type fakeReplayQuerier struct {
datasource.FakeQuerier
registry map[string]map[string]struct{}
registry map[string]map[string][]datasource.Metric
}
func (fr *fakeReplayQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fr
}
type fakeRWClient struct{}
func (fc *fakeRWClient) Push(_ prompbmarshal.TimeSeries) error {
return nil
}
func (fc *fakeRWClient) Close() error {
return nil
}
func (fr *fakeReplayQuerier) QueryRange(_ context.Context, q string, from, to time.Time) (res datasource.Result, err error) {
key := fmt.Sprintf("%s+%s", from.Format("15:04:05"), to.Format("15:04:05"))
dps, ok := fr.registry[q]
if !ok {
return res, fmt.Errorf("unexpected query received: %q", q)
}
_, ok = dps[key]
metrics, ok := dps[key]
if !ok {
return res, fmt.Errorf("unexpected time range received: %q", key)
}
delete(dps, key)
if len(fr.registry[q]) < 1 {
delete(fr.registry, q)
}
res.Data = metrics
return res, nil
}
func TestReplay(t *testing.T) {
f := func(from, to string, maxDP int, ruleDelay time.Duration, cfg []config.Group, qb *fakeReplayQuerier) {
f := func(from, to string, maxDP, ruleConcurrency int, ruleDelay time.Duration, cfg []config.Group, qb *fakeReplayQuerier, expectTotalRows int) {
t.Helper()
fromOrig, toOrig, maxDatapointsOrig := *replayFrom, *replayTo, *replayMaxDatapoints
@@ -52,106 +59,211 @@ func TestReplay(t *testing.T) {
*replayRuleRetryAttempts = 1
*replayRulesDelay = ruleDelay
rwb := &remotewrite.DebugClient{}
rwb := &fakeRWClient{}
*replayFrom = from
*replayTo = to
*ruleEvaluationConcurrency = ruleConcurrency
*replayMaxDatapoints = maxDP
if err := replay(cfg, qb, rwb); err != nil {
totalRows, _, err := replay(cfg, qb, rwb)
if err != nil {
t.Fatalf("replay failed: %s", err)
}
if len(qb.registry) > 0 {
t.Fatalf("not all requests were sent: %#v", qb.registry)
if totalRows != expectTotalRows {
t.Fatalf("unexpected total rows count: got %d, want %d", totalRows, expectTotalRows)
}
}
// one rule + one response
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:00.000Z", 10, time.Millisecond, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:00.000Z", 10, 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {"12:00:00+12:02:00": {}},
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {"12:00:00+12:02:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
}},
},
})
}, 1)
// one rule + multiple responses
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, time.Millisecond, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
"12:02:00+12:02:30": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
},
},
})
}, 2)
// datapoints per step
f("2021-01-01T12:00:00.000Z", "2021-01-01T15:02:30.000Z", 60, time.Millisecond, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T15:02:30.000Z", 60, 1, time.Millisecond, []config.Group{
{Interval: promutil.NewDuration(time.Minute), Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+13:00:00": {},
"13:00:00+14:00:00": {},
"12:00:00+13:00:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"13:00:00+14:00:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"14:00:00+15:00:00": {},
"15:00:00+15:02:30": {},
},
},
})
}, 3)
// multiple recording rules + multiple responses
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, time.Millisecond, []config.Group{
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
{Rules: []config.Rule{{Record: "bar", Expr: "max(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:01:00+12:02:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"12:02:00+12:02:30": {},
},
},
})
}, 4)
// multiple alerting rules + multiple responses
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, time.Millisecond, []config.Group{
// alerting rule generates two series `ALERTS` and `ALERTS_FOR_STATE` when triggered
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 1, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Alert: "foo", Expr: "sum(up) > 1"}}},
{Rules: []config.Rule{{Alert: "bar", Expr: "max(up) < 1"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
registry: map[string]map[string][]datasource.Metric{
"sum(up) > 1": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1, 2},
Values: []float64{1, 2},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:00:00+12:01:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
})
}, 6)
// multiple alerting rules in one group+ multiple responses + concurrency
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 0, []config.Group{
{Rules: []config.Rule{{Alert: "foo", Expr: "sum(up) > 1"}, {Alert: "bar", Expr: "max(up) < 1"}}, Concurrency: 2}}, &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
// multiple recording rules in one group+ multiple responses + concurrency
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 1, 0, []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up) > 1"}, {Record: "bar", Expr: "max(up) < 1"}}, Concurrency: 2}}, &fakeReplayQuerier{
registry: map[string]map[string][]datasource.Metric{
"sum(up) > 1": {
"12:00:00+12:01:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:01:00+12:02:00": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
"12:02:00+12:02:30": {
{
Timestamps: []int64{1},
Values: []float64{1},
},
},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {{
Timestamps: []int64{1},
Values: []float64{1},
}},
"12:02:00+12:02:30": {},
},
},
}, 4)
// single rule + rule concurrency
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 3, time.Millisecond, []config.Group{
{Rules: []config.Rule{{Record: "foo-concurrent", Expr: "sum(up)"}}},
}, &fakeReplayQuerier{
registry: map[string]map[string][]datasource.Metric{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {{
Timestamps: []int64{1},
Values: []float64{1},
}},
"12:02:00+12:02:30": {},
},
},
}, 1)
// multiple rules + rule concurrency + group concurrency
f("2021-01-01T12:00:00.000Z", "2021-01-01T12:02:30.000Z", 1, 3, 0, []config.Group{
{Rules: []config.Rule{{Alert: "foo-group-single-concurrent", Expr: "sum(up) > 1"}, {Alert: "bar-group-single-concurrent", Expr: "max(up) < 1"}}, Concurrency: 2}}, &fakeReplayQuerier{
registry: map[string]map[string][]datasource.Metric{
"sum(up) > 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:01:00+12:02:00": {{
Timestamps: []int64{1},
Values: []float64{1},
}},
"12:02:00+12:02:30": {},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:01:00+12:02:00": {{
Timestamps: []int64{1},
Values: []float64{1},
}},
"12:02:00+12:02:30": {},
},
},
})
}, 4)
}

View File

@@ -741,6 +741,13 @@ func (ar *AlertingRule) restore(ctx context.Context, q datasource.Querier, ts ti
}
var labelsFilter string
for k, v := range ar.Labels {
if strings.Contains(v, "{{") && strings.Contains(v, "}}") {
// do not append label to the filter when value contains template,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9305.
// it's ok to do the simple check to skip some labels,
// since we verify the results' hash afterward to ensure the alerts match.
continue
}
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
}
// use `default_rollup()` instead of `last_over_time()` here to accounts for possible staleness markers

View File

@@ -815,10 +815,6 @@ func TestGroup_Restore(t *testing.T) {
t.Helper()
defer fqr.Reset()
for _, r := range rules {
fqr.Set(r.Expr, metricWithValueAndLabels(t, 0, "__name__", r.Alert))
}
fg := NewGroup(config.Group{Name: "TestRestore", Rules: rules}, fqr, time.Second, nil)
fg.Init()
wg := sync.WaitGroup{}
@@ -871,6 +867,7 @@ func TestGroup_Restore(t *testing.T) {
}
// one active alert, no previous state
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutil.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
@@ -879,10 +876,10 @@ func TestGroup_Restore(t *testing.T) {
ActiveAt: defaultTS,
},
})
fqr.Reset()
// one active alert with state restore
ts := time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fn(
@@ -894,8 +891,33 @@ func TestGroup_Restore(t *testing.T) {
},
})
// one rule, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo",
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "prod"),
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "dev"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
// only env=prod has state metric, so only it will have state restore
stateMetric("foo", ts, "env", "prod"))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutil.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
Name: "foo",
ActiveAt: defaultTS,
},
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "prod"}): {
Name: "foo",
ActiveAt: ts,
},
})
// two rules, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set("bar", metricWithValueAndLabels(t, 0, "__name__", "bar"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("bar", ts))
fn(
@@ -916,6 +938,8 @@ func TestGroup_Restore(t *testing.T) {
// two rules, two active alerts, two with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set("bar", metricWithValueAndLabels(t, 0, "__name__", "bar"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
@@ -938,6 +962,7 @@ func TestGroup_Restore(t *testing.T) {
// one active alert but wrong state restore
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertname="bar",alertgroup="TestRestore"}[3600s])`,
stateMetric("wrong alert", ts))
fn(
@@ -951,6 +976,7 @@ func TestGroup_Restore(t *testing.T) {
// one active alert with labels
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev"))
fn(
@@ -964,6 +990,7 @@ func TestGroup_Restore(t *testing.T) {
// one active alert with restore labels mismatch
ts = time.Now().Truncate(time.Hour)
fqr.Set("foo", metricWithValueAndLabels(t, 0, "__name__", "foo"))
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev", "team", "foo"))
fn(
@@ -974,6 +1001,27 @@ func TestGroup_Restore(t *testing.T) {
ActiveAt: defaultTS,
},
})
// two active alerts with dynamic labels and restore
ts = time.Now().Truncate(time.Hour)
fqr.Set(`default_rollup(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts, "env", "dev"),
stateMetric("foo", ts.Add(time.Second), "env", "prod"))
fqr.Set("foo",
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "dev"),
metricWithValueAndLabels(t, 0, "__name__", "foo", "env", "prod"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "{{$labels.env}}"}, For: promutil.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
Name: "foo",
ActiveAt: ts,
},
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "prod"}): {
Name: "foo",
ActiveAt: ts.Add(time.Second),
},
})
}
func TestAlertingRule_Exec_Negative(t *testing.T) {

View File

@@ -513,7 +513,7 @@ func (g *Group) infof(format string, args ...any) {
}
// Replay performs group replay
func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoint, replayRuleRetryAttempts int, replayDelay time.Duration, disableProgressBar bool) int {
func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoint, replayRuleRetryAttempts int, replayDelay time.Duration, disableProgressBar bool, ruleEvaluationConcurrency int) int {
var total int
step := g.Interval * time.Duration(maxDataPoint)
ri := rangeIterator{start: start, end: end, step: step}
@@ -541,8 +541,7 @@ func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoi
if !disableProgressBar {
bar = pb.StartNew(iterations)
}
// pass ri as a copy, so it can be modified within the replayRuleRange
total += replayRuleRange(rule, ri, bar, rw, replayRuleRetryAttempts)
total += replayRuleRange(rule, ri, bar, rw, replayRuleRetryAttempts, ruleEvaluationConcurrency)
if bar != nil {
bar.Finish()
}
@@ -565,7 +564,7 @@ func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoi
wg.Add(1)
go func(r Rule, ri rangeIterator) {
// pass ri as a copy, so it can be modified within the replayRuleRange
res <- replayRuleRange(r, ri, bar, rw, replayRuleRetryAttempts)
res <- replayRuleRange(r, ri, bar, rw, replayRuleRetryAttempts, ruleEvaluationConcurrency)
<-sem
wg.Done()
}(r, ri)
@@ -586,17 +585,34 @@ func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoi
return total
}
func replayRuleRange(r Rule, ri rangeIterator, bar *pb.ProgressBar, rw remotewrite.RWClient, replayRuleRetryAttempts int) int {
func replayRuleRange(r Rule, ri rangeIterator, bar *pb.ProgressBar, rw remotewrite.RWClient, replayRuleRetryAttempts, ruleEvaluationConcurrency int) int {
fmt.Printf("> Rule %q (ID: %d)\n", r, r.ID())
total := 0
sem := make(chan struct{}, ruleEvaluationConcurrency)
wg := sync.WaitGroup{}
res := make(chan int, int(ri.end.Sub(ri.start)/ri.step)+1)
for ri.next() {
n, err := replayRule(r, ri.s, ri.e, rw, replayRuleRetryAttempts)
if err != nil {
logger.Fatalf("rule %q: %s", r, err)
}
if bar != nil {
bar.Increment()
}
sem <- struct{}{}
wg.Add(1)
go func(s, e time.Time) {
n, err := replayRule(r, s, e, rw, replayRuleRetryAttempts)
if err != nil {
logger.Fatalf("rule %q: %s", r, err)
}
if bar != nil {
bar.Increment()
}
res <- n
<-sem
wg.Done()
}(ri.s, ri.e)
}
wg.Wait()
close(res)
close(sem)
total := 0
for n := range res {
total += n
}
return total

View File

@@ -99,3 +99,15 @@ textarea.curl-area {
padding: 0;
overflow: scroll;
}
.w-10 {
width: 10%;
}
.w-20 {
width: 20%;
}
.w-60 {
width: 60%;
}

View File

@@ -9,6 +9,7 @@ import (
"strconv"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/rule"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
@@ -27,6 +28,7 @@ var (
// such as Grafana, and proxied via vmselect.
{"api/v1/rules", "list all loaded groups and rules"},
{"api/v1/alerts", "list all active alerts"},
{"api/v1/notifiers", "list all notifiers"},
{fmt.Sprintf("api/v1/alert?%s=<int>&%s=<int>", paramGroupID, paramAlertID), "get alert status by group and alert ID"},
}
systemLinks = [][2]string{
@@ -42,6 +44,10 @@ var (
{Name: "Notifiers", URL: "notifiers"},
{Name: "Docs", URL: "https://docs.victoriametrics.com/victoriametrics/vmalert/"},
}
ruleTypeMap = map[string]string{
"alert": ruleTypeAlerting,
"record": ruleTypeRecording,
}
)
type requestHandler struct {
@@ -89,10 +95,13 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
WriteRuleDetails(w, r, rule)
return true
case "/vmalert/groups":
filter := r.URL.Query().Get("filter")
rf := extractRulesFilter(r, filter)
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data := rh.groups(rf)
WriteListGroups(w, r, data, filter)
WriteListGroups(w, r, data, rf.filter)
return true
case "/vmalert/notifiers":
WriteListTargets(w, r, notifier.GetTargets())
@@ -102,23 +111,35 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
// served without `vmalert` prefix:
case "/rules":
// Grafana makes an extra request to `/rules`
// handler in addition to `/api/v1/rules` calls in alerts UI,
var data []apiGroup
filter := r.URL.Query().Get("filter")
rf := extractRulesFilter(r, filter)
// handler in addition to `/api/v1/rules` calls in alerts UI
var data []*apiGroup
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data = rh.groups(rf)
WriteListGroups(w, r, data, filter)
WriteListGroups(w, r, data, rf.filter)
return true
case "/vmalert/api/v1/notifiers", "/api/v1/notifiers":
data, err := rh.listNotifiers()
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json")
w.Write(data)
return true
case "/vmalert/api/v1/rules", "/api/v1/rules":
// path used by Grafana for ng alerting
var data []byte
var err error
filter := r.URL.Query().Get("filter")
rf := extractRulesFilter(r, filter)
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data, err = rh.listGroups(rf)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
@@ -129,7 +150,12 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/vmalert/api/v1/alerts", "/api/v1/alerts":
// path used by Grafana for ng alerting
data, err := rh.listAlerts()
rf, err := newRulesFilter(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
data, err := rh.listAlerts(rf)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
@@ -218,7 +244,7 @@ func (rh *requestHandler) getAlert(r *http.Request) (*apiAlert, error) {
type listGroupsResponse struct {
Status string `json:"status"`
Data struct {
Groups []apiGroup `json:"groups"`
Groups []*apiGroup `json:"groups"`
} `json:"data"`
}
@@ -229,82 +255,102 @@ type rulesFilter struct {
ruleNames []string
ruleType string
excludeAlerts bool
onlyUnhealthy bool
onlyNoMatch bool
filter string
dsType config.Type
}
func extractRulesFilter(r *http.Request, filter string) rulesFilter {
rf := rulesFilter{}
func newRulesFilter(r *http.Request) (*rulesFilter, error) {
rf := &rulesFilter{}
query := r.URL.Query()
var ruleType string
ruleTypeParam := r.URL.Query().Get("type")
// for some reason, `type` in filter doesn't match `type` in response,
// so we use this matching here
if ruleTypeParam == "alert" {
ruleType = ruleTypeAlerting
} else if ruleTypeParam == "record" {
ruleType = ruleTypeRecording
ruleTypeParam := query.Get("type")
if len(ruleTypeParam) > 0 {
if ruleType, ok := ruleTypeMap[ruleTypeParam]; ok {
rf.ruleType = ruleType
} else {
return nil, errResponse(fmt.Errorf(`invalid parameter "type": not supported value %q`, ruleTypeParam), http.StatusBadRequest)
}
}
dsType := query.Get("datasource_type")
if len(dsType) > 0 {
if config.SupportedType(dsType) {
rf.dsType = config.NewRawType(dsType)
} else {
return nil, errResponse(fmt.Errorf(`invalid parameter "datasource_type": not supported value %q`, dsType), http.StatusBadRequest)
}
}
filter := strings.ToLower(query.Get("filter"))
if len(filter) > 0 {
if filter == "nomatch" || filter == "unhealthy" {
rf.filter = filter
} else {
return nil, errResponse(fmt.Errorf(`invalid parameter "filter": not supported value %q`, filter), http.StatusBadRequest)
}
}
rf.ruleType = ruleType
rf.excludeAlerts = httputil.GetBool(r, "exclude_alerts")
rf.ruleNames = append([]string{}, r.Form["rule_name[]"]...)
rf.groupNames = append([]string{}, r.Form["rule_group[]"]...)
rf.files = append([]string{}, r.Form["file[]"]...)
switch filter {
case "unhealthy":
rf.onlyUnhealthy = true
case "noMatch":
rf.onlyNoMatch = true
}
return rf
return rf, nil
}
func (rh *requestHandler) groups(rf rulesFilter) []apiGroup {
func (rf *rulesFilter) matchesGroup(group *rule.Group) bool {
if len(rf.groupNames) > 0 && !slices.Contains(rf.groupNames, group.Name) {
return false
}
if len(rf.files) > 0 && !slices.Contains(rf.files, group.File) {
return false
}
if len(rf.dsType.Name) > 0 && rf.dsType.String() != group.Type.String() {
return false
}
return true
}
func (rh *requestHandler) groups(rf *rulesFilter) []*apiGroup {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
groups := make([]apiGroup, 0)
groups := make([]*apiGroup, 0)
for _, group := range rh.m.groups {
if len(rf.groupNames) > 0 && !slices.Contains(rf.groupNames, group.Name) {
if !rf.matchesGroup(group) {
continue
}
if len(rf.files) > 0 && !slices.Contains(rf.files, group.File) {
continue
}
g := groupToAPI(group)
// the returned list should always be non-nil
// https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221
filteredRules := make([]apiRule, 0)
for _, r := range g.Rules {
if rf.ruleType != "" && rf.ruleType != r.Type {
for _, rule := range g.Rules {
if rf.ruleType != "" && rf.ruleType != rule.Type {
continue
}
if len(rf.ruleNames) > 0 && !slices.Contains(rf.ruleNames, r.Name) {
if len(rf.ruleNames) > 0 && !slices.Contains(rf.ruleNames, rule.Name) {
continue
}
if (rule.LastError == "" && rf.filter == "unhealthy") || (!isNoMatch(rule) && rf.filter == "nomatch") {
continue
}
if rf.excludeAlerts {
r.Alerts = nil
rule.Alerts = nil
}
if (r.LastError == "" && rf.onlyUnhealthy) || (!isNoMatch(r) && rf.onlyNoMatch) {
continue
}
if r.LastError != "" {
if rule.LastError != "" {
g.Unhealthy++
} else {
g.Healthy++
}
if isNoMatch(r) {
if isNoMatch(rule) {
g.NoMatch++
}
filteredRules = append(filteredRules, r)
filteredRules = append(filteredRules, rule)
}
g.Rules = filteredRules
groups = append(groups, g)
}
// sort list of groups for deterministic output
slices.SortFunc(groups, func(a, b apiGroup) int {
slices.SortFunc(groups, func(a, b *apiGroup) int {
if a.Name != b.Name {
return strings.Compare(a.Name, b.Name)
}
@@ -313,7 +359,7 @@ func (rh *requestHandler) groups(rf rulesFilter) []apiGroup {
return groups
}
func (rh *requestHandler) listGroups(rf rulesFilter) ([]byte, error) {
func (rh *requestHandler) listGroups(rf *rulesFilter) ([]byte, error) {
lr := listGroupsResponse{Status: "success"}
lr.Data.Groups = rh.groups(rf)
b, err := json.Marshal(lr)
@@ -360,14 +406,17 @@ func (rh *requestHandler) groupAlerts() []groupAlerts {
return gAlerts
}
func (rh *requestHandler) listAlerts() ([]byte, error) {
func (rh *requestHandler) listAlerts(rf *rulesFilter) ([]byte, error) {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
lr := listAlertsResponse{Status: "success"}
lr.Data.Alerts = make([]*apiAlert, 0)
for _, g := range rh.m.groups {
for _, r := range g.Rules {
for _, group := range rh.m.groups {
if !rf.matchesGroup(group) {
continue
}
for _, r := range group.Rules {
a, ok := r.(*rule.AlertingRule)
if !ok {
continue
@@ -391,6 +440,42 @@ func (rh *requestHandler) listAlerts() ([]byte, error) {
return b, nil
}
type listNotifiersResponse struct {
Status string `json:"status"`
Data struct {
Notifiers []*apiNotifier `json:"notifiers"`
} `json:"data"`
}
func (rh *requestHandler) listNotifiers() ([]byte, error) {
targets := notifier.GetTargets()
lr := listNotifiersResponse{Status: "success"}
lr.Data.Notifiers = make([]*apiNotifier, 0)
for protoName, protoTargets := range targets {
notifier := &apiNotifier{
Kind: string(protoName),
Targets: make([]*apiTarget, 0, len(protoTargets)),
}
for _, target := range protoTargets {
notifier.Targets = append(notifier.Targets, &apiTarget{
Address: target.Notifier.Addr(),
Labels: target.Labels.ToMap(),
})
}
lr.Data.Notifiers = append(lr.Data.Notifiers, notifier)
}
b, err := json.Marshal(lr)
if err != nil {
return nil, &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf(`error encoding list of notifiers: %w`, err),
StatusCode: http.StatusInternalServerError,
}
}
return b, nil
}
func errResponse(err error, sc int) *httpserver.ErrorWithStatusCode {
return &httpserver.ErrorWithStatusCode{
Err: err,

View File

@@ -93,18 +93,18 @@
{%= tpl.Footer(r) %}
{% endfunc %}
{% func ListGroups(r *http.Request, groups []apiGroup, filter string) %}
{% func ListGroups(r *http.Request, groups []*apiGroup, filter string) %}
{%code
prefix := vmalertutil.Prefix(r.URL.Path)
filters := map[string]string{
"": "All",
"unhealthy": "Unhealthy",
"noMatch": "No Match",
"nomatch": "No Match",
}
icons := map[string]string{
"": "all",
"unhealthy": "unhealthy",
"noMatch": "nomatch",
"nomatch": "nomatch",
}
currentText := filters[filter]
currentIcon := icons[filter]
@@ -161,9 +161,9 @@
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" style="width: 60%">Rule</th>
<th scope="col" style="width: 20%" class="text-center" title="How many series were produced by the rule">Series</th>
<th scope="col" style="width: 20%" class="text-center" title="How many seconds ago rule was executed">Updated</th>
<th scope="col" class="w-60">Rule</th>
<th scope="col" class="w-20" class="text-center" title="How many series were produced by the rule">Series</th>
<th scope="col" class="w-20" class="text-center" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
@@ -594,9 +594,9 @@
<thead>
<tr>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" style="width: 10%" class="text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
{% if seriesFetchedEnabled %}<th scope="col" style="width: 10%" class="text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" style="width: 10%" class="text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
{% if seriesFetchedEnabled %}<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>

View File

@@ -316,7 +316,7 @@ func Welcome(r *http.Request) string {
}
//line app/vmalert/web.qtpl:96
func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGroup, filter string) {
func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []*apiGroup, filter string) {
//line app/vmalert/web.qtpl:96
qw422016.N().S(`
`)
@@ -325,12 +325,12 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGr
filters := map[string]string{
"": "All",
"unhealthy": "Unhealthy",
"noMatch": "No Match",
"nomatch": "No Match",
}
icons := map[string]string{
"": "all",
"unhealthy": "unhealthy",
"noMatch": "nomatch",
"nomatch": "nomatch",
}
currentText := filters[filter]
currentIcon := icons[filter]
@@ -523,9 +523,9 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGr
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" style="width: 60%">Rule</th>
<th scope="col" style="width: 20%" class="text-center" title="How many series were produced by the rule">Series</th>
<th scope="col" style="width: 20%" class="text-center" title="How many seconds ago rule was executed">Updated</th>
<th scope="col" class="w-60">Rule</th>
<th scope="col" class="w-20" class="text-center" title="How many series were produced by the rule">Series</th>
<th scope="col" class="w-20" class="text-center" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
@@ -722,7 +722,7 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []apiGr
}
//line app/vmalert/web.qtpl:222
func WriteListGroups(qq422016 qtio422016.Writer, r *http.Request, groups []apiGroup, filter string) {
func WriteListGroups(qq422016 qtio422016.Writer, r *http.Request, groups []*apiGroup, filter string) {
//line app/vmalert/web.qtpl:222
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:222
@@ -733,7 +733,7 @@ func WriteListGroups(qq422016 qtio422016.Writer, r *http.Request, groups []apiGr
}
//line app/vmalert/web.qtpl:222
func ListGroups(r *http.Request, groups []apiGroup, filter string) string {
func ListGroups(r *http.Request, groups []*apiGroup, filter string) string {
//line app/vmalert/web.qtpl:222
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:222
@@ -1697,17 +1697,17 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule apiRule)
<thead>
<tr>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" style="width: 10%" class="text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
`)
//line app/vmalert/web.qtpl:598
if seriesFetchedEnabled {
//line app/vmalert/web.qtpl:598
qw422016.N().S(`<th scope="col" style="width: 10%" class="text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>`)
qw422016.N().S(`<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>`)
//line app/vmalert/web.qtpl:598
}
//line app/vmalert/web.qtpl:598
qw422016.N().S(`
<th scope="col" style="width: 10%" class="text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>

View File

@@ -19,25 +19,34 @@ import (
func TestHandler(t *testing.T) {
fq := &datasource.FakeQuerier{}
fq.Add(datasource.Metric{
Values: []float64{1}, Timestamps: []int64{0},
Values: []float64{1},
Timestamps: []int64{0},
})
g := rule.NewGroup(config.Group{
Name: "group",
File: "rules.yaml",
Concurrency: 1,
Rules: []config.Rule{
{ID: 0, Alert: "alert"},
{ID: 1, Record: "record"},
},
}, fq, 1*time.Minute, nil)
ar := g.Rules[0].(*rule.AlertingRule)
rr := g.Rules[1].(*rule.RecordingRule)
g.ExecOnce(context.Background(), func() []notifier.Notifier { return nil }, nil, time.Time{})
m := &manager{groups: map[uint64]*rule.Group{
g.CreateID(): g,
}}
m := &manager{groups: map[uint64]*rule.Group{}}
var ar *rule.AlertingRule
var rr *rule.RecordingRule
for _, dsType := range []string{"prometheus", "", "graphite"} {
g := rule.NewGroup(config.Group{
Name: "group",
File: "rules.yaml",
Type: config.NewRawType(dsType),
Concurrency: 1,
Rules: []config.Rule{
{
ID: 0,
Alert: "alert",
},
{
ID: 1,
Record: "record",
},
},
}, fq, 1*time.Minute, nil)
ar = g.Rules[0].(*rule.AlertingRule)
rr = g.Rules[1].(*rule.RecordingRule)
g.ExecOnce(context.Background(), func() []notifier.Notifier { return nil }, nil, time.Time{})
m.groups[g.CreateID()] = g
}
rh := &requestHandler{m: m}
getResp := func(t *testing.T, url string, to any, code int) {
@@ -54,7 +63,7 @@ func TestHandler(t *testing.T) {
t.Fatalf("err closing body %s", err)
}
}()
if to != nil {
if to != nil && code < 300 {
if err = json.NewDecoder(resp.Body).Decode(to); err != nil {
t.Fatalf("unexpected err %s", err)
}
@@ -95,14 +104,23 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/alerts", func(t *testing.T) {
lr := listAlertsResponse{}
getResp(t, ts.URL+"/api/v1/alerts", &lr, 200)
if length := len(lr.Data.Alerts); length != 1 {
t.Fatalf("expected 1 alert got %d", length)
if length := len(lr.Data.Alerts); length != 3 {
t.Fatalf("expected 3 alert got %d", length)
}
lr = listAlertsResponse{}
getResp(t, ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
if length := len(lr.Data.Alerts); length != 1 {
t.Fatalf("expected 1 alert got %d", length)
if length := len(lr.Data.Alerts); length != 3 {
t.Fatalf("expected 3 alert got %d", length)
}
lr = listAlertsResponse{}
getResp(t, ts.URL+"/api/v1/alerts?datasource_type=test", &lr, 400)
lr = listAlertsResponse{}
getResp(t, ts.URL+"/api/v1/alerts?datasource_type=prometheus", &lr, 200)
if length := len(lr.Data.Alerts); length != 2 {
t.Fatalf("expected 2 alert got %d", length)
}
})
t.Run("/api/v1/alert?alertID&groupID", func(t *testing.T) {
@@ -138,14 +156,14 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/rules", func(t *testing.T) {
lr := listGroupsResponse{}
getResp(t, ts.URL+"/api/v1/rules", &lr, 200)
if length := len(lr.Data.Groups); length != 1 {
t.Fatalf("expected 1 group got %d", length)
if length := len(lr.Data.Groups); length != 3 {
t.Fatalf("expected 3 group got %d", length)
}
lr = listGroupsResponse{}
getResp(t, ts.URL+"/vmalert/api/v1/rules", &lr, 200)
if length := len(lr.Data.Groups); length != 1 {
t.Fatalf("expected 1 group got %d", length)
if length := len(lr.Data.Groups); length != 3 {
t.Fatalf("expected 3 group got %d", length)
}
})
t.Run("/api/v1/rule?ruleID&groupID", func(t *testing.T) {
@@ -172,10 +190,10 @@ func TestHandler(t *testing.T) {
})
t.Run("/api/v1/rules&filters", func(t *testing.T) {
check := func(url string, expGroups, expRules int) {
check := func(url string, statusCode, expGroups, expRules int) {
t.Helper()
lr := listGroupsResponse{}
getResp(t, ts.URL+url, &lr, 200)
getResp(t, ts.URL+url, &lr, statusCode)
if length := len(lr.Data.Groups); length != expGroups {
t.Fatalf("expected %d groups got %d", expGroups, length)
}
@@ -191,25 +209,31 @@ func TestHandler(t *testing.T) {
}
}
check("/api/v1/rules?type=alert", 1, 1)
check("/api/v1/rules?type=record", 1, 1)
check("/api/v1/rules?type=alert", 200, 3, 3)
check("/api/v1/rules?type=record", 200, 3, 3)
check("/api/v1/rules?type=records", 400, 0, 0)
check("/vmalert/api/v1/rules?type=alert", 1, 1)
check("/vmalert/api/v1/rules?type=record", 1, 1)
check("/vmalert/api/v1/rules?type=alert", 200, 3, 3)
check("/vmalert/api/v1/rules?type=record", 200, 3, 3)
check("/vmalert/api/v1/rules?type=recording", 400, 0, 0)
check("/vmalert/api/v1/rules?datasource_type=prometheus", 200, 2, 4)
check("/vmalert/api/v1/rules?datasource_type=graphite", 200, 1, 2)
check("/vmalert/api/v1/rules?datasource_type=graphiti", 400, 0, 0)
// no filtering expected due to bad params
check("/api/v1/rules?type=badParam", 1, 2)
check("/api/v1/rules?foo=bar", 1, 2)
check("/api/v1/rules?type=badParam", 400, 0, 0)
check("/api/v1/rules?foo=bar", 200, 3, 6)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=bar", 0, 0)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=group&rule_group[]=bar", 1, 2)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=bar", 200, 0, 0)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=group&rule_group[]=bar", 200, 3, 6)
check("/api/v1/rules?rule_group[]=group&file[]=foo", 0, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=foo", 200, 0, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml", 200, 3, 6)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=foo", 1, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert", 1, 1)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert&rule_name[]=record", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=foo", 200, 3, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert", 200, 3, 3)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert&rule_name[]=record", 200, 3, 6)
})
t.Run("/api/v1/rules&exclude_alerts=true", func(t *testing.T) {
// check if response returns active alerts by default
@@ -259,7 +283,7 @@ func TestEmptyResponse(t *testing.T) {
t.Fatalf("err closing body %s", err)
}
}()
if to != nil {
if to != nil && code < 300 {
if err = json.NewDecoder(resp.Body).Decode(to); err != nil {
t.Fatalf("unexpected err %s", err)
}

View File

@@ -20,6 +20,16 @@ const (
paramRuleID = "rule_id"
)
type apiNotifier struct {
Kind string `json:"kind"`
Targets []*apiTarget `json:"targets"`
}
type apiTarget struct {
Address string `json:"address"`
Labels map[string]string `json:"labels"`
}
// apiAlert represents a notifier.AlertingRule state
// for WEB view
// https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#get-apiv1rules
@@ -108,7 +118,7 @@ type apiGroup struct {
// groupAlerts represents a group of alerts for WEB view
type groupAlerts struct {
Group apiGroup
Group *apiGroup
Alerts []*apiAlert
}
@@ -327,7 +337,7 @@ func newAlertAPI(ar *rule.AlertingRule, a *notifier.Alert) *apiAlert {
return aa
}
func groupToAPI(g *rule.Group) apiGroup {
func groupToAPI(g *rule.Group) *apiGroup {
g = g.DeepCopy()
ag := apiGroup{
// encode as string to avoid rounding
@@ -353,7 +363,7 @@ func groupToAPI(g *rule.Group) apiGroup {
for _, r := range g.Rules {
ag.Rules = append(ag.Rules, ruleToAPI(r))
}
return ag
return &ag
}
func urlValuesToStrings(values url.Values) []string {

View File

@@ -51,6 +51,13 @@ func main() {
buildinfo.Init()
logger.Init()
ctx, cancel := context.WithCancel(context.Background())
go func() {
procutil.WaitForSigterm()
logger.Infof("received stop signal, canceling backup operation")
cancel()
}()
// Storing snapshot delete function to be able to call it in case
// of error since logger.Fatal will exit the program without
// calling deferred functions.
@@ -79,7 +86,7 @@ func main() {
}
logger.Infof("Snapshot delete url %s", deleteURL.Redacted())
name, err := snapshot.Create(createURL.String())
name, err := snapshot.Create(ctx, createURL.String())
if err != nil {
logger.Fatalf("cannot create snapshot: %s", err)
}
@@ -89,7 +96,9 @@ func main() {
}
deleteSnapshot = func() {
err := snapshot.Delete(deleteURL.String(), name)
// Do not use ctx here as it may be canceled by the time deleteSnapshot is called
// if process is interrupted.
err := snapshot.Delete(context.Background(), deleteURL.String(), name)
if err != nil {
logger.Fatalf("cannot delete snapshot: %s", err)
}
@@ -99,13 +108,6 @@ func main() {
listenAddrs := []string{*httpListenAddr}
go httpserver.Serve(listenAddrs, nil, httpserver.ServeOptions{})
ctx, cancel := context.WithCancel(context.Background())
go func() {
procutil.WaitForSigterm()
logger.Infof("received stop signal, canceling backup operation")
cancel()
}()
pushmetrics.Init()
err := makeBackup(ctx)
deleteSnapshot()

View File

@@ -70,7 +70,7 @@ var (
Usage: "VictoriaMetrics address to perform import requests. \n" +
"Should be the same as --httpListenAddr value for single-node version or vminsert component. \n" +
"When importing into the clustered version do not forget to set additionally --vm-account-id flag. \n" +
"Please note, that `vmctl` performs initial readiness check for the given address by checking `/health` endpoint.",
"Please note, that vmctl performs initial readiness check for the given address by checking /health endpoint.",
},
&cli.StringFlag{
Name: vmUser,
@@ -514,27 +514,27 @@ var (
},
&cli.StringFlag{
Name: vmNativeSrcBearerToken,
Usage: "Optional bearer auth token to use for the corresponding `--vm-native-src-addr`",
Usage: "Optional bearer auth token to use for the corresponding --vm-native-src-addr",
},
&cli.StringFlag{
Name: vmNativeSrcCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to `--vm-native-src-addr`",
Usage: "Optional path to client-side TLS certificate file to use when connecting to --vm-native-src-addr",
},
&cli.StringFlag{
Name: vmNativeSrcKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to `--vm-native-src-addr`",
Usage: "Optional path to client-side TLS key to use when connecting to --vm-native-src-addr",
},
&cli.StringFlag{
Name: vmNativeSrcCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to `--vm-native-src-addr`. By default, system CA is used",
Usage: "Optional path to TLS CA file to use for verifying connections to --vm-native-src-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: vmNativeSrcServerName,
Usage: "Optional TLS server name to use for connections to `--vm-native-src-addr`. By default, the server name from `--vm-native-src-addr` is used",
Usage: "Optional TLS server name to use for connections to --vm-native-src-addr. By default, the server name from --vm-native-src-addr is used",
},
&cli.BoolFlag{
Name: vmNativeSrcInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to `--vm-native-src-addr`",
Usage: "Whether to skip TLS certificate verification when connecting to --vm-native-src-addr",
Value: false,
},
@@ -563,27 +563,27 @@ var (
},
&cli.StringFlag{
Name: vmNativeDstBearerToken,
Usage: "Optional bearer auth token to use for the corresponding `--vm-native-dst-addr`",
Usage: "Optional bearer auth token to use for the corresponding --vm-native-dst-addr",
},
&cli.StringFlag{
Name: vmNativeDstCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to `--vm-native-dst-addr`",
Usage: "Optional path to client-side TLS certificate file to use when connecting to --vm-native-dst-addr",
},
&cli.StringFlag{
Name: vmNativeDstKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to `--vm-native-dst-addr`",
Usage: "Optional path to client-side TLS key to use when connecting to --vm-native-dst-addr",
},
&cli.StringFlag{
Name: vmNativeDstCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to `--vm-native-dst-addr`. By default, system CA is used",
Usage: "Optional path to TLS CA file to use for verifying connections to --vm-native-dst-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: vmNativeDstServerName,
Usage: "Optional TLS server name to use for connections to `--vm-native-dst-addr`. By default, the server name from `--vm-native-dst-addr` is used",
Usage: "Optional TLS server name to use for connections to --vm-native-dst-addr. By default, the server name from --vm-native-dst-addr is used",
},
&cli.BoolFlag{
Name: vmNativeDstInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to `--vm-native-dst-addr`",
Usage: "Whether to skip TLS certificate verification when connecting to --vm-native-dst-addr",
Value: false,
},
@@ -597,7 +597,7 @@ var (
Name: vmRateLimit,
Usage: "Optional data transfer rate limit in bytes per second.\n" +
"By default, the rate limit is disabled. It can be useful for limiting load on source or destination databases. \n" +
"Rate limit is applied per worker, see `--vm-concurrency`.",
"Rate limit is applied per worker, see --vm-concurrency.",
},
&cli.BoolFlag{
Name: vmInterCluster,

View File

@@ -140,6 +140,15 @@ func (p *vmNativeProcessor) runSingle(ctx context.Context, f native.Filter, srcU
written, err := io.Copy(w, reader)
if err != nil {
// io.Copy could fail if ImportPipe will fail before and close the pr
// so we check if that's the case and to not ignore importErr if it exists.
select {
case importErr := <-importCh:
if importErr != nil {
return fmt.Errorf("failed to import %s: %w", p.dst.Addr, importErr)
}
default:
}
return fmt.Errorf("failed to write into %q: %s", p.dst.Addr, err)
}

View File

@@ -3,6 +3,7 @@ package graphite
import (
"flag"
"fmt"
"math"
"net/http"
"regexp"
"sort"
@@ -202,7 +203,7 @@ func MetricsExpandHandler(startTime time.Time, w http.ResponseWriter, r *http.Re
func MetricsIndexHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) error {
deadline := searchutil.GetDeadlineForQuery(r, startTime)
jsonp := r.FormValue("jsonp")
sq := storage.NewSearchQuery(0, 0, nil, 0)
sq := storage.NewSearchQuery(0, math.MaxInt, nil, 0)
metricNames, err := netstorage.LabelValues(nil, "__name__", sq, 0, deadline)
if err != nil {
return fmt.Errorf(`cannot obtain metric names: %w`, err)

View File

@@ -562,6 +562,15 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
w.Header().Set("Content-Type", "application/json")
fmt.Fprint(w, `{"status":"success","data":{"alerts":[]}}`)
return true
case "/api/v1/notifiers", "/notifiers":
notifiersRequests.Inc()
if len(*vmalertProxyURL) > 0 {
proxyVMAlertRequests(w, r)
return true
}
w.Header().Set("Content-Type", "application/json")
fmt.Fprint(w, `{"status":"success","data":{"notifiers":[]}}`)
return true
case "/api/v1/metadata":
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
metadataRequests.Inc()
@@ -691,9 +700,10 @@ var (
expandWithExprsRequests = metrics.NewCounter(`vm_http_requests_total{path="/expand-with-exprs"}`)
prettifyQueryRequests = metrics.NewCounter(`vm_http_requests_total{path="/prettify-query"}`)
vmalertRequests = metrics.NewCounter(`vm_http_requests_total{path="/vmalert"}`)
rulesRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/rules"}`)
alertsRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/alerts"}`)
vmalertRequests = metrics.NewCounter(`vm_http_requests_total{path="/vmalert"}`)
rulesRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/rules"}`)
alertsRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/alerts"}`)
notifiersRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/notifiers"}`)
metadataRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/metadata"}`)
buildInfoRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/buildinfo"}`)

View File

@@ -377,11 +377,13 @@ func getRollupConfigs(funcName string, rf rollupFunc, expr metricsql.Expr, start
preFunc := func(_ []float64, _ []int64) {}
funcName = strings.ToLower(funcName)
// window > lookbackDelta could result in negative delta.
// See issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8342
stalenessInterval := lookbackDelta
if stalenessInterval != 0 && stalenessInterval < window {
stalenessInterval = window
if stalenessInterval != 0 {
// If stalenessInterval was set, it should additionally account for [window] range to cover following cases:
// * window > stalenessInterval, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8342
// * window captures prevValue in doInternal while removeCounterResets does not,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-3000735468
stalenessInterval += window
}
if rollupFuncsRemoveCounterResets[funcName] {

View File

@@ -746,7 +746,7 @@ See also [irate](#irate), [rollup_rate](#rollup_rate) and [rate_prometheus](#rat
#### rate_prometheus
`rate_prometheus(series_selector[d])` {{% available_from "#" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
`rate_prometheus(series_selector[d])` {{% available_from "v1.120.0" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
increase rate over the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#filtering).
The resulting calculation is equivalent to `increase_prometheus(series_selector[d]) / d`.

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,5 @@
<svg width="48" height="48" fill="#e94600" xmlns="http://www.w3.org/2000/svg">
<path d="M24.5475 0C10.3246.0265251 1.11379 3.06365 4.40623 6.10077c0 0 12.32997 11.23333 16.58217 14.84083.8131.6896 2.1728 1.1936 3.5191 1.2201h.1199c1.3463-.0265 2.706-.5305 3.5191-1.2201 4.2522-3.5942 16.5422-14.84083 16.5422-14.84083C48.0478 3.06365 38.8636.0265251 24.6674 0"/>
<path d="M28.1579 27.0159c-.8131.6896-2.1728 1.1936-3.5191 1.2201h-.12c-1.3463-.0265-2.7059-.5305-3.519-1.2201-2.9725-2.5067-13.35639-11.87-17.26201-15.3979v5.4112c0 .5968.22661 1.3793.6265 1.7506C7.00358 21.1936 17.2675 30.5437 20.9731 33.6737c.8132.6896 2.1728 1.1936 3.5191 1.2201h.12c1.3463-.0265 2.7059-.5305 3.519-1.2201 3.679-3.13 13.9429-12.4536 16.6089-14.8939.4132-.3713.6265-1.1538.6265-1.7506V11.618c-3.9323 3.5411-14.3162 12.931-17.2354 15.3979h.0267Z"/>
<path d="M28.1579 39.748c-.8131.6897-2.1728 1.1937-3.5191 1.2202h-.12c-1.3463-.0265-2.7059-.5305-3.519-1.2202-2.9725-2.4933-13.35639-11.8567-17.26201-15.3978v5.4111c0 .5969.22661 1.3793.6265 1.7507C7.00358 33.9258 17.2675 43.2759 20.9731 46.4058c.8132.6897 2.1728 1.1937 3.5191 1.2202h.12c1.3463-.0265 2.7059-.5305 3.519-1.2202 3.679-3.1299 13.9429-12.4535 16.6089-14.8938.4132-.3714.6265-1.1538.6265-1.7507v-5.4111c-3.9323 3.5411-14.3162 12.931-17.2354 15.3978h.0267Z"/>
</svg>

After

Width:  |  Height:  |  Size: 1.3 KiB

View File

@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="./manifest.json"/>
<link rel="manifest" href="./manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.
@@ -36,10 +36,10 @@
<meta property="og:title" content="UI for VictoriaMetrics">
<meta property="og:url" content="https://victoriametrics.com/">
<meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data">
<script type="module" crossorigin src="./assets/index-BiQY-19a.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-D8IJGiEn.js">
<script type="module" crossorigin src="./assets/index-1uwNuj_1.js"></script>
<link rel="modulepreload" crossorigin href="./assets/vendor-V4vnRsM-.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-D1GxaB_c.css">
<link rel="stylesheet" crossorigin href="./assets/index-ojCMu5lE.css">
<link rel="stylesheet" crossorigin href="./assets/index-C36SC0pJ.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>

View File

@@ -644,6 +644,15 @@ func writeStorageMetrics(w io.Writer, strg *storage.Storage) {
metrics.WriteCounterUint64(w, `vm_cache_misses_total{type="indexdb/tagFiltersToMetricIDs"}`, idbm.TagFiltersToMetricIDsCacheMisses)
metrics.WriteCounterUint64(w, `vm_cache_misses_total{type="storage/regexps"}`, storage.RegexpCacheMisses())
metrics.WriteCounterUint64(w, `vm_cache_misses_total{type="storage/regexpPrefixes"}`, storage.RegexpPrefixesCacheMisses())
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/tsid", reason="cache_size"}`, m.TSIDCacheSizeEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/tsid", reason="miss_percentage"}`, m.TSIDCacheMissEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/tsid", reason="expiration"}`, m.TSIDCacheExpireEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricName", reason="cache_size"}`, m.MetricNameCacheSizeEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricName", reason="miss_percentage"}`, m.MetricNameCacheMissEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricName", reason="expiration"}`, m.MetricNameCacheExpireEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricIDs", reason="cache_size"}`, m.MetricIDCacheSizeEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricIDs", reason="miss_percentage"}`, m.MetricIDCacheMissEvictionBytes)
metrics.WriteCounterUint64(w, `vm_cache_eviction_bytes_total{type="storage/metricIDs", reason="expiration"}`, m.MetricIDCacheExpireEvictionBytes)
metrics.WriteCounterUint64(w, `vm_deleted_metrics_total{type="indexdb"}`, idbm.DeletedMetricsCount)

View File

@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="/manifest.json"/>
<link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.

View File

@@ -2,9 +2,9 @@
<html lang="en">
<head>
<meta charset="utf-8"/>
<link rel="icon" href="/favicon.svg" />
<link rel="apple-touch-icon" href="/favicon.svg" />
<link rel="mask-icon" href="/favicon.svg" color="#000000">
<link rel="icon" href="/favicon.victorialogs.svg" />
<link rel="apple-touch-icon" href="/favicon.victorialogs.svg" />
<link rel="mask-icon" href="/favicon.victorialogs.svg" color="#000000">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=5"/>
<meta name="theme-color" content="#000000"/>
@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="/manifest.json"/>
<link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.

View File

@@ -13,7 +13,7 @@
manifest.json provides metadata used when your web app is installed on a
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
-->
<link rel="manifest" href="/manifest.json"/>
<link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/>
<!--
Notice the use of in the tags above.
It will be replaced with the URL of the `public` folder during the build.

View File

@@ -0,0 +1,5 @@
<svg width="48" height="48" fill="#e94600" xmlns="http://www.w3.org/2000/svg">
<path d="M24.5475 0C10.3246.0265251 1.11379 3.06365 4.40623 6.10077c0 0 12.32997 11.23333 16.58217 14.84083.8131.6896 2.1728 1.1936 3.5191 1.2201h.1199c1.3463-.0265 2.706-.5305 3.5191-1.2201 4.2522-3.5942 16.5422-14.84083 16.5422-14.84083C48.0478 3.06365 38.8636.0265251 24.6674 0"/>
<path d="M28.1579 27.0159c-.8131.6896-2.1728 1.1936-3.5191 1.2201h-.12c-1.3463-.0265-2.7059-.5305-3.519-1.2201-2.9725-2.5067-13.35639-11.87-17.26201-15.3979v5.4112c0 .5968.22661 1.3793.6265 1.7506C7.00358 21.1936 17.2675 30.5437 20.9731 33.6737c.8132.6896 2.1728 1.1936 3.5191 1.2201h.12c1.3463-.0265 2.7059-.5305 3.519-1.2201 3.679-3.13 13.9429-12.4536 16.6089-14.8939.4132-.3713.6265-1.1538.6265-1.7506V11.618c-3.9323 3.5411-14.3162 12.931-17.2354 15.3979h.0267Z"/>
<path d="M28.1579 39.748c-.8131.6897-2.1728 1.1937-3.5191 1.2202h-.12c-1.3463-.0265-2.7059-.5305-3.519-1.2202-2.9725-2.4933-13.35639-11.8567-17.26201-15.3978v5.4111c0 .5969.22661 1.3793.6265 1.7507C7.00358 33.9258 17.2675 43.2759 20.9731 46.4058c.8132.6897 2.1728 1.1937 3.5191 1.2202h.12c1.3463-.0265 2.7059-.5305 3.519-1.2202 3.679-3.1299 13.9429-12.4535 16.6089-14.8938.4132-.3714.6265-1.1538.6265-1.7507v-5.4111c-3.9323 3.5411-14.3162 12.931-17.2354 15.3978h.0267Z"/>
</svg>

After

Width:  |  Height:  |  Size: 1.3 KiB

View File

@@ -746,7 +746,7 @@ See also [irate](#irate), [rollup_rate](#rollup_rate) and [rate_prometheus](#rat
#### rate_prometheus
`rate_prometheus(series_selector[d])` {{% available_from "#" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
`rate_prometheus(series_selector[d])` {{% available_from "v1.120.0" %}} is a [rollup function](#rollup-functions), which calculates the average per-second
increase rate over the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#filtering).
The resulting calculation is equivalent to `increase_prometheus(series_selector[d]) / d`.

View File

@@ -57,7 +57,7 @@ const QueryEditor: FC<QueryEditorProps> = ({
const [caretPositionInput, setCaretPositionInput] = useState<[number, number]>([0, 0]);
const autocompleteAnchorEl = useRef<HTMLInputElement>(null);
const [showAutocomplete, setShowAutocomplete] = useState(!!AutocompleteEl);
const [showAutocomplete, setShowAutocomplete] = useState(false);
const debouncedSetShowAutocomplete = useRef(debounce(setShowAutocomplete, 500)).current;
const warning = [
@@ -128,7 +128,7 @@ const QueryEditor: FC<QueryEditorProps> = ({
useEffect(() => {
setShowAutocomplete(false);
debouncedSetShowAutocomplete(true);
debouncedSetShowAutocomplete(caretPositionAutocomplete.every(Boolean));
}, [caretPositionAutocomplete]);
return (

View File

@@ -25,6 +25,11 @@ type PrometheusQuerier interface {
PrometheusAPIV1QueryRange(t *testing.T, query string, opts QueryOpts) *PrometheusAPIV1QueryResponse
PrometheusAPIV1Series(t *testing.T, matchQuery string, opts QueryOpts) *PrometheusAPIV1SeriesResponse
PrometheusAPIV1ExportNative(t *testing.T, query string, opts QueryOpts) []byte
// TODO(@rtm0): Prometheus does not provide this API. Either move it to a
// separate interface or rename this interface to allow for multiple querier
// types.
GraphiteMetricsIndex(t *testing.T, opts QueryOpts) GraphiteMetricsIndexResponse
}
// Writer contains methods for writing new data
@@ -395,6 +400,10 @@ type TSDBStatusResponse struct {
Data TSDBStatusResponseData
}
// GraphiteMetricsIndexResponse is an in-memory representation of the json response
// returned by the /graphite/metrics/index.json endpoint.
type GraphiteMetricsIndexResponse = []string
// AdminTenantsResponse is an in-memory representation of the json response
// returned by the /api/v1/admin/tenants endpoint.
type AdminTenantsResponse struct {

View File

@@ -433,3 +433,24 @@ func (tc *TestCase) MustStartVlsingle(instance string, flags []string) *Vlsingle
tc.addApp(instance, app)
return app
}
// MustStartDefaultVlagent is a test helper function that starts an instance of
// vlagent with defaults suitable for most tests.
func (tc *TestCase) MustStartDefaultVlagent(remoteWriteURLs []string) *Vlagent {
tc.t.Helper()
return tc.MustStartVlagent("vlagent", remoteWriteURLs, nil)
}
// MustStartVlagent is a test helper function that starts an instance of
// vlagent and fails the test if the app fails to start.
func (tc *TestCase) MustStartVlagent(instance string, remoteWriteURLs []string, flags []string) *Vlagent {
tc.t.Helper()
app, err := StartVlagent(instance, remoteWriteURLs, flags, tc.cli)
if err != nil {
tc.t.Fatalf("Could not start %s: %v", instance, err)
}
tc.addApp(instance, app)
return app
}

View File

@@ -0,0 +1,62 @@
package tests
import (
"testing"
"github.com/google/go-cmp/cmp"
at "github.com/VictoriaMetrics/VictoriaMetrics/apptest"
)
func testMetricsIndex(t *testing.T, sut at.PrometheusWriteQuerier) {
// verify index is empty at the start
expected := at.GraphiteMetricsIndexResponse{}
tenant := "1:2"
got := sut.GraphiteMetricsIndex(t, at.QueryOpts{Tenant: tenant})
if diff := cmp.Diff(expected, got); diff != "" {
t.Errorf("unexpected response (-want, +got):\n%s", diff)
}
// Mon Feb 5 09:57:36 CET 2024
const ingestTimestamp = ` 1707123456700`
dataSet := []string{
`metric_name_1{label="foo"} 10`,
`metric_name_1{label="bar"} 10`,
`metric_name_2{label="baz"} 20`,
`metric_name_1{label="baz"} 10`,
`metric_name_3{label="baz"} 30`,
}
for idx := range dataSet {
dataSet[idx] += ingestTimestamp
}
sut.PrometheusAPIV1ImportPrometheus(t, dataSet, at.QueryOpts{Tenant: tenant})
sut.ForceFlush(t)
// verify ingested metrics correctly returned in index response
expected = []string{"metric_name_1", "metric_name_2", "metric_name_3"}
got = sut.GraphiteMetricsIndex(t, at.QueryOpts{Tenant: tenant})
if diff := cmp.Diff(expected, got); diff != "" {
t.Errorf("unexpected response (-want, +got):\n%s", diff)
}
}
func TestSingleMetricsIndex(t *testing.T) {
tc := at.NewTestCase(t)
defer tc.Stop()
sut := tc.MustStartDefaultVmsingle()
testMetricsIndex(tc.T(), sut)
}
func TestClusterMetricsIndex(t *testing.T) {
tc := at.NewTestCase(t)
defer tc.Stop()
sut := tc.MustStartDefaultCluster()
testMetricsIndex(tc.T(), sut)
}

View File

@@ -41,6 +41,7 @@ func testSpecialQueryRegression(tc *at.TestCase, sut at.PrometheusWriteQuerier)
testDuplicateLabel(tc, sut)
testTooBigLookbehindWindow(tc, sut)
testMatchSeries(tc, sut)
testNegativeIncrease(tc, sut)
// graphite
testComparisonNotInfNotNan(tc, sut)
@@ -234,6 +235,53 @@ func testMatchSeries(tc *at.TestCase, sut at.PrometheusWriteQuerier) {
})
}
func testNegativeIncrease(tc *at.TestCase, sut at.PrometheusWriteQuerier) {
t := tc.T()
// negative increase when user overrides staleness interval
// https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-2978728661
sut.PrometheusAPIV1ImportPrometheus(t, []string{
`foo 108 1750109243514`, // 2025-06-16 21:27:23:514
`foo 108 1750109258514`, // 2025-06-16 21:27:38:514
// gap 75s
`foo 1 1750109333514`, // 2025-06-16 21:28:53:514
`foo 1 1750109348514`, // 2025-06-16 21:29:08:514
}, at.QueryOpts{})
sut.ForceFlush(t)
tc.Assert(&at.AssertOptions{
Msg: "regression for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-2978728661",
DoNotRetry: true,
Got: func() any {
return sut.PrometheusAPIV1QueryRange(t, `increase(foo[1m])`, at.QueryOpts{
Start: "2025-06-16T21:28:40.700Z",
End: "2025-06-16T21:29:30.700Z",
Step: "9s",
MaxLookback: "65s",
})
},
Want: &at.PrometheusAPIV1QueryResponse{
Status: "success",
Data: &at.QueryData{
ResultType: "matrix",
Result: []*at.QueryResult{
{
Metric: map[string]string{},
Samples: []*at.Sample{
at.NewSample(t, "2025-06-16T21:28:40.700Z", 0),
at.NewSample(t, "2025-06-16T21:28:49.700Z", 0),
at.NewSample(t, "2025-06-16T21:28:58.700Z", 1),
at.NewSample(t, "2025-06-16T21:29:07.700Z", 1),
at.NewSample(t, "2025-06-16T21:29:16.700Z", 0),
at.NewSample(t, "2025-06-16T21:29:25.700Z", 0),
},
},
},
},
},
})
}
func testComparisonNotInfNotNan(tc *at.TestCase, sut at.PrometheusWriteQuerier) {
t := tc.T()

View File

@@ -0,0 +1,154 @@
package tests
import (
"fmt"
"os"
"path"
"testing"
"time"
at "github.com/VictoriaMetrics/VictoriaMetrics/apptest"
)
// TestSingleVlagentRemoteWrite performs tests for remote write data ingestion
// by vlagent application
func TestSingleVlagentRemoteWrite(t *testing.T) {
os.RemoveAll(t.Name())
tc := at.NewTestCase(t)
defer tc.Stop()
// test data ingestion into
const instance = "vlsingle"
const r1Port = "50425"
sutFlags := []string{
"-httpListenAddr=127.0.0.1:" + r1Port,
"-storageDataPath=" + tc.Dir() + "/" + instance,
"-retentionPeriod=100y",
}
sut := tc.MustStartVlsingle(instance, sutFlags)
remoteWriteURL := fmt.Sprintf("http://%s/internal/insert", sut.HTTPAddr())
vlagent := tc.MustStartDefaultVlagent([]string{remoteWriteURL})
vlagent.JSONLineWrite(t, []string{
`{"_msg":"ingest jsonline","_time": "2025-06-05T14:30:19.088007Z", "foo":"bar"}`,
`{"_msg":"ingest jsonline","_time": "2025-06-05T14:30:19.088007Z", "bar":"foo"}`,
}, at.QueryOptsLogs{})
sut.ForceFlush(t)
got := sut.LogsQLQuery(t, "ingest jsonline", at.QueryOptsLogs{})
wantLogLines := []string{
`{"_msg":"ingest jsonline","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","bar":"foo"}`,
`{"_msg":"ingest jsonline","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","foo":"bar"}`,
}
assertLogsQLResponseEqual(t, got, &at.LogsQLQueryResponse{LogLines: wantLogLines})
// stop log storage and check data buffering works correctly
tc.StopApp(instance)
// ingest some data vlagent must hold it in memory
vlagent.JSONLineWrite(t, []string{
`{"_msg":"ingest jsonline2","_time": "2025-06-05T14:30:19.088007Z", "foo":"bar"}`,
`{"_msg":"ingest jsonline2","_time": "2025-06-05T14:30:19.088007Z", "bar":"foo"}`,
}, at.QueryOptsLogs{})
vlagent.WaitQueueEmptyAfter(t, func() {
// start storage and check if buffered data correctly ingested
sut = tc.MustStartVlsingle(instance, sutFlags)
})
sut.ForceFlush(t)
got = sut.LogsQLQuery(t, "ingest jsonline2", at.QueryOptsLogs{})
wantLogLines = []string{
`{"_msg":"ingest jsonline2","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","bar":"foo"}`,
`{"_msg":"ingest jsonline2","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","foo":"bar"}`,
}
assertLogsQLResponseEqual(t, got, &at.LogsQLQueryResponse{LogLines: wantLogLines})
}
func TestSingleVlagentRemoteWriteReplication(t *testing.T) {
os.RemoveAll(t.Name())
tc := at.NewTestCase(t)
defer tc.Stop()
const (
instanceReplica0 = "vlsingle-0"
vlsinglePortR0 = "53541"
instanceReplica1 = "vlsingle-1"
vlsinglePortR1 = "53124"
vlagentInstance = "vlagent"
)
sutFlagsR0 := []string{
"-httpListenAddr=127.0.0.1:" + vlsinglePortR0,
"-storageDataPath=" + path.Join(tc.Dir(), instanceReplica0),
"-retentionPeriod=100y",
}
sutFlagsR1 := []string{
"-httpListenAddr=127.0.0.1:" + vlsinglePortR1,
"-storageDataPath=" + path.Join(tc.Dir(), instanceReplica1),
"-retentionPeriod=100y",
}
sutR0 := tc.MustStartVlsingle(instanceReplica0, sutFlagsR0)
sutR1 := tc.MustStartVlsingle(instanceReplica1, sutFlagsR1)
vlagentRemoteWriteURLs := []string{
fmt.Sprintf("http://%s/internal/insert", sutR0.HTTPAddr()),
fmt.Sprintf("http://%s/internal/insert", sutR1.HTTPAddr()),
}
vlagentFlags := []string{
"-remoteWrite.tmpDataPath=" + fmt.Sprintf("%s/%s-%d", os.TempDir(), vlagentInstance, time.Now().UnixNano()),
}
vlagent := tc.MustStartVlagent(vlagentInstance, vlagentRemoteWriteURLs, vlagentFlags)
// ingest data and check if it properly replicated to the vlsingles
vlagent.JSONLineWrite(t, []string{
`{"_msg":"ingest jsonline","_time": "2025-06-05T14:30:19.088007Z", "foo":"bar"}`,
`{"_msg":"ingest jsonline","_time": "2025-06-05T14:30:19.088007Z", "bar":"foo"}`,
}, at.QueryOptsLogs{})
wantLogLines := []string{
`{"_msg":"ingest jsonline","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","bar":"foo"}`,
`{"_msg":"ingest jsonline","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","foo":"bar"}`,
}
sutR0.ForceFlush(t)
gotR0 := sutR0.LogsQLQuery(t, "ingest jsonline", at.QueryOptsLogs{})
assertLogsQLResponseEqual(t, gotR0, &at.LogsQLQueryResponse{LogLines: wantLogLines})
sutR1.ForceFlush(t)
gotR1 := sutR1.LogsQLQuery(t, "ingest jsonline", at.QueryOptsLogs{})
assertLogsQLResponseEqual(t, gotR1, &at.LogsQLQueryResponse{LogLines: wantLogLines})
// stop log storage and check data buffering works correctly at vlagent
tc.StopApp(instanceReplica0)
// ingest some data vlagent must hold it in memory
vlagent.JSONLineWrite(t, []string{
`{"_msg":"ingest jsonline2","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","bar":"foo"}`,
`{"_msg":"ingest jsonline2","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","foo":"bar"}`,
}, at.QueryOptsLogs{})
// check alive storage received data
wantLogLines = []string{
`{"_msg":"ingest jsonline2","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","bar":"foo"}`,
`{"_msg":"ingest jsonline2","_stream":"{}","_time":"2025-06-05T14:30:19.088007Z","foo":"bar"}`,
}
sutR1.ForceFlush(t)
gotR1 = sutR1.LogsQLQuery(t, "ingest jsonline2", at.QueryOptsLogs{})
assertLogsQLResponseEqual(t, gotR1, &at.LogsQLQueryResponse{LogLines: wantLogLines})
// stop vmagent, it must buffer data on-disk
tc.StopApp(vlagentInstance)
vlagent = tc.MustStartVlagent(vlagentInstance, vlagentRemoteWriteURLs, vlagentFlags)
vlagent.WaitQueueEmptyAfter(t, func() {
// start storage and check if buffered data correctly ingested
sutR0 = tc.MustStartVlsingle(instanceReplica0, sutFlagsR0)
})
sutR0.ForceFlush(t)
gotR0 = sutR0.LogsQLQuery(t, "ingest jsonline2", at.QueryOptsLogs{})
assertLogsQLResponseEqual(t, gotR0, &at.LogsQLQueryResponse{LogLines: wantLogLines})
}

159
apptest/vlagent.go Normal file
View File

@@ -0,0 +1,159 @@
package apptest
import (
"fmt"
"net/http"
"os"
"regexp"
"strings"
"testing"
"time"
)
// Vlagent holds the state of a vlagent app and provides vlagent-specific functions
type Vlagent struct {
*app
*ServesMetrics
remoteStoragesCount int
httpListenAddr string
}
// StartVlagent starts an instance of vlagent with the given flags.
// It also sets the default flags and populates the app instance state with runtime
// values extracted from the application log (such as httpListenAddr)
func StartVlagent(instance string, remoteWriteURLs []string, flags []string, cli *Client) (*Vlagent, error) {
extractREs := []*regexp.Regexp{
httpListenAddrRE,
}
app, stderrExtracts, err := startApp(instance, "../../bin/vlagent", flags, &appOptions{
defaultFlags: map[string]string{
"-httpListenAddr": "127.0.0.1:0",
"-remoteWrite.url": strings.Join(remoteWriteURLs, ","),
"-remoteWrite.tmpDataPath": fmt.Sprintf("%s/%s-%d", os.TempDir(), instance, time.Now().UnixNano()),
"-remoteWrite.flushInterval": "10ms",
"-remoteWrite.showURL": "true",
},
extractREs: extractREs,
})
if err != nil {
return nil, err
}
return &Vlagent{
app: app,
remoteStoragesCount: len(remoteWriteURLs),
ServesMetrics: &ServesMetrics{
metricsURL: fmt.Sprintf("http://%s/metrics", stderrExtracts[0]),
cli: cli,
},
httpListenAddr: stderrExtracts[0],
}, nil
}
// JSONLineWrite is a test helper function that inserts a
// collection of records in json line format by sending a HTTP
// POST request to /insert/jsonline vlagent endpoint.
//
// See https://docs.victoriametrics.com/victorialogs/data-ingestion/#json-stream-api
func (app *Vlagent) JSONLineWrite(t *testing.T, records []string, opts QueryOptsLogs) {
t.Helper()
data := []byte(strings.Join(records, "\n"))
url := fmt.Sprintf("http://%s/insert/jsonline", app.httpListenAddr)
uv := opts.asURLValues()
uvs := uv.Encode()
if len(uvs) > 0 {
url += "?" + uvs
}
app.sendBlocking(t, len(records), func() {
_, statusCode := app.cli.Post(t, url, "text/plain", data)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
})
}
// WaitQueueEmptyAfter checks that persistent queue is empty
// after execution of provided callback
func (app *Vlagent) WaitQueueEmptyAfter(t *testing.T, cb func()) {
t.Helper()
const (
retries = 70
period = 100 * time.Millisecond
)
// vlagent_remotewrite_blocks_sent_total
// take in account data replication
blocksSent := app.remoteWriteBlocksSent(t)
cb()
for range retries {
if app.remoteWriteBlocksSent(t) > blocksSent && app.persistentQueueSize(t) == 0 {
return
}
time.Sleep(period)
}
t.Fatalf("timed out while waiting for inserted logs to be flushed to remote storage")
}
// sendBlocking sends the data to remote write url by executing `send` function and
// waits until the data is actually sent.
//
// vlagent does not send the data immediately. It first puts the data into a
// buffer. Then a background goroutine takes the data from the buffer sends it
// to the vmstorage. This happens every 1s by default.
//
// Waiting is implemented a retrieving the value of `vlagent_remotewrite_block_size_rows_sum`
// metric and checking whether it is equal or greater than the wanted value.
// If it is, then the data has been sent to remote storage.
//
// Unreliable if the records are inserted concurrently.
func (app *Vlagent) sendBlocking(t *testing.T, numRecordsToSend int, send func()) {
t.Helper()
send()
const (
retries = 50
period = 100 * time.Millisecond
)
// take in account data replication
wantRowsSentCount := app.remoteWriteRowsPushed(t) + numRecordsToSend*app.remoteStoragesCount
for range retries {
if app.remoteWriteRowsPushed(t) >= wantRowsSentCount {
return
}
time.Sleep(period)
}
t.Fatalf("timed out while waiting for inserted rows to be sent to remote storage")
}
func (app *Vlagent) remoteWriteBlocksSent(t *testing.T) int {
total := 0.0
for _, v := range app.GetMetricsByPrefix(t, "vlagent_remotewrite_blocks_sent_total") {
total += v
}
return int(total)
}
func (app *Vlagent) remoteWriteRowsPushed(t *testing.T) int {
total := 0.0
// vlagent_remotewrite_blocks_sent_total
for _, v := range app.GetMetricsByPrefix(t, "vlagent_remotewrite_block_size_rows_sum") {
total += v
}
return int(total)
}
func (app *Vlagent) persistentQueueSize(t *testing.T) int {
total := 0.0
for _, v := range app.GetMetricsByPrefix(t, "vlagent_remotewrite_pending_data_bytes") {
total += v
}
for _, v := range app.GetMetricsByPrefix(t, "vlagent_remotewrite_pending_inmemory_blocks") {
total += v
}
return int(total)
}

View File

@@ -227,6 +227,25 @@ func (app *Vmselect) APIV1StatusTSDB(t *testing.T, matchQuery string, date strin
return status
}
// GraphiteMetricsIndex sends a query to a /graphite/metrics/index.json
//
// See https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#metrics-api
func (app *Vmselect) GraphiteMetricsIndex(t *testing.T, opts QueryOpts) GraphiteMetricsIndexResponse {
t.Helper()
seriesURL := fmt.Sprintf("http://%s/select/%s/graphite/metrics/index.json", app.httpListenAddr, opts.getTenant())
res, statusCode := app.cli.Get(t, seriesURL)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, res)
}
var index GraphiteMetricsIndexResponse
if err := json.Unmarshal([]byte(res), &index); err != nil {
t.Fatalf("could not unmarshal metrics index response data:\n%s\n err: %v", res, err)
}
return index
}
// APIV1AdminTenants sends a query to a /admin/tenants endpoint
func (app *Vmselect) APIV1AdminTenants(t *testing.T) *AdminTenantsResponse {
t.Helper()

View File

@@ -318,6 +318,25 @@ func (app *Vmsingle) PrometheusAPIV1Series(t *testing.T, matchQuery string, opts
return NewPrometheusAPIV1SeriesResponse(t, res)
}
// GraphiteMetricsIndex sends a query to a /metrics/index.json
//
// See https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#metrics-api
func (app *Vmsingle) GraphiteMetricsIndex(t *testing.T, _ QueryOpts) GraphiteMetricsIndexResponse {
t.Helper()
seriesURL := fmt.Sprintf("http://%s/metrics/index.json", app.httpListenAddr)
res, statusCode := app.cli.Get(t, seriesURL)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, res)
}
var index GraphiteMetricsIndexResponse
if err := json.Unmarshal([]byte(res), &index); err != nil {
t.Fatalf("could not unmarshal metrics index response data:\n%s\n err: %v", res, err)
}
return index
}
// APIV1StatusMetricNamesStats sends a query to a /api/v1/status/metric_names_stats endpoint
// and returns the statistics response for given params.
//

View File

@@ -5826,7 +5826,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 12,
"y": 2387
@@ -6110,7 +6110,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 0,
"y": 2395
@@ -7538,7 +7538,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the max number of concurrent selects across instances.\n* `max` limit can be configured via `search.maxConcurrentRequests` flag\n* `current` shows the current number of goroutines busy with processing requests\n\nWhen `current` hits `max` constantly, it means one or more vmselect nodes are overloaded with number of requests. If you observe that CPU for vmselects is saturated, consider adding more vmselect replicas or increase CPU resources. If CPU panel shows a plenty of free resources - try increasing `search.maxConcurrentRequests`.",
"description": "Shows the max number of concurrent selects across instances.\n* `max` limit can be configured via `search.maxConcurrentRequests` flag\n* `current` shows the current number of goroutines busy with processing requests\n\nWhen `current` hits `max` constantly, it means one or more vmselect nodes are overloaded with number of requests. If you observe that CPU for vmselects is saturated, consider adding more vmselect replicas or increase CPU resources. If CPU and Memory panels show a plenty of free resources - try increasing `-search.maxConcurrentRequests`. Please note, the higher is `-search.maxConcurrentRequests`, the higher could be [peak memory usage](https://docs.victoriametrics.com/victoriametrics/troubleshooting/#out-of-memory-errors).",
"fieldConfig": {
"defaults": {
"color": {
@@ -10053,7 +10053,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 0,
"y": 5014
@@ -10159,7 +10159,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 12,
"y": 5014
@@ -10687,4 +10687,4 @@
"uid": "oS7Bi_0Wz",
"version": 1,
"weekStart": ""
}
}

5138
dashboards/vlagent.json Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -5827,7 +5827,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 12,
"y": 2387
@@ -6111,7 +6111,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 0,
"y": 2395
@@ -7539,7 +7539,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the max number of concurrent selects across instances.\n* `max` limit can be configured via `search.maxConcurrentRequests` flag\n* `current` shows the current number of goroutines busy with processing requests\n\nWhen `current` hits `max` constantly, it means one or more vmselect nodes are overloaded with number of requests. If you observe that CPU for vmselects is saturated, consider adding more vmselect replicas or increase CPU resources. If CPU panel shows a plenty of free resources - try increasing `search.maxConcurrentRequests`.",
"description": "Shows the max number of concurrent selects across instances.\n* `max` limit can be configured via `search.maxConcurrentRequests` flag\n* `current` shows the current number of goroutines busy with processing requests\n\nWhen `current` hits `max` constantly, it means one or more vmselect nodes are overloaded with number of requests. If you observe that CPU for vmselects is saturated, consider adding more vmselect replicas or increase CPU resources. If CPU and Memory panels show a plenty of free resources - try increasing `-search.maxConcurrentRequests`. Please note, the higher is `-search.maxConcurrentRequests`, the higher could be [peak memory usage](https://docs.victoriametrics.com/victoriametrics/troubleshooting/#out-of-memory-errors).",
"fieldConfig": {
"defaults": {
"color": {
@@ -10054,7 +10054,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 0,
"y": 5014
@@ -10160,7 +10160,7 @@
"overrides": []
},
"gridPos": {
"h": 7,
"h": 8,
"w": 12,
"x": 12,
"y": 5014
@@ -10688,4 +10688,4 @@
"uid": "oS7Bi_0Wz_vm",
"version": 1,
"weekStart": ""
}
}

View File

@@ -3409,7 +3409,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max(10, sum(sum_over_time(scrape_series_added[5m])) by (job)) > 0",
"expr": "topk(10, sum(sum_over_time(scrape_series_added[5m])) by (job)) > 0",
"interval": "",
"legendFormat": "{{ job }}",
"range": true,

View File

@@ -115,13 +115,14 @@
"mappings": [
{
"options": {
"0": {
"match": "null",
"result": {
"color": "green",
"index": 0,
"text": "Ok"
}
},
"type": "value"
"type": "special"
},
{
"options": {
@@ -140,8 +141,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
}
@@ -173,17 +173,19 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "count(vmalert_config_last_reload_successful{job=~\"$job\", instance=~\"$instance\"} < 1 ) or 0",
"expr": "count(vmalert_config_last_reload_successful{job=~\"$job\", instance=~\"$instance\"} < 1 )",
"interval": "",
"legendFormat": "",
"range": true,
"refId": "A"
}
],
@@ -204,8 +206,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
}
@@ -237,7 +238,7 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -268,8 +269,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
}
@@ -301,7 +301,7 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -332,8 +332,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -369,17 +368,19 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "(sum(increase(vmalert_alerting_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])) or vector(0)) + \n(sum(increase(vmalert_recording_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])) or vector(0))",
"expr": "(sum(increase(vmalert_alerting_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval]))) + \n(sum(increase(vmalert_recording_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])))",
"interval": "",
"legendFormat": "",
"range": true,
"refId": "A"
}
],
@@ -394,14 +395,24 @@
"description": "Shows number of Recording Rules which produce no data.\n\n Usually it means that such rules are misconfigured, since they give no output during the evaluation.\nPlease check if rule's expression is correct and it is working as expected.",
"fieldConfig": {
"defaults": {
"mappings": [],
"mappings": [
{
"options": {
"match": "null",
"result": {
"index": 1,
"text": "0"
}
},
"type": "special"
}
],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -437,7 +448,7 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -446,7 +457,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "count(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\"} < 1) or 0",
"expr": "count(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\"} < 1)",
"interval": "",
"legendFormat": "",
"range": true,
@@ -479,8 +490,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -535,7 +545,7 @@
},
"showHeader": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -605,8 +615,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -642,7 +651,7 @@
"sort": "asc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -724,8 +733,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -763,7 +771,7 @@
"sort": "desc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -787,7 +795,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Top $topk groups by evaluation duration. Shows groups that take the most of time during the evaluation across all instances.\n\nThe panel uses MetricsQL functions and may not work with VictoriaMetrics.",
"description": "Top $topk groups by evaluation duration. Shows groups that take the most of time during the evaluation across all instances.",
"fieldConfig": {
"defaults": {
"color": {
@@ -831,8 +839,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -870,7 +877,7 @@
"sort": "desc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -879,7 +886,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max($topk, max(sum(\n rate(vmalert_iteration_duration_seconds_sum{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n/\n rate(vmalert_iteration_duration_seconds_count{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n) by(job, instance, group, file)) \nby(job, group, file))",
"expr": "topk($topk, max(sum(\n rate(vmalert_iteration_duration_seconds_sum{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n/\n rate(vmalert_iteration_duration_seconds_count{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n) by(job, instance, group, file)) \nby(job, group, file))",
"interval": "",
"legendFormat": "({{job}}) {{group}}({{file}})",
"range": true,
@@ -938,8 +945,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -975,7 +981,7 @@
"sort": "desc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -1043,8 +1049,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1080,7 +1085,7 @@
"sort": "none"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -1160,8 +1165,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1276,8 +1280,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1394,8 +1397,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1503,8 +1505,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1637,8 +1638,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1754,8 +1754,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
},
@@ -1876,8 +1875,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
},
@@ -1997,8 +1995,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2106,8 +2103,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2216,8 +2212,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2325,8 +2320,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2434,8 +2428,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
},
@@ -2856,7 +2849,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows top $topk current active (firing) alerting rules.\n\nThe panel uses MetricsQL functions and may not work with VictoriaMetrics.",
"description": "Shows top $topk current active (firing) alerting rules.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2943,7 +2936,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max($topk, sum(vmalert_alerts_firing{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, group, file, alertname) > 0)",
"expr": "topk($topk, sum(vmalert_alerts_firing{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, group, file, alertname) > 0)",
"interval": "",
"legendFormat": "({{job}}) {{group}}.{{alertname}}({{file}})",
"range": true,
@@ -3376,7 +3369,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the top $topk recording rules which generate the most of [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples). Each generated sample is basically a time series which then ingested into configured remote storage. Rules with high numbers may cause the most pressure on the remote database and become a source of too high cardinality.\n\nThe panel uses MetricsQL functions and may not work with VictoriaMetrics.",
"description": "Shows the top $topk recording rules which generate the most of [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples). Each generated sample is basically a time series which then ingested into configured remote storage. Rules with high numbers may cause the most pressure on the remote database and become a source of too high cardinality.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3463,7 +3456,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max($topk, \n max(\n sum(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, instance, group, file, recording) > 0\n ) by(job, group, file, recording)\n)",
"expr": "topk($topk, \n max(\n sum(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, instance, group, file, recording) > 0\n ) by(job, group, file, recording)\n)",
"interval": "",
"legendFormat": "({{job}}) {{group}}.{{recording}}({{file}})",
"range": true,
@@ -4084,7 +4077,7 @@
],
"preload": false,
"refresh": "",
"schemaVersion": 40,
"schemaVersion": 41,
"tags": [
"victoriametrics"
],
@@ -4228,10 +4221,10 @@
"baseFilters": [],
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
"uid": "${ds}"
},
"filters": [],
"name": "adhoc",
"name": "filter",
"type": "adhoc"
}
]
@@ -4244,6 +4237,5 @@
"timezone": "",
"title": "VictoriaMetrics - vmalert (VM)",
"uid": "LzldHAVnz_vm",
"version": 1,
"weekStart": ""
"version": 1
}

View File

@@ -3408,7 +3408,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max(10, sum(sum_over_time(scrape_series_added[5m])) by (job)) > 0",
"expr": "topk(10, sum(sum_over_time(scrape_series_added[5m])) by (job)) > 0",
"interval": "",
"legendFormat": "{{ job }}",
"range": true,

View File

@@ -114,13 +114,14 @@
"mappings": [
{
"options": {
"0": {
"match": "null",
"result": {
"color": "green",
"index": 0,
"text": "Ok"
}
},
"type": "value"
"type": "special"
},
{
"options": {
@@ -139,8 +140,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
}
@@ -172,17 +172,19 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "count(vmalert_config_last_reload_successful{job=~\"$job\", instance=~\"$instance\"} < 1 ) or 0",
"expr": "count(vmalert_config_last_reload_successful{job=~\"$job\", instance=~\"$instance\"} < 1 )",
"interval": "",
"legendFormat": "",
"range": true,
"refId": "A"
}
],
@@ -203,8 +205,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
}
@@ -236,7 +237,7 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -267,8 +268,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
}
@@ -300,7 +300,7 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -331,8 +331,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -368,17 +367,19 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "(sum(increase(vmalert_alerting_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])) or vector(0)) + \n(sum(increase(vmalert_recording_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])) or vector(0))",
"expr": "(sum(increase(vmalert_alerting_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval]))) + \n(sum(increase(vmalert_recording_rules_errors_total{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])))",
"interval": "",
"legendFormat": "",
"range": true,
"refId": "A"
}
],
@@ -393,14 +394,24 @@
"description": "Shows number of Recording Rules which produce no data.\n\n Usually it means that such rules are misconfigured, since they give no output during the evaluation.\nPlease check if rule's expression is correct and it is working as expected.",
"fieldConfig": {
"defaults": {
"mappings": [],
"mappings": [
{
"options": {
"match": "null",
"result": {
"index": 1,
"text": "0"
}
},
"type": "special"
}
],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -436,7 +447,7 @@
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -445,7 +456,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "count(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\"} < 1) or 0",
"expr": "count(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\"} < 1)",
"interval": "",
"legendFormat": "",
"range": true,
@@ -478,8 +489,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -534,7 +544,7 @@
},
"showHeader": true
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -604,8 +614,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -641,7 +650,7 @@
"sort": "asc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -723,8 +732,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -762,7 +770,7 @@
"sort": "desc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -786,7 +794,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Top $topk groups by evaluation duration. Shows groups that take the most of time during the evaluation across all instances.\n\nThe panel uses MetricsQL functions and may not work with Prometheus.",
"description": "Top $topk groups by evaluation duration. Shows groups that take the most of time during the evaluation across all instances.",
"fieldConfig": {
"defaults": {
"color": {
@@ -830,8 +838,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -869,7 +876,7 @@
"sort": "desc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -878,7 +885,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max($topk, max(sum(\n rate(vmalert_iteration_duration_seconds_sum{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n/\n rate(vmalert_iteration_duration_seconds_count{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n) by(job, instance, group, file)) \nby(job, group, file))",
"expr": "topk($topk, max(sum(\n rate(vmalert_iteration_duration_seconds_sum{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n/\n rate(vmalert_iteration_duration_seconds_count{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}[$__rate_interval])\n) by(job, instance, group, file)) \nby(job, group, file))",
"interval": "",
"legendFormat": "({{job}}) {{group}}({{file}})",
"range": true,
@@ -937,8 +944,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -974,7 +980,7 @@
"sort": "desc"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -1042,8 +1048,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1079,7 +1084,7 @@
"sort": "none"
}
},
"pluginVersion": "11.5.0",
"pluginVersion": "12.0.2",
"targets": [
{
"datasource": {
@@ -1159,8 +1164,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1275,8 +1279,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1393,8 +1396,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1502,8 +1504,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1636,8 +1637,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -1753,8 +1753,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
},
@@ -1875,8 +1874,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
},
@@ -1996,8 +1994,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2105,8 +2102,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2215,8 +2211,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2324,8 +2319,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
@@ -2433,8 +2427,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
}
]
},
@@ -2855,7 +2848,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows top $topk current active (firing) alerting rules.\n\nThe panel uses MetricsQL functions and may not work with Prometheus.",
"description": "Shows top $topk current active (firing) alerting rules.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2942,7 +2935,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max($topk, sum(vmalert_alerts_firing{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, group, file, alertname) > 0)",
"expr": "topk($topk, sum(vmalert_alerts_firing{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, group, file, alertname) > 0)",
"interval": "",
"legendFormat": "({{job}}) {{group}}.{{alertname}}({{file}})",
"range": true,
@@ -3375,7 +3368,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the top $topk recording rules which generate the most of [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples). Each generated sample is basically a time series which then ingested into configured remote storage. Rules with high numbers may cause the most pressure on the remote database and become a source of too high cardinality.\n\nThe panel uses MetricsQL functions and may not work with Prometheus.",
"description": "Shows the top $topk recording rules which generate the most of [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples). Each generated sample is basically a time series which then ingested into configured remote storage. Rules with high numbers may cause the most pressure on the remote database and become a source of too high cardinality.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3462,7 +3455,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "topk_max($topk, \n max(\n sum(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, instance, group, file, recording) > 0\n ) by(job, group, file, recording)\n)",
"expr": "topk($topk, \n max(\n sum(vmalert_recording_rules_last_evaluation_samples{job=~\"$job\", instance=~\"$instance\", group=~\"$group\", file=~\"$file\"}) by(job, instance, group, file, recording) > 0\n ) by(job, group, file, recording)\n)",
"interval": "",
"legendFormat": "({{job}}) {{group}}.{{recording}}({{file}})",
"range": true,
@@ -4083,7 +4076,7 @@
],
"preload": false,
"refresh": "",
"schemaVersion": 40,
"schemaVersion": 41,
"tags": [
"victoriametrics"
],
@@ -4227,10 +4220,10 @@
"baseFilters": [],
"datasource": {
"type": "prometheus",
"uid": "$ds"
"uid": "${ds}"
},
"filters": [],
"name": "adhoc",
"name": "filter",
"type": "adhoc"
}
]
@@ -4243,6 +4236,5 @@
"timezone": "",
"title": "VictoriaMetrics - vmalert",
"uid": "LzldHAVnz",
"version": 1,
"weekStart": ""
"version": 1
}

View File

@@ -95,7 +95,7 @@ publish-via-docker:
--label "org.opencontainers.image.version=$(PKG_TAG)" \
--label "org.opencontainers.image.created=$(shell date -u +"%Y-%m-%dT%H:%M:%SZ")" \
$(foreach registry,$(DOCKER_REGISTRIES),\
--tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(RACE)$(EXTRA_TAG_SUFFIX) \
--tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(RACE)$(EXTRA_DOCKER_TAG_SUFFIX) \
) \
-o type=image \
--provenance=false \
@@ -115,7 +115,7 @@ publish-via-docker:
--label "org.opencontainers.image.version=$(PKG_TAG)" \
--label "org.opencontainers.image.created=$(shell date -u +"%Y-%m-%dT%H:%M:%SZ")" \
$(foreach registry,$(DOCKER_REGISTRIES),\
--tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(RACE)$(EXTRA_TAG_SUFFIX)-scratch \
--tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(RACE)$(EXTRA_DOCKER_TAG_SUFFIX)-scratch \
) \
-o type=image \
--provenance=false \
@@ -129,6 +129,14 @@ publish-via-docker:
$(APP_NAME)-linux-ppc64le-prod \
$(APP_NAME)-linux-386-prod
publish-via-docker-from-rc:
$(foreach registry,$(DOCKER_REGISTRIES),\
docker buildx imagetools create --tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG) $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(EXTRA_DOCKER_TAG_SUFFIX); \
)
$(foreach registry,$(DOCKER_REGISTRIES),\
docker buildx imagetools create --tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)-scratch $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(EXTRA_DOCKER_TAG_SUFFIX)-scratch; \
)
publish-via-docker-latest:
$(foreach registry,$(DOCKER_REGISTRIES),\
docker buildx imagetools create --tag $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):latest $(registry)/$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG); \

View File

@@ -21,6 +21,7 @@ to the Internet.
* [alertmanager](#alertmanager)
* [Grafana](#grafana)
* [Alerts](#alerts)
* [Troubleshooting](#troubleshooting)
## VictoriaMetrics single server
@@ -56,6 +57,8 @@ To shutdown environment run:
make docker-vm-single-down
```
See [troubleshooting](#troubleshooting) in case if issues.
## VictoriaMetrics cluster
To spin-up environment with VictoriaMetrics cluster run the following command:
@@ -92,6 +95,8 @@ To shutdown environment execute the following command:
make docker-vm-cluster-down
```
See [troubleshooting](#troubleshooting) in case if issues.
## vmagent
vmagent is used for scraping and pushing time series to VictoriaMetrics instance.
@@ -135,6 +140,8 @@ To shutdown environment execute the following command:
make docker-vl-single-down
```
See [troubleshooting](#troubleshooting) in case if issues.
## VictoriaLogs cluster
To spin-up environment with VictoriaLogs cluster run the following command:
@@ -176,6 +183,8 @@ To shutdown environment execute the following command:
make docker-vl-cluster-down
```
See [troubleshooting](#troubleshooting) in case if issues.
Please see more examples on integration of VictoriaLogs with other log shippers below:
* [filebeat](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/filebeat)
* [fluentbit](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/fluentbit)
@@ -249,3 +258,34 @@ The list of alerting rules is the following:
Please, also see [how to monitor VictoriaMetrics installations](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
and [how to monitor VictoriaLogs installations](https://docs.victoriametrics.com/victorialogs/#monitoring).
## Troubleshooting
This environment has the following requirements:
* installed [docker compose](https://docs.docker.com/compose/);
* access to the Internet for downloading docker images;
* **All commands should be executed from the root directory of [the VictoriaMetrics repo](https://github.com/VictoriaMetrics/VictoriaMetrics).**
The expected output of running a command like `make docker-vm-single-up` is the following:
```sh
make docker-vm-single-up :(
docker compose -f deployment/docker/compose-vm-single.yml up -d
[+] Running 9/9
✔ Network docker_default Created 0.0s
✔ Volume "docker_vmagentdata" Created 0.0s
✔ Container docker-alertmanager-1 Started 0.3s
✔ Container docker-victoriametrics-1 Started 0.3s
...
```
Containers are started in [--detach mode](https://docs.docker.com/reference/cli/docker/compose/up/), meaning they run in the background.
As a result, you won't see their logs or exit status directly in the terminal.
If something isnt working as expected, try the following troubleshooting steps:
1. Run from the correct directory. Make sure you're running the command from the root of the [VictoriaMetrics repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
2. Check container status. Run `docker ps -a` to list all containers and their status. Healthy and running containers should have `STATUS` set to `Up`.
3. View container logs. To inspect logs for a specific container, get its container ID from step p2 and run: `docker logs -f <containerID>`.
4. Read the logs carefully and follow any suggested actions.
5. Check for port conflicts. Some containers (e.g., Grafana) expose HTTP ports. If a port (like `:3000`) is already in use, the container may fail to start. Stop the conflicting process or change the exposed port in the Docker Compose file.
6. Shut down the deployment. To tear down the environment, run: `make <environment>-down` (i.e. `make docker-vm-single-down`).
Note, this command also removes all attached volumes, so all the temporary created data will be removed too (i.e. Grafana dashboards or collected metrics).

View File

@@ -1,14 +1,17 @@
# balance load among vmselects
# see https://docs.victoriametrics.com/victoriametrics/vmauth/#load-balancing
unauthorized_user:
url_map:
- src_paths:
- "/select/.*"
url_prefix:
- http://vmselect-1:8481
- http://vmselect-2:8481
- src_paths:
- "/insert/.*"
url_prefix:
- http://vminsert-1:8480
- http://vminsert-2:8480
users:
- username: "foo"
password: "bar"
url_map:
- src_paths:
- "/select/.*"
- "/admin/.*"
url_prefix:
- http://vmselect-1:8481
- http://vmselect-2:8481
- src_paths:
- "/insert/.*"
url_prefix:
- http://vminsert-1:8480
- http://vminsert-2:8480

View File

@@ -1,7 +1,7 @@
services:
# Grafana instance configured with VictoriaLogs as datasource
grafana:
image: grafana/grafana:11.5.0
image: grafana/grafana:12.0.2
depends_on:
- "victoriametrics"
- "vmauth"
@@ -68,7 +68,7 @@ services:
# VictoriaMetrics instance, a single process responsible for
# scraping, storing metrics and serve read requests.
victoriametrics:
image: victoriametrics/victoria-metrics:v1.119.0
image: victoriametrics/victoria-metrics:v1.120.0
volumes:
- vmdata:/storage
- ./prometheus-vl-cluster.yml:/etc/prometheus/prometheus.yml
@@ -81,7 +81,7 @@ services:
# It proxies query requests from vmalert to either VictoriaMetrics or VictoriaLogs,
# depending on the requested path.
vmauth:
image: victoriametrics/vmauth:v1.119.0
image: victoriametrics/vmauth:v1.120.0
depends_on:
- "victoriametrics"
- "vlselect-1"
@@ -97,7 +97,7 @@ services:
# vmalert executes alerting and recording rules according to given rule type.
vmalert:
image: victoriametrics/vmalert:v1.119.0
image: victoriametrics/vmalert:v1.120.0
depends_on:
- "vmauth"
- "alertmanager"

View File

@@ -1,7 +1,7 @@
services:
# Grafana instance configured with VictoriaLogs as datasource
grafana:
image: grafana/grafana:11.5.0
image: grafana/grafana:12.0.2
depends_on:
- "victoriametrics"
- "victorialogs"
@@ -49,7 +49,7 @@ services:
# VictoriaMetrics instance, a single process responsible for
# scraping, storing metrics and serve read requests.
victoriametrics:
image: victoriametrics/victoria-metrics:v1.119.0
image: victoriametrics/victoria-metrics:v1.120.0
ports:
- "8428:8428"
volumes:
@@ -64,7 +64,7 @@ services:
# It proxies query requests from vmalert to either VictoriaMetrics or VictoriaLogs,
# depending on the requested path.
vmauth:
image: victoriametrics/vmauth:v1.119.0
image: victoriametrics/vmauth:v1.120.0
depends_on:
- "victoriametrics"
- "victorialogs"
@@ -78,7 +78,7 @@ services:
# vmalert executes alerting and recording rules according to the given rule type.
vmalert:
image: victoriametrics/vmalert:v1.119.0
image: victoriametrics/vmalert:v1.120.0
depends_on:
- "vmauth"
- "alertmanager"

View File

@@ -3,7 +3,7 @@ services:
# It scrapes targets defined in --promscrape.config
# And forward them to --remoteWrite.url
vmagent:
image: victoriametrics/vmagent:v1.119.0
image: victoriametrics/vmagent:v1.120.0
depends_on:
- "vmauth"
ports:
@@ -17,7 +17,7 @@ services:
restart: always
grafana:
image: grafana/grafana:11.5.0
image: grafana/grafana:12.0.2
depends_on:
- "vmauth"
ports:
@@ -35,14 +35,14 @@ services:
# vmstorage shards. Each shard receives 1/N of all metrics sent to vminserts,
# where N is number of vmstorages (2 in this case).
vmstorage-1:
image: victoriametrics/vmstorage:v1.119.0-cluster
image: victoriametrics/vmstorage:v1.120.0-cluster
volumes:
- strgdata-1:/storage
command:
- "--storageDataPath=/storage"
restart: always
vmstorage-2:
image: victoriametrics/vmstorage:v1.119.0-cluster
image: victoriametrics/vmstorage:v1.120.0-cluster
volumes:
- strgdata-2:/storage
command:
@@ -52,7 +52,7 @@ services:
# vminsert is ingestion frontend. It receives metrics pushed by vmagent,
# pre-process them and distributes across configured vmstorage shards.
vminsert-1:
image: victoriametrics/vminsert:v1.119.0-cluster
image: victoriametrics/vminsert:v1.120.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -61,7 +61,7 @@ services:
- "--storageNode=vmstorage-2:8400"
restart: always
vminsert-2:
image: victoriametrics/vminsert:v1.119.0-cluster
image: victoriametrics/vminsert:v1.120.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -73,7 +73,7 @@ services:
# vmselect is a query fronted. It serves read queries in MetricsQL or PromQL.
# vmselect collects results from configured `--storageNode` shards.
vmselect-1:
image: victoriametrics/vmselect:v1.119.0-cluster
image: victoriametrics/vmselect:v1.120.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -83,7 +83,7 @@ services:
- "--vmalert.proxyURL=http://vmalert:8880"
restart: always
vmselect-2:
image: victoriametrics/vmselect:v1.119.0-cluster
image: victoriametrics/vmselect:v1.120.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -98,7 +98,7 @@ services:
# read requests from Grafana, vmui, vmalert among vmselects.
# It can be used as an authentication proxy.
vmauth:
image: victoriametrics/vmauth:v1.119.0
image: victoriametrics/vmauth:v1.120.0
depends_on:
- "vmselect-1"
- "vmselect-2"
@@ -112,7 +112,7 @@ services:
# vmalert executes alerting and recording rules
vmalert:
image: victoriametrics/vmalert:v1.119.0
image: victoriametrics/vmalert:v1.120.0
depends_on:
- "vmauth"
ports:

View File

@@ -3,7 +3,7 @@ services:
# It scrapes targets defined in --promscrape.config
# And forward them to --remoteWrite.url
vmagent:
image: victoriametrics/vmagent:v1.119.0
image: victoriametrics/vmagent:v1.120.0
depends_on:
- "victoriametrics"
ports:
@@ -18,7 +18,7 @@ services:
# VictoriaMetrics instance, a single process responsible for
# storing metrics and serve read requests.
victoriametrics:
image: victoriametrics/victoria-metrics:v1.119.0
image: victoriametrics/victoria-metrics:v1.120.0
ports:
- 8428:8428
- 8089:8089
@@ -38,7 +38,7 @@ services:
restart: always
grafana:
image: grafana/grafana:11.5.0
image: grafana/grafana:12.0.2
depends_on:
- "victoriametrics"
ports:
@@ -54,7 +54,7 @@ services:
# vmalert executes alerting and recording rules
vmalert:
image: victoriametrics/vmalert:v1.119.0
image: victoriametrics/vmalert:v1.120.0
depends_on:
- "victoriametrics"
- "alertmanager"

View File

@@ -19,7 +19,7 @@ services:
retries: 10
dd-proxy:
image: docker.io/victoriametrics/vmauth:v1.119.0
image: docker.io/victoriametrics/vmauth:v1.120.0
restart: on-failure
volumes:
- ./:/etc/vmauth
@@ -44,6 +44,18 @@ services:
deploy:
replicas: 0
# vlagent is needed for HA setup and its replica count is set to 1 in compose-ha.yml file
vlagent:
image: victoriametrics/vlagent:v0.0.1
volumes:
- vlagent:/vlagent
command:
- '--remoteWrite.tmpDataPath=/vlagent'
- '--remoteWrite.url=http://victorialogs:9428/internal/insert'
- '--remoteWrite.url=http://victorialogs-2:9428/internal/insert'
deploy:
replicas: 0
victoriametrics:
image: victoriametrics/victoria-metrics:v1.112.0
ports:
@@ -63,3 +75,4 @@ volumes:
victorialogs:
victorialogs-2:
victoriametrics:
vlagent:

View File

@@ -2,3 +2,6 @@ services:
victorialogs-2:
deploy:
replicas: 1
vlagent:
deploy:
replicas: 1

View File

@@ -32,20 +32,8 @@
[OUTPUT]
Name http
Match *
host victorialogs
port 9428
compress gzip
uri /insert/jsonline?_stream_fields=stream,path&_msg_field=log&_time_field=date
format json_lines
json_date_format iso8601
header AccountID 0
header ProjectID 0
[OUTPUT]
Name http
Match *
host victorialogs-2
port 9428
host vlagent
port 9429
compress gzip
uri /insert/jsonline?_stream_fields=stream,path&_msg_field=log&_time_field=date
format json_lines

View File

@@ -13,12 +13,7 @@ input {
output {
http {
url => "http://victorialogs:9428/insert/jsonline?_stream_fields=host.name,stream&_msg_field=log&_time_field=time"
format => "json"
http_method => "post"
}
http {
url => "http://victorialogs-2:9428/insert/jsonline?_stream_fields=host.name,stream&_msg_field=log&_time_field=time"
url => "http://vlagent:9429/insert/jsonline?_stream_fields=host.name,stream&_msg_field=log&_time_field=time"
format => "json"
http_method => "post"
}

View File

@@ -1,8 +1,7 @@
exporters:
elasticsearch:
endpoints:
- http://victorialogs:9428/insert/elasticsearch
- http://victorialogs-2:9428/insert/elasticsearch
- http://vlagent:9429/insert/elasticsearch
receivers:
filelog:
include: [/var/lib/docker/containers/**/*.log]

View File

@@ -18,26 +18,7 @@ sinks:
type: http
inputs:
- parser
uri: http://victorialogs:9428/insert/jsonline
encoding:
codec: json
framing:
method: newline_delimited
compression: gzip
healthcheck:
enabled: false
request:
headers:
AccountID: '0'
ProjectID: '0'
VL-Stream-Fields: source_type,host,container_name,label.com.docker.compose.service
VL-Msg-Field: message.msg
VL-Time-Field: timestamp
vlogs-2:
type: http
inputs:
- parser
uri: http://victorialogs-2:9428/insert/jsonline
uri: http://vlagent:9429/insert/jsonline
encoding:
codec: json
framing:

View File

@@ -1,6 +1,6 @@
services:
vmagent:
image: victoriametrics/vmagent:v1.119.0
image: victoriametrics/vmagent:v1.120.0
depends_on:
- "victoriametrics"
ports:
@@ -14,7 +14,7 @@ services:
restart: always
victoriametrics:
image: victoriametrics/victoria-metrics:v1.119.0
image: victoriametrics/victoria-metrics:v1.120.0
ports:
- 8428:8428
volumes:
@@ -27,7 +27,7 @@ services:
restart: always
grafana:
image: grafana/grafana-oss:10.2.1
image: grafana/grafana:12.0.2
depends_on:
- "victoriametrics"
ports:
@@ -40,7 +40,7 @@ services:
restart: always
vmalert:
image: victoriametrics/vmalert:v1.119.0
image: victoriametrics/vmalert:v1.120.0
depends_on:
- "victoriametrics"
ports:

View File

@@ -58,7 +58,7 @@ services:
- ./vmsingle/promscrape.yml:/promscrape.yml
grafana:
image: grafana/grafana:11.5.0
image: grafana/grafana:12.0.2
depends_on: [vmsingle]
ports:
- 3000:3000

View File

@@ -126,7 +126,7 @@ Metrics to save the output (in metric names or labels). Must have `__name__` key
</td>
<td>
`/api/v1/import`
`/api/v1/import`{{% deprecated_from "v1.19.2" anomaly %}}
</td>
<td>

View File

@@ -2,9 +2,9 @@
- To use *vmanomaly*, part of the enterprise package, a license key is required. Obtain your key [here](https://victoriametrics.com/products/enterprise/trial/) for this tutorial or for enterprise use.
- In the tutorial, we'll be using the following VictoriaMetrics components:
- [VictoriaMetrics Single-Node](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) (v1.119.0)
- [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) (v1.119.0)
- [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) (v1.119.0)
- [VictoriaMetrics Single-Node](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) (v1.120.0)
- [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) (v1.120.0)
- [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) (v1.120.0)
- [Grafana](https://grafana.com/) (v.10.2.1)
- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/)
- [Node exporter](https://github.com/prometheus/node_exporter#node-exporter) (v1.7.0) and [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) (v0.27.0)
@@ -315,7 +315,7 @@ Let's wrap it all up together into the `docker-compose.yml` file.
services:
vmagent:
container_name: vmagent
image: victoriametrics/vmagent:v1.119.0
image: victoriametrics/vmagent:v1.120.0
depends_on:
- "victoriametrics"
ports:
@@ -332,7 +332,7 @@ services:
victoriametrics:
container_name: victoriametrics
image: victoriametrics/victoria-metrics:v1.119.0
image: victoriametrics/victoria-metrics:v1.120.0
ports:
- 8428:8428
volumes:
@@ -365,7 +365,7 @@ services:
vmalert:
container_name: vmalert
image: victoriametrics/vmalert:v1.119.0
image: victoriametrics/vmalert:v1.120.0
depends_on:
- "victoriametrics"
ports:

View File

@@ -241,27 +241,27 @@ services:
- grafana_data:/var/lib/grafana/
vmsingle:
image: victoriametrics/victoria-metrics:v1.119.0
image: victoriametrics/victoria-metrics:v1.120.0
command:
- -httpListenAddr=0.0.0.0:8429
vmstorage:
image: victoriametrics/vmstorage:v1.119.0-cluster
image: victoriametrics/vmstorage:v1.120.0-cluster
vminsert:
image: victoriametrics/vminsert:v1.119.0-cluster
image: victoriametrics/vminsert:v1.120.0-cluster
command:
- -storageNode=vmstorage:8400
- -httpListenAddr=0.0.0.0:8480
vmselect:
image: victoriametrics/vmselect:v1.119.0-cluster
image: victoriametrics/vmselect:v1.120.0-cluster
command:
- -storageNode=vmstorage:8401
- -httpListenAddr=0.0.0.0:8481
vmagent:
image: victoriametrics/vmagent:v1.119.0
image: victoriametrics/vmagent:v1.120.0
volumes:
- ./scrape.yaml:/etc/vmagent/config.yaml
command:
@@ -270,7 +270,7 @@ services:
- -remoteWrite.url=http://vmsingle:8429/api/v1/write
vmgateway-cluster:
image: victoriametrics/vmgateway:v1.119.0-enterprise
image: victoriametrics/vmgateway:v1.120.0-enterprise
ports:
- 8431:8431
volumes:
@@ -286,7 +286,7 @@ services:
- -auth.oidcDiscoveryEndpoints=http://keycloak:8080/realms/master/.well-known/openid-configuration
vmgateway-single:
image: victoriametrics/vmgateway:v1.119.0-enterprise
image: victoriametrics/vmgateway:v1.120.0-enterprise
ports:
- 8432:8431
volumes:
@@ -397,7 +397,7 @@ Once iDP configuration is done, vmagent configuration needs to be updated to use
```yaml
vmagent:
image: victoriametrics/vmagent:v1.119.0
image: victoriametrics/vmagent:v1.120.0
volumes:
- ./scrape.yaml:/etc/vmagent/config.yaml
- ./vmagent-client-secret:/etc/vmagent/oauth2-client-secret

View File

@@ -209,8 +209,8 @@ Migrating data from other databases to VictoriaMetrics is as simple as importing
[supported ingestion formats](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#push-model).
But migration from InfluxDB might get easier with [vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/). See more about
migrating [from InfluxDB v1.x versions](https://docs.victoriametrics.com/victoriametrics/vmctl/#migrating-data-from-influxdb-1x).
Migrating data from InfluxDB v2.x is not supported. But there is a useful [3rd party solution](https://docs.victoriametrics.com/victoriametrics/vmctl/#migrating-data-from-influxdb-2x)
migrating [from InfluxDB v1.x versions](https://docs.victoriametrics.com/victoriametrics/vmctl/influxdb/).
Migrating data from InfluxDB v2.x is not supported. But there is a useful [3rd party solution](https://docs.victoriametrics.com/victoriametrics/vmctl/influxdb/#influxdb-v2)
for this.
Please note, data migration is a backfilling process, so read about [backfilling tips](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#backfilling).

View File

@@ -75,4 +75,3 @@ Additional context
### What more can we do?
Setup vmagents in Ground Control regions. That allows it to accept data close to storage and add more reliability if storage is temporarily offline.
g

View File

@@ -18,6 +18,9 @@ according to [these docs](https://docs.victoriametrics.com/victorialogs/quicksta
## tip
* BUGFIX: [`rate_sum` stats function](https://docs.victoriametrics.com/victorialogs/logsql/#rate_sum-stats): fix inconsistent per-second rate calculation when time filters are specified via HTTP query parameters instead of LogsQL expression. This affects recording rule results. See [#9303](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9303).
* BUGFIX: [web UI](https://docs.victoriametrics.com/victorialogs/querying/#web-ui): disabled opening of autocomplete popup on initial page load.
## [v1.24.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.24.0-victorialogs)
Released at 2025-06-20
@@ -59,6 +62,7 @@ Released at 2025-06-20
* BUGFIX: [web UI](https://docs.victoriametrics.com/victorialogs/querying/#web-ui): fix issue with hits chart ignoring selected AccountID and ProjectID. See [#9157](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9157).
* BUGFIX: [web UI](https://docs.victoriametrics.com/victorialogs/querying/#web-ui): fix missing field values in auto-complete. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8749)
* BUGFIX: [web UI](https://docs.victoriametrics.com/victorialogs/querying/#web-ui): remove the compact mode of the table tab and add field sorting capabilities to the JSON tab. See [#7047](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7047).
* BUGFIX: [web UI](https://docs.victoriametrics.com/victorialogs/querying/#web-ui): fix errors in console about loading of `manifest.json` when accessing UI through vmauth with Basic Auth enabled.
* BUGFIX: [Journald data ingestion](https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/): properly read log timestamp from `__REALTIME_TIMESTAMP` field according to [the docs](https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/#time-field). See [#9144](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9144). The bug has been introduced in [v1.22.0-victorialogs](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.22.0-victorialogs).
* BUGFIX: [data ingestion](https://docs.victoriametrics.com/victorialogs/data-ingestion/): support `-` as a timestamp value, as described in [RFC5424](https://datatracker.ietf.org/doc/html/rfc5424#section-6.2.3).
* BUGFIX: [LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/): properly handle quotes inside quoted strings such as `"\""`. Previously this could lead to panics. See [#9219](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9219).
@@ -570,7 +574,7 @@ Released at 2024-09-30
Released at 2024-09-29
* FEATURE: [data ingestion](https://docs.victoriametrics.com/victorialogs/data-ingestion/): accept Unix timestamps in seconds in the ingested logs. This simplifies integration with systems, which prefer Unix timestamps over text-based representation of time.
* FEATURE: [`sort` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe): allow using `order` alias instead of `sort`. For example, `_time:5s | order by (_time)` query works the same as `_time:5s | sort by (_time)`. This simplifies the to [LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/) transition from SQL-like query languages.
* FEATURE: [`sort` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe): allow using `order` alias instead of `sort`. For example, `_time:5s | order by (_time)` query works the same as `_time:5s | sort by (_time)`. This simplifies [LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/) transition from SQL-like query languages.
* FEATURE: [`stats` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe): allow using multiple identical [stats functions](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe-functions) with distinct [filters](https://docs.victoriametrics.com/victorialogs/logsql/#stats-with-additional-filters) and automatically generated result names. For example, `_time:5m | count(), count() if (error)` query works as expected now, e.g. it returns two results over the last 5 minutes: the total number of logs and the number of logs with `error` [word](https://docs.victoriametrics.com/victorialogs/logsql/#word). Previously this query couldn't be executed because the `if (...)` condition wasn't included in the automatically generate result name, so both results had the same name - `count(*)`.
* BUGFIX: properly calculate [`uniq`](https://docs.victoriametrics.com/victorialogs/logsql/#uniq-pipe) and [`top`](https://docs.victoriametrics.com/victorialogs/logsql/#top-pipe) pipes. Previously they could return invalid results in some cases.
@@ -650,7 +654,7 @@ Released at 2024-07-05
Released at 2024-07-02
* FEATURE: add `-syslog.useLocalTimestamp.tcp` and `-syslog.useLocalTimestamp.udp` command-line flags, which could be used for using the local timestamp as [`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) for the logs ingested via the corresponding `-syslog.listenAddr.tcp` / `-syslog.listenAddr.udp`. By default the timestamp from the syslog message is used as [`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field). See [these docs](https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/).
* FEATURE: add `-syslog.useLocalTimestamp.tcp` and `-syslog.useLocalTimestamp.udp` command-line flags, which could be used for using the local timestamp as [`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) for the logs ingested via the corresponding `-syslog.listenAddr.tcp` / `-syslog.listenAddr.udp`. By default, the timestamp from the syslog message is used as [`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field). See [these docs](https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/).
* BUGFIX: make slowly ingested logs visible for search as soon as they are ingested into VictoriaLogs. Previously slowly ingested logs could remain invisible for search for long time.

View File

@@ -207,7 +207,7 @@ via `-insert.maxLineSizeBytes` command-line flag.
VictoriaLogs limits [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) name length to 128 bytes -
Log entries with longer field names are ignored during [data ingestion](https://docs.victoriametrics.com/victorialogs/data-ingestion/).
The maximum length of a field name is hardcoded and is unikely to increase, since this may increase RAM and CPU usage.
The maximum length of a field name is hardcoded and is unlikely to increase, since this may increase RAM and CPU usage.
## How many fields a single log entry may contain
@@ -243,7 +243,7 @@ is returned in the `total_bytes` field.
If you use [VictoriaLogs web UI](https://docs.victoriametrics.com/victorialogs/querying/#web-ui)
or [Grafana plugin for VictoriaLogs](https://docs.victoriametrics.com/victorialogs/victorialogs-datasource/),
then make sure the selected time range covers the last day. Otherwise the query above returns
then make sure the selected time range covers the last day. Otherwise, the query above returns
results on the intersection of the last day and the selected time range.
See [why the log field occupies a lot of disk space](#why-the-log-field-occupies-a-lot-of-disk-space).
@@ -334,7 +334,7 @@ The query works in the following way:
The needed storage space depends on the following factors:
- Data compressibility. VictoraLogs compresses the ingested logs before storing them to disk. The compression ratio depends on the "randomness" of the ingested logs.
- Data compressibility. VictoriaLogs compresses the ingested logs before storing them to disk. The compression ratio depends on the "randomness" of the ingested logs.
Less "random" logs with many repeated field values and small differences between log messages compress the best (up to 100x and more).
More "random" logs with many unique field values may have very low compression rate.

View File

@@ -1319,7 +1319,7 @@ This query matches the following log messages, since their length is in the requ
This query doesn't match the following log messages:
- `foo`, since it is too short
- `foo bar baz abc`, sinc it is too long
- `foo bar baz abc`, since it is too long
It is possible to use `inf` as the upper bound. For example, the following query matches [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the length bigger or equal to 5 chars:
@@ -3362,7 +3362,7 @@ It understands the following Syslog formats:
The following fields are unpacked:
- `level` - optained from `PRI`.
- `level` - obtained from `PRI`.
- `priority` - obtained from `PRI`.
- `facility` - calculated as `PRI / 8`.
- `facility_keyword` - string representation of the `facility` field according to [these docs](https://en.wikipedia.org/wiki/Syslog#Facility).
@@ -3497,7 +3497,7 @@ See also:
#### Conditional unroll
If the [`unroll` pipe](#unroll-pipe) must be applied only to some [log enties](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model),
If the [`unroll` pipe](#unroll-pipe) must be applied only to some [log entries](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model),
then add `if (<filters>)` after `unroll`.
The `<filters>` can contain arbitrary [filters](#filters). For example, the following query unrolls `value` field only if `value_type` field equals to `json_array`:
@@ -3534,7 +3534,7 @@ LogsQL supports the following functions for [`stats` pipe](#stats-pipe):
`avg(field1, ..., fieldN)` [stats pipe function](#stats-pipe-functions) calculates the average value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
Non-numeric values are ignored. If all the values are non-numeric, then `NaN` is returned.
For example, the following query returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
@@ -3972,6 +3972,7 @@ See also:
`sum(field1, ..., fieldN)` [stats pipe function](#stats-pipe-functions) calculates the sum of numeric values across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are skipped. If all the values across `field1`, ..., `fieldN` are non-numeric, then `NaN` is returned.
For example, the following query returns the sum of numeric values for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
@@ -4127,7 +4128,7 @@ LogsQL supports the following string literals:
- `"double quoted"`. Double quote and backslash inside such a string must be escaped with `\`: `"escape\"doublequote and \\ backslash"`.
Double-quoted strings may contain special sequences such as `\n`, `\t`, `\f`, `\x8c`, etc. They are decoded according to [these docs](https://go.dev/ref/spec#String_literals).
- `'single quoted'`. Single quote and backsliash inside such a string must be escaped with `\`: `'escape\'singlequote and \\ backslash'`.
- `'single quoted'`. Single quote and backslash inside such a string must be escaped with `\`: `'escape\'singlequote and \\ backslash'`.
- ``` `backtick quoted` ```. Strings with backslashes, double quotes and single quotes shouldn't be escaped inside backtick-quoted strings.
## Comments
@@ -4136,10 +4137,10 @@ LogsQL query may contain comments at any place. The comment starts with `#` and
Example query with comments:
```logsql
error # find logs with `error` word
| stats by (_stream) logs # then count the number of logs per `_stream` label
| sort by (logs) desc # then sort by the found logs in descending order
| limit 5 # and show top 5 streams with the biggest number of logs
error # find logs with `error` word
| stats by (_stream) count() logs # then count the number of logs per `_stream` label
| sort by (logs) desc # then sort by the found logs in descending order
| limit 5 # and show top 5 streams with the biggest number of logs
```
## Numeric values

View File

@@ -31,41 +31,126 @@ See [quick start guide](#quick-start) on how to start working with VictoriaLogs
## Architecture
VictoriaLogs in cluster mode consists of `vlinsert`, `vlselect` and `vlstorage` components:
VictoriaLogs in cluster mode is composed of three main components: `vlinsert`, `vlselect`, and `vlstorage`.
- `vlinsert` accepts the ingested logs via [all the supported data ingestion protocols](https://docs.victoriametrics.com/victorialogs/data-ingestion/)
and spreads them evenly among the `vlstorage` nodes listed via the `-storageNode` command-line flag.
```mermaid
sequenceDiagram
participant LS as Log Sources
participant VI as vlinsert
participant VS1 as vlstorage-1
participant VS2 as vlstorage-2
participant VL as vlselect
participant QC as Query Client
Note over LS,VS2: Log Ingestion Flow
LS->>VI: Send logs via supported protocols
VI->>VS1: POST /internal/insert (HTTP)
VI->>VS2: POST /internal/insert (HTTP)
Note right of VI: Distributes logs evenly<br/>across vlstorage nodes
Note over VS1,QC: Query Flow
QC->>VL: Query via HTTP endpoints
VL->>VS1: GET /internal/select/* (HTTP)
VL->>VS2: GET /internal/select/* (HTTP)
VS1-->>VL: Return local results
VS2-->>VL: Return local results
VL->>QC: Processed & aggregated results
```
- `vlselect` accepts incoming queries via [all the supported HTTP querying endpoints](https://docs.victoriametrics.com/victorialogs/querying/),
requests the needed data from `vlstorage` nodes listed via the `-storageNode` command-line flag, processes the queries and returns the corresponding responses.
- `vlinsert` handles log ingestion via [all supported protocols](https://docs.victoriametrics.com/victorialogs/data-ingestion/).
It distributes incoming logs evenly across `vlstorage` nodes, as specified by the `-storageNode` command-line flag.
- `vlstorage` is responsible for two tasks:
- `vlselect` receives queries through [all supported HTTP query endpoints](https://docs.victoriametrics.com/victorialogs/querying/).
It fetches the required data from the configured `vlstorage` nodes, processes the queries, and returns the results.
- It accepts logs from `vlinsert` nodes and stores them at the directory specified via `-storageDataPath` command-line flag.
See [these docs](https://docs.victoriametrics.com/victorialogs/#storage) for details about this flag.
- `vlstorage` performs two key roles:
- It stores logs received from `vlinsert` at the directory defined by the `-storageDataPath` flag.
See [storage configuration docs](https://docs.victoriametrics.com/victorialogs/#storage) for details.
- It handles queries from `vlselect` by retrieving and transforming the requested data locally before returning results.
- It processes requests from `vlselect` nodes. It selects the requested logs and performs all data transformations and calculations,
which can be executed locally, before sending the results to `vlselect`.
Each `vlstorage` node operates as a self-contained VictoriaLogs instance.
Refer to the [single-node and cluster mode duality](#single-node-and-cluster-mode-duality) documentation for more information.
This design allows you to reuse existing single-node VictoriaLogs instances by listing them in the `-storageNode` flag for `vlselect`, enabling unified querying across all nodes.
`vlstorage` is basically a single-node version of VictoriaLogs. See [these docs](#single-node-and-cluster-mode-duality) for details.
This means that the existing single-node VictoriaLogs instances can be added to the list of `vlstorage` nodes via `-storageNode` command-line flag at `vlselect`
in order to get global querying view over all the logs across all the single-node VictoriaLogs instances.
All VictoriaLogs components are horizontally scalable and can be deployed on hardware best suited to their respective workloads.
`vlinsert` and `vlselect` can be run on the same node, which allows the minimal cluster to consist of just one `vlstorage` node and one node acting as both `vlinsert` and `vlselect`.
However, for production environments, it is recommended to separate `vlinsert` and `vlselect` roles to avoid resource contention — for example, to prevent heavy queries from interfering with log ingestion.
Every component of the VictoriaLogs cluster can scale from a single node to arbitrary number of nodes and can run on the most suitable hardware for the given workload.
`vlinsert` nodes can be used as `vlselect` nodes, so the minimum VictoriaLogs cluster must contain a `vlstorage` node plus a node, which plays both `vlinsert` and `vlselect` roles.
It isn't recommended sharing `vlinsert` and `vlselect` responsibilities in a single node, since this increases chances that heavy queries can negatively affect data ingestion
and vice versa.
Communication between `vlinsert` / `vlselect` and `vlstorage` is done via HTTP over the port specified by the `-httpListenAddr` flag:
`vlselect` and `vlinsert` communicate with `vlstorage` via HTTP at the TCP port specified via `-httpListenAddr` command-line flag:
- `vlinsert` sends data to the `/internal/insert` endpoint on `vlstorage`.
- `vlselect` sends queries to endpoints under `/internal/select/` on `vlstorage`.
- `vlinsert` sends requests to `/internal/insert` HTTP endpoint at `vlstorage`.
- `vlselect` sends requests to HTTP endpoints at `vlstorage` starting with `/internal/select/`.
This HTTP-based communication model allows you to use reverse proxies for authorization, routing, and encryption between components.
Use of [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/) is recommended for managing access control.
This allows using various http proxies for authorization, routing and encryption of requests between these components.
It is recommended to use [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/).
For advanced setups, refer to the [multi-level cluster setup](#multi-level-cluster-setup) documentation.
See also [multi-level cluster setup](#multi-level-cluster-setup).
## High availability
In the cluster setup, the following rules apply:
- The `vlselect` component requires **all relevant vlstorage nodes to be available** in order to return complete and correct query results.
- If even one of the vlstorage nodes is temporarily unavailable, `vlselect` cannot safely return a full response, since some of the required data may reside on the missing node. Rather than risk delivering partial or misleading query results, which can cause confusion, trigger false alerts, or produce incorrect metrics, VictoriaLogs chooses to return an error instead.
- The `vlinsert` component continues to function normally when some vlstorage nodes are unavailable. It automatically routes new logs to the remaining available nodes to ensure that data ingestion remains uninterrupted and newly received logs are not lost.
> [!NOTE] Insight
> In most real-world cases, `vlstorage` nodes become unavailable during planned maintenance such as upgrades, config changes, or rolling restarts. These are typically infrequent (weekly or monthly) and brief (a few minutes).
> A short period of query downtime during such events is acceptable and fits well within most SLAs. For example, 60 minutes of downtime per month still provides around 99.86% availability, which often outperforms complex HA setups that rely on opaque auto-recovery and may fail unpredictably.
VictoriaLogs itself does not handle replication at the storage level. Instead, it relies on an external log shipper, such as [vector](https://docs.victoriametrics.com/victorialogs/data-ingestion/vector/) or [vlagent](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9034), to send the same log stream to multiple independent VictoriaLogs instances:
```mermaid
graph TD
subgraph "HA Solution"
subgraph "Ingestion Layer"
LS["Log Sources<br/>(Applications)"]
VECTOR["Log Collector<br/>• Buffering<br/>• Replication<br/>• Delivery Guarantees"]
end
subgraph "Storage Layer"
subgraph "Zone A"
VLA["VictoriaLogs Cluster A"]
end
subgraph "Zone B"
VLB["VictoriaLogs Cluster B"]
end
end
subgraph "Query Layer"
LB["Load Balancer<br/>(vmauth)<br/>• Health Checks<br/>• Failover<br/>• Query Distribution"]
QC["Query Clients<br/>(Grafana, API)"]
end
LS --> VECTOR
VECTOR -->|"Replicate logs to<br/>Zone A cluster"| VLA
VECTOR -->|"Replicate logs to<br/>Zone B cluster"| VLB
VLA -->|"Serve queries from<br/>Zone A cluster"| LB
VLB -->|"Serve queries from<br/>Zone B cluster"| LB
LB --> QC
style VECTOR fill:#e8f5e8
style VLA fill:#d5e8d4
style VLB fill:#d5e8d4
style LB fill:#e1f5fe
style QC fill:#fff2cc
style LS fill:#fff2cc
end
```
In this HA solution:
- A log shipper at the top receives logs and replicates them in parallel to two VictoriaLogs clusters.
- If one cluster fails completely (i.e., **all** of its storage nodes become unavailable), the log shipper continues to send logs to the remaining healthy cluster and buffers any logs that cannot be delivered. When the failed cluster becomes available again, the log shipper resumes sending both buffered and new logs to it.
- On the read path, a load balancer (e.g., vmauth) sits in front of the VictoriaLogs clusters and routes query requests to any healthy cluster.
- If one cluster fails (i.e., **at least one** of its storage nodes is unavailable), the load balancer detects this and automatically redirects all query traffic to the remaining healthy cluster.
There's no hidden coordination logic or consensus algorithm. You can scale it horizontally and operate it safely, even in bare-metal Kubernetes clusters using local PVs, as long as the log shipper handles reliable replication and buffering.
## Single-node and cluster mode duality
Every `vlstorage` node can be used as a single-node VictoriaLogs instance:
@@ -189,7 +274,7 @@ Start `vlselect` node, which [accepts incoming queries](https://docs.victoriamet
Note that all the VictoriaLogs cluster components - `vlstorage`, `vlinsert` and `vlselect` - share the same executable - `victoria-logs-prod`.
Their roles depend on whether the `-storageNode` command-line flag is set - if this flag is set, then the executable runs in `vlinsert` and `vlselect` modes.
Otherwise it runs in `vlstorage` mode, which is identical to a [single-node VictoriaLogs mode](https://docs.victoriametrics.com/victorialogs/).
Otherwise, it runs in `vlstorage` mode, which is identical to a [single-node VictoriaLogs mode](https://docs.victoriametrics.com/victorialogs/).
Let's ingest some logs (aka [wide events](https://jeremymorrell.dev/blog/a-practitioners-guide-to-wide-events/))
from [GitHub archive](https://www.gharchive.org/) into the VictoriaLogs cluster with the following command:

View File

@@ -28,23 +28,24 @@ See [the list of supported Journald fields](https://www.freedesktop.org/software
## Level field
VictoriaLogs atomatically sets the `level` log field according to the [`PRIORITY` field falue](https://wiki.archlinux.org/title/Systemd/Journal).
VictoriaLogs automatically sets the `level` log field according to the [`PRIORITY` field value](https://wiki.archlinux.org/title/Systemd/Journal).
## Stream fields
VictoriaLogs uses `(_MACHINE_ID, _HOSTNAME, _SYSTEMD_UNIT)` as [stream fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields)
for logs ingested via jorunald protocol. The list of log stream fields can be changed via `-journald.streamFields` command-line flag if needed,
by providing comma-separated list of journald fields form [this list](https://www.freedesktop.org/software/systemd/man/latest/systemd.journal-fields.html).
for logs ingested via journald protocol. The list of log stream fields can be changed via `-journald.streamFields` command-line flag if needed,
by providing comma-separated list of journald fields from [this list](https://www.freedesktop.org/software/systemd/man/latest/systemd.journal-fields.html).
Please make sure that the log stream fields passed to `-journlad.streamFields` do not contain fields with high number or unbound number of unique values,
Please make sure that the log stream fields passed to `-journald.streamFields` do not contain fields with high number or unbound number of unique values,
since this may lead to [high cardinality issues](https://docs.victoriametrics.com/victorialogs/keyconcepts/#high-cardinality).
This can happen with `_SYSTEMD_UNIT` if you have templated units with non-static instances
such as `systemd-coredump@.service` or if you have a `.socket` unit with `Accept=yes`.
The following Journald fields are also good candidates for stream fields:
- `_TRANSPORT`
- `_TRANSPORT` (to separate out kernel and audit logs which are not associated with a `_SYSTEMD_UNIT`)
- `_SYSTEMD_USER_UNIT`
## Dropping fields
VictoriaLogs can be configured for skipping the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)

View File

@@ -11,6 +11,7 @@ tags:
- logs
aliases:
- /victorialogs/keyConcepts.html
- /victorialogs/keyConcepts/
---
## Data model

View File

@@ -94,7 +94,7 @@ Loki allows applying filters to log labels with `{...} | label op value` syntax:
according to [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter).
* `{...} | label > value`, `{...} label >= value`, `{...} label < value` and `{...} label <= value`.
This is eqvalent to `{...} label:>value`, `{...} label:>=value`, `{...} label:<value` and `{...} label:<=value`
This is equivalent to `{...} label:>value`, `{...} label:>=value`, `{...} label:<value` and `{...} label:<=value`
in VictoriaLogs. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#range-comparison-filter).
* `{...} | label ~= value`. This is equivalent to `{...} label:~value` in VictoriaLogs.
@@ -192,7 +192,7 @@ Loki provides the ability to drop log labels with the `{...} | drop label1, ...,
according to [these docs](https://grafana.com/docs/loki/latest/query/log_queries/#drop-labels-expression).
The similar syntax is also supported by VictoriaLogs. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#delete-pipe).
Lokis supports conditional dropping of labels with the `{...} | drop label="value"` syntax.
Loki supports conditional dropping of labels with the `{...} | drop label="value"` syntax.
This can be replaced with [conditional format](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-format) at VictoriaLogs:
```logsql
@@ -266,7 +266,7 @@ can be substituted with the simple `{...} | count()` query at VictoriaLogs.
### Unwrapped range aggregations
Loki allows calculating metrics from label values by using the `func_name({...} | unwrap label_name)` syntax. There is no need in unrapping any labels in VictoriaLogs -
Loki allows calculating metrics from label values by using the `func_name({...} | unwrap label_name)` syntax. There is no need in unwrapping any labels in VictoriaLogs -
just pass the needed label names into the needed [`stats` pipe function](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe-functions).
VictoriaLogs aggregates all the selected logs by default, while Loki groups stats by log stream. Use `... | stats by (_stream) ...`

View File

@@ -128,7 +128,7 @@ See also:
## How to select logs with all the given words in log message?
Just enumerate the needed [words](https://docs.victoriametrics.com/victorialogs/logsql/#word) in the query, by deliming them with whitespace.
Just enumerate the needed [words](https://docs.victoriametrics.com/victorialogs/logsql/#word) in the query, by delimiting them with whitespace.
For example, the following query selects logs containing both `error` and `kubernetes` [words](https://docs.victoriametrics.com/victorialogs/logsql/#word)
in the [log message](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field):

View File

@@ -56,7 +56,7 @@ FROM <table>
```
The `<filters>` part selects the needed logs (rows) according to the provided [filters](https://docs.victoriametrics.com/victorialogs/logsql/#filters).
Then the provided [pipes](https://docs.victoriametrics.com/victorialogs/logsql/#pipes) are executed sequentlially.
Then the provided [pipes](https://docs.victoriametrics.com/victorialogs/logsql/#pipes) are executed sequentially.
Every such pipe receives all the rows from the previous stage, performs some calculations and/or transformations,
and then pushes the resulting rows to the next stage. This simplifies reading and understanding the query - just read it from the beginning
to the end in order to understand what does it do at every stage.

Some files were not shown because too many files have changed in this diff Show More