`null` values can be actual `NaN` or `null` values exposed by the
exporter, or stale markers
https://docs.victoriametrics.com/victoriametrics/vmagent/#prometheus-staleness-markers
Before, vmui Raw Query was silently dropping non-numeric values.
Displaying such values on chart could improve debugging experience.
Screenshots:
<img width="1487" height="833" alt="image"
src="https://github.com/user-attachments/assets/2c80cd52-9d37-41f7-ad73-a48335b8aef2"
/>
Decisions:
1. Since null value can't be mapped on Y axis - it is placed on the very
bottom as `X` sign. It is ok to have it placed on top or as a dotted
line
2. It has a tooltip explaining what it is
3. Its value is not accounted in cumulative min/max/avg tolltip of the
whole time series on the graph
4. Null value can't exactly state it is a stale marker - such a value
can come from many places
Values to import as example dataset:
```
{"metric":{"__name__":"cpu_usage","job":"test","instance":"127.0.0.1:8080","cpu":"1","exported_instance":"foo","exported_job":"foo"},"values":[null,0.2,0.2,null,0.2,null,0.2,0.2,null,0.2,null,0.2,null,0.2,null,0.2,null,0.2,null,0.2,null,0.2,null,0.2,0.2,0.2,0.2,null,0.2,0.2,null,0.2,0.2,null,0.2,0.2,null,0.2,null,0.2,null,0.2,null,0.2,null,0.2,null,0.2,0.2],"timestamps":[1779362323611,1779362323617,1779362333617,1779362343610,1779362343617,1779362353610,1779362353617,1779362363617,1779362373610,1779362373617,1779362383610,1779362383617,1779362393610,1779362393617,1779362403610,1779362403617,1779362413610,1779362413617,1779362423610,1779362423617,1779362433610,1779362433617,1779362443610,1779362443617,1779362453617,1779362463617,1779362473617,1779362483610,1779362483617,1779362493617,1779362503610,1779362503617,1779362513617,1779362523610,1779362523617,1779362533617,1779362543609,1779362543617,1779362553609,1779362553617,1779362563609,1779362563617,1779362573609,1779362573617,1779362583609,1779362583617,1779362593609,1779362593617,1779362603617]}
```
------------
Feature like that would significantly help in debugging issues like
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10893
@Loori-R I am sorry, but this PR is rather a PoC - it was mostly
generated with AI help.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Rerouting from a slow storage node is triggered only when the cluster's p90 saturation is below 0.6 (meaning the cluster has spare capacity) and the slow node's saturation exceeds the p90 by more than 20%.
This fixes a rerouting storm that could occur under high cluster load: when
all vmstorage saturations are close to each other, different vminsert nodes could disagree on which vmstorage is "the slowest" (since each calculates saturation independently), causing rerouting from many nodes simultaneously and making degradation worse. In a known customer case, 30% of vmstorage nodes were being rerouted from at the same time.
Previously, the median was used as the reference point with a 0.80 cutoff, which allowed rerouting even when the cluster was significantly loaded.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10876
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10901
govulncheck run locally uses the local Go version, which may differ from
the one used to build production binaries.
For example, local go1.26.2 may report vulnerabilities already fixed in
go1.26.3 used by the builder.
In this case the command would report issues which has to be manually
triaged by a developer.
Add govulncheck-docker target that runs govulncheck inside a Docker
container using the same version used to build production binaries.
The command will be used in release scripts and could be used manually.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10981
The `--retentionPeriod` flag is missing in several quick start guide
examples. This may cause users to overlook the parameter and incorrectly
believe data will be stored permanently until manually deleted.
But the quick start guide is not intended for production deployment, and
we have dedicated section `#Productionization` there already. We should
mention `--retentionPeriod` flag in this section.
This change could be helpful to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/249#issuecomment-4468637250
---------
Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Previously, it was not possible to configure mTLS between multi-level
vminserts. But vmselect supported such feature. It was a configuration
discrepancy.
This commit adds the same flags to the vminsert.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10958
Pin GitHub actions to their full-length commit SHAs.
Semver tags were updated to be more precise: e.g. `v7` to `v7.0.0`
---------
Signed-off-by: Rudransh Shrivastava <rudransh@victoriametrics.com>
When -remoteWrite.shardByURL is enabled, and one of the remote write
targets has -remoteWrite.disableOnDiskQueue set becomes blocked, samples
could be rerouted to other shards (see `getEligibleRemoteWriteCtxs` impl), breaking the sharding guarantee. Fix this by always using `rwctxsGlobal` in sharding mode.
Add a startup check that requires `-remoteWrite.disableOnDiskQueue` to be
configured uniformly across all targets when -remoteWrite.shardByURL` is enabled.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10507
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10947
Previously, errors in app/vmalert-tool and lib packages used the %w verb
in logger.Errorf calls, which is intended for wrapping errors via fmt.Errorf.
Using %w with the logger package does not wrap the error — instead, it prints
a malformed %!w(...) placeholder rather than the actual error message.
This commit replaces all affected occurrences of %w with %s to correctly
format and display errors.
Related PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10962
This registers `remoteWrite.headers` in `InitSecretFlags()` so it is
masked by the existing secret-flag.
Without this, values passed via `-remoteWrite.headers` are exposed in
startup flag logs, /metrics, and /flags, because these paths only redact
flags recognized by `flagutil.IsSecretFlag()`.
The change keeps the existing `-remoteWrite.showURL` behavior for
`remoteWrite.url`, while always treating `-remoteWrite.headers` as
secret.
Release v1.130 added a regression into enterprise vmstorage version.
Server configuration for vminsert listener was initialized without mtls
configuration args. It made impossible vminsert to vmstorage mtls
connection.
This commit fixes regression and adds a integration tests to verify
it.
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10958
The outdated link to the slides for this talk has been dropped in the commit f0a147fdf7 .
The video recording for the talk is still available at YouTube ( https://www.youtube.com/watch?v=ZJQYW-cFOms ),
so put it to the articles page.
Enterprise version of VictoriaTraces isn't available yet, but it is better to mention it
at the https://docs.victoriametrics.com/victoriametrics/enterprise/ page for the sake of consistency.
While at it, consistently use absolute links, even if they point to the same document.
This simplifies moving the text between docs without breaking the links.
This change should clearly distinguish different multitnenacy scenarios
for vmagent. It is expected to be easier to read and follow for users.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
The core `lib/promauth` already supports `usernameFile`
configs, but the CLI flags for vmagent remotewrite and vmalert
datasource/remotewrite/remoteread/notifier only expose
`basicAuth.username`.
This commit adds the corresponding `basicAuth.usernameFile` flags to match
the existing `basicAuth.passwordFile` pattern, closing the gap between
YAML and CLI configuration.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9436
This commit adds a warning message, if `-memory.allowedBytes` has value less than 1MB.
It should help to debug possible issues, if there is a problem with app start-up due to low memory limit.
For example, fastcache could panic at `-memory.allowedBytes=`
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10935
In most cases, vmalert is configured to write to vm components like
vminsert or vmagent, using VictoriaMetrics remote write protocol can
save network bandwidth.
The VictoriaMetrics remote write protocol is used by default, and the
protocol is downgraded from VictoriaMetrics to Prometheus remote write
if one request fails with protocol error.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10929
Replace the pattern of `git checkout <tag> && make <binary>` with `git
worktree add /tmp/vm-* <tag>` so that flag updates no longer switch the
working tree of the current repository. Each variant (opensource,
enterprise, cluster) gets its own worktree, removing the need to restore
the original branch between steps.
Also normalize dynamic default values in vmctl prometheus flags
(-prom-tmp-dir-path) to `os.TempDir()` to reduce noisy diffs caused by
machine-specific temp paths.
Add '=' separator between label name and value when computing the hash
to prevent false collisions, like {a="bc"} and {ab="c"} hashing to the
same value.
getLabelsHashForShard is added to avoid sharding disruptions in vmagent
(-remoteWrite.shardByURL=true mode). The function preserves previous
behavior, without '=' between name and value.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10937
### Describe Your Changes
Fix stale `quantiles(...)` stream aggregation output for series without
samples in the current aggregation interval.
Previously, `quantilesAggrConfig` reused the `quantiles` buffer across
aggregation values. If `quantilesAggrValue.flush` was called for a
series without samples after another series had already calculated
quantiles, the stale quantile
values could be emitted for the empty series.
This could produce unrealistic `*_quantiles` output values and make the
same aggregated value appear across unrelated labelsets.
The PR skips `quantiles(...)` output when there is no histogram for the
current interval and adds a regression test for this case.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
synctest runs inner closure in a new goroutine, which makes `t.Helper` instruction
useless on `t.Fatalf` checks. So when test fails we observe the log line where `t.Fatalf`
was called, instead of where `f()` was called.
Moving checks out of synctest closure makes `t.Helper` useful again.
--
In the synctest we were waiting for ingest a new batch of samples for aggregation interval.
Because of this, the new batch had 50% chance to be ingested in the previous or current
aggregation interval, depending on whether go run time initiated flush() call or no.
This change waits for additional 1ms for flush to happen. Locally, it stopped producing
flaky tests.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
When a request contains both URL path query params and POST form values
for extra_label and extra_filters[], URL query params now take
precedence. This resolves the conflict between the two sources and
simplifies security enforcement for extra_label/extra_filters policies
via vmauth or any other http proxy.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10908
This commit introduces a new metric to expose fs type for the provided path.
For example:
```
vm_fs_info{path="/vmstorage-data", fs_type="xfs"}
```
Path must be registered with new method `fs.RegisterPathFsMetrics`.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10482
Add a new `Kafka (Enterprise)` row to both vmagent dashboards:
- `dashboards/vmagent.json`
- `dashboards/vm/vmagent.json`
The row is placed before `Drilldown` and contains three Kafka-specific
panels:
- `Kafka bytes`
- `Kafka messages in/out`
- `Kafka and consumer errors`
The goal is to provide a compact Kafka-focused view for enterprise
vmagent deployments without duplicating the existing generic remote
write panels such as connection saturation and persistent queue size.
The new row helps distinguish:
- producer vs consumer throughput at the Kafka topic level
- message-rate shifts that may indicate smaller Kafka payloads and
higher per-message overhead
- producer-side Kafka errors vs consumer-side Kafka errors
Descriptions include links to the relevant Kafka documentation sections.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10728
---------
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 4.35.1 to 4.35.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v4.35.2</h2>
<ul>
<li>The undocumented TRAP cache cleanup feature that could be enabled
using the <code>CODEQL_ACTION_CLEANUP_TRAP_CACHES</code> environment
variable is deprecated and will be removed in May 2026. If you are
affected by this, we recommend disabling TRAP caching by passing the
<code>trap-caching: false</code> input to the <code>init</code> Action.
<a
href="https://redirect.github.com/github/codeql-action/pull/3795">#3795</a></li>
<li>The Git version 2.36.0 requirement for improved incremental analysis
now only applies to repositories that contain submodules. <a
href="https://redirect.github.com/github/codeql-action/pull/3789">#3789</a></li>
<li>Python analysis on GHES no longer extracts the standard library,
relying instead on models of the standard library. This should result in
significantly faster extraction and analysis times, while the effect on
alerts should be minimal. <a
href="https://redirect.github.com/github/codeql-action/pull/3794">#3794</a></li>
<li>Fixed a bug in the validation of OIDC configurations for private
registries that was added in CodeQL Action 4.33.0 / 3.33.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3807">#3807</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2">2.25.2</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3823">#3823</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>4.35.2 - 15 Apr 2026</h2>
<ul>
<li>The undocumented TRAP cache cleanup feature that could be enabled
using the <code>CODEQL_ACTION_CLEANUP_TRAP_CACHES</code> environment
variable is deprecated and will be removed in May 2026. If you are
affected by this, we recommend disabling TRAP caching by passing the
<code>trap-caching: false</code> input to the <code>init</code> Action.
<a
href="https://redirect.github.com/github/codeql-action/pull/3795">#3795</a></li>
<li>The Git version 2.36.0 requirement for improved incremental analysis
now only applies to repositories that contain submodules. <a
href="https://redirect.github.com/github/codeql-action/pull/3789">#3789</a></li>
<li>Python analysis on GHES no longer extracts the standard library,
relying instead on models of the standard library. This should result in
significantly faster extraction and analysis times, while the effect on
alerts should be minimal. <a
href="https://redirect.github.com/github/codeql-action/pull/3794">#3794</a></li>
<li>Fixed a bug in the validation of OIDC configurations for private
registries that was added in CodeQL Action 4.33.0 / 3.33.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3807">#3807</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2">2.25.2</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3823">#3823</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="95e58e9a2c"><code>95e58e9</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3824">#3824</a>
from github/update-v4.35.2-d2e135a73</li>
<li><a
href="6f31bfe060"><code>6f31bfe</code></a>
Update changelog for v4.35.2</li>
<li><a
href="d2e135a73a"><code>d2e135a</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3823">#3823</a>
from github/update-bundle/codeql-bundle-v2.25.2</li>
<li><a
href="60abb65df0"><code>60abb65</code></a>
Add changelog note</li>
<li><a
href="5a0a562209"><code>5a0a562</code></a>
Update default bundle to codeql-bundle-v2.25.2</li>
<li><a
href="65216971a1"><code>6521697</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3820">#3820</a>
from github/dependabot/github_actions/dot-github/wor...</li>
<li><a
href="3c45af2dd2"><code>3c45af2</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3821">#3821</a>
from github/dependabot/npm_and_yarn/npm-minor-345b93...</li>
<li><a
href="f1c339364c"><code>f1c3393</code></a>
Rebuild</li>
<li><a
href="1024fc496c"><code>1024fc4</code></a>
Rebuild</li>
<li><a
href="9dd4cfed96"><code>9dd4cfe</code></a>
Bump the npm-minor group across 1 directory with 6 updates</li>
<li>Additional commits viewable in <a
href="https://github.com/github/codeql-action/compare/v4.35.1...v4.35.2">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This change introduces a helper `MustStartDefaultRWVmagent` that by
default sets `-remoteWrite.flushInterval=50ms`. This helper makes it
easier to setup RW tests as all of them rely on frequent flushes. So
instead of overloading the flag, we can use dedicated helper for that.
This helper was added after newly added RW test became flaky because it
didn't have `-remoteWrite.flushInterval=50ms` set.
---------
Failing test
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/25446725004/job/74769752869#step:5:71
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit adds possibility to omit tenantID in the URL path. In this case,
tenantID will be fetched from HTTP headers `AccountID` and `ProjectID`.
If headers are missing too, then default `0:0` tenantID is used.
This functionality can be enabled only if -enableMultitenantHandlers
cmd-line flag was set to vminsert, vmselect or vmagent.
Motivation: this change makes VM configuration for multienancy
consistent with VL configuration - see
https://docs.victoriametrics.com/victorialogs/#multitenancy. And keeps
backward compatibility in the same time.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4241
Before, some of the template examples were wrongly renderred by hugo.
For example:
```
http://vm-grafana.com/<dashboard-id>?viewPanel=<panel-id>&from={{($activeAt.Add (parseDurationTime \"-1h\")).UnixMilli}}&to={{($activeAt.Add (parseDurationTime \"1h\")).UnixMilli}}
```
was renderred like:
```
http://vm-grafana.com/ ?viewPanel=&from={{($activeAt.Add (parseDurationTime "-1h")).UnixMilli}}&to={{($activeAt.Add (parseDurationTime "1h")).UnixMilli}}
```
Wrapping examples in ` helps to render them raw.
While there, also fixed some examples.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
On 2025-12-16, Hetzner Cloud deprecated the `datacenter` field in their
Servers API and introduced a top-level `location` field carrying the
same data. The `datacenter` field will be removed after 2026-07-01.
Without this change, `__meta_hetzner_hcloud_datacenter_location`, and
`__meta_hetzner_hcloud_datacenter_location_network_zone` would silently
become empty for the `hcloud` role after that date.
This mirrors the change made in Prometheus v3.11.0
([prometheus/prometheus#17850](https://github.com/prometheus/prometheus/pull/17850)).
## Changes
**`hcloud` role:**
- Add `HCloudLocation` struct and `Location` field on `HCloudServer`,
mapped to the new top-level `location` API field
- Emit two new canonical labels: `__meta_hetzner_hcloud_location` and
`__meta_hetzner_hcloud_location_network_zone`
- Keep the deprecated `__meta_hetzner_hcloud_datacenter_location` and
`__meta_hetzner_hcloud_datacenter_location_network_zone` labels, now
sourced from the new `location` field so they continue to work past
2026-07-01
- `__meta_hetzner_datacenter` (the datacenter name, e.g. `fsn1-dc14`) is
unaffected for this role — the datacenter name is a distinct concept
from location and is kept as-is (this will stop working starting
2026-07-01)
**`robot` role:**
- Add `__meta_hetzner_robot_datacenter` as the canonical replacement for
`__meta_hetzner_datacenter`; the old label is kept for backward
compatibility
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10909
At v1.142.0 was introduced a bug, when changes from OSS version were
back-ported into Enterprise branch. It changed the order of storage
nodes discovery. And resulted into:
* overwrite of discovered storage nodes
* duplicate of per storage node metrics
This bug only affects enterprise vminsert version.
Mention -rule.stripFilePath cmd-limne flag in security recommendations,
so users can be aware of it.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Haley Wang <haley@victoriametrics.com>
The change adds `AI observability` section to `AI tools` documentation.
It mentions excellent @Amper articles describing these integrations in
all details.
The doc change doesn't repeat the articles, but rather helps users to
discover them.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The flag already exists in the ENT version. We decided to expose it in
OSS and strip the path from all public places, including all
APIs(includes `/metrics`) and debug logs(it's minor info there).
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5625
Previously, if `-tls` flag was provided, victoria metrics components
produced the following log error entry at health checks:
http: TLS handshake error from 10.244.0.1:46556: EOF
Such health checks are common for many orchestration systems, such as
consul
or kubernetes. And default http server already suppresses such EOF
health checks.
This commit adds suppression to the tls server as well.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10538
# What Changed
- Updated the operator installation procedure
- Updated the commands to match the rest of the guides
- Updated screenshots
- Reordered steps to make more sense of the process
- Fixed issues in the YAML
- Tested on actual OpenShift trial instance running on AWS
- Added steps to confirm log ingestion using VMUI
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10864
This PR fixes several broken links and anchors in the victoriametrics
docs.
Note about links changes in FAQ.md file. The links inside the paragraph
break navigation in the right-side menu. To fix this, an explicit anchor
definition has been added. The anchor is the same as before, setting it
explsitly fixes the siebar links.
See https://github.com/VictoriaMetrics/vmdocs/issues/221 for the
up-to-date list once this PR is merged.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10874
Previously, metricMetadata was not properly reset during parsing of
metrics. It could result into `Unit` suffix to be added from previously
parsed metric into next metric without Unit field.
For example, metric `http_request` with `Unit` `seconds` will be
converted into `http_request_seconds` and `Unit` field hold `seconds`.
Next parsed metric `cpu_usage_ratio` has no `Unit` and it will get
previous `seconds` `Unit` -> `cpu_usage_ratio_seconds`.
This commit adds metricMetadata reset call before parsing of next
metric.
Bug was introduced at 293d80910c
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10889
Iximiuz labs prepared a set of playgrounds for VictoriaMetrics. These
are interactive playgrounds backed by real Linux machines running
VictoriaMetrics software, allowing experimenting and investigating right
in the browser tab.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Metadata is enabled by default since v1.137.0, and the metadata volume
can be a big contributor to resource usage and network traffic.
vmagent dahsboard:
1. `Troubleshooting` section: rename `Datapoints rate` panel to `Rows
rate` to include metadata rate;
2. `Ingestion` section: add metadata rate to existing `Rows rate` panel.
(The difference between this panel and the one above is that this panel
only contains data from write requests, while the above panel also
includes the scraping part.)
vmcluster dashboard:
1. `vminsert` section: add `Rows rate` panel
Didn’t see a good place for it in the vmsingle dashboard, since it
doesn’t have a dedicated insert section, and I don’t want to add it to
`overview` yet.
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10868
* add visual mermaid diagram to demonstrate aggregation concept;
* update Recording-rules-alternative:
* * recommend using rate_sum instead of total for better reliability
* * demonstrate how to calculate sliding window, typicall for recording
rules
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Add the support of storage and retrieval of samples with future
timestamps as requested in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/827
What to expect:
- By default, the max future timestamp is still limited to `now+2d`. To
change it, set the `-futureRetention` flag in `vmstorage`. The max flag
value is currently limited to `100y`. It can be extended if we see a
demand for this, but it can't be more than `~ 290y` due to how the time
duration is implemented in Go. The flag value can't be less than `2d`.
- downsampling and retention filters (available in enterprise edition)
are currently not supported for future timestamps
- If `vmstorage` restarts with a smaller value of `-futureRetention`
flag, any future partitions that are outside the new future retention
will be automatically deleted.
- Data ingestion, data retrieval, backup/restore, timeseries (soft)
deletion, and other operations work with future timestamps the same way
as with the historical timestamps.
- In the cluster version, the affected binaries are `vmstorage` and
`vmselect`. This means that `vmselect` version must match `vmstorage`
version if you want to query future timestamps. `vminsert` was not
affected, so its version can be a lower one.
- If you downgrade the `vmstorage`, the data with future timestamps will
remain on disk and memory (per-partition caches) but won't be available
for querying.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Update `timeutil.ParseTimeAt` to check the time limits for all date/time formats, not just year.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, backend url health check start could produce a data race
and a race condition.
The following panic could be produced:
`panic: sync: WaitGroup is reused before previous Wait has returned`
It happened because concurrent goroutine could process request, while
configuration was reloaded and stopHealthChecks method was called.
This commit adds a dedicated structure for backend health checks.
Which protects from data race with mutex guard. And prevents race
condition with a boolean flag.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10806
Visually outline that guideline message should be removed from
description before submitting the PR. This should prevent cases when PR
template was blending into the PRs description remaining unnoticed.
The commit in metricsql
d0bc93816e
introduced a bug that changes an order of binary op evaluation. This
commit updates to metricsql version that fixes a bug by reverting to
previous behavior.
The bug was introduced in v1.140.0, v1.136.4, and v1.122.19 releases.
It was reported in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10856
Previously
- `GetData` in the OpenTSDB client was returning empty `Metric{}` with
`nil` error for several conditions (multiple series returned, aggregate
tags present, `modifyData` failures), causing `vmctl opentsdb` to
silently drop series during migration
This commit changes these silent return paths to return proper errors with
descriptive messages including the query string, so operators can detect
and diagnose partial migrations.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10797
Previously, if rule label value was set to empty string, vmalert ignored this label during labels merge with labels from data source response. In contrast, Prometheus removes data source label in this case as well. Which allows to perform label delete operation.
This commit uses the same logic as Prometheus for resolving labels conflicts and allows to remove labels.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10766
Proxy protocol parser kept sub-slice reference for pooled bytesBuffer at readProxyProto
```
bb := bbPool.Get()
defer bbPool.Put(bb) // ← buffer returned to pool AFTER function returns
...
IP: bb.B[0:16], // ← BUG: sub-slice of pooled buffer!
...
```
This commit properly allocates new slice for ipv6 address and copies buffer content to it.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10839
`IndexDBRecordsDrop` and `TooManyTSIDMisses` were mistakenly placed to `alerts-health.yml`,
which was supposed to contain rules related to all VM components. But these two rules
are related to storage components only (vmstorage and vmsingle). Moving them to corresponding
files.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change should reduce confusion for users where `alerts.yml`
belongs to. Before, developers could mistakenly assume that
`alerts.yml` was related to both single and cluster installations.
In result, rule `MetadataCacheUtilizationIsTooHigh` was added only
to `alerts.yml` and not copied to `alerts-cluster.yml`.
The rename change should bring more context into the file name
and reduce confusion in the future.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, this rule was only a part of single-node rule set.
But it is applicable for both: single and cluster installations.
Adding it to cluster as well.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new rule `MetricNameStatsCacheUtilizationIsTooHigh` will signalize
about overutilization of Metric names usage stats tracker. See
https://docs.victoriametrics.com/victoriametrics/#track-ingested-metrics-usage
This rule can fire for deployments with high churn rate of metric names.
In cases like this, it is better to disable metric name tracking
completely, as it brings no use.
It might fire for deployments that were tracking metric names for very
long periods and this alert might be a good sign to reset the cache.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The docs currently wrongly states that vminsert applies a label limit
per timeseries of `30`. Currently, the limit is `40`, which is also
correctly stated in in vmcluster docs. This PR corrects this in the key
concepts docs.
```
-maxLabelsPerTimeseries int
The maximum number of labels per time series to be accepted. Series with superfluous labels are ignored. In this case the vm_rows_ignored_total{reason="too_many_labels"} metric at /metrics page is incremented (default 40)
```
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10826
After switching squash merges to use the PR title and description, the
PR template text started leaking into final commit messages and adding
noise.
This PR removes the template and documents what a PR title and PR
description should contain instead.
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10789
Previously After RoundTrip returns successfully (err == nil, res != nil), the code checks if the original client request's context was canceled. If canceled, it returns immediately without closing res.Body.
There is a race window where:
1) RoundTrip completes successfully (res is non-nil)
2) The client cancels the request context (closes connection)
3) The context check at line 484 sees the cancellation
4) The function returns without closing res.Body
The response body holds a reference to the underlying TCP connection. Without closing it, the connection is permanently leaked along with the transport goroutines (readLoop + writeLoop or dialConnFor).
bug was introduced at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10233
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10833
**"Run query" link params**
Added correct params to "Run query" link on Alerting Rules page:
- `g0.step_input` - set to `group.interval` (in seconds)
- `g0.end_time` - set to `rule.lastEvaluation` / `alert.activeAt`
- `g0.relative_time=none` - to fix the time range
**Time display timezone**
Changed `t.format(...)` to `t.tz().format(...)` to display time in the
user-selected timezone.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10366https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10827
TCP healthchecks on the clusternative port of vmselect logs the following warning continuously:
VictoriaMetrics/lib/vmselectapi/server.go:204 cannot complete vmselect handshake due to network error with client "10.129.30.27:43829": cannot read hello message : cannot read message with size 11: EOF; read only 0 bytes. Check vmselect logs for errors
This is in contrast to vminsert, where it seems like there's handling for these healthchecks:
```
if errors.Is(err, io.EOF) {
// This is likely a TCP healthcheck, which must be ignored in order to prevent logs pollution.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1762
return errTCPHealthcheck
```
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10786
Previously, on non-200 HTTP status codes, lib/promscrape performed an
unbounded body read, which could potentially result in OOM.
This commit adds a maxScrapeSize limit to error response body reads,
protecting against malicious or misbehaving metrics endpoints.
vmsingle shuts down vminsert before closing the ingestion rate limiter, even though the rate limiter API explicitly requires the opposite order to unblock callers. vminsert.Stop() waits for unmarshal workers, which can be blocked in ingestionRateLimiter.Register() when the limit is hit.
Workers in runParallelPerPathInternal check ctxLocal.Done() before processing each work item and exit early on cancellation — without sending a result to resultCh. However, the coordinator loop always waits for exactly len(perPath) results from resultCh. If cancellation occurs before all tasks report, the read blocks indefinitely.
0aaa741b5b introduced a regression in lib/awsapi/config.go that causes empty credentials to be returned on the very first call to getFreshAPICredentials() when using EKS Pod Identity (or any container credential mechanism with no static access key). These empty credentials are then used for SigV4 signing -> 403 Forbidden on every remote write request.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10815
Just run a simple bash command without the heavyweight Docker image
While at it, rely on TAG environment variable instead of PKG_TAG env variable
for `make docs-update-version`, in order to be consistent with other Make commands.
The change is needed to group splitting/sharding section of the documentation,
so they go one after another. This should improve readability.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The previous descrioption didn't mention that relabeling can be used
for filtering scrape targets. Adding this metion.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
These links were removed in 134501bf99
without adding complete substitution to their content.
Restoring these links as they can be useful for readers to learn about relabeling.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The old links were removed in #10754
mistakenly thinking that google didn't index it. However, it did. And users can get 404
when searching in google for VM plyagrounds.
Restoring the links via aliases. It means hugo will serve the `/playgrounds` page when
user requests `/playgrounds/victoriametrics/`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Fix app tests:
1. Sync code between vmsingle and vmcluster: it must be the same because
apptest does not differentiate between branches, it just runs pre-built
binaries
2. Simplify range queries in backup/restore test so that it does not
depend on the interval between samples to work correctly.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Before creating the PR, make sure you have read and followed the [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
You can find out about our security policy and VictoriaMetrics version support on the [security page](https://docs.victoriametrics.com/victoriametrics/#security) in the documentation.
The following versions of VictoriaMetrics receive regular security fixes:
maxLabelsPerTimeseries=flag.Int("maxLabelsPerTimeseries",0,"The maximum number of labels per time series to be accepted. Series with superfluous labels are ignored. In this case the vm_rows_ignored_total{reason=\"too_many_labels\"} metric at /metrics page is incremented")
maxLabelNameLen=flag.Int("maxLabelNameLen",0,"The maximum length of label names in the accepted time series. Series with longer label name are ignored. In this case the vm_rows_ignored_total{reason=\"too_long_label_name\"} metric at /metrics page is incremented")
maxLabelValueLen=flag.Int("maxLabelValueLen",0,"The maximum length of label values in the accepted time series. Series with longer label value are ignored. In this case the vm_rows_ignored_total{reason=\"too_long_label_value\"} metric at /metrics page is incremented")
enableMultitenancyViaHeaders=flag.Bool("enableMultitenancyViaHeaders",false,"Enables multitenancy via HTTP headers. "+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#multitenancy")
"Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'")
basicAuthUsername=flagutil.NewArrayString("remoteWrite.basicAuth.username","Optional basic auth username to use for the corresponding -remoteWrite.url")
basicAuthUsernameFile=flagutil.NewArrayString("remoteWrite.basicAuth.usernameFile","Optional path to basic auth username to use for the corresponding -remoteWrite.url. "+
"The file is re-read every second")
basicAuthPassword=flagutil.NewArrayString("remoteWrite.basicAuth.password","Optional basic auth password to use for the corresponding -remoteWrite.url")
basicAuthPasswordFile=flagutil.NewArrayString("remoteWrite.basicAuth.passwordFile","Optional path to basic auth password to use for the corresponding -remoteWrite.url. "+
logger.Infof("received unsupported media type or bad request from remote storage at %q. Re-packing the block to Prometheus remote write and retrying."+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol",req.URL.Redacted())
zstdBlockLen:=len(data)
data,err=repackBlockFromZstdToSnappy(data)
iferr==nil{
logger.Infof("received unsupported media type or bad request from remote storage at %q. Downgrading protocol from VictoriaMetrics to Prometheus remote write for all future requests. "+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol",req.URL.Redacted())
c.isVMRemoteWrite.Store(false)
returnc.send(ctx,data)
}
logger.Warnf("failed to repack zstd block (%d bytes) to snappy: %s; The block will be rejected. "+
"Possible cause: ungraceful shutdown leading to persisted queue corruption.",
zstdBlockLen,err)
}
}
ifresp.StatusCode!=http.StatusTooManyRequests{
// MUST NOT retry write requests on HTTP 4xx responses other than 429
return&nonRetriableError{
@@ -394,3 +438,19 @@ type nonRetriableError struct {
func(e*nonRetriableError)Error()string{
returne.err.Error()
}
var(
writeRequestBufPoolbytesutil.ByteBufferPool
compressBufPoolbytesutil.ByteBufferPool
)
// repackBlockFromZstdToSnappy repacks the given zstd-compressed block to snappy-compressed block.
// On k conflicts in origin set, the original value is preferred and copied
// to processed with `exported_%k` key. The copy happens only if passed v isn't equal to origin[k] value.
func(ls*labelSet)add(k,vstring){
// do not add label with empty value, since it has no meaning.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
// do not add label with empty value to the result, as it has no meaning:
// if the label already exists in the original query result, remove it to preserve compatibility with relabeling, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10766.
// otherwise, ignore the label, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984.
"invalid_label":`error evaluating template: template: :1:298: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
"For example, if lookback=1h then range from now() to now()-1h will be scanned.")
maxStartDelay=flag.Duration("group.maxStartDelay",5*time.Minute,"Defines the max delay before starting the group evaluation. Group's start is artificially delayed for random duration on interval"+
" [0..min(--group.maxStartDelay, group.interval)]. This helps smoothing out the load on the configured datasource, so evaluations aren't executed too close to each other.")
ruleStripFilePath=flag.Bool("rule.stripFilePath",false,"Whether to strip rule file paths in logs and all API responses, including /metrics. "+
"For example, file path '/path/to/tenant_id/rules.yml' will be stripped to 'groupHashID/rules.yml'. "+
"This flag may be useful for hiding sensitive information in file paths, such as S3 bucket details.")
// do not add label with empty value, since it has no meaning.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
// do not add label with empty value to the result, as it has no meaning:
// if the label already exists in the original query result, remove it to preserve compatibility with relabeling, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10766.
// otherwise, ignore the label, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984.
"This allows reducing the consumption of backend resources when processing requests from clients connected via slow networks. "+
"Set to 0 to disable request buffering. See https://docs.victoriametrics.com/victoriametrics/vmauth/#request-body-buffering")
maxRequestBodySizeToRetry=flagutil.NewBytes("maxRequestBodySizeToRetry",16*1024,"The maximum request body size to buffer in memory for potential retries at other backends. "+
"Request bodies larger than this size cannot be retried if the backend fails. Zero or negative value disables request body buffering and retries. "+
"Request bodies larger than this size cannot be retried if the backend fails. Zero or negative value disables retries. "+
"See also -requestBufferSize")
maxConcurrentRequests=flag.Int("maxConcurrentRequests",1000,"The maximum number of concurrent requests vmauth can process simultaneously. "+
retentionPeriod=flagutil.NewRetentionDuration("retentionPeriod","1M","Data with timestamps outside the retentionPeriod is automatically deleted. The minimum retentionPeriod is 24h or 1d. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention. See also -retentionFilter")
futureRetention=flagutil.NewRetentionDuration("futureRetention","2d","Data with timestamps bigger than now+futureRetention is automatically deleted. "+
"The minimum futureRetention is 2 days. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention")
snapshotAuthKey=flagutil.NewPassword("snapshotAuthKey","authKey, which must be passed in query string to /snapshot* pages. It overrides -httpAuth.*")
forceMergeAuthKey=flagutil.NewPassword("forceMergeAuthKey","authKey, which must be passed in query string to /internal/force_merge pages. It overrides -httpAuth.*")
forceFlushAuthKey=flagutil.NewPassword("forceFlushAuthKey","authKey, which must be passed in query string to /internal/force_flush pages. It overrides -httpAuth.*")
@@ -54,8 +56,8 @@ var (
logNewSeries=flag.Bool("logNewSeries",false,"Whether to log new series. This option is for debug purposes only. It can lead to performance issues "+
"when big number of new series are ingested into VictoriaMetrics")
denyQueriesOutsideRetention=flag.Bool("denyQueriesOutsideRetention",false,"Whether to deny queries outside the configured -retentionPeriod. "+
"When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+
denyQueriesOutsideRetention=flag.Bool("denyQueriesOutsideRetention",false,"Whether to deny queries outside the configured -retentionPeriod and -futureRetention. "+
"When set, then /api/v1/query_range will return an error for queries with 'from' value outside -retentionPeriod or 'to' value beyond -futureRetention. "+
"This may be useful when multiple data sources with distinct retentions are hidden behind query-tee")
maxHourlySeries=flag.Int64("storage.maxHourlySeries",0,"The maximum number of unique series can be added to the storage during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-limiter . "+
@@ -101,21 +103,6 @@ var (
"If set to 0 or a negative value, defaults to 1% of allowed memory.")
)
// CheckTimeRange returns true if the given tr is denied for querying.
tooltip:"The time range between start and end of the query request. 'instant' means the query was executed at a single point in time without a time range"
};
constcountCol: TopQueryColumn={
key:"count",
tooltip:`The number of times the query was executed over the last ${maxLifetime}`,
};
consttopBySumDuration: TopQueryColumn[]=[
queryCol,
{
key:"sumDurationSeconds",
title:"duration",
tooltip:`Cumulative time spent executing the query across all its invocations over the last ${maxLifetime}`,
// Lift the marker by half its size so the entire icon sits inside the plot area
// (yMin maps to the plot's bottom edge, so centering on it would clip the lower half).
constcy=valToPosY(yMin,scaleY,yDim,yOff)-xHalf;
constxPath=newPath2D();
for(leti=0;i<nullTs.length;i++){
constt=nullTs[i];
if(t<xMin||t>xMax)continue;
constcx=valToPosX(t,scaleX,xDim,xOff);
xPath.moveTo(cx-xHalf,cy-xHalf);
xPath.lineTo(cx+xHalf,cy+xHalf);
xPath.moveTo(cx+xHalf,cy-xHalf);
xPath.lineTo(cx-xHalf,cy+xHalf);
}
u.ctx.lineWidth=1.6*uPlot.pxRatio;
u.ctx.strokeStyle=u.ctx.fillStyle;
u.ctx.stroke(xPath);
}
};
uPlot.orient(u,seriesIdx,orientCallback);
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.