When a request contains both URL path query params and POST form values
for extra_label and extra_filters[], URL query params now take
precedence. This resolves the conflict between the two sources and
simplifies security enforcement for extra_label/extra_filters policies
via vmauth or any other http proxy.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10908
This commit introduces a new metric to expose fs type for the provided path.
For example:
```
vm_fs_info{path="/vmstorage-data", fs_type="xfs"}
```
Path must be registered with new method `fs.RegisterPathFsMetrics`.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10482
Add a new `Kafka (Enterprise)` row to both vmagent dashboards:
- `dashboards/vmagent.json`
- `dashboards/vm/vmagent.json`
The row is placed before `Drilldown` and contains three Kafka-specific
panels:
- `Kafka bytes`
- `Kafka messages in/out`
- `Kafka and consumer errors`
The goal is to provide a compact Kafka-focused view for enterprise
vmagent deployments without duplicating the existing generic remote
write panels such as connection saturation and persistent queue size.
The new row helps distinguish:
- producer vs consumer throughput at the Kafka topic level
- message-rate shifts that may indicate smaller Kafka payloads and
higher per-message overhead
- producer-side Kafka errors vs consumer-side Kafka errors
Descriptions include links to the relevant Kafka documentation sections.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10728
---------
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 4.35.1 to 4.35.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v4.35.2</h2>
<ul>
<li>The undocumented TRAP cache cleanup feature that could be enabled
using the <code>CODEQL_ACTION_CLEANUP_TRAP_CACHES</code> environment
variable is deprecated and will be removed in May 2026. If you are
affected by this, we recommend disabling TRAP caching by passing the
<code>trap-caching: false</code> input to the <code>init</code> Action.
<a
href="https://redirect.github.com/github/codeql-action/pull/3795">#3795</a></li>
<li>The Git version 2.36.0 requirement for improved incremental analysis
now only applies to repositories that contain submodules. <a
href="https://redirect.github.com/github/codeql-action/pull/3789">#3789</a></li>
<li>Python analysis on GHES no longer extracts the standard library,
relying instead on models of the standard library. This should result in
significantly faster extraction and analysis times, while the effect on
alerts should be minimal. <a
href="https://redirect.github.com/github/codeql-action/pull/3794">#3794</a></li>
<li>Fixed a bug in the validation of OIDC configurations for private
registries that was added in CodeQL Action 4.33.0 / 3.33.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3807">#3807</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2">2.25.2</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3823">#3823</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>4.35.2 - 15 Apr 2026</h2>
<ul>
<li>The undocumented TRAP cache cleanup feature that could be enabled
using the <code>CODEQL_ACTION_CLEANUP_TRAP_CACHES</code> environment
variable is deprecated and will be removed in May 2026. If you are
affected by this, we recommend disabling TRAP caching by passing the
<code>trap-caching: false</code> input to the <code>init</code> Action.
<a
href="https://redirect.github.com/github/codeql-action/pull/3795">#3795</a></li>
<li>The Git version 2.36.0 requirement for improved incremental analysis
now only applies to repositories that contain submodules. <a
href="https://redirect.github.com/github/codeql-action/pull/3789">#3789</a></li>
<li>Python analysis on GHES no longer extracts the standard library,
relying instead on models of the standard library. This should result in
significantly faster extraction and analysis times, while the effect on
alerts should be minimal. <a
href="https://redirect.github.com/github/codeql-action/pull/3794">#3794</a></li>
<li>Fixed a bug in the validation of OIDC configurations for private
registries that was added in CodeQL Action 4.33.0 / 3.33.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3807">#3807</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2">2.25.2</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3823">#3823</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="95e58e9a2c"><code>95e58e9</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3824">#3824</a>
from github/update-v4.35.2-d2e135a73</li>
<li><a
href="6f31bfe060"><code>6f31bfe</code></a>
Update changelog for v4.35.2</li>
<li><a
href="d2e135a73a"><code>d2e135a</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3823">#3823</a>
from github/update-bundle/codeql-bundle-v2.25.2</li>
<li><a
href="60abb65df0"><code>60abb65</code></a>
Add changelog note</li>
<li><a
href="5a0a562209"><code>5a0a562</code></a>
Update default bundle to codeql-bundle-v2.25.2</li>
<li><a
href="65216971a1"><code>6521697</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3820">#3820</a>
from github/dependabot/github_actions/dot-github/wor...</li>
<li><a
href="3c45af2dd2"><code>3c45af2</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3821">#3821</a>
from github/dependabot/npm_and_yarn/npm-minor-345b93...</li>
<li><a
href="f1c339364c"><code>f1c3393</code></a>
Rebuild</li>
<li><a
href="1024fc496c"><code>1024fc4</code></a>
Rebuild</li>
<li><a
href="9dd4cfed96"><code>9dd4cfe</code></a>
Bump the npm-minor group across 1 directory with 6 updates</li>
<li>Additional commits viewable in <a
href="https://github.com/github/codeql-action/compare/v4.35.1...v4.35.2">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This change introduces a helper `MustStartDefaultRWVmagent` that by
default sets `-remoteWrite.flushInterval=50ms`. This helper makes it
easier to setup RW tests as all of them rely on frequent flushes. So
instead of overloading the flag, we can use dedicated helper for that.
This helper was added after newly added RW test became flaky because it
didn't have `-remoteWrite.flushInterval=50ms` set.
---------
Failing test
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/25446725004/job/74769752869#step:5:71
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit adds possibility to omit tenantID in the URL path. In this case,
tenantID will be fetched from HTTP headers `AccountID` and `ProjectID`.
If headers are missing too, then default `0:0` tenantID is used.
This functionality can be enabled only if -enableMultitenantHandlers
cmd-line flag was set to vminsert, vmselect or vmagent.
Motivation: this change makes VM configuration for multienancy
consistent with VL configuration - see
https://docs.victoriametrics.com/victorialogs/#multitenancy. And keeps
backward compatibility in the same time.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4241
Before, some of the template examples were wrongly renderred by hugo.
For example:
```
http://vm-grafana.com/<dashboard-id>?viewPanel=<panel-id>&from={{($activeAt.Add (parseDurationTime \"-1h\")).UnixMilli}}&to={{($activeAt.Add (parseDurationTime \"1h\")).UnixMilli}}
```
was renderred like:
```
http://vm-grafana.com/ ?viewPanel=&from={{($activeAt.Add (parseDurationTime "-1h")).UnixMilli}}&to={{($activeAt.Add (parseDurationTime "1h")).UnixMilli}}
```
Wrapping examples in ` helps to render them raw.
While there, also fixed some examples.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
On 2025-12-16, Hetzner Cloud deprecated the `datacenter` field in their
Servers API and introduced a top-level `location` field carrying the
same data. The `datacenter` field will be removed after 2026-07-01.
Without this change, `__meta_hetzner_hcloud_datacenter_location`, and
`__meta_hetzner_hcloud_datacenter_location_network_zone` would silently
become empty for the `hcloud` role after that date.
This mirrors the change made in Prometheus v3.11.0
([prometheus/prometheus#17850](https://github.com/prometheus/prometheus/pull/17850)).
## Changes
**`hcloud` role:**
- Add `HCloudLocation` struct and `Location` field on `HCloudServer`,
mapped to the new top-level `location` API field
- Emit two new canonical labels: `__meta_hetzner_hcloud_location` and
`__meta_hetzner_hcloud_location_network_zone`
- Keep the deprecated `__meta_hetzner_hcloud_datacenter_location` and
`__meta_hetzner_hcloud_datacenter_location_network_zone` labels, now
sourced from the new `location` field so they continue to work past
2026-07-01
- `__meta_hetzner_datacenter` (the datacenter name, e.g. `fsn1-dc14`) is
unaffected for this role — the datacenter name is a distinct concept
from location and is kept as-is (this will stop working starting
2026-07-01)
**`robot` role:**
- Add `__meta_hetzner_robot_datacenter` as the canonical replacement for
`__meta_hetzner_datacenter`; the old label is kept for backward
compatibility
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10909
At v1.142.0 was introduced a bug, when changes from OSS version were
back-ported into Enterprise branch. It changed the order of storage
nodes discovery. And resulted into:
* overwrite of discovered storage nodes
* duplicate of per storage node metrics
This bug only affects enterprise vminsert version.
Mention -rule.stripFilePath cmd-limne flag in security recommendations,
so users can be aware of it.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Haley Wang <haley@victoriametrics.com>
The change adds `AI observability` section to `AI tools` documentation.
It mentions excellent @Amper articles describing these integrations in
all details.
The doc change doesn't repeat the articles, but rather helps users to
discover them.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The flag already exists in the ENT version. We decided to expose it in
OSS and strip the path from all public places, including all
APIs(includes `/metrics`) and debug logs(it's minor info there).
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5625
Previously, if `-tls` flag was provided, victoria metrics components
produced the following log error entry at health checks:
http: TLS handshake error from 10.244.0.1:46556: EOF
Such health checks are common for many orchestration systems, such as
consul
or kubernetes. And default http server already suppresses such EOF
health checks.
This commit adds suppression to the tls server as well.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10538
# What Changed
- Updated the operator installation procedure
- Updated the commands to match the rest of the guides
- Updated screenshots
- Reordered steps to make more sense of the process
- Fixed issues in the YAML
- Tested on actual OpenShift trial instance running on AWS
- Added steps to confirm log ingestion using VMUI
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10864
This PR fixes several broken links and anchors in the victoriametrics
docs.
Note about links changes in FAQ.md file. The links inside the paragraph
break navigation in the right-side menu. To fix this, an explicit anchor
definition has been added. The anchor is the same as before, setting it
explsitly fixes the siebar links.
See https://github.com/VictoriaMetrics/vmdocs/issues/221 for the
up-to-date list once this PR is merged.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10874
Previously, metricMetadata was not properly reset during parsing of
metrics. It could result into `Unit` suffix to be added from previously
parsed metric into next metric without Unit field.
For example, metric `http_request` with `Unit` `seconds` will be
converted into `http_request_seconds` and `Unit` field hold `seconds`.
Next parsed metric `cpu_usage_ratio` has no `Unit` and it will get
previous `seconds` `Unit` -> `cpu_usage_ratio_seconds`.
This commit adds metricMetadata reset call before parsing of next
metric.
Bug was introduced at 293d80910c
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10889
Iximiuz labs prepared a set of playgrounds for VictoriaMetrics. These
are interactive playgrounds backed by real Linux machines running
VictoriaMetrics software, allowing experimenting and investigating right
in the browser tab.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Metadata is enabled by default since v1.137.0, and the metadata volume
can be a big contributor to resource usage and network traffic.
vmagent dahsboard:
1. `Troubleshooting` section: rename `Datapoints rate` panel to `Rows
rate` to include metadata rate;
2. `Ingestion` section: add metadata rate to existing `Rows rate` panel.
(The difference between this panel and the one above is that this panel
only contains data from write requests, while the above panel also
includes the scraping part.)
vmcluster dashboard:
1. `vminsert` section: add `Rows rate` panel
Didn’t see a good place for it in the vmsingle dashboard, since it
doesn’t have a dedicated insert section, and I don’t want to add it to
`overview` yet.
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10868
* add visual mermaid diagram to demonstrate aggregation concept;
* update Recording-rules-alternative:
* * recommend using rate_sum instead of total for better reliability
* * demonstrate how to calculate sliding window, typicall for recording
rules
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Add the support of storage and retrieval of samples with future
timestamps as requested in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/827
What to expect:
- By default, the max future timestamp is still limited to `now+2d`. To
change it, set the `-futureRetention` flag in `vmstorage`. The max flag
value is currently limited to `100y`. It can be extended if we see a
demand for this, but it can't be more than `~ 290y` due to how the time
duration is implemented in Go. The flag value can't be less than `2d`.
- downsampling and retention filters (available in enterprise edition)
are currently not supported for future timestamps
- If `vmstorage` restarts with a smaller value of `-futureRetention`
flag, any future partitions that are outside the new future retention
will be automatically deleted.
- Data ingestion, data retrieval, backup/restore, timeseries (soft)
deletion, and other operations work with future timestamps the same way
as with the historical timestamps.
- In the cluster version, the affected binaries are `vmstorage` and
`vmselect`. This means that `vmselect` version must match `vmstorage`
version if you want to query future timestamps. `vminsert` was not
affected, so its version can be a lower one.
- If you downgrade the `vmstorage`, the data with future timestamps will
remain on disk and memory (per-partition caches) but won't be available
for querying.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Update `timeutil.ParseTimeAt` to check the time limits for all date/time formats, not just year.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, backend url health check start could produce a data race
and a race condition.
The following panic could be produced:
`panic: sync: WaitGroup is reused before previous Wait has returned`
It happened because concurrent goroutine could process request, while
configuration was reloaded and stopHealthChecks method was called.
This commit adds a dedicated structure for backend health checks.
Which protects from data race with mutex guard. And prevents race
condition with a boolean flag.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10806
Visually outline that guideline message should be removed from
description before submitting the PR. This should prevent cases when PR
template was blending into the PRs description remaining unnoticed.
The commit in metricsql
d0bc93816e
introduced a bug that changes an order of binary op evaluation. This
commit updates to metricsql version that fixes a bug by reverting to
previous behavior.
The bug was introduced in v1.140.0, v1.136.4, and v1.122.19 releases.
It was reported in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10856
Previously
- `GetData` in the OpenTSDB client was returning empty `Metric{}` with
`nil` error for several conditions (multiple series returned, aggregate
tags present, `modifyData` failures), causing `vmctl opentsdb` to
silently drop series during migration
This commit changes these silent return paths to return proper errors with
descriptive messages including the query string, so operators can detect
and diagnose partial migrations.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10797
Previously, if rule label value was set to empty string, vmalert ignored this label during labels merge with labels from data source response. In contrast, Prometheus removes data source label in this case as well. Which allows to perform label delete operation.
This commit uses the same logic as Prometheus for resolving labels conflicts and allows to remove labels.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10766
Proxy protocol parser kept sub-slice reference for pooled bytesBuffer at readProxyProto
```
bb := bbPool.Get()
defer bbPool.Put(bb) // ← buffer returned to pool AFTER function returns
...
IP: bb.B[0:16], // ← BUG: sub-slice of pooled buffer!
...
```
This commit properly allocates new slice for ipv6 address and copies buffer content to it.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10839
`IndexDBRecordsDrop` and `TooManyTSIDMisses` were mistakenly placed to `alerts-health.yml`,
which was supposed to contain rules related to all VM components. But these two rules
are related to storage components only (vmstorage and vmsingle). Moving them to corresponding
files.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change should reduce confusion for users where `alerts.yml`
belongs to. Before, developers could mistakenly assume that
`alerts.yml` was related to both single and cluster installations.
In result, rule `MetadataCacheUtilizationIsTooHigh` was added only
to `alerts.yml` and not copied to `alerts-cluster.yml`.
The rename change should bring more context into the file name
and reduce confusion in the future.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, this rule was only a part of single-node rule set.
But it is applicable for both: single and cluster installations.
Adding it to cluster as well.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new rule `MetricNameStatsCacheUtilizationIsTooHigh` will signalize
about overutilization of Metric names usage stats tracker. See
https://docs.victoriametrics.com/victoriametrics/#track-ingested-metrics-usage
This rule can fire for deployments with high churn rate of metric names.
In cases like this, it is better to disable metric name tracking
completely, as it brings no use.
It might fire for deployments that were tracking metric names for very
long periods and this alert might be a good sign to reset the cache.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The docs currently wrongly states that vminsert applies a label limit
per timeseries of `30`. Currently, the limit is `40`, which is also
correctly stated in in vmcluster docs. This PR corrects this in the key
concepts docs.
```
-maxLabelsPerTimeseries int
The maximum number of labels per time series to be accepted. Series with superfluous labels are ignored. In this case the vm_rows_ignored_total{reason="too_many_labels"} metric at /metrics page is incremented (default 40)
```
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10826
After switching squash merges to use the PR title and description, the
PR template text started leaking into final commit messages and adding
noise.
This PR removes the template and documents what a PR title and PR
description should contain instead.
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10789
Previously After RoundTrip returns successfully (err == nil, res != nil), the code checks if the original client request's context was canceled. If canceled, it returns immediately without closing res.Body.
There is a race window where:
1) RoundTrip completes successfully (res is non-nil)
2) The client cancels the request context (closes connection)
3) The context check at line 484 sees the cancellation
4) The function returns without closing res.Body
The response body holds a reference to the underlying TCP connection. Without closing it, the connection is permanently leaked along with the transport goroutines (readLoop + writeLoop or dialConnFor).
bug was introduced at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10233
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10833
**"Run query" link params**
Added correct params to "Run query" link on Alerting Rules page:
- `g0.step_input` - set to `group.interval` (in seconds)
- `g0.end_time` - set to `rule.lastEvaluation` / `alert.activeAt`
- `g0.relative_time=none` - to fix the time range
**Time display timezone**
Changed `t.format(...)` to `t.tz().format(...)` to display time in the
user-selected timezone.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10366https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10827
TCP healthchecks on the clusternative port of vmselect logs the following warning continuously:
VictoriaMetrics/lib/vmselectapi/server.go:204 cannot complete vmselect handshake due to network error with client "10.129.30.27:43829": cannot read hello message : cannot read message with size 11: EOF; read only 0 bytes. Check vmselect logs for errors
This is in contrast to vminsert, where it seems like there's handling for these healthchecks:
```
if errors.Is(err, io.EOF) {
// This is likely a TCP healthcheck, which must be ignored in order to prevent logs pollution.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1762
return errTCPHealthcheck
```
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10786
Previously, on non-200 HTTP status codes, lib/promscrape performed an
unbounded body read, which could potentially result in OOM.
This commit adds a maxScrapeSize limit to error response body reads,
protecting against malicious or misbehaving metrics endpoints.
vmsingle shuts down vminsert before closing the ingestion rate limiter, even though the rate limiter API explicitly requires the opposite order to unblock callers. vminsert.Stop() waits for unmarshal workers, which can be blocked in ingestionRateLimiter.Register() when the limit is hit.
Workers in runParallelPerPathInternal check ctxLocal.Done() before processing each work item and exit early on cancellation — without sending a result to resultCh. However, the coordinator loop always waits for exactly len(perPath) results from resultCh. If cancellation occurs before all tasks report, the read blocks indefinitely.
0aaa741b5b introduced a regression in lib/awsapi/config.go that causes empty credentials to be returned on the very first call to getFreshAPICredentials() when using EKS Pod Identity (or any container credential mechanism with no static access key). These empty credentials are then used for SigV4 signing -> 403 Forbidden on every remote write request.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10815
Just run a simple bash command without the heavyweight Docker image
While at it, rely on TAG environment variable instead of PKG_TAG env variable
for `make docs-update-version`, in order to be consistent with other Make commands.
The change is needed to group splitting/sharding section of the documentation,
so they go one after another. This should improve readability.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The previous descrioption didn't mention that relabeling can be used
for filtering scrape targets. Adding this metion.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
These links were removed in 134501bf99
without adding complete substitution to their content.
Restoring these links as they can be useful for readers to learn about relabeling.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The old links were removed in #10754
mistakenly thinking that google didn't index it. However, it did. And users can get 404
when searching in google for VM plyagrounds.
Restoring the links via aliases. It means hugo will serve the `/playgrounds` page when
user requests `/playgrounds/victoriametrics/`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Fix app tests:
1. Sync code between vmsingle and vmcluster: it must be the same because
apptest does not differentiate between branches, it just runs pre-built
binaries
2. Simplify range queries in backup/restore test so that it does not
depend on the interval between samples to work correctly.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously (*writeconcurrencylimiter.Reader).Read() could permanently leak concurrency tokens from the -maxConcurrentInserts semaphore.
Consider the following example:
* GetReader() acquires a token, then PutReader() unconditionally releases it.
* Read() calls DecConcurrency() before the underlying I/O and IncConcurrency() after it. If IncConcurrency() returns an error, Read() returns without holding a token.
* Each such failure permanently removes one slot from the concurrencyLimitCh semaphore. Slots leak one by one until the channel is fully drained, at which point DecConcurrency() blocks forever, deadlocking ingestion on vmstorage.
This commit adds tracking for obtained tokens to the reader. Which prevents possible tokens leakage.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10784
This reverts commit b3c03c023c.
Reason for revert: the original logic was correct from the user's perspective:
- The -maxRequestBodySizeToRetry command-line flag controls the size of the request body,
which could be retried on backend failure. The meaining of this flag wasn't changed after
the introduction of the -requestBufferSize flag in the commit e31abfc25c
(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309 )
- The -requestBufferSize flag controls the size of the buffer for reading request body
before sending sending it to the backend and before applying concurrency limits.
These flags are independent from user's perspective. The fact that these flags share the implementation,
sholdn't be known to the user - this is an implementation detail, which allows avoiding double buffering.
Both flags enable request buffering. If the user wants disabling of all the request buffering,
then both flags must be set to 0. That's why these flags are cross-mentioned in their -help descriptions.
Also the reverted commit had the following issues:
- It reduced the default value for the -requestBufferSize flag from 32KiB to 16KiB.
The 32KiB value has been calculated and justified at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309 .
It shouldn't increase vmagent memory usage too much for typical workloads.
For example, if vmagent handles 10K concurrent requests, then the memory overhead for the request buffering
will be 10K*32KiB=320MiB. This is a small price for being able to efficiently handling 10K concurrent requests.
- It added a dot to the end of the https://docs.victoriametrics.com/victoriametrics/vmauth/#request-body-buffering link
in the description for the description of the -requestBufferSize flag. This breaks clicking the link in some environments,
since the trailing dot is considered as a part of the url.
- It added a superflouous whitespace in front of the 'Disabling request buffering' text inside the description
for the -requstBufferSize flag.
- It introduced an unnecessary complexity to the user by mentioning that the zero value
at -maxBufferSize disables buffering for request reties (these things must be independent
from the user's perspective).
- It changed the bufferedBody logic in non-trivial ways, which aren't related to the original issue.
If these changes are needed, then they must be justified in a separate issue and must be prepared
in a separate pull request / commit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10675
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10677
Previously, Storage.table was initialized after startFreeDiskSpaceWatcher was called.
This created a potential data race condition: if openTable took a long time to complete
and freed disk space during that window, the free disk space watcher could read an
uninitialized (or partially initialized) Storage.table, leading to an invalid memory
address or nil pointer dereference panic.
This commit properly initializes s.isReadOnly state during storage start and
starts FreeDiskSpaceWatcher after openTable.
Bug was introduced in github.com/VictoriaMetrics/VictoriaMetrics/commit/27b958ba8bc66578206ddac26ccf47b2cc3e8101
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10747
Align group evaluation time with the `eval_offset` option to allow users
to manage group execution more effectively by understanding the exact
time each group will be scheduled, particularly in cases of spreading
rule execution within a window, chaining groups, or debugging data delay
issue.
If the group evaluation takes less than the group interval, but the
initial evaluation combined with the additional restore operation
exceeds the group interval, the evaluation time will be gradually
corrected in subsequent evaluations, as the interval ticker schedule
remains unchanged.
For groups without `eval_offset`, this change also ensures that all
evaluations follow the interval. Previously, the gap between the first
and second evaluations was larger than the interval. And the
`eval_delay` continues to help prevent partial responses.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10772.
Follow-up commit for
211fb08028
Address @f41gh7 review comments:
- Move code from `lib/osinfo` to `lib/appmetrics`.
- Make the logic private.
- Use metrics.WriteGaugeUint64 func.
- Remove registration logic from `app/xxx/main.go`.
- Remove `lib/osinfo` package.
At 00:00 UTC the ingested samples start to have timestamps for the new
day (in the ingested samples are always recent). Even though there was a
next-day prefill of the per-day index during the last hour of the day,
some performance degradation is still possible.
For example, in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10698
it is manifested as `vminsert-to-vmstorage connection saturation` peaks
right after midnight.
Possible hypothesis why this is happening. At midnight,
currHourMetricIDs is empty and prevHourMetricIDs cannot be used because
it holds metricIDs for the previous day. So the ingestion logic hits
dateMetricIDsCache which may not have the metricID in its read-only
buffer and therefore should aquire lock to check its prev read-only
buffer or read-write buffer. Which creates lock contention and therefore
raises ingestion request latency.
A solution to this could be re-using the nextDayMetricIDs during the
first hour of the day. During this time, it is equivalent to
currHourMetricIDs.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
This change reverts part of the changes in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10686
Motivation: docs added https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10686 in most cases are too verbose, ai-generated and bringing low practical sense.
The improvement goal: remove bloat from the docs and keep them practical and useful.
What it does:
- Completely removes items from the sidebar
- Moves the content of the most important playground pages to the
`/playground/` stub (README.md). Use H2s for each playground.
- Updates and cleans the text.
- Removes the individual children pages in the playground category (keep
only the `/playgrounds/` page/stub and remove the children).
- Removes items as these don't really need much introduction or aren't
playgrounds:
- log to logsql: a conversion tool
- sql to logsql: same
- adds Grafana playground section
Links of child pages will become invalid. We don't preserve them as this is pretty new doc (1w on prod) and is unlikely to have already persisted links somewhere.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This commit adds new metrics `vmalert_remotewrite_queue_capacity` and `vmalert_remotewrite_queue_size`, which is updated with each push and it's
frequency depends on `-remoteWrite.concurrency`,
`remoteWrite.flushInterval`
It doesn't account for the pending data within each pushers request, it
should provide a general indication of the queue usage.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10765
Exctract repeated code from nextDayMetricIDs synctests into separate
funcs to make the code more readable.
The change was originally introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10704 and was
extracted into a separate PR to keep the original change simple.
Previously, vminsert did not account for the ingest concurrency limit in buffer size calculation.
This could lead to excessively large buffers and OOM errors when the concurrency limit was reached.
This commit fixes buffer size calculation by separating `insertCtx` and `storageNode` buffer size limits.
`storageNode` buffer size is set to a larger value, as it is allocated per configured `-storageNode`
and is independent of the concurrency limit.
`insertCtx` buffer size now accounts for the configured concurrency limit
and calculates the maximum buffer size accordingly.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10725
Previously, vmselect in cluster-native mode could return partial responses to upstream vmselect.
Since upstream vmselect expects full responses (mimicking vmstorage behavior),
partial responses must be disabled in cluster-native mode.
This prevents incomplete responses from being cached at the upstream vmselect level.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10678
Automatically set daily and hourly series limits to `MaxInt32` when `remoteWrite.maxHourlySeries` or `remoteWrite.maxDailySeries` is set to `-1`.
This change addresses a usability issue with the cardinality limiter. Users may want to enable the limiter to observe its metrics before deciding on an appropriate limit. However, the underlying bloom filter only supports `int32`, so setting large values can lead to overflow.
With this PR:
* Setting either flag to `-1` is treated as “no practical limit” and internally mapped to `math.MaxInt32`
* Values exceeding `int32` are safely clamped to `MaxInt32` to prevent overflow
This allows users to enable the limiter for estimation purposes without risking invalid configurations or runtime issues.
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9614
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Previously, last scrape result was unconditionally update, despite possible scrape error.
The commit updates last scrape result only at successful scrape. It properly accounts `scrape_series_added` metric and aligns it with the same metric in Prometheus.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10653
Previously introduced flag `requestBufferSize` raised default value for
in-memory buffer from 16KB to 32KB. It could increase memory usage for
vmauth. Also it made unclean how to actually disable requests buffering.
This commit aligns flags value to the 16KB. And disables requests
buffering if any of flags value are 0 as mentioned at flags description.
If any of flags have non-default value, those value are used as max size
for request buffer. If both flags are modified - bigger value wins.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10675
I expect the change to help in two ways:
1. Spreading remote write flushes over the flush interval to avoid
congestion at the remote write destination;
2. Enhance queue data consumption. Currently, all flushers may always
flush data simultaneously, resulting in periods where no flushers are
consuming data from the queue, which increases the risk of reaching the
queue limit `remoteWrite.maxQueueSize` even when a increased
`remoteWrite.concurrency`. By making the flushers more dispersed, it is
more likely that some flushers are consistently consuming data from the
queue, which should make queue management easier.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10729/
Changes:
- Added the number of `pending alerts` and `firing alerts`
- Improvement `transormations` for panel - FIRING over time by group and rules
- Added sort for panel - FIRING over time by rule
Signed-off-by: sias32 <sias.32@yandex.ru>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Replace 1.2 multiplier with 1.25 in disk space estimation formula.
1.2 only provides ~16.7% free space, while the docs recommend keeping
20%. Using 1.25 correctly accounts for 20% free space.
Inspired by
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10394
Add per-URL `-remoteWrite.disableMetadata` flag to control metadata
sending for each remote storage independently.
After v1.137.0 enabled `-enableMetadata` by default, metadata is sent to
ALL remote write targets, even those with relabeling filters that drop
most metrics. This causes unnecessary growth in
`vmagent_remotewrite_requests_total`. and significant increase in
network load for heavy filtered remote write destinations.
When the network between client and s3 server is unstable, the client may encounter temporary io.EOF errors when reading the response from s3 server.
Currently, the s3 sdk in vmbackup uses the default retry policy. However, this default retry policy won't retry when s3 sdk meet unexpected EOF. This means that the temporary unexpected EOF error will cause the backup task to fail.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10699
Add link to blogpost with detailed information about zstd+rw protocol.
This PR is based on question in community channel about implementation
details.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Implemented dedicated thanos migration mode for vmctl to migrate data from Thanos installations to VictoriaMetrics.
Key features:
1. Raw and downsampled blocks support: Reads both raw blocks
(resolution=0) and downsampled blocks (5m/1h resolution) directly from
Thanos snapshots
2. All aggregate types: Imports count, sum, min, max, and counter
aggregates from downsampled blocks as separate metrics with resolution
and type suffixes (e.g., metric_name:5m:count)
3. Dedicated flags: Uses `--thanos-*` prefixed flags (--thanos-snapshot,
--thanos-concurrency, --thanos-filter-time-start,
--thanos-filter-time-end, --thanos-filter-label,
--thanos-filter-label-value, --thanos-aggr-types)
4. Selective aggregate import: Use `--thanos-aggr-types` to import only
specific aggregates
Usage:
```
vmctl thanos --thanos-snapshot /path/to/thanos-data --vm-addr http://victoria-metrics:8428
```
Closes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9262
Signed-off-by: Dmytro Kozlov <d.kozlov@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
## Summary
This PR implements split phase metrics for filestream operations as
requested in #10432.
### Changes
- Added `vm_filestream_fsync_duration_seconds_total` metric to track
fsync syscall duration separately
- Added `vm_filestream_fsync_calls_total` metric to count fsync calls
- Added `vm_filestream_write_syscall_duration_seconds_total` metric to
track write syscall duration (previously mixed with flush time)
- Refactored `MustClose()` and `MustFlush()` to use new `flush()` and
`sync()` helper methods
- Kept `vm_filestream_write_duration_seconds_total` for backward
compatibility
### Problem Solved
Previously, `vm_filestream_write_duration_seconds_total` was being
incremented in two places:
1. `statWriter.Write()` - triggered by `bw.Flush()` and `bw.Write()`
2. `Writer.MustFlush()` - which included the above process, leading to
double-counting
This made it impossible to distinguish between write syscall time and
fsync time, which is critical for diagnosing storage latency issues.
### Solution
The new metrics allow users to:
- Distinguish "flush got slower" vs "fsync got slower" using metrics
only
- No file path labels (bounded cardinality)
- No double-counting between metrics
### Testing
- Code compiles successfully
- All existing metrics are preserved for backward compatibility
Closes#10432
---------
Signed-off-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Signed-off-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
These types hide public types from lib/storage/metricnamestats package.
These types do not resolve any practical issues. Instead, they add a level of indirection,
which complicates reading and understanding the code.
These types were introduced in the commit 795d3fe722
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6145
### Describe Your Changes
Some users may not know that VictoriaMetrics Cloud provides relevant
features to manage workloads. This change add notes in relevant places
in which users may find that a managed solution is what they need.
The intention is not to push users to Cloud, but giving the information.
That's why it's always phrased like: "If you don't want to do X, Cloud
can do it for you", instead of "Start for free, etc". This is an Open
Source first project, and shall remain as such.
After this gets proper review, VictoriaLogs and other repos may follow.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Jose Gómez-Sellés <14234281+jgomezselles@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Commit 83da33d8cf
removed NFS directory delete retries. It was made on assumption, that
only directory rename could cause such issues. However, both rename and
unlink uses the same "silly rename" logic
https://linux-nfs.org/wiki/index.php/Server-side_silly_rename
and linux kernel - `fs/nfs/dir.c` `nfs_unlink` and `nfs_rename`.
And NFS client may treat file still open, even if it
was properly closed by application. Most probably it could be triggered, because VictoriaMetrics may
open the same file multiple times ( data read and background merges).
There is no issue with VictoriaMetrics itself, it properly closes files. But NFS-client may have delays
or cache metadata information for the files. So it could trigger silly rename behavior.
This commit restores original behavior with deletion retries and brings
back metrics for unsuccessful delete operations.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9842
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10680
We noticed that backup restores in our environment were much slower than
the hardware/bandwidth constraints would suggest and we traced this down
to a couple of bottlenecks. This PR attempts to address all of them.
#### Lack of pre-allocation of files,
This was causing writes far into files to be quite slow as new blocks
needed to be continually allocated. This was particularly bad on ext4
for us, but will likely be applicable to most disks and filesystems,
you'll see the impl here is linux specific but this is mostly because I
don't have a test env for any other platform and didn't want to blindly
make changes without a validation env.
This comes with the downside of no longer being to to resume a restore
mid file, and requiring the re-downloading of parts already in the file
size the file will appear at full size from the very start. This is I
think _generally_ a good tradeoff for the restore speed gains, it is
definitely a tradeoff so I've included a flag to disable the
pre-allocation behavior and fall back to the existing part diffing
logic.
#### Fsync after each part
With many small parts in relatively few files, or in high concurrency
setups the the writerCloser fsync on each part(actually double fsync
since both `filestream.Writer.mustFlush` and
`filestream.Writer.mustClose` both fsync). Was causing slowdowns since
we would be continually queuing fsyncs.
With the pre-allocation pattern the file is only "ready" once re-named
so I moved to a per file fsync after rename.
#### Concurrent read/write
The previous download pattern was to do a read from the remoteFs, with
whatever latency that entailed, then sequentially do a write, again with
whatever latency that entailed. This meant that throughput was limited
to `readLatency + writeLatency * blockSize`.
Similar to how `crossTypeCopy` is implemented in the backup process we
can instead use `io.pipe` to allow two goroutines to work in parallel
with a small buffer between them.
#### Pagecache avoidance
`filestream.Writer` does quite a lot to avoid polluting the page cache,
but this is not relevent in a restore context and with large sequential
block writes its much more effecient to let the OS flush the pagecache
whenever it wants rather than doing a bunch of small buffer syscalls to
flush blocks.
Therefore this switches over to a much simplier directWriterCloser that
does direct file IO and lets the OS handle flushes while mid write.
### Performance
Before the changes we were seeing writes speeds of only 100MBps, this
was a restore from EBS volumes, ext with 1GB/s throughput with
<img width="1613" height="586" alt="Screenshot 2026-03-16 at 1 29 46 PM"
src="https://github.com/user-attachments/assets/5d54dcb7-cb59-43e0-9247-fda8c70feb2f"
/>
After these changes in the same restore env we're seeing 600MBs flat
rates.
<img width="1611" height="471" alt="Screenshot 2026-03-16 at 1 31 33 PM"
src="https://github.com/user-attachments/assets/ea8e2eb7-533a-48fa-99e0-0b38286e5572"
/>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Remove shards as they only complicate things when the number of requests
per second is in the range of thousands.
Related to #10532.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This commit allows to perform JWT claim matching over 1 dimension arrays. It could
be useful from practical standpoint. Because permissions are usually assigned as a list of values.
For example, the following config allows admin access over list of assigned roles for user:
```yaml
match_claims:
access.roles: "admin"
```
JWT token:
```json
{
"access": {
"roles": [
"read",
"write",
"admin"
]
}
}
```
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10647
RFC-7617 allows empty password/username. Moreover, from RFC standpoint both empty values are valid as well. It should be just encoded as `:`. So this commit relaxes non-empty username restriction.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6956
There are cases then the key sizeBytes is much greater than the value
sizeBytes. Therefore it is important to include the key sizeBytes into
the total.
Also fix some code comments.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Bumps [flatted](https://github.com/WebReflection/flatted) from 3.3.3 to
3.4.2.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3bf09091c3"><code>3bf0909</code></a>
3.4.2</li>
<li><a
href="885ddcc33c"><code>885ddcc</code></a>
fix CWE-1321</li>
<li><a
href="0bdba705d1"><code>0bdba70</code></a>
added flatted-view to the benchmark</li>
<li><a
href="2a02dce7c6"><code>2a02dce</code></a>
3.4.1</li>
<li><a
href="fba4e8f2e1"><code>fba4e8f</code></a>
Merge pull request <a
href="https://redirect.github.com/WebReflection/flatted/issues/89">#89</a>
from WebReflection/python-fix</li>
<li><a
href="5fe86485e6"><code>5fe8648</code></a>
added "when in Rome" also a test for PHP</li>
<li><a
href="53517adbef"><code>53517ad</code></a>
some minor improvement</li>
<li><a
href="b3e2a0c387"><code>b3e2a0c</code></a>
Fixing recursion issue in Python too</li>
<li><a
href="c4b46dbcbf"><code>c4b46db</code></a>
Add SECURITY.md for security policy and reporting</li>
<li><a
href="f86d071e0f"><code>f86d071</code></a>
Create dependabot.yml for version updates</li>
<li>Additional commits viewable in <a
href="https://github.com/WebReflection/flatted/compare/v3.3.3...v3.4.2">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Due to a conflict with VL FAQ page identifier,
VM FAQ page stopped rendering.
This change adds unique identifier to VM FAQ page and fixes the issue.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, by mistake, datasource was referenced by input name instead
of variable name. For an unknown reason, it worked well in local setup
and on playground.
This fix is confirmed by users and continues working at local setup
and playground.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Updated the [HA monitoring setup in Kubernetes via VictoriaMetrics
Cluster](https://docs.victoriametrics.com/guides/k8s-ha-monitoring-via-vm-cluster/)
guide.
Changes:
- Added an introduction explaining how HA works in this guide
- Updated and verified commands used in the guide
- Replaced using Grafana UI usage in favor of using VMUI instead (it was
used to run queries, it's easier to just use the built-in VMUI instead
of installing Grafana just to use the Explore tab)
- Removed Grafana screenshots and replaced them with VMUI
- Tested on a modern version of GKE
- Added explanations for `replicationFactor`, de-duplication, and
`isPartial`
- Added next steps
- Added VMUI screenshots
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This commit adds a rpc retry by dialing a new connection instead of
getting an old one from the connection pool when the previous rpc error
is `io.EOF`.
It helps prevent broken connections from remaining for too long and
causing failed requests and partial responses during `vmstorage` rolling
restart period
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10314
Previously inmemoryPart refCount was not properly decremented.
Previous behavior:
* createInmemoryPart called newPartWrapperFromInmemoryPart and returns a partWrapper with refCount=1
* multiple parts are merged in mustMergeInmemoryPartsFinal, which creates a new merged part
* the source partWrappers are never decRef'd
* Since refCount never reaches 0, putInmemoryPart and (*part).MustClose are never called
This commit properly decrements refCount at mustMergeInmemoryPartsFinal.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10086
This commit adds a new `folder_ids` field in
`yandexcloud_sd_configs` that allows users to specify Yandex Cloud
folder IDs directly, bypassing the organization->cloud->folder hierarchy
traversal.
Previously, the Yandex Cloud service discovery required traversing the
entire resource hierarchy (organizations -> clouds -> folders ->
instances) to discover instances. This works when the Service Account
has permissions at all levels. However, some Service Accounts may only
have permissions at the folder level, causing discovery to fail when it
cannot access organization or cloud resources.
With this change, users can now configure folder IDs directly:
```yaml
yandexcloud_sd_configs:
- service: compute
folder_ids:
- folder-id-1
- folder-id-2
```
When `folder_ids` is specified, the discovery skips the hierarchy
traversal and directly queries instances from the specified folders.
This is a backward-compatible change - when `folder_ids` is not
specified, the existing behavior is preserved.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10587
Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Before creating the PR, make sure you have read and followed the [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
You can find out about our security policy and VictoriaMetrics version support on the [security page](https://docs.victoriametrics.com/victoriametrics/#security) in the documentation.
The following versions of VictoriaMetrics receive regular security fixes:
maxLabelsPerTimeseries=flag.Int("maxLabelsPerTimeseries",0,"The maximum number of labels per time series to be accepted. Series with superfluous labels are ignored. In this case the vm_rows_ignored_total{reason=\"too_many_labels\"} metric at /metrics page is incremented")
maxLabelNameLen=flag.Int("maxLabelNameLen",0,"The maximum length of label names in the accepted time series. Series with longer label name are ignored. In this case the vm_rows_ignored_total{reason=\"too_long_label_name\"} metric at /metrics page is incremented")
maxLabelValueLen=flag.Int("maxLabelValueLen",0,"The maximum length of label values in the accepted time series. Series with longer label value are ignored. In this case the vm_rows_ignored_total{reason=\"too_long_label_value\"} metric at /metrics page is incremented")
enableMultitenancyViaHeaders=flag.Bool("enableMultitenancyViaHeaders",false,"Enables multitenancy via HTTP headers. "+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#multitenancy")
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
maxHourlySeries=flag.Int("remoteWrite.maxHourlySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxDailySeries=flag.Int("remoteWrite.maxDailySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxHourlySeries=flag.Int64("remoteWrite.maxHourlySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. "+
fmt.Sprintf("Setting this flag to '-1' sets limit to maximum possible value (%d) which is useful in order to enable series tracking without enforcing limits. ",math.MaxInt32)+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxDailySeries=flag.Int64("remoteWrite.maxDailySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. "+
fmt.Sprintf("Setting this flag to '-1' sets limit to maximum possible value (%d) which is useful in order to enable series tracking without enforcing limits. ",math.MaxInt32)+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxIngestionRate=flag.Int("maxIngestionRate",0,"The maximum number of samples vmagent can receive per second. Data ingestion is paused when the limit is exceeded. "+
"By default there are no limits on samples ingestion rate. See also -remoteWrite.rateLimit")
@@ -92,6 +100,8 @@ var (
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#disabling-on-disk-persistence . See also -remoteWrite.dropSamplesOnOverload")
dropSamplesOnOverload=flag.Bool("remoteWrite.dropSamplesOnOverload",false,"Whether to drop samples when -remoteWrite.disableOnDiskQueue is set and if the samples "+
"cannot be pushed into the configured -remoteWrite.url systems in a timely manner. See https://docs.victoriametrics.com/victoriametrics/vmagent/#disabling-on-disk-persistence")
disableMetadataPerURL=flagutil.NewArrayBool("remoteWrite.disableMetadata","Whether to disable sending metadata to the corresponding -remoteWrite.url. "+
"By default, metadata sending is controlled by the global -enableMetadata flag")
)
var(
@@ -157,8 +167,8 @@ func Init() {
iflen(*remoteWriteURLs)==0{
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
// sentRows and sentBytes are historical counters that can now be replaced by flushedRows and flushedBytes histograms. They may be deprecated in the future after the new histograms have been adopted for some time.
// On k conflicts in origin set, the original value is preferred and copied
// to processed with `exported_%k` key. The copy happens only if passed v isn't equal to origin[k] value.
func(ls*labelSet)add(k,vstring){
// do not add label with empty value, since it has no meaning.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
// do not add label with empty value to the result, as it has no meaning:
// if the label already exists in the original query result, remove it to preserve compatibility with relabeling, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10766.
// otherwise, ignore the label, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984.
"invalid_label":`error evaluating template: template: :1:298: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
"For example, if lookback=1h then range from now() to now()-1h will be scanned.")
maxStartDelay=flag.Duration("group.maxStartDelay",5*time.Minute,"Defines the max delay before starting the group evaluation. Group's start is artificially delayed for random duration on interval"+
" [0..min(--group.maxStartDelay, group.interval)]. This helps smoothing out the load on the configured datasource, so evaluations aren't executed too close to each other.")
ruleStripFilePath=flag.Bool("rule.stripFilePath",false,"Whether to strip rule file paths in logs and all API responses, including /metrics. "+
"For example, file path '/path/to/tenant_id/rules.yml' will be stripped to 'groupHashID/rules.yml'. "+
"This flag may be useful for hiding sensitive information in file paths, such as S3 bucket details.")
// do not add label with empty value, since it has no meaning.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
// do not add label with empty value to the result, as it has no meaning:
// if the label already exists in the original query result, remove it to preserve compatibility with relabeling, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10766.
// otherwise, ignore the label, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984.
{% if g.Unhealthy > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d g.Unhealthy %}</span> {% endif %}
{% if g.NoMatch > 0 %}<span class="badge bg-warning" title="Number of rules with status NoMatch">{%d g.NoMatch %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules with status Ok">{%d g.Healthy %}</span>
{% if g.States["unhealthy"] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d g.States["unhealthy"] %}</span> {% endif %}
{% if g.States["nomatch"] > 0 %}<span class="badge bg-warning" title="Number of rules with status NoMatch">{%d g.States["nomatch"] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules with status Ok">{%d g.States["ok"] %}</span>
"This allows reducing the consumption of backend resources when processing requests from clients connected via slow networks. "+
"Set to 0 to disable request buffering. See https://docs.victoriametrics.com/victoriametrics/vmauth/#request-body-buffering")
maxRequestBodySizeToRetry=flagutil.NewBytes("maxRequestBodySizeToRetry",16*1024,"The maximum request body size to buffer in memory for potential retries at other backends. "+
"Request bodies larger than this size cannot be retried if the backend fails. Zero or negative value disables request body buffering and retries. "+
"Request bodies larger than this size cannot be retried if the backend fails. Zero or negative value disables retries. "+
"See also -requestBufferSize")
maxConcurrentRequests=flag.Int("maxConcurrentRequests",1000,"The maximum number of concurrent requests vmauth can process simultaneously. "+
@@ -357,6 +357,7 @@ func bufferRequestBody(ctx context.Context, r io.ReadCloser, userName string) (i
concurrency=flag.Int("concurrency",10,"The number of concurrent workers. Higher concurrency may reduce restore duration")
maxBytesPerSecond=flagutil.NewBytes("maxBytesPerSecond",0,"The maximum download speed. There is no limit if it is set to 0")
skipBackupCompleteCheck=flag.Bool("skipBackupCompleteCheck",false,"Whether to skip checking for 'backup complete' file in -src. This may be useful for restoring from old backups, which were created without 'backup complete' file")
SkipPreallocation=flag.Bool("skipFilePreallocation",false,"Whether to skip pre-allocated files. This will likely be slower in most cases, but allows restores to resume mid file on failure")
.uplot,.uplot*,.uplot*:before,.uplot*:after{box-sizing:border-box}.uplot{font-family:system-ui,-apple-system,SegoeUI,Roboto,HelveticaNeue,Arial,NotoSans,sans-serif,"Apple Color Emoji","Segoe UI Emoji",SegoeUISymbol,"Noto Color Emoji";line-height:1.5;width:min-content}.u-title{text-align:center;font-size:18px;font-weight:700}.u-wrap{position:relative;-webkit-user-select:none;user-select:none}.u-over,.u-under{position:absolute}.u-under{overflow:hidden}.uplotcanvas{display:block;position:relative;width:100%;height:100%}.u-axis{position:absolute}.u-legend{font-size:14px;margin:auto;text-align:center}.u-inline{display:block}.u-inline*{display:inline-block}.u-inlinetr{margin-right:16px}.u-legendth{font-weight:600}.u-legendth>*{vertical-align:middle;display:inline-block}.u-legend.u-marker{width:1em;height:1em;margin-right:4px;background-clip:padding-box!important}.u-inline.u-liveth:after{content:":";vertical-align:middle}.u-inline:not(.u-live).u-value{display:none}.u-series>*{padding:4px}.u-seriesth{cursor:pointer}.u-legend.u-off>*{opacity:.3}.u-select{background:#00000012;position:absolute;pointer-events:none}.u-cursor-x,.u-cursor-y{position:absolute;left:0;top:0;pointer-events:none;will-change:transform}.u-hz.u-cursor-x,.u-vt.u-cursor-y{height:100%;border-right:1pxdashed#607D8B}.u-hz.u-cursor-y,.u-vt.u-cursor-x{width:100%;border-bottom:1pxdashed#607D8B}.u-cursor-pt{position:absolute;top:0;left:0;border-radius:50%;border:0solid;pointer-events:none;will-change:transform;background-clip:padding-box!important}.u-axis.u-off,.u-select.u-off,.u-cursor-x.u-off,.u-cursor-y.u-off,.u-cursor-pt.u-off{display:none}
retentionPeriod=flagutil.NewRetentionDuration("retentionPeriod","1M","Data with timestamps outside the retentionPeriod is automatically deleted. The minimum retentionPeriod is 24h or 1d. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention. See also -retentionFilter")
futureRetention=flagutil.NewRetentionDuration("futureRetention","2d","Data with timestamps bigger than now+futureRetention is automatically deleted. "+
"The minimum futureRetention is 2 days. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention")
snapshotAuthKey=flagutil.NewPassword("snapshotAuthKey","authKey, which must be passed in query string to /snapshot* pages. It overrides -httpAuth.*")
forceMergeAuthKey=flagutil.NewPassword("forceMergeAuthKey","authKey, which must be passed in query string to /internal/force_merge pages. It overrides -httpAuth.*")
forceFlushAuthKey=flagutil.NewPassword("forceFlushAuthKey","authKey, which must be passed in query string to /internal/force_flush pages. It overrides -httpAuth.*")
@@ -55,11 +59,13 @@ var (
denyQueriesOutsideRetention=flag.Bool("denyQueriesOutsideRetention",false,"Whether to deny queries outside the configured -retentionPeriod. "+
"When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+
"This may be useful when multiple data sources with distinct retentions are hidden behind query-tee")
maxHourlySeries=flag.Int("storage.maxHourlySeries",0,"The maximum number of unique series can be added to the storage during the last hour. "+
maxHourlySeries=flag.Int64("storage.maxHourlySeries",0,"The maximum number of unique series can be added to the storage during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-limiter . "+
fmt.Sprintf("Setting this flag to '-1' sets limit to maximum possible value (%d) which is useful in order to enable series tracking without enforcing limits. ",math.MaxInt32)+
"See also -storage.maxDailySeries")
maxDailySeries=flag.Int("storage.maxDailySeries",0,"The maximum number of unique series can be added to the storage during the last 24 hours. "+
maxDailySeries=flag.Int64("storage.maxDailySeries",0,"The maximum number of unique series can be added to the storage during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-limiter . "+
fmt.Sprintf("Setting this flag to '-1' sets limit to maximum possible value (%d) which is useful in order to enable series tracking without enforcing limits. ",math.MaxInt32)+
"See also -storage.maxHourlySeries")
minFreeDiskSpaceBytes=flagutil.NewBytes("storage.minFreeDiskSpaceBytes",100e6,"The minimum free disk space at -storageDataPath after which the storage stops accepting new data")
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.