Just run a simple bash command without the heavyweight Docker image
While at it, rely on TAG environment variable instead of PKG_TAG env variable
for `make docs-update-version`, in order to be consistent with other Make commands.
The change is needed to group splitting/sharding section of the documentation,
so they go one after another. This should improve readability.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The previous descrioption didn't mention that relabeling can be used
for filtering scrape targets. Adding this metion.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
These links were removed in 134501bf99
without adding complete substitution to their content.
Restoring these links as they can be useful for readers to learn about relabeling.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The old links were removed in #10754
mistakenly thinking that google didn't index it. However, it did. And users can get 404
when searching in google for VM plyagrounds.
Restoring the links via aliases. It means hugo will serve the `/playgrounds` page when
user requests `/playgrounds/victoriametrics/`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Fix app tests:
1. Sync code between vmsingle and vmcluster: it must be the same because
apptest does not differentiate between branches, it just runs pre-built
binaries
2. Simplify range queries in backup/restore test so that it does not
depend on the interval between samples to work correctly.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously (*writeconcurrencylimiter.Reader).Read() could permanently leak concurrency tokens from the -maxConcurrentInserts semaphore.
Consider the following example:
* GetReader() acquires a token, then PutReader() unconditionally releases it.
* Read() calls DecConcurrency() before the underlying I/O and IncConcurrency() after it. If IncConcurrency() returns an error, Read() returns without holding a token.
* Each such failure permanently removes one slot from the concurrencyLimitCh semaphore. Slots leak one by one until the channel is fully drained, at which point DecConcurrency() blocks forever, deadlocking ingestion on vmstorage.
This commit adds tracking for obtained tokens to the reader. Which prevents possible tokens leakage.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10784
This reverts commit b3c03c023c.
Reason for revert: the original logic was correct from the user's perspective:
- The -maxRequestBodySizeToRetry command-line flag controls the size of the request body,
which could be retried on backend failure. The meaining of this flag wasn't changed after
the introduction of the -requestBufferSize flag in the commit e31abfc25c
(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309 )
- The -requestBufferSize flag controls the size of the buffer for reading request body
before sending sending it to the backend and before applying concurrency limits.
These flags are independent from user's perspective. The fact that these flags share the implementation,
sholdn't be known to the user - this is an implementation detail, which allows avoiding double buffering.
Both flags enable request buffering. If the user wants disabling of all the request buffering,
then both flags must be set to 0. That's why these flags are cross-mentioned in their -help descriptions.
Also the reverted commit had the following issues:
- It reduced the default value for the -requestBufferSize flag from 32KiB to 16KiB.
The 32KiB value has been calculated and justified at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309 .
It shouldn't increase vmagent memory usage too much for typical workloads.
For example, if vmagent handles 10K concurrent requests, then the memory overhead for the request buffering
will be 10K*32KiB=320MiB. This is a small price for being able to efficiently handling 10K concurrent requests.
- It added a dot to the end of the https://docs.victoriametrics.com/victoriametrics/vmauth/#request-body-buffering link
in the description for the description of the -requestBufferSize flag. This breaks clicking the link in some environments,
since the trailing dot is considered as a part of the url.
- It added a superflouous whitespace in front of the 'Disabling request buffering' text inside the description
for the -requstBufferSize flag.
- It introduced an unnecessary complexity to the user by mentioning that the zero value
at -maxBufferSize disables buffering for request reties (these things must be independent
from the user's perspective).
- It changed the bufferedBody logic in non-trivial ways, which aren't related to the original issue.
If these changes are needed, then they must be justified in a separate issue and must be prepared
in a separate pull request / commit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10675
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10677
Previously, Storage.table was initialized after startFreeDiskSpaceWatcher was called.
This created a potential data race condition: if openTable took a long time to complete
and freed disk space during that window, the free disk space watcher could read an
uninitialized (or partially initialized) Storage.table, leading to an invalid memory
address or nil pointer dereference panic.
This commit properly initializes s.isReadOnly state during storage start and
starts FreeDiskSpaceWatcher after openTable.
Bug was introduced in github.com/VictoriaMetrics/VictoriaMetrics/commit/27b958ba8bc66578206ddac26ccf47b2cc3e8101
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10747
Align group evaluation time with the `eval_offset` option to allow users
to manage group execution more effectively by understanding the exact
time each group will be scheduled, particularly in cases of spreading
rule execution within a window, chaining groups, or debugging data delay
issue.
If the group evaluation takes less than the group interval, but the
initial evaluation combined with the additional restore operation
exceeds the group interval, the evaluation time will be gradually
corrected in subsequent evaluations, as the interval ticker schedule
remains unchanged.
For groups without `eval_offset`, this change also ensures that all
evaluations follow the interval. Previously, the gap between the first
and second evaluations was larger than the interval. And the
`eval_delay` continues to help prevent partial responses.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10772.
Follow-up commit for
211fb08028
Address @f41gh7 review comments:
- Move code from `lib/osinfo` to `lib/appmetrics`.
- Make the logic private.
- Use metrics.WriteGaugeUint64 func.
- Remove registration logic from `app/xxx/main.go`.
- Remove `lib/osinfo` package.
At 00:00 UTC the ingested samples start to have timestamps for the new
day (in the ingested samples are always recent). Even though there was a
next-day prefill of the per-day index during the last hour of the day,
some performance degradation is still possible.
For example, in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10698
it is manifested as `vminsert-to-vmstorage connection saturation` peaks
right after midnight.
Possible hypothesis why this is happening. At midnight,
currHourMetricIDs is empty and prevHourMetricIDs cannot be used because
it holds metricIDs for the previous day. So the ingestion logic hits
dateMetricIDsCache which may not have the metricID in its read-only
buffer and therefore should aquire lock to check its prev read-only
buffer or read-write buffer. Which creates lock contention and therefore
raises ingestion request latency.
A solution to this could be re-using the nextDayMetricIDs during the
first hour of the day. During this time, it is equivalent to
currHourMetricIDs.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
This change reverts part of the changes in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10686
Motivation: docs added https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10686 in most cases are too verbose, ai-generated and bringing low practical sense.
The improvement goal: remove bloat from the docs and keep them practical and useful.
What it does:
- Completely removes items from the sidebar
- Moves the content of the most important playground pages to the
`/playground/` stub (README.md). Use H2s for each playground.
- Updates and cleans the text.
- Removes the individual children pages in the playground category (keep
only the `/playgrounds/` page/stub and remove the children).
- Removes items as these don't really need much introduction or aren't
playgrounds:
- log to logsql: a conversion tool
- sql to logsql: same
- adds Grafana playground section
Links of child pages will become invalid. We don't preserve them as this is pretty new doc (1w on prod) and is unlikely to have already persisted links somewhere.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This commit adds new metrics `vmalert_remotewrite_queue_capacity` and `vmalert_remotewrite_queue_size`, which is updated with each push and it's
frequency depends on `-remoteWrite.concurrency`,
`remoteWrite.flushInterval`
It doesn't account for the pending data within each pushers request, it
should provide a general indication of the queue usage.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10765
Exctract repeated code from nextDayMetricIDs synctests into separate
funcs to make the code more readable.
The change was originally introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10704 and was
extracted into a separate PR to keep the original change simple.
Previously, vminsert did not account for the ingest concurrency limit in buffer size calculation.
This could lead to excessively large buffers and OOM errors when the concurrency limit was reached.
This commit fixes buffer size calculation by separating `insertCtx` and `storageNode` buffer size limits.
`storageNode` buffer size is set to a larger value, as it is allocated per configured `-storageNode`
and is independent of the concurrency limit.
`insertCtx` buffer size now accounts for the configured concurrency limit
and calculates the maximum buffer size accordingly.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10725
Previously, vmselect in cluster-native mode could return partial responses to upstream vmselect.
Since upstream vmselect expects full responses (mimicking vmstorage behavior),
partial responses must be disabled in cluster-native mode.
This prevents incomplete responses from being cached at the upstream vmselect level.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10678
Automatically set daily and hourly series limits to `MaxInt32` when `remoteWrite.maxHourlySeries` or `remoteWrite.maxDailySeries` is set to `-1`.
This change addresses a usability issue with the cardinality limiter. Users may want to enable the limiter to observe its metrics before deciding on an appropriate limit. However, the underlying bloom filter only supports `int32`, so setting large values can lead to overflow.
With this PR:
* Setting either flag to `-1` is treated as “no practical limit” and internally mapped to `math.MaxInt32`
* Values exceeding `int32` are safely clamped to `MaxInt32` to prevent overflow
This allows users to enable the limiter for estimation purposes without risking invalid configurations or runtime issues.
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9614
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Previously, last scrape result was unconditionally update, despite possible scrape error.
The commit updates last scrape result only at successful scrape. It properly accounts `scrape_series_added` metric and aligns it with the same metric in Prometheus.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10653
Previously introduced flag `requestBufferSize` raised default value for
in-memory buffer from 16KB to 32KB. It could increase memory usage for
vmauth. Also it made unclean how to actually disable requests buffering.
This commit aligns flags value to the 16KB. And disables requests
buffering if any of flags value are 0 as mentioned at flags description.
If any of flags have non-default value, those value are used as max size
for request buffer. If both flags are modified - bigger value wins.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10675
I expect the change to help in two ways:
1. Spreading remote write flushes over the flush interval to avoid
congestion at the remote write destination;
2. Enhance queue data consumption. Currently, all flushers may always
flush data simultaneously, resulting in periods where no flushers are
consuming data from the queue, which increases the risk of reaching the
queue limit `remoteWrite.maxQueueSize` even when a increased
`remoteWrite.concurrency`. By making the flushers more dispersed, it is
more likely that some flushers are consistently consuming data from the
queue, which should make queue management easier.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10729/
Changes:
- Added the number of `pending alerts` and `firing alerts`
- Improvement `transormations` for panel - FIRING over time by group and rules
- Added sort for panel - FIRING over time by rule
Signed-off-by: sias32 <sias.32@yandex.ru>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Replace 1.2 multiplier with 1.25 in disk space estimation formula.
1.2 only provides ~16.7% free space, while the docs recommend keeping
20%. Using 1.25 correctly accounts for 20% free space.
Inspired by
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10394
Add per-URL `-remoteWrite.disableMetadata` flag to control metadata
sending for each remote storage independently.
After v1.137.0 enabled `-enableMetadata` by default, metadata is sent to
ALL remote write targets, even those with relabeling filters that drop
most metrics. This causes unnecessary growth in
`vmagent_remotewrite_requests_total`. and significant increase in
network load for heavy filtered remote write destinations.
When the network between client and s3 server is unstable, the client may encounter temporary io.EOF errors when reading the response from s3 server.
Currently, the s3 sdk in vmbackup uses the default retry policy. However, this default retry policy won't retry when s3 sdk meet unexpected EOF. This means that the temporary unexpected EOF error will cause the backup task to fail.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10699
Add link to blogpost with detailed information about zstd+rw protocol.
This PR is based on question in community channel about implementation
details.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Implemented dedicated thanos migration mode for vmctl to migrate data from Thanos installations to VictoriaMetrics.
Key features:
1. Raw and downsampled blocks support: Reads both raw blocks
(resolution=0) and downsampled blocks (5m/1h resolution) directly from
Thanos snapshots
2. All aggregate types: Imports count, sum, min, max, and counter
aggregates from downsampled blocks as separate metrics with resolution
and type suffixes (e.g., metric_name:5m:count)
3. Dedicated flags: Uses `--thanos-*` prefixed flags (--thanos-snapshot,
--thanos-concurrency, --thanos-filter-time-start,
--thanos-filter-time-end, --thanos-filter-label,
--thanos-filter-label-value, --thanos-aggr-types)
4. Selective aggregate import: Use `--thanos-aggr-types` to import only
specific aggregates
Usage:
```
vmctl thanos --thanos-snapshot /path/to/thanos-data --vm-addr http://victoria-metrics:8428
```
Closes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9262
Signed-off-by: Dmytro Kozlov <d.kozlov@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
## Summary
This PR implements split phase metrics for filestream operations as
requested in #10432.
### Changes
- Added `vm_filestream_fsync_duration_seconds_total` metric to track
fsync syscall duration separately
- Added `vm_filestream_fsync_calls_total` metric to count fsync calls
- Added `vm_filestream_write_syscall_duration_seconds_total` metric to
track write syscall duration (previously mixed with flush time)
- Refactored `MustClose()` and `MustFlush()` to use new `flush()` and
`sync()` helper methods
- Kept `vm_filestream_write_duration_seconds_total` for backward
compatibility
### Problem Solved
Previously, `vm_filestream_write_duration_seconds_total` was being
incremented in two places:
1. `statWriter.Write()` - triggered by `bw.Flush()` and `bw.Write()`
2. `Writer.MustFlush()` - which included the above process, leading to
double-counting
This made it impossible to distinguish between write syscall time and
fsync time, which is critical for diagnosing storage latency issues.
### Solution
The new metrics allow users to:
- Distinguish "flush got slower" vs "fsync got slower" using metrics
only
- No file path labels (bounded cardinality)
- No double-counting between metrics
### Testing
- Code compiles successfully
- All existing metrics are preserved for backward compatibility
Closes#10432
---------
Signed-off-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Signed-off-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
These types hide public types from lib/storage/metricnamestats package.
These types do not resolve any practical issues. Instead, they add a level of indirection,
which complicates reading and understanding the code.
These types were introduced in the commit 795d3fe722
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6145
### Describe Your Changes
Some users may not know that VictoriaMetrics Cloud provides relevant
features to manage workloads. This change add notes in relevant places
in which users may find that a managed solution is what they need.
The intention is not to push users to Cloud, but giving the information.
That's why it's always phrased like: "If you don't want to do X, Cloud
can do it for you", instead of "Start for free, etc". This is an Open
Source first project, and shall remain as such.
After this gets proper review, VictoriaLogs and other repos may follow.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Jose Gómez-Sellés <14234281+jgomezselles@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Commit 83da33d8cf
removed NFS directory delete retries. It was made on assumption, that
only directory rename could cause such issues. However, both rename and
unlink uses the same "silly rename" logic
https://linux-nfs.org/wiki/index.php/Server-side_silly_rename
and linux kernel - `fs/nfs/dir.c` `nfs_unlink` and `nfs_rename`.
And NFS client may treat file still open, even if it
was properly closed by application. Most probably it could be triggered, because VictoriaMetrics may
open the same file multiple times ( data read and background merges).
There is no issue with VictoriaMetrics itself, it properly closes files. But NFS-client may have delays
or cache metadata information for the files. So it could trigger silly rename behavior.
This commit restores original behavior with deletion retries and brings
back metrics for unsuccessful delete operations.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9842
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10680
We noticed that backup restores in our environment were much slower than
the hardware/bandwidth constraints would suggest and we traced this down
to a couple of bottlenecks. This PR attempts to address all of them.
#### Lack of pre-allocation of files,
This was causing writes far into files to be quite slow as new blocks
needed to be continually allocated. This was particularly bad on ext4
for us, but will likely be applicable to most disks and filesystems,
you'll see the impl here is linux specific but this is mostly because I
don't have a test env for any other platform and didn't want to blindly
make changes without a validation env.
This comes with the downside of no longer being to to resume a restore
mid file, and requiring the re-downloading of parts already in the file
size the file will appear at full size from the very start. This is I
think _generally_ a good tradeoff for the restore speed gains, it is
definitely a tradeoff so I've included a flag to disable the
pre-allocation behavior and fall back to the existing part diffing
logic.
#### Fsync after each part
With many small parts in relatively few files, or in high concurrency
setups the the writerCloser fsync on each part(actually double fsync
since both `filestream.Writer.mustFlush` and
`filestream.Writer.mustClose` both fsync). Was causing slowdowns since
we would be continually queuing fsyncs.
With the pre-allocation pattern the file is only "ready" once re-named
so I moved to a per file fsync after rename.
#### Concurrent read/write
The previous download pattern was to do a read from the remoteFs, with
whatever latency that entailed, then sequentially do a write, again with
whatever latency that entailed. This meant that throughput was limited
to `readLatency + writeLatency * blockSize`.
Similar to how `crossTypeCopy` is implemented in the backup process we
can instead use `io.pipe` to allow two goroutines to work in parallel
with a small buffer between them.
#### Pagecache avoidance
`filestream.Writer` does quite a lot to avoid polluting the page cache,
but this is not relevent in a restore context and with large sequential
block writes its much more effecient to let the OS flush the pagecache
whenever it wants rather than doing a bunch of small buffer syscalls to
flush blocks.
Therefore this switches over to a much simplier directWriterCloser that
does direct file IO and lets the OS handle flushes while mid write.
### Performance
Before the changes we were seeing writes speeds of only 100MBps, this
was a restore from EBS volumes, ext with 1GB/s throughput with
<img width="1613" height="586" alt="Screenshot 2026-03-16 at 1 29 46 PM"
src="https://github.com/user-attachments/assets/5d54dcb7-cb59-43e0-9247-fda8c70feb2f"
/>
After these changes in the same restore env we're seeing 600MBs flat
rates.
<img width="1611" height="471" alt="Screenshot 2026-03-16 at 1 31 33 PM"
src="https://github.com/user-attachments/assets/ea8e2eb7-533a-48fa-99e0-0b38286e5572"
/>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Remove shards as they only complicate things when the number of requests
per second is in the range of thousands.
Related to #10532.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This commit allows to perform JWT claim matching over 1 dimension arrays. It could
be useful from practical standpoint. Because permissions are usually assigned as a list of values.
For example, the following config allows admin access over list of assigned roles for user:
```yaml
match_claims:
access.roles: "admin"
```
JWT token:
```json
{
"access": {
"roles": [
"read",
"write",
"admin"
]
}
}
```
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10647
RFC-7617 allows empty password/username. Moreover, from RFC standpoint both empty values are valid as well. It should be just encoded as `:`. So this commit relaxes non-empty username restriction.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6956
There are cases then the key sizeBytes is much greater than the value
sizeBytes. Therefore it is important to include the key sizeBytes into
the total.
Also fix some code comments.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Bumps [flatted](https://github.com/WebReflection/flatted) from 3.3.3 to
3.4.2.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3bf09091c3"><code>3bf0909</code></a>
3.4.2</li>
<li><a
href="885ddcc33c"><code>885ddcc</code></a>
fix CWE-1321</li>
<li><a
href="0bdba705d1"><code>0bdba70</code></a>
added flatted-view to the benchmark</li>
<li><a
href="2a02dce7c6"><code>2a02dce</code></a>
3.4.1</li>
<li><a
href="fba4e8f2e1"><code>fba4e8f</code></a>
Merge pull request <a
href="https://redirect.github.com/WebReflection/flatted/issues/89">#89</a>
from WebReflection/python-fix</li>
<li><a
href="5fe86485e6"><code>5fe8648</code></a>
added "when in Rome" also a test for PHP</li>
<li><a
href="53517adbef"><code>53517ad</code></a>
some minor improvement</li>
<li><a
href="b3e2a0c387"><code>b3e2a0c</code></a>
Fixing recursion issue in Python too</li>
<li><a
href="c4b46dbcbf"><code>c4b46db</code></a>
Add SECURITY.md for security policy and reporting</li>
<li><a
href="f86d071e0f"><code>f86d071</code></a>
Create dependabot.yml for version updates</li>
<li>Additional commits viewable in <a
href="https://github.com/WebReflection/flatted/compare/v3.3.3...v3.4.2">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Due to a conflict with VL FAQ page identifier,
VM FAQ page stopped rendering.
This change adds unique identifier to VM FAQ page and fixes the issue.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, by mistake, datasource was referenced by input name instead
of variable name. For an unknown reason, it worked well in local setup
and on playground.
This fix is confirmed by users and continues working at local setup
and playground.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Updated the [HA monitoring setup in Kubernetes via VictoriaMetrics
Cluster](https://docs.victoriametrics.com/guides/k8s-ha-monitoring-via-vm-cluster/)
guide.
Changes:
- Added an introduction explaining how HA works in this guide
- Updated and verified commands used in the guide
- Replaced using Grafana UI usage in favor of using VMUI instead (it was
used to run queries, it's easier to just use the built-in VMUI instead
of installing Grafana just to use the Explore tab)
- Removed Grafana screenshots and replaced them with VMUI
- Tested on a modern version of GKE
- Added explanations for `replicationFactor`, de-duplication, and
`isPartial`
- Added next steps
- Added VMUI screenshots
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This commit adds a rpc retry by dialing a new connection instead of
getting an old one from the connection pool when the previous rpc error
is `io.EOF`.
It helps prevent broken connections from remaining for too long and
causing failed requests and partial responses during `vmstorage` rolling
restart period
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10314
Previously inmemoryPart refCount was not properly decremented.
Previous behavior:
* createInmemoryPart called newPartWrapperFromInmemoryPart and returns a partWrapper with refCount=1
* multiple parts are merged in mustMergeInmemoryPartsFinal, which creates a new merged part
* the source partWrappers are never decRef'd
* Since refCount never reaches 0, putInmemoryPart and (*part).MustClose are never called
This commit properly decrements refCount at mustMergeInmemoryPartsFinal.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10086
This commit adds a new `folder_ids` field in
`yandexcloud_sd_configs` that allows users to specify Yandex Cloud
folder IDs directly, bypassing the organization->cloud->folder hierarchy
traversal.
Previously, the Yandex Cloud service discovery required traversing the
entire resource hierarchy (organizations -> clouds -> folders ->
instances) to discover instances. This works when the Service Account
has permissions at all levels. However, some Service Accounts may only
have permissions at the folder level, causing discovery to fail when it
cannot access organization or cloud resources.
With this change, users can now configure folder IDs directly:
```yaml
yandexcloud_sd_configs:
- service: compute
folder_ids:
- folder-id-1
- folder-id-2
```
When `folder_ids` is specified, the discovery skips the hierarchy
traversal and directly queries instances from the specified folders.
This is a backward-compatible change - when `folder_ids` is not
specified, the existing behavior is preserved.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10587
The new section is placed in root directory and is supposed to promote
information about the following tools:
* MCP servers for Logs, Traces and Metrics
* List of available agentic skills
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
The change suppose to make it more clear for understanding and stress
attention on important things.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
The new Grafana dashboard uses the following APIs:
- /api/v1/status/tsdb
- /api/v1/status/metric_names_stats
It shows the list of metric names, the request count and the last time
they were "used". Clicking on metric name allows exploring its
cardinality.
Based on https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9832
-----------
The PR contains a few unrelated changes:
* rename of folder for prometheus datasource to remove the duplicated
word
* fix for vmalert's access to the datasource, as before it wasn't able
to write/read properly
-------------
The dashboard screen cast:
https://github.com/user-attachments/assets/01dda5d9-14e5-4f5a-b795-a838abec4f5e
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Haley Wang <haley@victoriametrics.com>
### Describe Your Changes
When a label is set as focus label in the Cardinality Explorer, the
"Metric names with the highest number of series" table was hidden. This
change makes it visible alongside the focus label values table.
### How to reproduce
1. Go to Explore → Cardinality Explorer
2. Enter a selector like `{namespace!=""}` and set Focus label to
`namespace`
3. Click Execute Query
**Before:** Only "Values for 'namespace' label..." table is shown
**After:** "Metric names with the highest number of series" table is
also shown
<img width="1512" height="723"
alt="b2a8395a1577b31f58ae00f87e29eb87ca98eabfd0b3c0d9185be8f3a9789b5f"
src="https://github.com/user-attachments/assets/50c7f67a-1cfc-40d0-8e99-7750a933ee45"
/>
Fixes#10630
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Roshan1299 <banisettirosh@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Bumps and [minimatch](https://github.com/isaacs/minimatch). These
dependencies needed to be updated together.
Updates `minimatch` from 3.1.2 to 3.1.5
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7bba97888a"><code>7bba978</code></a>
3.1.5</li>
<li><a
href="bd259425b2"><code>bd25942</code></a>
docs: add warning about ReDoS</li>
<li><a
href="1a9c27c757"><code>1a9c27c</code></a>
fix partial matching of globstar patterns</li>
<li><a
href="1a2e084af5"><code>1a2e084</code></a>
3.1.4</li>
<li><a
href="ae24656237"><code>ae24656</code></a>
update lockfile</li>
<li><a
href="b100374922"><code>b100374</code></a>
limit recursion for **, improve perf considerably</li>
<li><a
href="26ffeaa091"><code>26ffeaa</code></a>
lockfile update</li>
<li><a
href="9eca892a4e"><code>9eca892</code></a>
lock node version to 14</li>
<li><a
href="00c323b188"><code>00c323b</code></a>
3.1.3</li>
<li><a
href="30486b2048"><code>30486b2</code></a>
update CI matrix and actions</li>
<li>Additional commits viewable in <a
href="https://github.com/isaacs/minimatch/compare/v3.1.2...v3.1.5">compare
view</a></li>
</ul>
</details>
<br />
Updates `minimatch` from 9.0.5 to 9.0.9
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7bba97888a"><code>7bba978</code></a>
3.1.5</li>
<li><a
href="bd259425b2"><code>bd25942</code></a>
docs: add warning about ReDoS</li>
<li><a
href="1a9c27c757"><code>1a9c27c</code></a>
fix partial matching of globstar patterns</li>
<li><a
href="1a2e084af5"><code>1a2e084</code></a>
3.1.4</li>
<li><a
href="ae24656237"><code>ae24656</code></a>
update lockfile</li>
<li><a
href="b100374922"><code>b100374</code></a>
limit recursion for **, improve perf considerably</li>
<li><a
href="26ffeaa091"><code>26ffeaa</code></a>
lockfile update</li>
<li><a
href="9eca892a4e"><code>9eca892</code></a>
lock node version to 14</li>
<li><a
href="00c323b188"><code>00c323b</code></a>
3.1.3</li>
<li><a
href="30486b2048"><code>30486b2</code></a>
update CI matrix and actions</li>
<li>Additional commits viewable in <a
href="https://github.com/isaacs/minimatch/compare/v3.1.2...v3.1.5">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Poison varint: MaxUint64 encoded as varint (0xFFFFFFFFFFFFFFFF).
The bounds check uint64(nSize)+n overflows to 9, bypassing the guard.
Then int(MaxUint64)=-1 makes src[10:9] which panics.
Previously there was a data-race, when targetURL was concurrently
updated in case of default url route.
This commit fixes data-race and adds concurrency to the routing tests.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10626
OAuth2 token source lib doesn't allow to define request headers explicitly.
This commit adds a custom transport to mitigate it. New transport modifies http.Request by making a shallow copy of it and setting additional headers.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8939
Previously vm_filestream_write_duration_seconds_total will be increased in two places:
* statWriter.Write()
* Writer.MustFlush(). It will eventually call statWriter.Write(), hence double counting vm_filestream_write_duration_seconds_total
For reference, vm_filestream_read_duration_seconds_total will be increased only in statReader.Read to track read syscall.
This commit removes latency tracking from MustFlush method.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10564
Replace ambiguous button labels such as "Submit" and "Apply" with
clearer wording to indicate that these actions only preview results and
do not modify the deployment configuration.
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10453
Add support for [OpenID Connect
Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html#IANA)
as an alternative way to obtain verification keys and rotate them
automatically.
`jwt` configuration should allow **exactly one** of the following
verification modes: `public_keys`, `oidc`, `skip_verify`. These options
must be mutually exclusive.
Example: OIDC configuration
```yaml
users:
- jwt:
oidc:
issuer: http://identity-provider.com
```
When `oidc` is enabled:
1. On startup, `vmauth` fetches:
```
{issuer}/.well-known/openid-configuration
```
2. Extracts `jwks_uri`.
3. Fetches [JWK
keys](https://openid.net/specs/draft-jones-json-web-key-03.html#ExampleJWK)
from `jwks_uri`.
4. Uses discovered keys to verify JWT tokens.
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10585
Failure handling:
* If discovery fails at startup:
* No keys are available.
* The user is skipped.
* Discovery runs periodically in background (e.g., every 1 minute).
* If keys become available later, authentication should start working
automatically.
* If keys were previously fetched and the identity provider becomes
unavailable:
* Cached keys must be preserved.
* Authentication continues using cached keys.
#### JWT Requirements in OIDC Mode
When `oidc` is enabled:
* `iss` claim becomes
[mandatory](https://openid.net/specs/openid-connect-core-1_0.html#IDToken).
* `iss` [must
match](https://openid.net/specs/openid-connect-core-1_0.html#RotateEncKeys):
* `oidc.issuer` from config.
* `issuer` returned in the OpenID configuration document.
* JWT header must contain `kid`.
* `kid` must be used to select the appropriate key from JWKS.
* Tokens without `kid` must be rejected.
* Tokens without `iss` must be rejected.
Rationale
* Enables automatic key rotation.
* Eliminates manual public key configuration.
* Maintains compatibility with standard OIDC providers.
---------
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
- VMUI Explore Metrics uses `rate` for histogram bucket queries, which
skips the first observation
in each bucket because `rate` requires two data points to calculate a
per-second rate.
- Replace `rate` with `increase_pure`, which assumes counters start from
0 and correctly shows
the first observation when a new bucket appears.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10365
The test should not fail now on systems with 1 cpu because partition
indexDBs are not rotated. See #8948.
Also removed two TODOs from the test to keep it simple.
Disabling is done by making the the handlers for `/tags/tagSeries` and
`/tags/tagMultiSeries` to return `501 (Not Implemented)` status code
along with the error message saying that the API has been disabled and
will be removed in future.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10544.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Now that indexDB is per-partition, the indexDB-related docs need to be
updated. Specifically the how the indexDB is cleaned up when it becomes
outside the `-retentionPeriod`.
Follow-up for #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
There are following main use cases for `eval_offset`:
1. To ensure rules are evaluated at an exact offset, so the results have
the exact timestamp the user wants.
2. The source data for a certain rule is delivered at a specific time
point, so rules need to be executed after that time point to get correct
results. For example, [chaining
groups](https://docs.victoriametrics.com/victoriametrics/vmalert/#chaining-groups).
3. A group contains some heavy rules that can take a few minutes to
finish. To guarantee a single evaluation can complete in time and not
delay the next run, the user may want to schedule the group to be
executed within [intervalStart, intervalEnd-avgTotalEvaluationDuration].
Negative value can be convenient for case3, as users only need to set
group `eval_offset: -avgTotalEvaluationDuration(a bigger value than the
real duration to leave some buffer would be better)`.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10424
Due to bug introduced at initial datadog-sketches API implementation, `host` label was incorrectly obtained from `Tags` structure. While actually it's present directly at root of protobuf message.
This commit properly attaches `host` label in such case.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10557
If the data flush to the remote write destination takes longer than the
periodic flush interval (default 2s), the ticker channel will contain a
stale tick, causing the ticker case to be selected too early with an
empty or small amount of data inside `wr`, resulting in a wasted remote
write request with one or two time series(if `ts, ok := <-c.input` was
also randomly selected beforehand).
We could also consider resetting the ticker after drain the stale tick
to ensure `wr` always accumulates data for the full flush interval, but
that seems more trivial to me.
Add new option per-user to print access logs. Such logs
contain limited amount of information to prevent exposing
sensitive data.
Access logs can be enabled/disabled via hot-reload and could
help locating clients that incorrectly use or abuse vmauth.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5936
Cluster apptests failed from time to time with the following error:
```
timed out while waiting for inserted rows to be sent to vmstorage
cluster
```
due to incorrect calculation of inserted row count before and after
insertion. This PR fixes it by putting the "before" count calculation
before the send() operation.
Previously, the client certificate was only refreshed during the TLS
handshake, which occurs when establishing a new connection. This meant
the remote HTTP server had to close the existing connection for the
client to pick up an updated (e.g. expired) certificate. As a
workaround, connection keep-alive could be disabled, but that
significantly increased request latency.
This commit adds a certificate check during HTTP RoundTrip. If the
client certificate has changed, the RoundTripper recreates the transport
and its connection pool. This behavior is already implemented for CA
certificate changes.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10393
Per @valyala's request, rename storage cache methods to adhere the
following format:
```
get[Value]By[Key]FromCache
put[Value]By[Key]ToCache
```
Also move `s.metricIDCache` methods from `indexDB` to `Storage` because
this cache exists at the `Storage` level.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Enable BuildKit-native SPDX SBOM and provenance attestations by setting
`--sbom=true --provenance=true` in `docker buildx build` within
`publish-via-docker`.
- Set `--provenance=true --sbom=true` in `publish-via-docker` for both
Alpine and scratch variants
- Add SBOM section to SECURITY.md with inspection and Trivy scan
instructions
- Update Release-Guide.md
- Add changelog entry
Verified end-to-end: pushed test image to GHCR, confirmed SBOM
attestation via `docker buildx imagetools inspect`, and Trivy scan via
`trivy image --sbom-sources oci` succeeded (with 0 vulnerabilities :-)).
Fixes#10473
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: John Allberg <john@ayoy.se>
Signed-off-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Samples in Mimir (or Prometheus) are stored in chunks, which are
compressed efficiently using algorithms rather than being stored as
independent samples, see details in [this
article](https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/)
and [this talk](https://www.youtube.com/watch?v=b_pEevMAC3I).
When using a small `--remote-read-step-interval`, particularly `minute`,
a single chunk may contain samples that exceed the requested time
window, and all the returned chunks contain overlapping samples.
Consequently, vmctl will read and migrate many duplicate samples into
VictoriaMetrics.
In tests, `--remote-read-step-interval=minute
--remote-read-use-stream=true` with raw sample `scrape_interval: 10s`
and remote read time range of 24h can write ~20x duplication.
But I assume the minute interval is rarely used with a large time range
and duplicates are fine in VictoriaMetrics due to deduplication, so we
don't need to disallow using it.
```
## --remote-read-step-interval=minute --remote-read-use-stream=false
## total samples: **15696611(the real number)**
2026/02/26 22:10:25 VictoriaMetrics importer stats:
idle duration: 50.080851955s;
time spent while importing: 32.108903417s;
total samples: 15696611;
samples/s: 488855.41;
total bytes: 735.8 MB;
bytes/s: 22.9 MB;
import requests: 79;
import requests retries: 0;
2026/02/26 22:10:25 Total time: 32.112912208s
## --remote-read-step-interval=day --remote-read-use-stream=true
## total samples: 15878869
2026/02/26 22:20:37 VictoriaMetrics importer stats:
idle duration: 960.698874ms;
time spent while importing: 6.338309625s;
total samples: 15878869;
samples/s: 2505221.41;
total bytes: 278.6 MB;
bytes/s: 44.0 MB;
import requests: 80;
import requests retries: 0;
2026/02/26 22:20:37 Total time: 6.340023167s
## --remote-read-step-interval=hour --remote-read-use-stream=true
## total samples: 21824000
2026/02/26 22:13:14 VictoriaMetrics importer stats:
idle duration: 5.238827666s;
time spent while importing: 7.274528s;
total samples: 21824000;
samples/s: 3000057.19;
total bytes: 394.4 MB;
bytes/s: 54.2 MB;
import requests: 110;
import requests retries: 0;
2026/02/26 22:13:14 Total time: 7.278895084s
## --remote-read-step-interval=minute --remote-read-use-stream=true
## total samples: **353800724(353800724/15696611~22.5)**
2026/02/26 22:18:41 VictoriaMetrics importer stats:
idle duration: 1m45.09105431s;
time spent while importing: 1m51.716730125s;
total samples: 353800724;
samples/s: 3166944.86;
total bytes: 6.8 GB;
bytes/s: 61.3 MB;
import requests: 1769;
import requests retries: 0;
2026/02/26 22:18:41 Total time: 1m51.721834958s
```
### Describe Your Changes
Move OpenTelemetry-related documentation under docs/integrations and
docs/data-ingestion to establish a clear, scalable structure.
As OpenTelemetry support expands, we need a dedicated place to document
protocol details, implementation specifics, and known limitations, such
as:
- Delta temporality not working with downsampling. See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10014#issuecomment-3697509266.
- Negative histogram buckets being discarded by VictoriaMetrics. See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9896.
The new structure separates concerns:
- `docs/integrations/` — protocol overview, implementation details, and
limitations.
- `docs/data-ingestion/` — OpenTelemetry Collector configuration and
ingestion setup.
This aligns OpenTelemetry documentation with the existing structure used
across other integrations and ingestion methods.
New pages and links preserve backward compatiblity
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
- Updated GKE version to a more current 1.34+
- Updated guide to more modern Helm and Kubectl versions
- Tested updated instructions on GKE 1.34.1-gke.3971001 (and a local k3s
instance) successfully
- Removed revision from Grafana values for helm chart (confirmed it
pulls the latest revision)
- Split the helm chart values (`guide-vmcluster-vmagent-values.yaml`)
into more readable chunks and added explanations next to each chunk
- Added and updated expected outputs. Some were missing and others were
outdated
- Updated Grafana dashboards screenshots since they changed from the
last revision
- Updated Grafana repo to use community org (old grafana chart was
deprecated
on Jan 30th -
[source](https://community.grafana.com/t/helm-repository-migration-grafana-community-charts/160983))
- Minor corrections and typo fixes. Improved flow
- Added a section at the end pointing readers where they can go next.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Vadim Rutkovsky <vadim@vrutkovs.eu>
## Summary
Fix an invalid MetricsQL numeric literal in the vmctl monitoring
documentation.
## Problem
The PromQL/MetricsQL example query for monitoring vm-native migration
data transfer speed used `1Mb` as a divisor:
```promql
rate(vmctl_vm_native_migration_bytes_transferred_total[5m]) / 1Mb
```
However, `Mb` is **not** a valid MetricsQL numeric suffix. According to
the [MetricsQL
documentation](https://docs.victoriametrics.com/victoriametrics/metricsql/#numeric-values):
> Numeric values can have `K`, `Ki`, `M`, `Mi`, `G`, `Gi`, `T` and `Ti`
suffixes.
The suffix `Mb` does not exist — only `M` (mega, 10^6) and `Mi` (mebi,
2^20 = 1,048,576) are valid.
## Fix
Replace `1Mb` with `1Mi` (1 mebibyte = 1,048,576 bytes), which is the
standard binary unit for memory/storage transfer measurements in
computing, and update the comment to reflect `MiB/s` instead of `MB/s`.
## Files Changed
- `docs/victoriametrics/vmctl/vmctl.md`: fixed the invalid literal `1Mb`
→ `1Mi` and updated the comment
---------
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: Vadim Alekseev <vadimaleksv@gmail.com>
Co-authored-by: Yury Moladau <yurymolodov@gmail.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* add meaningful description, it is required for publishin on grafana.com
* remove dependency on `victoriametrics-metrics-datasource` as it is not used
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit improves compatibility with promql by introducing a missing function `histogram_fraction`.
histogram_fraction is a shortcut for `histogram_share(upperLe, buckets) - histogram_share(lowerLe, buckets)`
histogram_count, histogram_sum or histogram_avg will not be added to metricsQL, as they only operate on Prometheus native histogram, which doesn't have _count and _sum series like the classic histogram or Victoriametrics histogram. For classic histogram, _count and _sum series can be used directly.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5346.
This commit optimizes the storage of originalLabels. Previously, they
were stored as a clone of the discovered labels, which required many
small allocations and added high pressure on the garbage collector.
Now originalLabels are stored as zstd-compressed JSON ([]byte). Since
they are rarely requested, the overhead of zstd decompression and
json.Unmarshal is negligible.
This optimization reduces memory usage for storing originalLabels by 3x
and CPU usage by 2x.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9952
After golang 1.23 it's safe to ignore timer.Reset True value.
According to the spec:
For a chan-based timer created with NewTimer, as of Go 1.23,
any receive from t.C after Reset has returned is guaranteed not
to receive a time value corresponding to the previous timer
settings;
If the program has not received from t.C already and the timer is
running, Reset is guaranteed to return true.
Before Go 1.23, the only safe way to use Reset was to call [Timer.Stop]
and explicitly drain the timer first.
Golang 1.23 changed timer implementation from sync and async. And it
made possible that chan send and timer.Stop could happen in the same
time.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9721
# Investigation & Root Cause --- InfluxDB Line Protocol Parsing with Raw
Newline (`\n`)
This document describes the investigation process and root cause
analysis for Influx Line Protocol parsing errors in VictoriaMetrics when
a **raw newline (`\n`) byte appears inside a quoted field value**.
------------------------------------------------------------------------
## Background
According to the Influx Line Protocol specification:
- Each point must be represented as a single line.
- The newline character (`\n`) separates points.
- Literal newline bytes are not allowed inside quoted field values.
Therefore, any raw newline byte (`0x0A`) inside a quoted string makes
the line invalid.
------------------------------------------------------------------------
## Related Issue
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067
------------------------------------------------------------------------
## Expected Behavior
VictoriaMetrics should reject Influx Line Protocol lines that contain a
raw newline inside a quoted field value, since this violates the
protocol specification.
The parsing failure itself is correct.
------------------------------------------------------------------------
## Actual Behavior
VictoriaMetrics rejects the line with the following error:
cannot parse field value for "...": missing closing quote for quoted
field value
While technically correct, the error message does not clearly indicate
that the root cause is a raw newline inside the quoted field value.
------------------------------------------------------------------------
## Minimal Reproducer
The issue can be reproduced without Telegraf or Jolokia:
``` bash
printf 'test value="hello
world"\n' | curl -X POST http://localhost:8428/write --data-binary @-
```
This produces:
cannot parse field value for "value": missing closing quote for quoted
field value
The failure occurs because the value contains an actual newline byte
(0x0A), not the escaped sequence `\n`.
------------------------------------------------------------------------
## Environment Setup
The issue was reproduced using the following stack:
- VictoriaMetrics v1.127.0
- InfluxDB 1.8
- Spring Boot + Jolokia
- Telegraf 1.36.2
Telegraf collects JVM `SystemProperties`, including:
``` json
"line.separator": "\n"
```
After JSON unmarshalling, this becomes a real newline byte in memory.
Detailed reproduction steps can be found here:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067#issuecomment-3896175100
------------------------------------------------------------------------
## Observed Serialized Line
Using breakpoint debugging in:
lib/bytesutil/bytebuffer.go:58
The `ReadFrom` function reads and assembles an Influx line containing:
SystemProperties.line.separator="
",
The quoted field contains an actual newline byte before the closing
quote.
This breaks the single-line assumption of Influx Line Protocol.
VictoriaMetrics splits on `\n`, resulting in:
- A truncated first line
- A missing closing quote
- Parsing failure
------------------------------------------------------------------------
## Important Clarification
This issue is **not** caused by the escaped sequence `"\\n"`.
The failure occurs only when the serialized Influx line contains an
actual newline byte (`0x0A`) inside the quoted value.
Escaped `\n` (two characters: `\` and `n`) is valid.
------------------------------------------------------------------------
## Root Cause
- Telegraf serializes a field containing a real newline byte.
- Influx Line Protocol forbids literal newline characters inside
quoted fields.
- VictoriaMetrics correctly treats `\n` as a line separator.
- The parser then encounters an incomplete quoted field and reports
"missing closing quote".
The parsing behavior is correct per specification.
------------------------------------------------------------------------
## Proposed Improvement
The parsing logic should remain unchanged.
However, the error message can be improved to better indicate the root
cause.
Suggested error message:
invalid Influx line protocol: missing closing quote for quoted field
value;
this may be caused by a raw newline (`\n`) inside the quoted field value
This makes the failure immediately actionable and easier to diagnose.
------------------------------------------------------------------------
## Summary
- The failure is caused by a raw newline byte inside a quoted field
value.
- This violates the Influx Line Protocol specification.
- VictoriaMetrics correctly rejects the line.
- The error message should explicitly mention the possibility of a raw
newline (`\n`) inside the quoted field.
Signed-off-by: hklhai <hkhai@outlook.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Previosly, extra filters were ignored for
`/api/v1/label/vm_account_id/values` or
`/api/v1/label/vm_project_id/values` calls. In result, even if user's
visibility was limited by applying
`?extra_filters[]={vm_account_id="1"}` param they could get the list of
all available tenants in the system.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d2a033453e)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* remove ToC in the beginning, as it duplicates right-bar functionality
and is easier to make a mistake with. For example, it didn't have the
ZFS section in it
* simplify wording where it was possible
* reference new tools VM got in recent releases
* re-prioritize tips order based on personal experience
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Components like vmselect and vminsert rarely touch disk, so most of the
time their values are 0. Filtering out 0 values makes the panel cleaner.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
A refactoring that moves the uint64set.Set marshaling and unmarshaling from lib/storage/storage.go to lib/uint64set. Also added function docs and tests.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
It should prvent apptest timeouts due to runners saturation. When
apptests are run with other tests and linters they do not have enough
CPU to complete in time and often times out.
If one re-runs the apptests shortly after they are likely to pass
because the same runner has enough resources available (other job
finished).
Remove GOGC=10 as the runner has enough memory (16Gb) to run apptests.
I did some tests and obeserve drop in overal test duration from 4.5m to
3.30-3m.
The new section is supposed to contain otel related information for all
products, like VT, VM, VL.
It also supposed to be visible for readers right away, without need to
dig for info in each product.
It contains basic information and is supposed to act as a router to more
detailed info in each product.
While there, also updated VM-related otel info.
---------
Depends on
https://github.com/VictoriaMetrics/victoriametrics-datasource/pull/458
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
- Add an introduction with a brief explanation of the operator and its
benefits as an intro
- Make some steps more explicit, instead of just linking to the VM
cluster guide
- Separate config/chart values files from kubectl apply (instead of
using heredoc and in-line yaml)
- Update screenshots and add figcaptions where needed
- Update Kubernetes and tools versions to newer releases
- Remove revision numbers from the Grafana config to install the latest
revision
- Added a section to configure scraping of Kubernetes resources (nodes,
pods, etc.)
- Tested updated instructions on GKE 1.33 and 1.34 (and a local k3s
instance) successfully
- Added and updated expected outputs. Some were missing and others were
outdated
- Updated Grafana dashboards screenshots since they changed from the
last revision
- Minor corrections and typo fixes. Improved flow
- Added a section at the end pointing readers to where they can go next.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
- Updated introduction
- Added proper steps
- Tested intructions on headlamp desktop version and the in-cluster web
ui
- Added images to guide user
- Mentioned that the test connection button does not work (it probes a
`-healthy` endpoint that is not supported by VM). The plugin still
works, it's just the test button that fails
- Added links to the single and cluster installation guides
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
### Describe Your Changes
- Rewrote the introduction
- Added list of endpoints for single node, cluster, and cloud
- Added tips for working with VictoriaMetrics running on Kubernetes
- Flushed out explanations for each step
- Added reference links for all required endpoints
- Tested every command
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Commit 610b328e5a introduced a bug in the
date range search logic. If the first searched date for a given tenant
did not match, the search could proceed incorrectly.
This commit fixes the SearchTenants API by correctly advancing the date
passed to table.Seek.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422
* add example of the produced log, so users could understand the impact;
* stress once again about sensetive data exposure when
dump_request_on_errors is enabled.
This change adds some context to the error when all backend failed. From
support cases it seems like without the context users might not know
what to do with this error message. Clarification advises them to check
the prev error messages.
Previously regex simplify function made an attempt to parse string representation of simplified regex.
And it could produce runtime panic due to std lib specification:
```
// Simplify returns a regexp equivalent to re but without counted repetitions
// and with various other simplifications, such as rewriting /(?:a+)+/ to /a+/.
// The resulting regexp will execute correctly but its string representation
// will not produce the same parse tree, because capturing parentheses
// may have been duplicated or removed.
```
This commit ignores simplified regex parsing error and returns back original regex.
It results into possible missing simplification of some niche regex patterns.
But it's extremely rare cases rarely seen in production. So the tradeoff is acceptable.
Fixes victoriaMetrics/victoriaLogs/issues/1112
While server side copies when using the same backup origin and
destination are always most efficient there are times when moving
between backup locations is required.
Right now vmbackup throws an error in these cases.
While its true that a user could always do a fresh backup from a
snapshot rather than copy an old backup, this requires access to storage
data locations and a running vmstorage instance, something that is not
_generally_ required for otherwise moving backups around in remote
locations using vmbackup.
This is a small change that makes the moving of backups from one
location to another transparent to users, without having to consider if
those locations are the same or different. This both simplifies backup
migrations and unlocks using vmbackup for more complex operations.
Specifically this came up in my use case because we want to orchestrate
the down-scaling of EBS volumes backing our vmstorage cluster, which
requires some complex backup operations, one of which being taking a
backup from s3 to a local filesystem.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10401
Use the same sharded implementation as in metricIDCache. The change is
basically a copy-paste. The only difference is that the rotation period
remains `1h` instead `1m` in order not to break the fix for #10064.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This should reduce memory reallocations and fragmentation when reading large request bodies from slow clients.
This also should reduce memory usage a bit because of the reduced memory fragmentation.
Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/1042
### Describe Your Changes
The job ensure that:
- the draft release with given `$(TAG)` exists
- the release has excpected `$(GITHUB_ASSETS_COUNT)` number of uploaded
assets
- All the assets were uploaded succesfully.
It also adds helper job `github-get-release` which finds a draft release
by `$(TAG)` and stores into file `/tmp/vm-github-release-$(TAG)` file.
The `github-delete-release1 job is decoupled from the file produced by
`github-create-release job`. So it could be run at any time from any
machine.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
Adds JWT authentication support to vmauth with signature verification
and tenant-based access control. For now, public_keys have to set
explisitly in the config, OIDC discovery will be added in upcoming PRs.
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10445
Key Features
- JWT Configuration: Added `jwt_token` field to user config supporting
RSA/ECDSA public keys or skip_verify mode (for testing purposes).
- Token Validation: Verifies JWT signatures, checks expiration, and
extracts vm_access claims
- Compatible with vmgateway: jwt tokens issued for vmgateway should work
with vmauth too.
Examples
```yaml
users:
- jwt_token:
public_keys:
- |
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
-----END PUBLIC KEY-----
url_prefix: "http://victoria-metrics:8428/"
```
```yaml
users:
- jwt_token:
skip_verify: true
url_prefix: "http://victoria-metrics:8428/"
```
Constraints
- JWT tokens cannot be mixed with other auth methods (bearer_token,
username, password)
- Requires at least one public key OR skip_verify=true
- Limited to single JWT user (multiple JWT users will be supported in
the future)
Next steps
- Multiple `jwt_token` support.
- Claim matching
- Claim based routing
- OIDC\JWKS support
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Exploit uint64set data structure peculiarities (adjacent elements are
stored in
64KiB buckets) to optimize metricIDCache memory footprint.
As the result the cache utilizes 87% less memory and is up to 90%
faster. See
[benchstat.txt](https://github.com/user-attachments/files/25294076/benchstat.txt).
Follow-up for #10388 and #10346.
Thanks to @valyala for the optimization idea.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, on the last day of a month, storage could report empty
metrics for the last partition. This could happen if a new empty
partition was created in updateNextDayMetricIDs or if time series with
future timestamps were ingested.
This commit adds a check to ensure the last partition belongs to the
current month. Since this is typically the most actively used partition,
it should be treated as the last one.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10387
Ensure proper expansion and reset of `buf` size for OpenTelemetry
ingestion. This pull request does:
1. Flush data in `wctx` when `buf` is over 4MiB.
2. Do not return `wctx` with `buf` larger than 4MiB while the actual
in-use length is less than 1MiB to the pool.
Previously, when a small number of requests carried a large volume of
time series or labels, `buf` was over-expanded and recycled to the pool,
resulting in an excessive memory usage issue.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10378
This is part of the effort to upgrate and validate the [Guides in the
docs](https://docs.victoriametrics.com/guides/).
Doc page:
https://docs.victoriametrics.com/guides/getting-started-with-opentelemetry/
Functionally, nothing should change. Aside from the fix that prevented
one of the example applications to run, the rest of the commands in the
guide should be equivalent to the original.
Header anchor links do not change with this update. I added a few
headers but the existing headers anchors should remain unchanged to
prevent breaking existing links.
- Tested on a more modern version of GKE to validate it still works OK
(1.34.1-gke.3971001)
- Changed wording of some sections to improve flow and readability
- Added some missing steps/troubleshooting
- Add tips annotations for cardinality explorer and setup references to
make them stand apart form the main content
- Use `kubectl port-forward svc/...` instead of `kubecl port-forward
pod` (service selectors vs pod names) in some test commands to make
instructions simpler
- Updated OpenTelemetry version to fix error that prevented
`app.go-collector.example` sample code from running
- Replaced the "Visit these links" part in the second program (with the
fast/slow endpoints) with curl commands
- Updated the first VMUI test link to show table instead of graph while
testing OpenTelemetry ingestion (default graph view can be confusing as
there metric value for `k8s_container_ready` doesn't really show any
values)
- Minor typos, grammar check, and consistency (Kubernetes vs kubernetes,
Helm vs Helm, Collector vs collector, etc)
This change only affects query trace. It correctly uses the branched
query trace in callback function, so in trace it is placed in the right
actions branch.
Bug was introduced in
c705da74f6
- Return back the check that the size of the scraped response doesn't exceed the maxScrapeSize
at the client.ReadData(). Without this check the scraped response may be truncated to maxScrapeSize+1
bytes, which can result in decompression error. The decompression error in this case
hides the original errror about too big response side. This complicates troubleshooting by users.
- Stop decompressing the scraped response as soon as the decompressed response size exceeds maxScrapeSize.
This protects from excess memory usage needed for holding the decompressed response with sizes exceeding
the maxScrapeSize.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10320
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9481
lrucache causes huge cpu usage in some caches. See #10297.
There was a hypothesis that this was due to too short ttl in lrucache.
Setting it to 1h (the default workingsetcache eviction period) but it did not
completely eliminate the problem. The CPU utilization was not huge but still high.
See #10416.
Thus reverting back fix such deployments. This solution is temporary
because the cache consumes at least 32MB. There is one instance per
indexDB which means that if the retention is 3y then the total memory
utilized by this cache will be over 1GB and most of it will be unused.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Go1.26 has been recently released and was picked up by CI actions.
The tests and linter actions start to fail with:
GOEXPERIMENT=synctest go vet ./lib/...
go: unknown GOEXPERIMENT synctest
This happens because Go 1.26 remove synctest experiment.
Changelog:
This package was first available in Go 1.24 under GOEXPERIMENT=synctest,
with a slightly different API. The experiment has now graduated to
general availability. The old API is still present if
GOEXPERIMENT=synctest is set, but will be removed in Go 1.26.
https://go.dev/doc/go1.25#library
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
* Add context-aware label autocomplete by applying existing label
matchers (e.g. namespace/job) when fetching labels and label values.
* Update `package.json` dependencies.
* Update `vite.config.ts` to ensure correct API requests in playground
mode (`start:playground`).
Related issue: #9269
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
The expected size of the data ingestion request body accepted by VictoriaMetrics / VictoriaLogs / VictoriaTraces
exceeds 64KiB, and is close to 1MiB. That's why it is better to re-use byte buffers with capacities up to 1MiB,
even if less than 25% of their capacity was used the last time.
This should reduce the number of GC cycles at high data ingestion rate when the request body sizes
are distributed at both sided of the 16KiB ... 64KiB range.
This is a follow-up for 09d2ce36e8
Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/1042
Previously `vm_deduplicated_samples_total{type="select"}` didn't take in account identical samples.
This commit takes it in account in the same way as `vm_deduplicated_samples_total{type="merge"}` metric.
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10384.
Before, by mistake, -remoteWrite.label flag was referenced in one part
of the doc as per-remoteWrite-url flag. In fact, -remoteWrite.label is
global and applies labels to all remoteWrite URLs unconditionally.
This commit tries to clarify it in docs:
* update the life-of-a-sample diagram to change the labels applying
logic
* add hint how to add a label via `extra_label`
* removes duplicated description for -remoteWrite.label flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10373
This is the first PR on a proposed series of updates to the guides.
I started with this one because:
It's on the top ten guides according to Google Analytics
It's a good starting point for me to get familiar with VM on Kubernetes
I plan to work through the rest of the guides in the following days
(coordinating the effort with JJ).
Changelog for this guide:
- Updated GKE version to a more current 1.34+
- Updated guide to more modern Helm and Kubectl versions
- Tested updated instructions on GKE 1.34.1-gke.3971001 (and a local k3s
instance) successfully
- Removed revision from Grafana values for helm chart (confirmed it
pulls the latest revision)
- Split the helm chart values into more readable chunks and added
explanations next to each chunk
- Added and updated expected outputs. Some were missing and others were
outdated
- Updated Grafana dashboards screenshots since they changed from the
last revision
- Updated Grafana repo to use community org (old grafana chart was
deprecated
on Jan 30th -
[source](https://community.grafana.com/t/helm-repository-migration-grafana-community-charts/160983))
- Minor corrections and typo fixes
- Added a section at the end pointing readers where they can go next.
### Describe Your Changes
Add Source Code data link (link to bar or line in graph to see) that
points directly to a source code file on Github. `VictoriaMetrics -
cluster`, `VictoriaMetrics - single-node`, and `VictoriaMetrics -
vmagent` dashboards were updated. I did not add it to other panels since
they do not have Drilldown section at all.
Also, fixed a misplaced Drilldown link in `VictoriaMetrics -
single-node` dashboard.
Proxy service code is here
https://github.com/VictoriaMetrics/location2source/
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
The new title better aligns with the code of
[writeconcurrencylimiter](d9dabea303/lib/writeconcurrencylimiter/concurrencylimiter.go (L140)),
the panel description and the metric used in the query.
Previously, the panel title suggested that it reflected only disk write
performance. During an incident investigation, this led to a wrong
assumption that the panel was unrelated to client-side performance.
In reality, the metric [includes the full write
path](98e320842c/lib/vminsertapi/server.go (L263)):
time spent reading data from the TCP connection, processing it, and
acknowledging the block. The updated title reflects this behavior more
accurately and reduces the risk of misinterpretation during incident
analysis.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Previosly the error returned on timeout suggested a memory leak, which
could confuse a user. In reality timeout could happen if vmselect is
overloaded or the query takes a lot of time to process. The commit
aligns rss. RunParallel with query deadline set either via flag
`-search.maxQueryDuration` or the `timeout` query argument. The logged
warn message is adjusted to suggest resource increase or timeout
increase.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8484
Signed-off-by: JAYICE <jayice.zhou@qq.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Previously, the Kafka consumer in vmagent committed offsets per message
(manual commit). At high message rates, this could overload the commit
path (coordinator, __consumer_offsets topic, and network).
This commit introduces time-based manual commits with a controlled window:
* enable.auto.commit remains false by default.
* After a successful TryPush (data accepted into the buffer before the
vmagent queue/backend), vmagent adds the message to pending offsets.
* Offsets are committed periodically (every second), as well as during
shutdown and partition rebalance.
This keeps the commit point tied to TryPush (stronger guarantees than
auto-commit) while significantly reducing commit QPS.
Auto-commit is also time-based, but it advances offsets based on poll()
delivery rather than application-level processing. This means offsets
may be committed before data is actually accepted by the vmagent
pipeline, slightly increasing the risk of data loss on crash or restart.
This change does not make the Kafka consumer fully transactional
end-to-end. Buffers in vmagent/vminsert/vmstorage still imply possible
data loss on hard stops. However, it provides stronger guarantees than
auto-commit, since commits are based on TryPush rather than poll().
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10395
Keeping such byte slices in the pool may increase memory usage when processing a small share of requests
with much bigger sizes than the average processed request.
This should help reducing memory usage at https://github.com/VictoriaMetrics/VictoriaLogs/issues/1042
Respect the default value of http.DefaultTransport.Proxy. Previously,
it could be unintentionally overridden with a nil value.
This commit aligns Proxy configuration across all created transports.
Previously, default `Proxy` was unconditionally replaced with config value, which could be nil.
It made impossible to use default http client proxy env variables.
This commit adds check in oauth http client builder that only overwrites the
transport proxy if a custom proxy url function is defined.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10385
Bug was introduced at 5a587f2006, while porting change from cluster branch.
This commit properlyslice `mms
[]metricsmetadata.Row` slice . Previously, every WriteMetadata call triggered a
slice allocation.
This shouldn't significantly impact overall performance, so I haven't
included benchmarks.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10392
This should reduce cpu utilization while still removing the storage
connection saturation.
Follow-up for 6bc809813b (#10346)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This is needed in order to allow using lib/pushmetrics for vmctl as it does not use go native flags.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
app/vmctl: add metrics for the migrations
- add flags to allow setting up metrics push
- add metrics to track progress of the migration for all modes
- add metrics for generic backoff and limiter packages
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The current implementation has a bottleneck – a single mutex to access
`prev`/`next` metric sets. Each rotation results in storage utilization
spikes since lock-free `curr` is almost empty, and cache needs to
promote metrics from `prev` to `next`.
This is an attempt to reduce contention by spliting cache into separate
shards.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10367
This commit adds a new flag `disableScheduledBackups` for `vmbackupmanager. Which disables any scheduled backups. It could be useful to keep vmbackupmanager running and serving API calls only.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10364
The new `ALERTS_FOR_STATE` may be retrieved during restore when:
1. a group contains multiple heavy rules, alerting rule A may have
already been executed and its state metrics successfully uploaded to the
datasource by the time all rules within the group have finished
executing;
2. the datasource makes data queryable very quickly, for instance, when
users configure a small value for `-search.latencyOffset`.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10335
Previously, running a vmalert with an empty notifier.url does not produce an error and leads to vmalert which will never send a notification successfully.
This commit properly validates notifier.url empty value.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10355
After changing the scrape interval from a smaller value (e.g., 30s) to a larger value (e.g., 60s), the changes() function starts to yield non-zero values even when the underlying values have not changed.
This commit keeps unchanged series values when a large gap occurs between samples or when the scrape interval decreases.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280
Commit 83da33d8cf introduced a check to
detect directories partially removed via IsPartiallyRemovedDir.
However, the check was performed using the full path, while de.Name()
returns only the current entry name (without the path). As a result, the
check always succeeded and the function did not behave as intended.
This flag allows disabling the mincore() syscall introduced in
50fc48ac47. On older ZFS filesystems,
mincore() may trigger a bug related to ZFSÕs own in-memory cache. Mixing
reads from mmap()ed files and direct disk reads can corrupt the ZFS ARC
cache and lead to data read corruption.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10327
Commit f62893c151 added an attempt to fix
stats for `tagFiltersCache`, `metricIDCache`, and `dateMetricIDCache`.
Instead of aggregated stats, it returned the largest cache stats by
cache size.
This resulted in possible counter decreases for counter metric types. It
made aggregated metrics less usable.
This commit changes cache stats aggregation by metric type:
* size-related gauge metrics are returned based on max cache size usage
* metric counters are reported as a sum of all counters
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10275
### Describe Your Changes
In the Cardinality Explorer, when filtering, a "Percentage from total"
stat appears. This stat is documented as "the share of these series in
the total number of time series".
This works for pages for individual metrics. However, if using a filter
that returns *multiple* metrics, the value of "Percentage from total"
will only account for the size of the *first* metric. One can have a
filter that returns, say, 10k time series (out of, say, 100k in the VM
cluster), and if the first metric returned has 1k time series, then
"Percentage from total" will show 1%, not 10%.
This PR fixes that calculation.
Credits to @PleasingFungus for the original fix (PR #10288).
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: PleasingFungus <PleasingFungus@users.noreply.github.com>
Commit e35a9a366c changed the order of wg.Add calls in the Graphite transform package. Previously, all wg.Add calls were made upfront, but after that change it became possible for wg.Wait to exit earlier than expected.
This commit fixes the issue by spawning all background goroutines first and starting the goroutine that calls wg.Wait afterward.
Previously vmagent had remoteWrite.queues as a global setting that was be applied to every persistentqueue. However, it could be useful to specify remotewrite.queues per remotewrite.url.
Considering each rw might have different workload(latency, throughput, and availability), so it will be more flexible for tuning if we can set remoteWrite.queues separately for specific rw.
This commit, makes `-remoteWrite-queues` configurable per remoteWrite.url.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10270
Initial implementation of searchTenantsOnDate used a index scan for the given prefix (index prefix + tenant + date).
It did not check whether the date prefix was actually outside the current date.
This commit adds the missing date check and makes the tenant search results accurate.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10295
These metrics must be incremented when the request couldn't be processed because of the configured per-user concurrency limit.
The commit 76176ac1d3 moved the counter increase to the place when the current request
is put in the wait queue because of the concurrency limit is reached. This is incorrect, since such requests
can still be successfully processed during -maxQueueDuration . This also contradicts the docs at https://docs.victoriametrics.com/victoriametrics/vmauth/#concurrency-limiting
There is a small practical sense in counting the number of times the concurrency limit is reached,
while the request is successfully processed during the -maxQueueDuration after that.
Add missing alerting rule for rejected unauthorized requests because of the concurrency limit.
Add missing grouping by instance for per-user counter of rejected queries because of the concurrency limit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
This simplifies troubleshooting by investigating the vm_log_messages_total metric
when logs are unavailable. The logs may be unavailable when the -loggerLevel command-line
flag is set to value other than INFO. The logs may be unavailable when clients
use Monitoring of Monitoring service ( https://victoriametrics.com/products/mom/ ),
which provides metrics, but doesn't provide logs from VictoriaMetrics components
running at the client side.
Add `is_printed` label to the `vm_log_messages_total` metric in order to detect whether
the given log has been suppressed or printed.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10304
While at it, make more readable the description for the TooManyLogs alert,
which is based on the vm_log_messages_total metric.
Also return back the `level!="info"` instead of `level="error"` filter
in the query for this alerting rule, in order to be consistent with queries
at the official dashboards for VictoriaMetrics components.
TODO: investigate too high warnings rate at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2760
and fix it at the source of these warnings instead of modifying the query
for the TooManyLogs alert.
Add fallback to legacy indexDB for stats search
After introducing the new partition index
(f97f627f79), storage stopped returning
stats for date ranges outside the partition index. This made the
migration backward incompatible, as there was no way to retrieve stats
for dates prior to the migration.
This change adds a fallback to the legacy indexDB search when the status
search on the current partition index returns zero series.
This is an imperfect solution: due to tag filters, the TSDB status
search may legitimately return empty results. However, the additional
overhead is small and acceptable.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10315
The vmanomaly has been moved to a separate repository. This means that the functionality related to vmanomaly is no longer needed in the app/vmui located in the VictoriaMetrics repository.
This commit removes all the functionality and unnecessary abstractions related to vmanomaly from the app/vmui repository. This should help improving long-term maintenance of the code.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9755
Wait for the first byte from the reader passed to ReadUncompressedData()
before obtaining concurrency token from -maxConcurrentInserts and before allocating
buffers needed for reading the request body in memory.
This should limit the amounts of memory needed for processing a big number of concurrent
HTTP requests via Prometheus remote_write protocol and via other HTTP-based data ingestion
protocols where every request contains a single block of data to process.
Now the maximum memory usage is limited by -maxConcurrentInserts, while the server
can process much more than -maxConcurrentInserts concurrent HTTP requests by pausing the excess requests.
Previously the memory usage wasn't limited by -maxConcurrentInserts, since buffers for reading the data
from concurrent connections were allocated before obtaining the concurrency token from -maxConcurrentInserts.
While at it, use protoparserutil.ReadUncompressedData() in lib/protoparser/promremotewrite/stream.Parse()
for the sake of consistency across parsers for protocols, which send the full block of data per every incoming HTTP request.
This is a follow-up for the commit d107dee9c7
This is follow-up for c5713a09d3
Originally, dateMetricID cache was fully rotate every 20 minutes. It
made daily-index pre-creation less efficient and caused CPU usage spikes
for index records lookup at midnight.
storage pre-fills index records for the next day in 1 hour before night.
But this rotation made only last 20 minutes before midnight visible in
the cache.
This commit changes rotation period from 20 minutes to 2 hours ( 1 hour
tick interval). While
it could slighlty increase cache memory usage ( in practice it shouldn't
be noticeable). It prevents from CPU usage spikes.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10064
Issue was introduced at d6ef8a807b commit.
Due to variable shadowing, if filter matched more than 100_000 metricIDs, it's fallback to the indexDB scan.
But because of type, `filter` value was not properly updated. And it triggered incorrect results.
This commit fixes this typo and adds test to verify this case.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10294
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10196
Prefer the non StaleNaN value when both StaleNaN and non-StaleNaN samples share the timestamp during deduplication(downsampling). The scenario can occur when:
1. Multiple vmagent instances scrape the same target(without -promscrape.cluster.name flag), one instance fails to scrape due to issues such as network, while others succeed.
2. Multiple vmalert instances evaluate the same recording rule, with one instance receiving a partial response while others receive a complete response.
In both cases, since the samples share the same timestamp and represent the metric state at that moment,
the non-StaleNaN value is entirely valid, whereas the StaleNaN could be caused by other unknown issues.
Therefore, it is reasonable to prioritize the non-StaleNaN value.
Previously, the first 20 raw samples were used for calculation.
But compare to the first 20 samples, the last 20 samples represent the latest state of the metrics,
so the lookbehind window calculated from them should be more accurate when applied to the most recent samples,
resulting in better query results for recent time ranges.
For example,if the scrape interval changes at day4, and the query range is set to last 7 days.
Applying the window derived from the first 20 samples(the old scrape interval) to new samples could result in consistently incorrect results from day4 through day11.
Conversely, applying the window derived from the last 20 samples (the new scrape interval) could lead to incorrect results for [day0-day4),
which are old states and generally less important.
This pull request does not address any specific bug, but change the general behavior, so there is no changelog.
Inspired by https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280, but not the fix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280.
This commit reduces default value for
`storage.vminsertConnsShutdownDuration` flag from `25s` to `10s`
seconds.
This change should help to reduce probability of ungraceful storage
shutdown at Kubernetes based environments, which has 30 seconds default
graceful termination period value (terminationGracePeriodSeconds).
Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10063
* Fixed a heatmap crash that happened when all visible cells had the
same value (division by zero produced invalid color indices).
* Improved how histogram buckets are chosen for display when the data is
very sparse, so the heatmap doesn’t look empty or drop the only
meaningful bucket.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10240
It is better to delete the snapshot, since the client is no longer interested in it.
This should prevent from creating many unused snapshots when clients cancel creating snapshots
because of timeouts. This is the real production case from one of VictoriaMetrics users:
the disk IO subsystem became very slow, so creating a snapshot took a lot of time, so vmbackup
was canceling creating the snapshot because of the timeout. But vmstorage was still continue
creating the snapshot. This resulted in the increasing number of created but unused snapshots.
Without measuring this, we have a blind spot. Exposing it as a metric
improves visibility and should save time during future debugging
sessions.
Inspired by review commit
c9596a0364 (r173621968)
indexDB mergeset has an edge for single item inmemoryBlock. It stores
such items blocks in-memory at blockheader firstItem. So there is no
need to perform on-disk read operations and storing copy of it at cache.
It also may result in incorrect search results, inmemoryBlock with a
single item has always zero index block offset. Which causes collisions
if it's cached with the next index block at part.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10239
Probably fixes
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10063
**Problem**
* VMUI had two tenant ID sources:
* URL path: `/select/<accountID>/vmui/`
* Query param: `tenantID`
* These could differ, causing confusion and inconsistent behavior.
**Solution**
* Removed the legacy `tenantID` query parameter.
* Use the URL path as the single source of truth for tenant ID.
* Changing the tenant in the UI now updates the URL path.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10232
### Describe Your Changes
The Backups errors panel uses a hard coded rate, when looking over a
large period of time this number would likely stay low do to the hard
coded rate when in reality the amount of errors is much larger.
This change addresses this by using the __rate variable in Grafana so
the rate will align with the date/time range in Grafana.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Call decConcurrency() inside Reader.Read() before calling the Read() at the underlying reader.
This reduces chances of improper use of the writeconcurrencylimiter.Reader by callers.
While at it, move the creation of writeconcurrencylimiter.GetReader() to the top of stream parser functions
at lib/protoparser/* packages, and call incConcurrency() inside GetReader() call.
This reduces the frequency of decConcurrency() / incConcurrency() calls
for typical buffered reads when parsing the incoming data. This, in turn,
reduces the contention on the concurrencyLimitCh.
### Describe Your Changes
Previously, we had to manually update flags in documentation whenever we
made flag-related changes in source code. Someone did it by hand, others
compiled enterprise binaries, executed them with `-help` flag, and
copied and pasted output to the documentation.
In https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9632 a
command `make docs-update-flags` was introduced. It automates the whole
process. It compiles binaries, runs `-help,` and syncs output to changes
automatically.
Now, we can **omit updating doc flags in the PR** and do it once before
releasing a new version.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
In alerting rules, annotations are only attached to alert messages that
are sent to the notifier (such as Alertmanager). These annotations
typically contain human-readable information, such as instructions for
resolving the alert.
In [replay
mode](https://docs.victoriametrics.com/victoriametrics/vmalert/#rules-backfilling),
vmalert does not send alert messages to the notifier at all(no notifier
is configured), as these alerts are outdated. Therefore, it does not
need to template the annotations in this mode.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10262
which tracks the number of requests to the query rollup cache
As described in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10117, when
retrieving cached data from the rollup result cache, there can be mixed
`get()` and `getBig()` calls to the underlaying fastcache. And it's
unpredictable how many times `getBig()` will call `get()`, so the
metrics from fastcache cannot be used to indicate query cache miss
ratio.
Exposing a new counter `vm_rollup_result_cache_requests_total` to track
the number of requests to the query rollup cache, together with the
existing `vm_rollup_result_cache_miss_total`, allows for monitoring the
rollup cache miss rate per query (or subquery), which is more
user-facing.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10117
related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5056
Previously VictoriaMetrics: Single-node version used
`http://<victoriametrics-addr>:8428/zabbixconnector/api/v1/history`
resulted in a missing path error. The issue was introduced during changes back-porting from vmagent.
Additionally, the http response was fixed. Zabbix expects a 200 status
code during normal operation.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10214
Since the cache may be reset too often, using the sizeBytes as an
indicator that this is the first met indexDB to collect tfssCache stats
is unreliable because it often can be zero all indexDB instances. Use
Requests metric instead because it is never reset.
Follow-up for #10204.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Since the first connection is not closed, the vmstorage will never
terminate gracefully which will cause the reset of all caches on the
start-up.
Follow-up for 244769a00d (#10136)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
The `err` may contain information about request cancelation performed by the server code.
In such cases the error must be logged. The error must be ignored only if the client canceled the request.
This is a follow-up for the commit c9596a0364
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
This is to debug cases when metric name tracker resets the tsid cache
after restart. It could be due vmstorage not having enough time to stop
gracefully. Logs should provide this info.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This fixes the following corner case: if all instances of a cache have
zero size, the stats won't be set at all. This results in some weird
graphs if the cache is reset very often (such as tfssCache): the cache
sizeMaxBytes alternates between the actual value and zero.
Follow-up for f62893c151
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Commit 5a587f2006 was not properly ported
to the single node branch. Since single node is able to perform both
promscrape and self-scrape, it's required to add metadata add methods to
those paths.
This commit fixes missing metadata add to the storage.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10175
Previously a short spike in the number of concurrent requests immediately led to `429 Too Many Requests` errors
when the number of concurrent requests exceeds -maxConcurrentRequests or -maxConcurrentPerUserRequests.
This commit allows processing short spikes in the number of concurrent requests during the -maxQueueDuration timeout.
The requests are rejected only if they couldn't be served accroding to the concurrency limits during the -maxQueueDuration.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10112
- Introduce backendURLs struct, which holds all the backend urls and allows stopping
all the health checkers across all the backend urls with a single call to backendURLs.stopHealthChecks().
- Immediately cancel the pending Dial call to the backend when backendURLs.stopHealthChecks() is called.
Use lib/netutil.Dialer.DialContext() for this.
- Replace a fragile closing of stopHealthCheckCh channel via stopHealthCheckOnce.Do()
with easier to maintain call of cancel() func for the corresponding healthChecksContext.
- Wait until health checker goroutines are finished before return from UserInfo.stopHealthChecks().
Previously the health checker goroutines could run for some time trying to dial the backend
after the return from UserInfo.stopHealthChecks().
- Try dialing the broken backend for https urls. It is better if the broken backend logs the error
instead of routing client requests to the broken backend.
- Log dial errors to the broken backend, so users could troubleshoot the backend connectivity issue with more details.
- Refer the correct issue - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997 -
in the comments explaining why periodic dialing of the broken backend is needed.
Previously the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890 was incorrectly referred.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10147
follow up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10177
Add `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` which track
the number of user request errors, and aligned with
`vmauth_user_requests_total`.
The existing `vmauth_http_request_errors_total` currently only counts
requests with `invalid_auth_token`. Once authorization has passed, any
subsequent request errors are tracked under
`xxx_user_request_backend_requests_total`.
This commit introduces the global `sampleLimit` setting to restrict the number
of samples accepted per scrape target, mirroring the behavior of
Prometheus.
Motivation:
1) The existing `-promscrape.seriesLimitPerTarget` flag currently takes
precedence over any `sample_limit` setting defined directly on the
scrape target. The new `sampleLimit` implementation ensures that the
target configuration is able to override the global setting, allowing
users to define specific limits per target.
2) The existing series limit flag uses memory-intensive Bloom filters,
resulting in high RAM consumption under high-cardinality scraping
scenarios. The `sampleLimit` provides a much simpler, low-overhead
alternative.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10145
The encoding.DecompressZSTD* consistently updates the vm_zstd_block_decompress_calls_total metric.
Also make the follwing improvements after the commit 10f7cd2ffc:
- Add encoding.DecompressZSTDLimited() function and use it instead of zstd.DecompressLimited,
so it properly updates vm_zstd_block_decompress_calls_total metric.
- Clarify description for the encoding.DecompressZSTD* and zstd.Decompress* functions.
Currently, `dateMetricIDCache` is reset when it is full and it is never
reset is not full but the data it stores is no longer needed. This leads
to the following problems:
- During regular data ingestion the cache sizeBytes may exceed max
allowed size and the cache gets reset which may potentially slow down
data ingestion (see #10064)
- The cache is per-indexDB. This means that in partition index (#8134)
there will be as many instances of this cache as the number of
partitions. If someone performs a backfill across all partitions, this
will fill all caches and they will never get reset even if no more
historical data is ingested.
So the solution is to periodically rotate the cache. After first
rotation the data is not deleted but moved to `prev` storage. After
second rotation `prev` gets deleted. This gives the cache an opportunity
to restore the `prev` data if it is still in use. Based on #10167.
This PR also removes the introduced recently introduced
`-storage.cacheSizeIndexDBDateMetricID` flag (see #10135). This should
be safe since it is new and its use case is very niche, i.e. no one
would really use it.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
We have `vmauth_user_requests_total` and
`vmauth_unauthorized_user_requests_total` to track requests from the
user side. However, in scenarios such as request timeouts or when the
response code matches `retry_status_code`, a single request may be
retried across multiple backends.
Exposing counters `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` that track the
number of requests sent to backends provides insight into the routing
logic and can help identify if requests are being consistently retried,
which may contribute to increased request duration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10171
Currently, backendErrors may be counted twice if a request to the
backend fails due to context.DeadlineExceeded.
9bc7a17d80/app/vmauth/main.go (L328)9bc7a17d80/app/vmauth/main.go (L294)
And we increment this counter in a way that is somewhat inconsistent.
Given that the counter's name is `xx_request_backend_errors_total`, it
should only increase when a backend request returns an error. This value
can exceed the user request error count if multiple backend requests
fail for a single user request.
The `xxx_request_backend_errors_total` counter should be used in
conjunction with the `xxx_request_backend_requests_total` introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10171.
There is no reason to send a request to the first backend if all
backends are marked as broken.
Also,
>// getFirstAvailableBackendURL returns the first available backendURL,
which isn't broken.
The fix only skips a redundant request when all backends are
unavailable, it doesn't introduce any changes from user's perspective,
so I skipped changelog.
When the time series deletion is performed some of the storage caches
need to be reset but some not. This PR reviews all storage caches and
documents why there are reset or not and also places all the resetting
logic (and comments) in one place.
### Describe Your Changes
Previously, a backend was considered healthy as soon as its
'bu.brokenDeadline' deadline expired, even if it was still unavailable.
This caused avoidable request failures and retries.
Now vmauth performs a TCP dial (1s timeout) before restoring the backend
to the healthy
pool. This avoids routing traffic to backends that are still down.
The dial check also covers cases where a route to the backend cannot be
resolved. Without this check, user requests would hang until the
connection timeout, leading to long waits
or errors. The new check fails fast and doesn't impact real user
requests.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Return "too big response size" error only for responses bigger than c.maxScrapeSize
(this option can be set either via max_scrape_size option inside scrape config
or via -promscrape.maxScrapeSize command-line flag).
Previously responses with sizes equal to c.maxScrapeSize were incorrectly rejected.
This should prevent from the excess memory usage because of dangling source byte slices
referred by decoderContext.ls.Labels.
This is a change similar to 63a68edb05
The tagFilters loop cache is per-indexDB which means that currently
there are two instances, one for idbCurr and one for idbPrev. When the
partition index (#8134) is released, there will be as many instances of
this cache as there will be partitions.
The cache is implemented using workingsetcache. Which occupies at least
30MB even when unused. Given that only the latest indexDB is used most
of the time, a lot of memory can be wasted.
Therefore the cache implementation is changed to lrucache because it
does not consume memory when it is unused and also has timeout-based
eviction.
This is a follow-up for 4cd727a511
(https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10072).
Right now we have two separate panels: RSS memory % usage and RSS
anonymous memory % usage. This makes trend comparison difficult because
one have to visually correlate two independent panels. Another problem
is that these panels don't show Go runtime allocations at all. The same
applies to memory allocated in C. There are allocations in C (zstd) one
should account for but there is no even a metric to expose it.
The commit adds Memory usage breakdown panel into Drilldown section. It
provides insight into Go Stack, Go Heap, Go Heap Released, Go Other,
Mmap: VM Cache, File cache memory distribution
It should help spot trends changes in memory by type or invistigate
issues such as #10069 and #10028 easier.
Panel info:
This panel shows memory usage by category.
How to use:
- Start from the high-level RSS panel.
- Identify an instance with unexpected or abnormal memory growth.
- Filter to that instance to inspect the detailed breakdown here.
Interpretation
- A steadily rising Go Heap usually indicates a memory leak. Collect
pprof memory profile.
- A growing Go Stack commonly points to a goroutine leak.
<img width="1508" height="628" alt="Screenshot 2025-12-08 at 13 18 44"
src="https://github.com/user-attachments/assets/0e794324-e86d-468e-b926-8bb11f5a2043"
/>
<img width="1503" height="674" alt="Screenshot 2025-12-08 at 13 19 34"
src="https://github.com/user-attachments/assets/62fc3fff-33b3-4dfe-ad3f-ad0526a8a606"
/>
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10139
Adjust slowness-based rerouting logic.
Rerouting now occurs only from the slowest node, and only if the cluster
as a whole has enough available capacity to handle the additional load.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890
The following dmc metrics were given standard names, i.e.:
- vm_date_metric_id_cache_resets_total became
vm_cache_resets_total{type="indexdb/date_metricID"}
- vm_date_metric_id_cache_syncs_total became
vm_cache_syncs_total{type="indexdb/date_metricID"}
This change should be safe since these metrics are currently not used in
VictoriaMetrics Gragana dashboards.
Additionally, other cache metrics were organized within the code so that
each metric has the same order.
This commit optimizes the performance of the storage by improving the utilization of persisted hourMetricIDs cache to avoid redundant indexDB lookups after vmstorage restart. The change refactors the hour-based cache checking logic using a switch statement to handle multiple hour scenarios more efficiently.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10114
- Remove mentions of `Caches` section in Grafana dashobards since this section does not exist anymore.
- Rewrite a bit the description of cache panels in Troubleshooting section.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
- Add more metrics to the protobuf to parse.
- Measure scan speed of the original protobuf at bytes/sec. Previously the number of ParseStream() calls per second was measured.
`tagFiltersCache` and `dateMeticIDCache` are now per-indexDB. Currently
we have 2 instance of indexDBs (prev and curr) and therefore 2 instances
of each cache.
When the storage stats is collected, the stats of individual caches is
added together. For example, is the `sizeMaxBytes` of each
tagFiltersCache is `100MB` and the `sizeBytes` of each instance is
`10MB` and `99MB`, then the resulting stats will be `sizeMaxBytes ==
200MB, sizeBytes == 109MB`.
While this is accurate, this stats hides a potential problem. It says
that the cache utilization is slightly above `50%` (109/200) and
everything seems to be okay. But in reality one of the caches is
utilized by 99% and soon will start evicting existing records to make
room for new ones, potentially slowing down the data retrieval. Ops
won't see it and will not take necessary action.
The solution is to report stats only for one instance of cache whose
utilization is the highest.
Alternatives considered:
- #10123. Might work, but breaks the encapsulation and can potentially
be slower
- Do not aggregate the stats and report is per-indexDB. This increases
the number of metrics and makes it dependent on the number of indexDB
instances (which can be many once #8134 is released).
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8134
This reverts commit dbe71700b5.
tsidCache is persistent and must be reset before deletedMetricID records
are added to the index. THis is needed to handle ungraceful shutdowns
properly.
A number of changes to `dateMetricIDCache` stats and configuration:
1. Export `SizeMaxBytes` metric and make the size configurable via a
flag
2. Fix `EntriesCount` and `SizeBytes` stats. Previously the cache
reported this stats for its immutable part only. Whereas there are cases
when the number of entries in its mutable part is comparable with the
number in immutable part. The stats from the mutable part remains
invisible until it is sync'ed to the immutable part. It is also possible
that the cache gets reset after the sync because the cache size exceeds
the max allowed size. Reporting the stats for both mutable and immutable
parts should provide a clear picture of the cache utilization.
Together, SizeBytes and SizeMaxBytes should enable tracking the cache
utilization properly. And take appropriate actions if necessary (such as
adjusting the memory resources and/or cache size limit via a flag).
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10064
After closing last connection to vminsert, vmstorage will still wait for
an interval, causing actual shutdown time will be always longger than
configurations.
This commit just skip the last sleep
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10136
Right now we have two separate panels: RSS memory % usage and RSS
anonymous memory % usage. This makes trend comparison difficult because
one have to visually correlate two independent panels. Another problem
is that these panels don't show Go runtime allocations at all. The same
applies to memory allocated in C. There are allocations in C (zstd) one
should account for but there is no even a metric to expose it.
The commit adds Memory usage breakdown panel into Drilldown section. It
provides insight into Go Stack, Go Heap, Go Heap Released, Go Other,
Mmap: VM Cache, File cache memory distribution
It should help spot trends changes in memory by type or invistigate
issues such as #10069 and #10028 easier.
Panel info:
This panel shows memory usage by category.
How to use:
- Start from the high-level RSS panel.
- Identify an instance with unexpected or abnormal memory growth.
- Filter to that instance to inspect the detailed breakdown here.
Interpretation
- A steadily rising Go Heap usually indicates a memory leak. Collect
pprof memory profile.
- A growing Go Stack commonly points to a goroutine leak.
indexDB has 3 block caches. These caches export metrics. Storage
collects these
metrics for each indexDB it has (currently prev and curr only).
There is a potential problem:
- These caches are shared by all indexDBs
- Each indexDB reports the block cache metrics.
- Storage collects the metrics of all indexDBs by adding them together.
I.e. it is possible to count block cache metrics several times.
It is not the case in current implementation because the addition of the
metrics
is not performed intentionally.
The added unit test 1) demonstrates that the resulting counts are
reported
correctly and 2) protects from future unintentional changes in this
behavior.
Additionally a code comment is added to explain why block cache metrics
are not summed up.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, influx parser allocated a new slice byte for
unescape of Row fields. It adds extra pressure at GC and increases CPU
usage.
This commit changes escape to in-place updates for provided []byte.
Since request for parsing is actually a []byte converted into the
string, it's safe to update it in-place. To be able to interact with
[]byte directly, this commit changes parser API and accepts []byte
instead of string.
Benchstat:
```
│ before │ after │
│ sec/op │ sec/op vs base │
RowsUnmarshalUnescape-10 74.68n ± 4% 54.23n ± 5% -27.38% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 40.41n ± 2% 42.59n ± 1% +5.39% (p=0.000 n=10)
geomean 54.93n 48.06n -12.51%
│ before │ after │
│ B/s │ B/s vs base │
RowsUnmarshalUnescape-10 1.035Gi ± 4% 1.425Gi ± 5% +37.72% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 1.613Gi ± 2% 1.531Gi ± 1% -5.11% (p=0.000 n=10)
geomean 1.292Gi 1.477Gi +14.32%
│ before │ after │
│ B/op │ B/op vs base │
RowsUnmarshalUnescape-10 149.00 ± 0% 96.00 ± 0% -35.57% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 80.00 ± 0% 80.00 ± 0% ~ (p=1.000 n=10) ¹
geomean 109.2 87.64 -19.73%
¹ all samples are equal
│ before │ after │
│ allocs/op │ allocs/op vs base │
RowsUnmarshalUnescape-10 5.000 ± 0% 1.000 ± 0% -80.00% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
geomean 2.236 1.000 -55.28%
```
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10053
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously, if user defined value for `memory.allowedBytes` flag
exceeded system memory limit, remaining memory could take negative
value. It results into incorrect memory auto-detect calculations for
various components. Such as vmstorage unique timeseries limit and parts
size.
This commit adds negative value check. And also logs system memory
limit at start-up of vm components.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10083
Previously, in order to cache results for `rate`, we consider
`rate(m[d])` as `(increase(m[d]) / d)` and cache the `increase` result.
However, in MetricsQL, `rate(d) = (lastValue - firstValue) /
(lastTimestamp - firstTimestamp)`, so it does not equal to
`increase(d)/d` if `d != (lastTimestamp - firstTimestamp)`.
Although the issue primarily arises when the time series samples are not
continuous, but the discrepancy is hard to debug and can be confusing to
users. Because the range query doesn't use this optimization, causing
recording rule results to
differ from raw query results in VMUI.
Therefore, it is better to disable the usage and only enable it when we
can cache it correctly.
fixes https://github.com/VictoriaMetrics/victoriaMetrics/issues/10098
As indexDBs became independent from each other, the tsidCache is now
reset more than once when the DeleteSeries() operation is performed. But
it needs to be performed only once. Thus, move the deletion from indexDB
to Storage.
Follow-up for 16d75ab0bd.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10119
There’s no need to call `c.Reset()` for rollup result cache if it’s not
persisted(`-cacheDataPath` not specified) or has already been cleared by
`-search.resetRollupResultCacheOnStartup`, as it is already newly
created.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10095
Previously vmgateway didn't handle http.Abort error.
It could lead to the unexpected panic at webserver.
This commit adds panic recover and prevent app from crash.
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md">js-yaml's
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.1] - 2025-11-12</h2>
<h3>Security</h3>
<ul>
<li>Fix prototype pollution issue in yaml merge (<<)
operator.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="cc482e7759"><code>cc482e7</code></a>
4.1.1 released</li>
<li><a
href="50968b862e"><code>50968b8</code></a>
dist rebuild</li>
<li><a
href="d092d86603"><code>d092d86</code></a>
lint fix</li>
<li><a
href="383665ff42"><code>383665f</code></a>
fix prototype pollution in merge (<<)</li>
<li><a
href="0d3ca7a27b"><code>0d3ca7a</code></a>
README.md: HTTP => HTTPS (<a
href="https://redirect.github.com/nodeca/js-yaml/issues/678">#678</a>)</li>
<li><a
href="49baadd52a"><code>49baadd</code></a>
doc: 'empty' style option for !!null</li>
<li><a
href="ba3460eb9d"><code>ba3460e</code></a>
Fix demo link (<a
href="https://redirect.github.com/nodeca/js-yaml/issues/618">#618</a>)</li>
<li>See full diff in <a
href="https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Previously, context cancellation was ignored when reading user response
for the prompt. That leads to ignoring of "Ctrl+C" and other termination
signals to vmctl until user finishes the input.
Fix that by properly propagating the context and respecting the
cancellation of the context.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Add more S3 configurations.
- SSES3KeyID allows to push to a bucket that is another account as the
KMS key it uses to encrypt data server side.
- ACL allows configure which permissions are given to the object
uploaded on the bucket (usefull when bucket policy expect a given
permission such as `bucket-owner-full-control`).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Some influx clients ( such as nimon monitoring client) adds excess white spaces in the influx line and does not set a
timestamp. Since Influx protocol requires whitespace before timestamp only when it set, it could present without timestamp. Whitespace before omitted timestamp confuses parser.
This commit adds check for the skipped timestamp and test case for it.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10049
Previously, proxy vmselect (aka 1st level vmselect) performed parsing
of MetricBlock received from vmstorage before forwarding it into top vmselect. It required an additional CPU and Memory, which greatly slowed down query requests.
This commit changes lib/vmselectapi iterator API, instead of MetricBlock, it returns encoded MetricBlock as a byte slice.
It allows to save CPU and memory at proxy vmselect by eliminating need of decoding MetricBlock received from storage.
In addition, it adds the following optimizations for proxy vmselect:
* reduces memory allocations by using iterator pool
* add per storageNode workerItem for iterator
Also, it adds optimization for vmstorage, it no longer performs extra memory copy of MetricName for MetricBlock.
vmselect and vmstorage metrics vm_vmselect_metric_rows_read_total and vm_metric_rows_read_total were removed, it's not used at any dashboards and rules. New Iterator API doesn't support it.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9899
The purpose of this PR is the same as #10000, except `lrucache` is used
for implementing tfss cache.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
revert change, that was introduced in
483e00ffb9
since rendering of all nested children significantly impacts alerting
tab performance in case of multiple items
@Loori-R @arturminchukov , what do you think about using react-virtuoso
additionally for alerting tab to decrease dom size?
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
A throttled logger will continue to log messages occasionally with a
suffix indicating how many similar logs were throttled. Using the same
logger for multiple log messages can result in certain logs being
entirely suppressed and invisible in the logs. This updates most of the
loggers used in `appendFromScopeMetrics` to be their own logger so that
"unsupported delta temporality/metric type" logs will be visible for all
metric types. Additionally, `skippedSampleLogger` is only used by
`appendSamplesFromHistogram` so this was moved closer to that function.
Related to #9447
Related to #9498
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Go runtime executes all the goroutines on GOMAXPROCS operating system threads.
Go runtime cannot switch the OS thread to another goroutine if the current goroutine
is stuck in the major pagefault while reading the data from memory-mapped file,
because Go runtime doesn't distiguinsh between reading from regular memory and reading
from memory-mapped file. So the OS thread becomes stuck while waiting until the OS
reads the data from file at the requested memory address and returns back control to Go application.
In the worst case it is possible that all the GOMAXPROCS threads are stuck in major pagefaults,
so Go runtime pauses executing all the goroutines. This state is possible in environments
with small GOMAXPROCS and high-latency disks such as NFS or small HDD-based disks at AWS.
See https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d for more details.
This commit protects from such stalls by verifying whether the given memory location from memory-mapped file
is already loaded in the OS page cache before reading from that memory.
If the location isn't in the OS page cache, then it falls back to pread() syscall for reading the data from file.
Go runtime allocates extra OS threads for long-running syscalls, so it can continue executing goroutines
across all the GOMAXPROCS threads while reading the data from slow storage via pread() syscall.
This commit uses mincore() syscall for detecting whether the given memory page is available in the OS page cache.
It also caches mincore() results for up to a minute in order to reduce the overhead for the mincore() syscall.
This commit reduces the increase rate for the process_major_pagefaults_total metric by multiple orders of magnitude
on systems with high-latency disks.
Currently, `lrucache.Cache` `SizeBytes()` and `SizeMaxBytes()` return
type is `int`. The cache `Entry.SizeBytes()` also returns `int` value.
Changing the type to `uint64` will allow using `uint64set.Set` as the
cache entry type (see #10072).
Please note that using `uint64` regardless the cpu architecture is set
is not entirely correct, because in 32-bit systems the size won't ever
get bigger than `2^32`, so the `uint64` will too much. However current
type (`int`) is not correct either since it is signed and will only
allow to store values up to `2^31`. Alternatively, all `SizeBytes()`
methods should return `uint`.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
….maxConcurrentRequests error
If `vmstorage` is currently overloaded it could return
maxConcurrentRequests error. Now `vmselect` immediately fails the whole
request even if `replicationFactor` is set up and other replicas could
respond without errors.
This PR treats them as regular errors, not fatal ones.
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.43.0 to 0.45.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4e0068c009"><code>4e0068c</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="e79546e28b"><code>e79546e</code></a>
ssh: curb GSSAPI DoS risk by limiting number of specified OIDs</li>
<li><a
href="f91f7a7c31"><code>f91f7a7</code></a>
ssh/agent: prevent panic on malformed constraint</li>
<li><a
href="2df4153a03"><code>2df4153</code></a>
acme/autocert: let automatic renewal work with short lifetime certs</li>
<li><a
href="bcf6a849ef"><code>bcf6a84</code></a>
acme: pass context to request</li>
<li><a
href="b4f2b62076"><code>b4f2b62</code></a>
ssh: fix error message on unsupported cipher</li>
<li><a
href="79ec3a51fc"><code>79ec3a5</code></a>
ssh: allow to bind to a hostname in remote forwarding</li>
<li><a
href="122a78f140"><code>122a78f</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="c0531f9c34"><code>c0531f9</code></a>
all: eliminate vet diagnostics</li>
<li><a
href="0997000b45"><code>0997000</code></a>
all: fix some comments</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/crypto/compare/v0.43.0...v0.45.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Describe Your Changes
fix#9987
Avoid blocking when a connection to `-opentsdbListenAddr` doesn't send
any data. This issue blocked other connections from being handled.
> This bug can be tested with:
> 1. Start VictoriaMetrics Single-node with `-opentsdbListenAddr=:4242`.
> 2. Run: `telnet 127.0.0.1 4242` without typing any data after
connection established.
> 3. Run (in another terminal, after step 2): `curl -H 'Content-Type:
application/json' -d
'{"metric":"x.y.z","value":2222222.34,"tags":{"t1":"v1","t2":"v2"}}'
http://localhost:4242/api/put`
>
> Before the change:
> - Step 3 was blocked infinitely.
>
> Expect result after the change:
> - Step 3 was executed.
> - Connection established by step 2 will be closed after 5 seconds.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Clarified the index size note in
docs/guides/understand-your-setup-size/README.md to steer readers toward
the FAQ when indexdb feels oversized, noting typical ratios and
troubleshooting guidance.
- Fix comment
- Re-use dst instead introducing a new variable.
This change has been requested to be in a separated PR during the
pt-index (#8134) code review.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Currently, when a partition is created its corresponding parts.json file
is not created right away (see createNewParition()). Its creation is
delayed until the first part files are created on disk (see
swapSrcWithDstParts()). However, the parts.json file is created for a
possibly empty partition when an existing partition is opened (see
mustOpenPartition()) and when a partition snapshot is create (see
MustCreateSnapshotAt()).
I.e. `parts.json` is an important part of a partition, since it is an
artifact that describes the partition contents. And it should be created
on pt creation even if its contents is empty.
To be honest, this change is mostly a no-op for the current storage
implementation. It only makes the code consistent, i.e. the parts.json
is created along with the partition.
However having it created when a partition is created becomes in
pt-index (#7599, #8134), because it allows having partitions with no
data and therefore without parts.json file. Still not a big deal but the
unit tests start failing.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
dateMetricIDCache does not belong to storage anymore since it has been
moved to indexDB. Instead moving the case to index_db.go, move it to a
separate file in order to navigate the code more easily.
No changes have been done to the code or tests.
Follow up for: #9983
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Alexander Frolov <9749087+fxrlv@users.noreply.github.com>
The data structure used for holding the nextDayMetricIDs is too complex
and can be simplified (flattened).
Follow up for: #9983
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
The change was introduced in pt-index PR (#8134) and is extracted into a
separate PR.
Currently used in partition_search and partition. If you see more places
like this, please let me know.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Looks like the `dateMetricIDCache` must be per indexDB:
- the use of this cache and `is.hasDateMetricID()` often go in pairs. So
it makes
sense to use this cache in that method.
- The same is true for `createPerDayIndexes()`: everytime the index
entry is
created, a corresponding entry is added to the cache.
- As a result the generation field is also removed from the cache.
Related to #7599 and #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This is very frequent question from new users of VcitoriaMetrcs who migrate from other solutions
with automatic data rebalancing among storage nodes, so it is a good idea to cover it in the docs.
Metrics metadata is loaded from a per-tenant storage map
(perTenantStorage map[uint64]map[string]*Row), so result rows order is
non-deterministic. The existing sortRows implementation only sorts by
metric name and ingestion time, which means rows that differ only by
tenant/account ID still sorted undeterministically.
This change updates `sortRows` to include account\project identifiers in
the comparison, ensuring stable and deterministic ordering for metadata
entries that share the same metric name and timestamp.
First discovered as flaky test:
--- FAIL: TestStorageRead (0.00s)
storage_test.go:337: unexpected rows get result (-want, +got):
[]*metricsmetadata.Row{
&{
... // 2 ignored and 1 identical fields
Help: "uselesshelp1",
Unit: "seconds1",
- AccountID: 1,
+ AccountID: 0,
- ProjectID: 1,
+ ProjectID: 0,
Type: 1,
},
&{
... // 2 ignored and 1 identical fields
Help: "uselesshelp1",
Unit: "seconds1",
- AccountID: 0,
+ AccountID: 1,
- ProjectID: 0,
+ ProjectID: 1,
Type: 1,
},
}
FAIL
https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/actions/runs/19361594138/job/55394642029#step:4:133
This commits adds storage part and cluster RPC methods for metrics metadata.
Key concepts:
* vmstorage persists metadata in-memory only.
* vmstorage evicts metadata records older than 1 hour.
* vmstorage stores only the last value of metadata for time series
metric name.
* vminsert opens an additional TCP connection to the vmstorage for
metadata write requests.
* vmselect doesn't support `limit_per_metric_name`.
This feature is available optional and must be enabled via flag - `-enableMetadata` provided to vminsert/vmsingle.
Fixes github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
vmstorage nodes work perfectly with one CPU core and even with 10% of a single CPU core
if the allocated CPU resources matches their workload.
It is better to recommend allocating the an interger number of CPU cores to vmstorage
in order to achieve an optimal performance, since vmstorage allocates internal resources
according to the available CPU cores. If there is a fractional number of CPU cores,
then the allocation of internal resources may be not so optimal.
Fractional number of CPU cores may also lead to increased latencies and stalls
because some P threads at Go runtime won't be able to run goroutines from their ready queues
in a timely manner becasue of the lack of CPU time. See https://victoriametrics.com/blog/kubernetes-cpu-go-gomaxprocs/
Too big values for the -maxConcurrentRequests command-line flag increase memory usage
and increase CPU overhead for processing incoming requests in most cases.
The only valid reason for increasing the value for -maxConcurrentRequests command-line flag
is when many clients send data to vmagent over very slow network.
Previously, zstd Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components. And in case of incorrectly formed zstd block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.
This commit makes ingest endpoints check frame content size and window size headers based on MaxRequest Limits.
For users, if an alerting rule has a misconfigured annotation, it's more
important to deliver the alert when the rule triggers rather than skip
it with templating error logs.
Then users can see the faulty annotation in alert message and fix it.
Note: the previous behavior is retained in replay mode because errors
there should be noticed immediately; hiding them could waste time,
resources and require a re-replay after fixes.
Also the rule's status in the vmalert UI remains unhealthy if templating
failed.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9853
In prometheus ecosystem, a label with an empty value equals no label,
since a query like `test{something=""}` matches all the series without
label `something`.
So for vmalert, preserving empty-value labels in generated alerts or
time series is unnecessary and can cause alert hash mismatches during
[restore](https://docs.victoriametrics.com/victoriametrics/vmalert/#alerts-state-on-restarts).
The empty-value label shouldn't come from datasource response since they
follow the same rule(omit empty-value labels), it may come from
`-external.label` or rule labels, but the empty value could be caused by
occasionally templating failures, which is hard to check there.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
This is a follow-up for the commit 1130adebad .
The EntriesCount, BytesSize and MaxBytesSize metrics must take into account the data
stored in both prev and curr caches, since this data occupies memory and it is expected
that the exposed metrics - vm_cache_entries, vm_cache_size_bytes and vm_cache_size_max_bytes -
take into account all the memory occupied by the corresponding caches.
The GetCalls, SetCalls, Collisions and Corruptions metrics must take into account stats
from the curr cache only, since the corresponding stats for the prev cache is already taken
during the rotation (when moving curr to prev and resetting the previous prev).
The Misses metric must take into account only misses in the prev cache, since these misses
mean that the given entry is missing the both the curr and the prev cache.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
While at it, make sure that the cache mode and cache stats is always read and updated under c.mu lock.
This may help resolving races similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9921
This reverts commit 89fd27c922.
Reason for revert: this commit adds scalability bottleneck in the fast path - Cache.Get() -
in the form of c.getCalls.Add(). This call doesn't scale on systems with big number of CPU cores,
since it needs to update atomically a shared memory from big number of CPU cores.
The Cache.Get() is called per every ingested sample when obtaining TSID by MetricName from the cache
at lib/storage.Storage.get(), so this can be a major bottleneck on systems with many CPU cores.
The solution for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
is to properly track cache requests and misses: cache requests must be taken into account
only at the curr cache, while cache misses must be taken into account only at the prev cache.
This will be implemented in the follow-up commit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
This reverts commit 994dadb4d5.
Reason for revert: the introduced metrics have zero practical applicability.
The lib/workingsetcache doesn't need manual tuning in most cases - its' size
is automatically adjusted to the given working set, if the working set is smaller
than the cache size limit set at the cache creation time. The limit just prevents
unbounded cache growth for large working sets.
If the working set exceeds the given limit, then the cache may become inefficient
because of the increased cache miss rate. The introduced metrics do not help determining
the needed cache size.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
If the cache cannot be saved to the given file, this is a fatal error.
It is better to log this fatal error inside Cache.MustSave() and then exit
instead of returning it to the caller. This makes the code more clear at the caller side.
It is expected that the number of TSIDs misses over the last 5 minutes is zero in steady state.
If it is non-zero, then something wrong happens. That's why it is better to use increase() instead of rate() function
for this alert.
This alert is expected after unclean shutdown (OOM, power off, kill -9) of VictoriaMetrics.
It should go away in a few minutes after the restart while VictoriaMetrics deletes metricIDs
for the missing MetricID->TSID entries which were created for the newly registered time series
just before unclean shutdown. It is OK to delete such metricIDs, since the corresponding time series
will be re-registered again. See the commit 20812008a7 .
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502
When one goroutine attemps to update the min timestamp under the lock it
could have been updated already by another goroutine with a smaller
timestamp. As a result the goroutine will update the timestamp with a
bigger value.
A simple unit test (included in this commit) demonstrates that.
Additionally, use a simple Mutex instead of RWMutex. RWMutexes only
introduce an unnecessary overhead for operations as simple as retrieving
a value from a map and regular Mutex should be preferred.
Thanks to @valyala for spotting a bug and the advice on RWMutexes.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This commit improves overall performance and stability of chart rendering,
refines time series generation, and fixes incorrect median calculation
in metric series.
JavaScript execution time improved by up to ×6 on large datasets.
**Changes:**
* Reworked `getTimeSeries` - one point per pixel.
* Added legend auto-collapse when >20 items.
* Switched median algorithm to Quickselect (Floyd–Rivest).
* Unified array stats functions (`min`, `max`, `avg`, `median`) into a
single pass.
* Removed unused `last` value from series.
* Renamed `roundToMilliseconds` to `roundToThousandths` and moved to
`utils/math`.
* Replaced `isSupportedDuration` with `parseSupportedDuration`, added
fractional duration support.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9699
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9926
When multiple service discovery configs of the same type exist (e.g.,
`hetzner_sd_config`), vmagent currently behaves as follows:
1. Attempts to request each config.
2. Exits immediately if any config returns an error.
3. Skips the rest configs and falls back to the previous service
discovery result.
The correct behavior—more compatible with Prometheus—should be:
1. Attempt to request each config.
2. Collect all valid results.
3. Use the valid results if there's at least one. otherwise (all
failed), fall back to the previous SD result.
Scrape example:
```yaml
scrape_configs:
- job_name: hetzner-default
hetzner_sd_configs:
- role: "hcloud"
authorization:
credentials: "some_valid_value"
- role: "hcloud"
authorization:
credentials: "some_wrong_value"
```
Expected outcome:
- At least targets from `credentials: "some_valid_value"` should appear
in the service discovery result.
current outcome:
- the error from `credentials: "some_wrong_value"` leads to an **empty**
result.
This issue should affect service discovery which using
`getScrapeWorkGeneric` function:
- `azure_sd_config`
- `consul_sd_config`
- `consulagent_sd_config`
- `digitalocean_sd_config`
- `dns_sd_config`
- `docker_sd_config`
- `dockerswarm_sd_config`
- `ec2_sd_config`
- `eureka_sd_config`
- `gce_sd_config`
- `hetzner_sd_config`
- `http_sd_config`
- `kuma_sd_config`
- `marathon_sd_config`
- `nomad_sd_config`
- `openstack_sd_config`
- `ovhcloud_sd_config`
- `puppetdb_sd_config`
- `vultr_sd_config`
- `yandexcloud_sd_config`
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9375
This is needed in order to detect and prevent cases of improper usage of partitions
while they are closed.
This is a follow-up for the commit 9725ee50ec .
### Describe Your Changes
Follow up on PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839, which
addresses review comment
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839#discussion_r2477729886
Alex:
```
this design decision isn't good, since it will lead to potential security issues over time when we'll forget adding ApplySecretFlags() call after the flag.Parse() call or add it at the wrong place. BTW, we do not call flag.Parse() explicitly - instead envflag.Parse() is called. So it is natural to call ApplySecretFlags() inside this call. Are there restrictions which prevent from doing this? If there are no restrictions, then there is no need in making this function public - it will be called explicitly inside envflag.Parse().
```
There is no changelog entry as there is no change in user-visible
behavior.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This should catch possible errors related to improper release of Table parts.
Fix such an error at TestTableCreateSnapshotAt by properly closing all the initialized
TableSearch instances.
Thanks to @rtm0 for pointing to this issue.
Previously, snappy Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components. And in case of incorrectly formed snappy block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.
This commit makes ingest endpoints check block size header based on MaxRequest Limits.
- Standardize all sections to use 'Recommended for:' instead of mixed
'For whom:' and 'Target audience:'
- Fix wording: 'Query evaluation is always local'
Addresses comments in #9919
vmalert tries to spread the moment group starts its evaluation
on `[0..group.interval]` duration. This approach allows to avoid
thundering herd problem when on vmalert start all groups execute their
rules simultaneously. It was introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/724
While for most configs it works great, for groups with big evaluation
intervals (30min, 60min) the first evaluation can be delayed
significantly.
This change introduces a start delay limit via new flag
`--group.maxStartDelay` (5m default).
It limits the `[0..group.interval]` start delay to
`[0..math.min(--group.maxStartDelay, group.interval)]`.
So all groups will start in first 5m or earlier.
The --group.maxStartDelay is ignored if user set `eval_offset`.
The 5m default limitation was picked high to not affect users with
relatively low evaluation intervals.
-----------
Based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9929
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmbackupmanager: enforce newline at the end of CLI result
Previously, vmbackupmanager only printed a response from API which did not include a newline character. That leads to issues with the rendering of the next command when using a shell.
Always append a newline character to avoid breaking shell formatting when using CLI mode.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update docs/victoriametrics/changelog/CHANGELOG.md
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
The load distribution could be uneven when short queries arrive to vmauth while a part of backends are busy
with long-running queries. In this case the major load goes to the backend after a row of busy backend.
Suppose we have four backends - b1, b2, b3 and b4. The first two backends are busy with bigger number
of long-running queries than b3 and b4. Then 75% of short queries will go to b3, while only 25%
of short queries will go to b4.
The new algorithm makes the distribution more even in these cases by storing the next backend
after the chosen backend as candidate for the next query (its' index is stored in the atomicCounter).
Avoid races when updating atomicCounter from concurrently executed queries by using CompareAndSwap() -
if the concurrent query updated it first then the current query won't overwrite it with the outdated value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9712
Data patterns considered:
- Same series, same date
- Same series, different dates
- Different series, same date
- Different series, different dates
To make sure that the pattern condition holds, a new storage instance is
started every benchmark iteration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9912
Previously, vmctl only accepted one label for filtering. Extend this to
allow providing multiple-filters at once. This is useful when migrating
large volumes of data as it allows narrowing down migration scope of
migration for one run so that the source side is not overwhelmed with
migration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9917/
Before, rules that didn't get evaluated yet were showing weird values in
vmalert's UI. It was happening because of
`time.Since(r.LastEvaluation).Seconds()` expression when
`r.LastEvaluation` had 0 value.
With this change, rules that weren't evaluated yet would show `Never` in
Updated column instead.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9924
This commit request revert the commit
d6bbfaf164 for the following reasons:
1. HTTP/2 carries security risks.
2. Most components in the VictoriaMetrics stack do not require HTTP/2
support.
3. While HTTP/2 support was available only as an option in previous
commit, there remains a potential risk of misusing this option and
enabling HTTP/2 inadvertently.
For components (e.g., VictoriaTraces) that require HTTP/2 support, they
should currently build an HTTP server manually with built-in packages,
instead of using `lib/httpserver` in VictoriaMetrics. If the mentioned
issue is resolved in the future and more components need HTTP/2, this
support can be reintroduced into `lib/httpserver`.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9927
It is a cosmetic change: it simplifies function signature by making it a
method of the Group struct.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Rationale: Having query stats logging enabled by default can greatly
help in investigating incidents.
Currently, it is disabled by default, so many users don’t enable it, and
when issues occur there are no stats available.
After discussion with the team, a 5s threshold was agreed upon as a
reasonable default to capture meaningful slow query data without
excessive logging.
This commit adds new RPC protocol for vminsert-vmstorage communication,
it acts in the same way as vmselect-vmstorage RPC.
It's implemented with new handshake hello methods in a backward
compatible way. Server attempts to parse RPC only if client send new
Hello message, while client fallbacks to the old Hello message if server
closes connection.
This change is need for the new metrics metadata forwarded from vminsert
into vmstorage.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
Changes extracted from PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9487
Previously, if a storage started with curr indexDB different from one
stored in nextDayMetricIDs cache file, the cache would still be loaded
into memory possibly affecting the next day prefill.
This is an unlikely case but it is still possible when:
- A programmer makes a mistake in the code and uses something else
instead of idbCurr.generation.
- Downgrading from pt-index to previous version
Related to #7599 and #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This change validates that QueryRange() method for prometheus datasource
receives response with `matrix` data type. It would throw an error
otherwise.
The change is needed to avoid confusions like in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9779.
The fix is not elegant, but it should be simple from code support
perspective. So each API has its own parsing function. Even if some
processing code is repeated.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Currently, the `httpserver` disabled HTTP/2 support by design, because:
```
// Disable http/2, since it doesn't give any advantages for VictoriaMetrics services.
```
As VictoriaLogs and VictoriaTraces rely on `httpserver`, in order to
support gRPC over HTTP/2, an option to support HTTP/2 is required.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9881
Previously all timeseries pushed into aggregators were added
sequentially. It could cause delays on data ingestion and it was not
possible to use all available.
This commit adds concurrency based on available CPU cores.
Also, it adds new generic Buffer and BufferPool into slicesutil.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9878
Previously a misleading random error could be logged for canceled and/or timed out requests to vmauth.
Consistently log the request timeout error for timed out requests.
While at it, do not log errors for requests canceled by the remote client, since such logs aren't actionable
and just pollute error logs generated by vmauth.
Introduce a new flag, which converts only metric names into Prometheus
compatible format. And keeps label names in original form.
It's needed to keep labels in original form, which
is useful for correlation with other telemetry sources, such as logs or
traces.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
AWS SDK does not modify custom http client configuration if it was provided. This leads to
additional configuration such as environment variables being ignored.
Use AWS http client builder instead of custom implementation and
override DialContext to preserve metrics exposed by custom transport.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9858
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Tracking original labels requires storing a copy of labels obtained
from service discovery. It adds extra Garbage Collection pressure and as
a result increased CPU usage.
While dropOriginalLabels has almost no impact at test and small
installations.
Impact grows with a scale. And especially is impactful at Kubernetes
based installations.
In addition, this flag is disabled by default for `k8s-stack` helm
chart, which is our main Kubernetes monitoring solution.
An also, we recommend at vmagent optimisation guide to disable original
labels storing.
This commit changes default value to true and disables tracking of
dropped targets by default. In case of debugging, it could be easily
enabled back by providing `false` value to the flag:
`promscrape.dropOriginalLabels`. It should improve resource usage out of
box by reducing user-experience for minority of users.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9665
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9772
TextField component has ability to show error message and depending on
it's presence text field height changes, which may cause visibility
issues if this field is vertically aligned with some neighbour
components. This PR makes textfield height constant and its input box
horizontally symmetrical
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9693
Currently, it is hard to make sense of progress based on logging as it
requires manual calculation of progress and ETA.
Solve this by:
- making data units humanly readable
- adding an estimation of completion for the operation
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v3.30.7</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.7 - 06 Oct 2025</h2>
<p>No user facing changes.</p>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.7/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.6</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.6 - 02 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.2. <a
href="https://redirect.github.com/github/codeql-action/pull/3168">#3168</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.6/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.5</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.5 - 26 Sep 2025</h2>
<ul>
<li>We fixed a bug that was introduced in <code>3.30.4</code> with
<code>upload-sarif</code> which resulted in files without a
<code>.sarif</code> extension not getting uploaded. <a
href="https://redirect.github.com/github/codeql-action/pull/3160">#3160</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.5/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.4</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.4 - 25 Sep 2025</h2>
<ul>
<li>We have improved the CodeQL Action's ability to validate that the
workflow it is used in does not use different versions of the CodeQL
Action for different workflow steps. Mixing different versions of the
CodeQL Action in the same workflow is unsupported and can lead to
unpredictable results. A warning will now be emitted from the
<code>codeql-action/init</code> step if different versions of the CodeQL
Action are detected in the workflow file. Additionally, an error will
now be thrown by the other CodeQL Action steps if they load a
configuration file that was generated by a different version of the
<code>codeql-action/init</code> step. <a
href="https://redirect.github.com/github/codeql-action/pull/3099">#3099</a>
and <a
href="https://redirect.github.com/github/codeql-action/pull/3100">#3100</a></li>
<li>We added support for reducing the size of dependency caches for Java
analyses, which will reduce cache usage and speed up workflows. This
will be enabled automatically at a later time. <a
href="https://redirect.github.com/github/codeql-action/pull/3107">#3107</a></li>
<li>You can now run the latest CodeQL nightly bundle by passing
<code>tools: nightly</code> to the <code>init</code> action. In general,
the nightly bundle is unstable and we only recommend running it when
directed by GitHub staff. <a
href="https://redirect.github.com/github/codeql-action/pull/3130">#3130</a></li>
<li>Update default CodeQL bundle version to 2.23.1. <a
href="https://redirect.github.com/github/codeql-action/pull/3118">#3118</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.4/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.3</h2>
<h1>CodeQL Action Changelog</h1>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>3.29.4 - 23 Jul 2025</h2>
<p>No user facing changes.</p>
<h2>3.29.3 - 21 Jul 2025</h2>
<p>No user facing changes.</p>
<h2>3.29.2 - 30 Jun 2025</h2>
<ul>
<li>Experimental: When the <code>quality-queries</code> input for the
<code>init</code> action is provided with an argument, separate
<code>.quality.sarif</code> files are produced and uploaded for each
language with the results of the specified queries. Do not use this in
production as it is part of an internal experiment and subject to change
at any time. <a
href="https://redirect.github.com/github/codeql-action/pull/2935">#2935</a></li>
</ul>
<h2>3.29.1 - 27 Jun 2025</h2>
<ul>
<li>Fix bug in PR analysis where user-provided <code>include</code>
query filter fails to exclude non-included queries. <a
href="https://redirect.github.com/github/codeql-action/pull/2938">#2938</a></li>
<li>Update default CodeQL bundle version to 2.22.1. <a
href="https://redirect.github.com/github/codeql-action/pull/2950">#2950</a></li>
</ul>
<h2>3.29.0 - 11 Jun 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.22.0. <a
href="https://redirect.github.com/github/codeql-action/pull/2925">#2925</a></li>
<li>Bump minimum CodeQL bundle version to 2.16.6. <a
href="https://redirect.github.com/github/codeql-action/pull/2912">#2912</a></li>
</ul>
<h2>3.28.21 - 28 July 2025</h2>
<p>No user facing changes.</p>
<h2>3.28.20 - 21 July 2025</h2>
<ul>
<li>Remove support for combining SARIF files from a single upload for
GHES 3.18, see <a
href="https://github.blog/changelog/2024-05-06-code-scanning-will-stop-combining-runs-from-a-single-upload/">the
changelog post</a>. <a
href="https://redirect.github.com/github/codeql-action/pull/2959">#2959</a></li>
</ul>
<h2>3.28.19 - 03 Jun 2025</h2>
<ul>
<li>The CodeQL Action no longer includes its own copy of the extractor
for the <code>actions</code> language, which is currently in public
preview.
The <code>actions</code> extractor has been included in the CodeQL CLI
since v2.20.6. If your workflow has enabled the <code>actions</code>
language <em>and</em> you have pinned
your <code>tools:</code> property to a specific version of the CodeQL
CLI earlier than v2.20.6, you will need to update to at least CodeQL
v2.20.6 or disable
<code>actions</code> analysis.</li>
<li>Update default CodeQL bundle version to 2.21.4. <a
href="https://redirect.github.com/github/codeql-action/pull/2910">#2910</a></li>
</ul>
<h2>3.28.18 - 16 May 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.21.3. <a
href="https://redirect.github.com/github/codeql-action/pull/2893">#2893</a></li>
<li>Skip validating SARIF produced by CodeQL for improved performance.
<a
href="https://redirect.github.com/github/codeql-action/pull/2894">#2894</a></li>
<li>The number of threads and amount of RAM used by CodeQL can now be
set via the <code>CODEQL_THREADS</code> and <code>CODEQL_RAM</code>
runner environment variables. If set, these environment variables
override the <code>threads</code> and <code>ram</code> inputs
respectively. <a
href="https://redirect.github.com/github/codeql-action/pull/2891">#2891</a></li>
</ul>
<h2>3.28.17 - 02 May 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.21.2. <a
href="https://redirect.github.com/github/codeql-action/pull/2872">#2872</a></li>
</ul>
<h2>3.28.16 - 23 Apr 2025</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="aac66ec793"><code>aac66ec</code></a>
Remove <code>update-proxy-release</code> workflow</li>
<li><a
href="91a63dc72c"><code>91a63dc</code></a>
Remove <code>undefined</code> values from results of
<code>unsafeEntriesInvariant</code></li>
<li><a
href="d25fa60a90"><code>d25fa60</code></a>
ESLint: Disable <code>no-unused-vars</code> for parameters starting with
<code>_</code></li>
<li><a
href="3adb1ff7b8"><code>3adb1ff</code></a>
Reorder supported tags in descending order</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This reverts commit 772ac8803e.
The reason for revert is that this recommendation should not be strict.
Installations with <= 1 vCPU will continue working efficiently. The load
from reads, writes and background merges will be evenly spread by Go runtime.
cc @Sleuth56 @tiny-pangolin
Before, `req.URL.Redacted` info was present in some error messages and
empty in others. This change uniformly adds it to the errors context.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- fixed ignored `search` query argument in `Notifiers` and `Rules` tabs
- added dropdown state reset, if other filters were updated and selected
state is not a subset of available items
- proxy requests to config.json for a local setup
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
While there, remove excessive relabeling info and point users to Routing section.
The Routing section should explain how to build flexible processing pipleines.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Reverts 17ca1ba8c4
Reason for reverts are following:
1. The fix relies on release candidates of specific libraries
2. The real fix would be to update Alpine version, which is not released yet
3. It makes the fix partially done, as it would require follow-up in future to
switch from release candidates to stable versions, or to update Alpine version.
4. The fix is not effective, as it doesn't update the base image cached by Docker.
The real fix will be to host&update the base image separately like in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9811.
5. VM binaries aren't vulnerable to mentioned vulnerabilites.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This reverts commit ccf97a4143.
reason for revert: this change may break tests, which expect that ServesMetrics.GetMetric() fails
when the given metric doesn't exist in the output.
It is better to add 'TryGetMetric() (float64, bool)' function, which would return '(0, false)'
when the given metric doesn't exist, so the caller could decide what to do next.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Add ad-hoc filters to query stats and operator dashboards.
These filters are useful for exploring non-uniform metrics sets
without distinct job/instance filters.
The previous text didn't contain links to vmagent's capabilities.
Instead, it contained misleading multitenancy-mode link that doesn't
seem to be related to the subject.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, `GetMetric` do `t.Fatalf` immediately when the target metric
not exist in `/metrics` page.
However, some metrics may start to appear after the process has been
running for a while. `t.Fatalf` invalidates the retry mechanism of
assertions, if the metric is not found the first time, the test case
will terminate.
This commit request changes `t.Fatalf` to `t.Logf` (instead of `t.Errorf`,
because error output may be considered a test case failure in some
scenarios).
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Follow-up for cea9505bab
fastcache.Cache allocates off-heap memory, which must be explicitly
returned back to the pool with Reset method call.
After changed made at commit above, during cache transit from whole to
split mode, it's possible that current cache is referenced by Cache.Get
or Cache.Call atomic pointers. It leads to potential memory leaks, since
we don't have any memory synchronization for atomic.Pointer.Store calls.
This commit adds `Finalizer` to the `fastcache.Cache` instances.
It properly releases memory, when cache is no reachable.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9769
Previously, cache state transition from split into whole could left
cache into broken state, if Reset cache method was called in switching
mode.
Also, cache Reset didn't start background workers and didn't change
cache size.
This commit properly check mode during cache transition. In addition,
it no longer stops background workers after whole mode transition and
always start workers during start-up.
Access to the prev, curr and mode Cache fields are properly locked
in order to mitigate possible race conditions.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9769
It seems like go compilator skipped computations and allocations for samples
as they weren't used afterwards. Sinking results into global variable removes
this optimizations and benchmark starts showing allocations within `pushSamples` fn.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Make SearchTSIDs look similar to SearchMetricNames, i.e. search for metricIDs within the method
- Make the corresponding corrupted index test look similar to one for metric names search
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Consistently use the `v0.0.0-YYYYMMDDHHMMSS-commit_hash` reference for
the internal deps such as `github.com/VictoriaMetrics/VictoriaMetrics`
dependency, since it allows referring any commit without waiting for the
release tag.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
add ability to limit available in datePicker dates using `minDate` and
`maxDate` parameters. all dates before `minDate` and after `maxDate`
cannot be picked. lower and upper bounds can be set independently.
This `minDate` and `maxDate` parameters aren't set by default in vmui.
The datepicker component with these params is re-used elsewhere.
The change also explciitly mentions `out-of-order` phrase, as it is commonly
used in Prometheus ecosystem.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Searching metricName by metricID happens many times during a single API
call. This requires getting the current set of idbs before those calls
happen. Which is fine but requires propagating idbs across the code
base. This is also fine in case of OSS version as it is used in Search
only.
Propagating idbs across the code base becomes a problem in Enterprise
version as it is used in at least 3 places. As a result it becomes very
difficult to merge things from OSS to Ent.
Localizing the all the dependencies in one searchMetricName type and
reusing this type everywhere should make things simpler.
Related enterprise changes:
https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/compare/search-metric-name-ent?expand=1
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9756
A small refactoring that reduces Search dependency on Storage:
- Move searchTSIDs() from Search to Storage because this method does not
depend on anything Search-specific but does depend on Storage.
- Use metricsTracker instead of storage.metricTracker.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9754
Benchmarking storage search api requires taking into account many
parameters, such as:
- data configuration: how many series, deleted series, search time range
- where the index data recides: prev and or indexDB
- which search operation to measure
While adding a new benchmark use case involves a lot boilerplate code.
This pr implements a framework for testing storage search ops that can
be relatively easily extended. This come in expecially handy when adding
new cases for parition index.
The current set of params will result of a lot of benchmarks to be run
which most probably does not make sense because:
- it will take a lot of time and
- the output data is hard to compare manually.
However, these benchmarks are very useful when only small set of params
is of interest. For example, if I want to compare the search of 100k
metric names when the index data resides in prevOnly, currOnly or
prevAndCurr indexDBs. This would translate in the following cmd:
```shell
go test ./lib/storage --loggerLevel=ERROR -run=^$ -bench=^BenchmarkSearch/MetricNames/.*/VariableSeries/100000$
```
Why this change:
- I often need to run benchmarks with configs that I did not have
before, requires either modifying the existing one or writing a new one.
It is easy to get lost and make benchmark non-comparable
- I need some way to make legacy and pt index benchmarks comparable
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
- state that it is unsafe to use lifecycle rules and describe the reason
- update formatting according latest changes in docs
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
- Rename copyStream to copyStreamToClient in order to make it more clear
that the stream must be copied from backend to client.
- Make sure that the client implements net/http.Flusher interface.
It is a programming error (BUG) if the client passed to copyStreamToClient
doesn't implement net/http.Flusher interface.
- Do not write zero-length data to the backend.
Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/667
1beb629b removed logic which was used in order to keep full backup
location path in the restore mark file. Because of this, backups created
with a shortname (e.g. `vmbackupmanager restore create
daily/2025-09-12`) will fail as backup location is not prepended.
Fix that by properly constructing full backup name from parsed canonical
values.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously, vmagent always set enable.auto.commit to false and manually
commited messages. It adds additional pressure to the kafka brokers and could slow down
data consumption.
This commit allows vmagent to skip manual commit and use auto-commit
based on provided configuration. Which may improve message read throughput.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/931
### Describe Your Changes
This PR introduces a `make docs-update-flags` command that updates flags
in the documentation using the actual binaries compiled from the latest
`enterprise-single-node` and `enterprise-cluster` branches (hardcoded
for now). The command also normalizes the output format.
It can be run from any branch. All work happens inside temporary
directories under /tmp. The script checks out the required branch,
builds the binaries, and updates the documentation. The current Git
repository is not touched.
The command adjusts default values to more meaningful ones, such as
changing `-maxConcurrentInserts` (default 20) to (default
2*cgroup.AvailableCPUs()).
Currently the logic is implemented only for vminsert, vmstorage,
vmselect, vmagent, vmalert, and victoria-metrics (aka single).
The goal is to make it easy to keep documentation synchronized with real
binaries
_**Note:** Please ignore xxx_flags.md files for now. Review flags in
`README.md` and `Cluster-VictoriaMetrics.md`, and `vmagent.md`,
`vmalert.md` only. Once we agree on the changes in those files, I'll
replace the flags with the `{{% content "xxx_flags.md" %}}`._
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
* stress on requirement to have empty destination folder for copying;
* remove extra verbosity from docs;
* remove list vmctl migration options as they became unsynced. Instead of syncing,
refer to the vmctl docs;
* fix typos.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The application version can be then displayed in the vmui. Showing the
application version in vmui should make it easier to determine currently
used VM version (at least vmselect version).
------------
@Loori-R it would be could to add the app version in vmui in a follow-up
PR or by pushing a commit to this branch.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
`Table` component:
- add `format` property for table column, which allows to apply custom
formatting depending on column type
- add `rowClasses` table property, that allows to pass function that
allows to customize row css class depending on row value
- add `rowAction` table property, that allows to execute action while
clicking on table row
`Popper` component:
- add `classes` to specify additional CSS classes for popper to
differentiate from other poppers, since it's mounted to a DOM root
`Switch` component:
- use gap instead of left-margin
`DateTimeInput` component:
- add `dateOnly` property to allow accepting only date in the input
additional fixes:
- fix TopQuery header fields alignment
<img width="1279" height="125" alt="image"
src="https://github.com/user-attachments/assets/08ad4dbc-19e5-47f5-9ccd-a9fb222335a4"
/>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
Set rateEnabled to false for probe_success in VMUI
Fix issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9655
Problem:
probe_success is incorrectly initialized with rateEnabled = true because
the regex detecting counters (/_sum?|_total?|_count?/) matches partial
strings like _su. This causes probe_success (a gauge) to be treated as a
counter, producing slightly misleading graphs. For example, when
rateEnabled is set to true, probe_success often shows as 0 in VMUI when
the probe is actually succeding.
It is not intuative for users to have to disable rateEnabled manually
just to get the correct value for probe_success in VMUI.
Solution:
Update the regex to strictly match suffixes:
`/_sum$|_total$|_count$/`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: William Wren <william.wren@ericsson.com>
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This is needed for VictoriaLogs, which allows limiting query results with the given set of extra filters
specified via extra_filters query arg. The request url can contain multiple extra_filters query args -
they are all applied with AND logic to the query. See https://docs.victoriametrics.com/victorialogs/querying/#extra-filters
The merge_query_args option at vmauth allows merging the extra_filters provided by the client
(such as Grafana plugin for VictoriaLogs or built-in web UI) with the extra_filters specified in the backend
url at vmauth config.
This is needed for https://github.com/VictoriaMetrics/VictoriaLogs/issues/106
`workingsetcache` is built on top of two
[fastcache](https://github.com/VictoriaMetrics/fastcache) instances
(curr and prev) that are rotated periodically (configurable via
`-cacheExpireDuration` flag). During the rotation curr becomes prev and
prev is discarded, new curr is an empty. If an entry is not found in
curr then the prev cache is checked, and if the entry is found there it
is copied to curr.
`workingsetcache` also exports metrics, such as `EntriesCount`,
`GetCalls`, `SetCalls`, and `Misses` counts. These metrics are currently
implemented as the sum of the same metrics in prev and curr `fastcache`
instances. Given to rotation logic, these counts can be incorrect:
1. `EntriesCount`. It is the sum of prev and curr entry counts. If an
entry is not found in curr and found in prev (and therefore is copied
from prev to curr) the resulting entry count will be incorrect, i.e. it
will count copied entries two times.
2. `GetCalls`. It is the sum of prev and curr get calls. If an entry is
not found in curr the logic will attempt to retrieve it from prev, which
will result in double counting. While it is actually one get call to
`workingsetcache`.
3. `SetCalls`. It is the sum of prev and curr get calls. If an entry is
not found in curr but found in prev it will be copied to curr resulting
in a set call to curr. While from the `workingsetcache` perspective
there hasn't been any set operation at all.
4. `Misses`. It is the sum of prev and curr misses. If an etry is not
found in curr, it is recorded as a miss. If it is then found in prev,
the entry is returned to the caller, but that cache miss remains. If it
is not found in prev, then there will be 2 misses for 1
`worksingsetcache` get call.
This PR introduces `GetCalls`, `SetCalls`, and `Misses` counts at the
`workingsetcache` level in order to count the calls correctly. It also
excludes duplicates from `EntriesCount`.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
### Describe Your Changes
Implemented the script that generates graphs using `gnuplot`.
Those graphs show the write speed to the db.
How to use it:
1. From the root run `make tsbs`;
2. The file will be generated automatically
`/tmp/tsbs-load-100000-2025-07-22T00:00:00Z-2025-07-23T00:00:00Z-80s.csv`
4. From the root run `make tsbs-plot-load` and observe the result
5. If you have two files with the `tsbs_load_victoriametrics` output,
just define the second in the
`TSBS_LOAD_RESULT_CSV_FILE_COMPARE=/tmp/tsbs-load-10
0000-2025-07-22T01:00:00Z-2025-07-23T01:00:00Z-80s.csv
`
To plot the measurements from some other benchmark, run
`make tsbs-plot-load TSBS_LOAD_RESULT_CSV_FILE=/path/to/file.csv`
To plot the measurements from two benchmarks, run
`make tsbs-plot-load TSBS_LOAD_RESULT_CSV_FILE=/path/to/file1.csv
TSBS_LOAD_RESULT_CSV_FILE_COMPARE=/path/to/file2.csv`
This command should generate a graph like described in the picture
<img width="638" height="578" alt="Screenshot 2025-07-25 at 15 35 42"
src="https://github.com/user-attachments/assets/900b05ab-0b98-4f7f-8f2c-18d28ad2eab6"
/>
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
This helps to improve readability of changes, so users
can see more important changes first, and see changes related
to the same component one after another.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Previously mock storage `net.Listen("tcp", …)` could succeed even if
another process was bound to the same port, due to dual-stack behavior
(`[::]:port` vs `0.0.0.0:port`). That lead to strange test results that
hard to bound to port misuse. Tests queried not mock server but whatever
was running on that port.
Switched to `"tcp4"` to ensure conflicts are detected correctly.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
As there are quite a few files, and each file might have multiple
changes and to make it easily to review, i limited the PR to 5 files at
a time.
I suggest you take a look at markdownlint and add it as part of your CI,
similar to
https://github.com/MicrosoftDocs/PowerShell-Docs/blob/main/.markdownlint.yaml
And while at it, take a look at cspell and how its used in thier repo
and replace the python one you have in your current implementation -
might open a PR with it after all the fixes PRs).
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This allows performing a single MustFsyncPath() for the parent directory after multiple calls to these functions.
This clarifies code paths, which call these functions, and makes them more maintainable.
This also removes a redundant fsync() call for the parent directory when creating a file-based part.
Previously the first fsync() was indirectly called when the directory was created via MustMkdirFailIfExist()
and the second fsync() was called via MustSyncPathAndParentDir() after all the data is written to the part.
The fs.MustWriteSync() already fsyncs the created file, so there is no need in additional fsync() call.
While at it, add missing fsync for the parent directory after creating a directory for persistent queue.
The source file contents should be already fsynced to disk before creating a hard link,
so there is no sense in calling fsync() on the created hard link.
This commit ensures that the -search.maxQueryLen flag applies to Graphite
queries, matching the behavior already present for Prometheus queries.
Previously, Graphite queries could bypass this limit, creating an
inconsistency and a potential vector for resource exhaustion.
Key changes:
Added getMaxQueryLen() to access the global query length limit.
Enforced query length validation in execExpr() for Graphite queries.
Added comprehensive tests for the new validation logic and edge cases.
Error messages are consistent with Prometheus query validation.
The default limit is 16KB (configurable via -search.maxQueryLen).
Setting the limit to 0 disables validation.
This change closes the gap where Graphite queries could exceed
configured length limits, providing consistent protection against
excessively long queries across both query APIs.
Follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9534
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9600
The vm_deleted_metrics_total metric value represents the number of
metricIDs stored in deletedMetricIDs cache. This cache lives at the
storage level and stores the deleted metrics from both prev and curr
idbs. However, the metric is populated at the idb level. Since there are
always 2 idbs (prev and curr), the value is populated twice. Hence the
doubled value of the metric.
The fix is to populate the metric value at the storage level.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9602
- load and parse static`/vmui/config.json`, modify it according to
runtime values and use it as a replacement for static config.json
- remove using `/flags` endpoint for checking features, that should be
enabled on VMUI
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9635
`router.home` represents `/` path, which is the same for all UI apps,
but content and title for root path differs depending on application
type. added `getDefaultOptions` function, which returns proper home
route configuration depending on application type, which allows to
remove renamings in respective layouts
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9641
The commit
25cd5637bc
introduced the `-enableMetadata` flag and the
`promscrape.IsMetadataEnabled()` function, which is now used in multiple
places, including the `app/vminsert/prometheusimport` [request
handler](b24b76ff08/app/vminsert/prometheusimport/request_handler.go (L36)).
Because of the use of `promscrape` package vminsert registered all
`-promscrape.*` service discovery flags, which were not relevant for
`vminsert`.
This change moves the metadata flag logic into a dedicated package,
preventing vminsert from unintentionally loading unrelated promscrape
flags.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9631
This is an attempt to adjust image styles to GitHub themes, because
existing images with transparent backround become unreadable on dark theme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This is an attempt to adjust image styles to GitHub themes, because
existing images with transparent backround become unreadable on dark theme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
When having a `match` of `__name__` key alone for labels api, it's going
to hit max series limit in case of high cardinality metric name.
Instead, we can skip looking by `metricIDs` and fallback to inverted
index scan with a `composite key` since we only have some `__name__` and
a label name.
Common requests for optimisations are:
1) /api/v1/labels?match=up or /api/v1/labels?extra_filters=up
2) /api/v1/label/job/values?match=up or /api/v1/labels?extra_filters =up
It's widely used by grafana variables.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9489
a.Subtract(b) perfomance degrades as b becomes bigger than a. For
example if len(b2) == 10xlen(b1) then time(a.Subtract(b2)) == 10x
time(a.Subtract(b1)).
A quick fix is to iterate over a elements in len(b) > len(a). Iterating
over a's elements and at the same time deleting should be safe since no
elements are actually deleted (i.e. memory freed, etc). Deletion here
means setting a corresponding bit from 1 to 0.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9602
### Describe Your Changes
Add the support of all standard TSDB query types that can be executed
against VictoriaMetrics. `double-groupby-all` is commented out as it
attempts to retrieve all 1B samples and fails. While this can be fixed
by setting the `-search.maxSamplesPerQuery` this query is left disabled
anyway because it will consume way too much memory and cpu time.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
New benchmarks for storage search (data and index):
- Use the same dataset that accounts for prev and curr indexDBs and
deleted series
- The code is more structured
- Account for various numbers of series in response including higher
numbers (>10k) as this appears to be a quite common use case.
These bechmarks were used for investigating #9602 performance issue and
helped discover that prefetching metric names needed to be restored
#9619.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-08-27 11:19:29 +02:00
4203 changed files with 394386 additions and 245547 deletions
VictoriaMetrics is a fast, cost-saving, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.
## Folder Structure
-`/app`: Contains the compilable binaries.
-`/lib`: Contains the golang reusable libraries
-`/docs/victoriametrics`: Contains documentation for the project.
-`/apptest/tests`: Contains integration tests.
## Libraries and Frameworks
- Backend: Golang, no framework. Use third-party libraries sparingly.
- Frontend: React.
## Code review guidelines
Ensure the feature or bugfix includes a changelog entry in /docs/victoriametrics/changelog/CHANGELOG.md.
Verify the entry is under the ## tip section and matches the structure and style of existing entries.
Chore-only changes may be omitted from the changelog.
VictoriaMetrics is a fast, cost-saving, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.
VictoriaMetrics is a fast, cost-effective, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.
Here are some resources and information about VictoriaMetrics:
-Deployment types: [Single-node version](https://docs.victoriametrics.com/), [Cluster version](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/), and [Enterprise version](https://docs.victoriametrics.com/victoriametrics/enterprise/)
- Changelog: [CHANGELOG](https://docs.victoriametrics.com/victoriametrics/changelog/), and [How to upgrade](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-upgrade-victoriametrics)
-**Community**: [Slack](https://slack.victoriametrics.com/) (join via [Slack Inviter](https://slack.victoriametrics.com/)), [X (Twitter)](https://x.com/VictoriaMetrics), [YouTube](https://www.youtube.com/@VictoriaMetrics). See full list [here](https://docs.victoriametrics.com/victoriametrics/#community-and-contributions).
- **Changelog**: Project evolves fast - check the [CHANGELOG](https://docs.victoriametrics.com/victoriametrics/changelog/), and [How to upgrade](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-upgrade-victoriametrics).
- **Enterprise support:** [Contact us](mailto:info@victoriametrics.com) for commercial support with additional [enterprise features](https://docs.victoriametrics.com/victoriametrics/enterprise/).
- **Enterprise releases:** Enterprise and [long-term support releases (LTS)](https://docs.victoriametrics.com/victoriametrics/lts-releases/) are publicly available and can be evaluated for free
using a [free trial license](https://victoriametrics.com/products/enterprise/trial/).
- **Security:** we achieved [security certifications](https://victoriametrics.com/security/) for Database Software Development and Software-Based Monitoring Services.
Yes, we open-source both the single-node VictoriaMetrics and the cluster version.
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol=flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey=flagutil.NewPassword("configAuthKey","Authorization key for accessing /config page. It must be passed via authKey query arg. It overrides -httpAuth.*")
configAuthKey=flagutil.NewPassword("configAuthKey","Authorization key for accessing /config and /remotewrite-.*-config pages. It must be passed via authKey query arg. It overrides -httpAuth.*")
reloadAuthKey=flagutil.NewPassword("reloadAuthKey","Auth key for /-/reload http endpoint. It must be passed via authKey query arg. It overrides -httpAuth.*")
dryRun=flag.Bool("dryRun",false,"Whether to check config files without running vmagent. The following files are checked: "+
fmt.Fprintf(w,"See docs at <a href='https://docs.victoriametrics.com/victoriametrics/vmagent/'>https://docs.victoriametrics.com/victoriametrics/vmagent/</a></br>")
unparsedLabelsGlobal=flagutil.NewArrayString("remoteWrite.label","Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url."+
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
unparsedLabelsGlobal=flagutil.NewArrayString("remoteWrite.label","Optional label in the form 'name=value' to add to all the metrics before sending them to all -remoteWrite.url.")
relabelConfigPathGlobal=flag.String("remoteWrite.relabelConfig","","Optional path to file with relabeling configs, which are applied "+
"to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. "+
"The path can point either to local file or to http url. "+
@@ -32,9 +34,12 @@ var (
"See https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels")
"See also -remoteWrite.maxDiskUsagePerURL and -remoteWrite.disableOnDiskQueue")
keepDanglingQueues=flag.Bool("remoteWrite.keepDanglingQueues",false,"Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. "+
"Useful when -remoteWrite.url is changed temporarily and persistent queue files will be needed later on.")
queues=flag.Int("remoteWrite.queues",cgroup.AvailableCPUs()*2,"The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
queues=flagutil.NewArrayInt("remoteWrite.queues",cgroup.AvailableCPUs()*2,"The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. "+
"Default value depends on the number of available CPU cores. It should work fine in most cases since it minimizes resource usage")
showRemoteWriteURL=flag.Bool("remoteWrite.showURL",false,"Whether to show -remoteWrite.url in the exported metrics. "+
@@ -79,10 +84,14 @@ var (
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
maxHourlySeries=flag.Int("remoteWrite.maxHourlySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxDailySeries=flag.Int("remoteWrite.maxDailySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxHourlySeries=flag.Int64("remoteWrite.maxHourlySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. "+
fmt.Sprintf("Setting this flag to '-1' sets limit to maximum possible value (%d) which is useful in order to enable series tracking without enforcing limits. ",math.MaxInt32)+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxDailySeries=flag.Int64("remoteWrite.maxDailySeries",0,"The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. "+
fmt.Sprintf("Setting this flag to '-1' sets limit to maximum possible value (%d) which is useful in order to enable series tracking without enforcing limits. ",math.MaxInt32)+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#cardinality-limiter")
maxIngestionRate=flag.Int("maxIngestionRate",0,"The maximum number of samples vmagent can receive per second. Data ingestion is paused when the limit is exceeded. "+
"By default there are no limits on samples ingestion rate. See also -remoteWrite.rateLimit")
@@ -91,6 +100,8 @@ var (
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#disabling-on-disk-persistence . See also -remoteWrite.dropSamplesOnOverload")
dropSamplesOnOverload=flag.Bool("remoteWrite.dropSamplesOnOverload",false,"Whether to drop samples when -remoteWrite.disableOnDiskQueue is set and if the samples "+
"cannot be pushed into the configured -remoteWrite.url systems in a timely manner. See https://docs.victoriametrics.com/victoriametrics/vmagent/#disabling-on-disk-persistence")
disableMetadataPerURL=flagutil.NewArrayBool("remoteWrite.disableMetadata","Whether to disable sending metadata to the corresponding -remoteWrite.url. "+
"By default, metadata sending is controlled by the global -enableMetadata flag")
)
var(
@@ -156,8 +167,8 @@ func Init() {
iflen(*remoteWriteURLs)==0{
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
streamAggrGlobalConfig=flag.String("streamAggr.config","","Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -streamAggr.keepInput, -streamAggr.dropInput and -streamAggr.dedupInterval")
streamAggrGlobalKeepInput=flag.Bool("streamAggr.keepInput",false,"Whether to keep all the input samples after the aggregation "+
"with -streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to remote storages write. See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDropInput=flag.Bool("streamAggr.dropInput",false,"Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to remote storages write. See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalKeepInput=flag.Bool("streamAggr.keepInput",false,"Whether to keep input samples that match any rule in "+
"-streamAggr.config. By default, matched raw samples are aggregated and dropped, while unmatched samples "+
"are written to the remote storage. See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDropInput=flag.Bool("streamAggr.dropInput",false,"Whether to drop input samples that not matching any rule in "+
"-streamAggr.config. By default, only matched raw samples are dropped, while unmatched samples "+
"are written to the remote storage. See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDedupInterval=flag.Duration("streamAggr.dedupInterval",0,"Input samples are de-duplicated with this interval on "+
"aggregator before optional aggregation with -streamAggr.config . "+
"See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
@@ -43,11 +43,11 @@ var (
streamAggrConfig=flagutil.NewArrayString("remoteWrite.streamAggr.config","Optional path to file with stream aggregation config for the corresponding -remoteWrite.url. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -remoteWrite.streamAggr.keepInput, -remoteWrite.streamAggr.dropInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrDropInput=flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput","Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. By default, only aggregates samples are dropped, while the remaining samples "+
streamAggrDropInput=flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput","Whether to drop input samples that not matching any rule in "+
"the corresponding -remoteWrite.streamAggr.config. By default, only matched raw samples are dropped, while unmatched samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrKeepInput=flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput","Whether to keep all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. By default, only aggregates samples are dropped, while the remaining samples "+
streamAggrKeepInput=flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput","Whether to keep input samples that match any rule in "+
"the corresponding -remoteWrite.streamAggr.config. By default, matched raw samples are aggregated and dropped, while unmatched samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDedupInterval=flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval",0,"Input samples are de-duplicated with this interval before optional aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
returnfmt.Errorf("interval shouldn't be lower than 0")
}
ifg.EvalOffset.Duration()<0{
returnfmt.Errorf("eval_offset shouldn't be lower than 0")
}
// if `eval_offset` is set, interval won't use global evaluationInterval flag and must bigger than offset.
ifg.EvalOffset.Duration()>g.Interval.Duration(){
returnfmt.Errorf("eval_offset should be smaller than interval; now eval_offset: %v, interval: %v",g.EvalOffset.Duration(),g.Interval.Duration())
// if `eval_offset` is set, the group interval must be specified explicitly(instead of inherited from global evaluationInterval flag) and must bigger than offset.
returnfmt.Errorf("the abs value of eval_offset should be smaller than interval; now eval_offset: %v, interval: %v",g.EvalOffset.Duration(),g.Interval.Duration())
}
ifg.EvalOffset!=nil&&g.EvalDelay!=nil{
returnfmt.Errorf("eval_offset cannot be used with eval_delay")
}
ifg.Limit<0{
returnfmt.Errorf("invalid limit %d, shouldn't be less than 0",g.Limit)
ifg.Limit!=nil&&*g.Limit<0{
returnfmt.Errorf("invalid limit %d, shouldn't be less than 0",*g.Limit)
}
ifg.Concurrency<0{
returnfmt.Errorf("invalid concurrency %d, shouldn't be less than 0",g.Concurrency)
@@ -57,7 +56,7 @@ absolute path to all .tpl files in root.
-rule.templates="dir/**/*.tpl". Includes all the .tpl files in "dir" subfolders recursively.
`)
configCheckInterval=flag.Duration("configCheckInterval",0,"Interval for checking for changes in '-rule' or '-notifier.config' files. "+
configCheckInterval=flag.Duration("configCheckInterval",0,"Interval for checking for changes in '-rule', '-rule.templates' and '-notifier.config' files. "+
"By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddrs=flagutil.NewArrayString("httpListenAddr","Address to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
@@ -77,15 +76,12 @@ absolute path to all .tpl files in root.
`Link to VMUI: -external.alert.source='vmui/#/?g0.expr={{.Expr|queryEscape}}'. `+
`If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.`)
externalLabels=flagutil.NewArrayString("external.label","Optional label in the form 'Name=value' to add to all generated recording rules and alerts. "+
"In case of conflicts, original labels are kept with prefix `exported_`.")
"In case of conflicts, original labels are kept with prefix 'exported_'.")
dryRun=flag.Bool("dryRun",false,"Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.")
)
var(
alertURLGeneratorFnnotifier.AlertURLGenerator
extURL*url.URL
)
varextURL*url.URL
funcmain(){
// Write flags and help message to stdout, since it is easier to grep or pipe.
// targets with same address but different alert_relabel_configs are still considered duplicates since it's mostly due to misconfiguration and could cause duplicated notifications.
if_,ok:=duplicates[u];ok{
if!*suppressDuplicateTargetErrors{
logger.Errorf("skipping duplicate target with identical address %q; "+
"make sure service discovery and relabeling is set up properly; "+
// sentRows and sentBytes are historical counters that can now be replaced by flushedRows and flushedBytes histograms. They may be deprecated in the future after the new histograms have been adopted for some time.
"summary":`{{$labels.__name__}}: Too high connection number for "{{$labels.instance}}"`,
"summary":`{{$labels.__name__}}: Too high connection number for "{{$labels.instance}}".{{if$isPartial}} WARNING: Partial response detected - this alert may be incomplete. Please verify the results manually.{{end}}`,
"description":`{{$labels.alertname}}: It is {{$value}} connections for "{{$labels.instance}}"`,
"summary":`Alert "{{$labels.alertname}}({{$labels.alertgroup}})" for instance {{$labels.instance}}`,
"summary":`Alert "{{$labels.alertname}}({{$labels.alertgroup}})" for instance {{$labels.instance}}.{{if$isPartial}} WARNING: Partial response detected - this alert may be incomplete. Please verify the results manually.{{end}}`,
"summary":`Alert "originAlertname(originGroupname)" for instance foo`,
"summary":`Alert "originAlertname(originGroupname)" for instance foo. WARNING: Partial response detected - this alert may be incomplete. Please verify the results manually.`,
"invalid_label":`error evaluating template: template: :1:298: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
"invalid_label":`error evaluating template: template: :1:298: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
ruleResultsLimit=flag.Int("rule.resultsLimit",0,"Limits the number of alerts or recording results a single rule can produce. "+
"Can be overridden by the limit option under group if specified. "+
"If exceeded, the rule will be marked with an error and all its results will be discarded. "+
"0 means no limit.")
ruleUpdateEntriesLimit=flag.Int("rule.updateEntriesLimit",20,"Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param.")
resendDelay=flag.Duration("rule.resendDelay",0,"MiniMum amount of time to wait before resending an alert to notifier.")
maxResolveDuration=flag.Duration("rule.maxResolveDuration",0,"Limits the maxiMum duration for automatic alert expiration, "+
resendDelay=flag.Duration("rule.resendDelay",0,"Minimum amount of time to wait before resending an alert to notifier.")
maxResolveDuration=flag.Duration("rule.maxResolveDuration",0,"Limits the maximum duration for automatic alert expiration, "+
"which by default is 4 times evaluationInterval of the parent group")
evalDelay=flag.Duration("rule.evalDelay",30*time.Second,"Adjustment of the 'time' parameter for rule evaluation requests to compensate intentional data delay from the datasource. "+
"Normally, should be equal to '-search.latencyOffset' (cmd-line flag configured for VictoriaMetrics single-node or vmselect). "+
@@ -36,6 +40,8 @@ var (
disableAlertGroupLabel=flag.Bool("disableAlertgroupLabel",false,"Whether to disable adding group's Name as label to generated alerts and time series.")
remoteReadLookBack=flag.Duration("remoteRead.lookback",time.Hour,"Lookback defines how far to look into past for alerts timeseries. "+
"For example, if lookback=1h then range from now() to now()-1h will be scanned.")
maxStartDelay=flag.Duration("group.maxStartDelay",5*time.Minute,"Defines the max delay before starting the group evaluation. Group's start is artificially delayed for random duration on interval"+
" [0..min(--group.maxStartDelay, group.interval)]. This helps smoothing out the load on the configured datasource, so evaluations aren't executed too close to each other.")
{% if g.Unhealthy > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d g.Unhealthy %}</span> {% endif %}
{% if g.NoMatch > 0 %}<span class="badge bg-warning" title="Number of rules with status NoMatch">{%d g.NoMatch %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules with status Ok">{%d g.Healthy %}</span>
{% if g.States["unhealthy"] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d g.States["unhealthy"] %}</span> {% endif %}
{% if g.States["nomatch"] > 0 %}<span class="badge bg-warning" title="Number of rules with status NoMatch">{%d g.States["nomatch"] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules with status Ok">{%d g.States["ok"] %}</span>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" title="The time when the rule was executed">Updated at</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
{% if seriesFetchedEnabled %}<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="The time used in execution query request">Execution timestamp</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>
</thead>
@@ -649,8 +661,8 @@
<span class="badge bg-warning text-dark" title="This firing state is kept because of `keep_firing_for`">stabilizing</span>
{% endfunc %}
{% func seriesFetchedWarn(prefix string, r apiRule) %}
{% if isNoMatch(r) %}
{% func seriesFetchedWarn(prefix string, r *rule.ApiRule) %}
{% if r.IsNoMatch() %}
<svg
data-bs-toggle="tooltip"
title="No match! This rule's last evaluation hasn't selected any time series from the datasource.
"See https://docs.victoriametrics.com/victoriametrics/vmauth/#load-balancing for details")
defaultLoadBalancingPolicy=flag.String("loadBalancingPolicy","least_loaded","The default load balancing policy to use for backend urls specified inside url_prefix section. "+
"Supported policies: least_loaded, first_available. See https://docs.victoriametrics.com/victoriametrics/vmauth/#load-balancing")
defaultMergeQueryArgs=flagutil.NewArrayString("mergeQueryArgs","An optional list of client query arg names, which must be merged with args at backend urls. "+
"The rest of client query args are replaced by the corresponding query args from backend urls for security reasons; "+
"see https://docs.victoriametrics.com/victoriametrics/vmauth/#query-args-handling")
discoverBackendIPsGlobal=flag.Bool("discoverBackendIPs",false,"Whether to discover backend IPs via periodic DNS queries to hostnames specified in url_prefix. "+
"This may be useful when url_prefix points to a hostname with dynamically scaled instances behind it. See https://docs.victoriametrics.com/victoriametrics/vmauth/#discovering-backend-ips")
discoverBackendIPsInterval=flag.Duration("discoverBackendIPsInterval",10*time.Second,"The interval for re-discovering backend IPs if -discoverBackendIPs command-line flag is set. "+
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.