Some influx clients ( such as nimon monitoring client) adds excess white spaces in the influx line and does not set a
timestamp. Since Influx protocol requires whitespace before timestamp only when it set, it could present without timestamp. Whitespace before omitted timestamp confuses parser.
This commit adds check for the skipped timestamp and test case for it.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10049
Previously, proxy vmselect (aka 1st level vmselect) performed parsing
of MetricBlock received from vmstorage before forwarding it into top vmselect. It required an additional CPU and Memory, which greatly slowed down query requests.
This commit changes lib/vmselectapi iterator API, instead of MetricBlock, it returns encoded MetricBlock as a byte slice.
It allows to save CPU and memory at proxy vmselect by eliminating need of decoding MetricBlock received from storage.
In addition, it adds the following optimizations for proxy vmselect:
* reduces memory allocations by using iterator pool
* add per storageNode workerItem for iterator
Also, it adds optimization for vmstorage, it no longer performs extra memory copy of MetricName for MetricBlock.
vmselect and vmstorage metrics vm_vmselect_metric_rows_read_total and vm_metric_rows_read_total were removed, it's not used at any dashboards and rules. New Iterator API doesn't support it.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9899
The purpose of this PR is the same as #10000, except `lrucache` is used
for implementing tfss cache.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
revert change, that was introduced in
483e00ffb9
since rendering of all nested children significantly impacts alerting
tab performance in case of multiple items
@Loori-R @arturminchukov , what do you think about using react-virtuoso
additionally for alerting tab to decrease dom size?
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
A throttled logger will continue to log messages occasionally with a
suffix indicating how many similar logs were throttled. Using the same
logger for multiple log messages can result in certain logs being
entirely suppressed and invisible in the logs. This updates most of the
loggers used in `appendFromScopeMetrics` to be their own logger so that
"unsupported delta temporality/metric type" logs will be visible for all
metric types. Additionally, `skippedSampleLogger` is only used by
`appendSamplesFromHistogram` so this was moved closer to that function.
Related to #9447
Related to #9498
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Go runtime executes all the goroutines on GOMAXPROCS operating system threads.
Go runtime cannot switch the OS thread to another goroutine if the current goroutine
is stuck in the major pagefault while reading the data from memory-mapped file,
because Go runtime doesn't distiguinsh between reading from regular memory and reading
from memory-mapped file. So the OS thread becomes stuck while waiting until the OS
reads the data from file at the requested memory address and returns back control to Go application.
In the worst case it is possible that all the GOMAXPROCS threads are stuck in major pagefaults,
so Go runtime pauses executing all the goroutines. This state is possible in environments
with small GOMAXPROCS and high-latency disks such as NFS or small HDD-based disks at AWS.
See https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d for more details.
This commit protects from such stalls by verifying whether the given memory location from memory-mapped file
is already loaded in the OS page cache before reading from that memory.
If the location isn't in the OS page cache, then it falls back to pread() syscall for reading the data from file.
Go runtime allocates extra OS threads for long-running syscalls, so it can continue executing goroutines
across all the GOMAXPROCS threads while reading the data from slow storage via pread() syscall.
This commit uses mincore() syscall for detecting whether the given memory page is available in the OS page cache.
It also caches mincore() results for up to a minute in order to reduce the overhead for the mincore() syscall.
This commit reduces the increase rate for the process_major_pagefaults_total metric by multiple orders of magnitude
on systems with high-latency disks.
Currently, `lrucache.Cache` `SizeBytes()` and `SizeMaxBytes()` return
type is `int`. The cache `Entry.SizeBytes()` also returns `int` value.
Changing the type to `uint64` will allow using `uint64set.Set` as the
cache entry type (see #10072).
Please note that using `uint64` regardless the cpu architecture is set
is not entirely correct, because in 32-bit systems the size won't ever
get bigger than `2^32`, so the `uint64` will too much. However current
type (`int`) is not correct either since it is signed and will only
allow to store values up to `2^31`. Alternatively, all `SizeBytes()`
methods should return `uint`.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
….maxConcurrentRequests error
If `vmstorage` is currently overloaded it could return
maxConcurrentRequests error. Now `vmselect` immediately fails the whole
request even if `replicationFactor` is set up and other replicas could
respond without errors.
This PR treats them as regular errors, not fatal ones.
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.43.0 to 0.45.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4e0068c009"><code>4e0068c</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="e79546e28b"><code>e79546e</code></a>
ssh: curb GSSAPI DoS risk by limiting number of specified OIDs</li>
<li><a
href="f91f7a7c31"><code>f91f7a7</code></a>
ssh/agent: prevent panic on malformed constraint</li>
<li><a
href="2df4153a03"><code>2df4153</code></a>
acme/autocert: let automatic renewal work with short lifetime certs</li>
<li><a
href="bcf6a849ef"><code>bcf6a84</code></a>
acme: pass context to request</li>
<li><a
href="b4f2b62076"><code>b4f2b62</code></a>
ssh: fix error message on unsupported cipher</li>
<li><a
href="79ec3a51fc"><code>79ec3a5</code></a>
ssh: allow to bind to a hostname in remote forwarding</li>
<li><a
href="122a78f140"><code>122a78f</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="c0531f9c34"><code>c0531f9</code></a>
all: eliminate vet diagnostics</li>
<li><a
href="0997000b45"><code>0997000</code></a>
all: fix some comments</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/crypto/compare/v0.43.0...v0.45.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Describe Your Changes
fix#9987
Avoid blocking when a connection to `-opentsdbListenAddr` doesn't send
any data. This issue blocked other connections from being handled.
> This bug can be tested with:
> 1. Start VictoriaMetrics Single-node with `-opentsdbListenAddr=:4242`.
> 2. Run: `telnet 127.0.0.1 4242` without typing any data after
connection established.
> 3. Run (in another terminal, after step 2): `curl -H 'Content-Type:
application/json' -d
'{"metric":"x.y.z","value":2222222.34,"tags":{"t1":"v1","t2":"v2"}}'
http://localhost:4242/api/put`
>
> Before the change:
> - Step 3 was blocked infinitely.
>
> Expect result after the change:
> - Step 3 was executed.
> - Connection established by step 2 will be closed after 5 seconds.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Clarified the index size note in
docs/guides/understand-your-setup-size/README.md to steer readers toward
the FAQ when indexdb feels oversized, noting typical ratios and
troubleshooting guidance.
- Fix comment
- Re-use dst instead introducing a new variable.
This change has been requested to be in a separated PR during the
pt-index (#8134) code review.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Currently, when a partition is created its corresponding parts.json file
is not created right away (see createNewParition()). Its creation is
delayed until the first part files are created on disk (see
swapSrcWithDstParts()). However, the parts.json file is created for a
possibly empty partition when an existing partition is opened (see
mustOpenPartition()) and when a partition snapshot is create (see
MustCreateSnapshotAt()).
I.e. `parts.json` is an important part of a partition, since it is an
artifact that describes the partition contents. And it should be created
on pt creation even if its contents is empty.
To be honest, this change is mostly a no-op for the current storage
implementation. It only makes the code consistent, i.e. the parts.json
is created along with the partition.
However having it created when a partition is created becomes in
pt-index (#7599, #8134), because it allows having partitions with no
data and therefore without parts.json file. Still not a big deal but the
unit tests start failing.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
dateMetricIDCache does not belong to storage anymore since it has been
moved to indexDB. Instead moving the case to index_db.go, move it to a
separate file in order to navigate the code more easily.
No changes have been done to the code or tests.
Follow up for: #9983
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Alexander Frolov <9749087+fxrlv@users.noreply.github.com>
The data structure used for holding the nextDayMetricIDs is too complex
and can be simplified (flattened).
Follow up for: #9983
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
The change was introduced in pt-index PR (#8134) and is extracted into a
separate PR.
Currently used in partition_search and partition. If you see more places
like this, please let me know.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Looks like the `dateMetricIDCache` must be per indexDB:
- the use of this cache and `is.hasDateMetricID()` often go in pairs. So
it makes
sense to use this cache in that method.
- The same is true for `createPerDayIndexes()`: everytime the index
entry is
created, a corresponding entry is added to the cache.
- As a result the generation field is also removed from the cache.
Related to #7599 and #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This is very frequent question from new users of VcitoriaMetrcs who migrate from other solutions
with automatic data rebalancing among storage nodes, so it is a good idea to cover it in the docs.
Metrics metadata is loaded from a per-tenant storage map
(perTenantStorage map[uint64]map[string]*Row), so result rows order is
non-deterministic. The existing sortRows implementation only sorts by
metric name and ingestion time, which means rows that differ only by
tenant/account ID still sorted undeterministically.
This change updates `sortRows` to include account\project identifiers in
the comparison, ensuring stable and deterministic ordering for metadata
entries that share the same metric name and timestamp.
First discovered as flaky test:
--- FAIL: TestStorageRead (0.00s)
storage_test.go:337: unexpected rows get result (-want, +got):
[]*metricsmetadata.Row{
&{
... // 2 ignored and 1 identical fields
Help: "uselesshelp1",
Unit: "seconds1",
- AccountID: 1,
+ AccountID: 0,
- ProjectID: 1,
+ ProjectID: 0,
Type: 1,
},
&{
... // 2 ignored and 1 identical fields
Help: "uselesshelp1",
Unit: "seconds1",
- AccountID: 0,
+ AccountID: 1,
- ProjectID: 0,
+ ProjectID: 1,
Type: 1,
},
}
FAIL
https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/actions/runs/19361594138/job/55394642029#step:4:133
This commits adds storage part and cluster RPC methods for metrics metadata.
Key concepts:
* vmstorage persists metadata in-memory only.
* vmstorage evicts metadata records older than 1 hour.
* vmstorage stores only the last value of metadata for time series
metric name.
* vminsert opens an additional TCP connection to the vmstorage for
metadata write requests.
* vmselect doesn't support `limit_per_metric_name`.
This feature is available optional and must be enabled via flag - `-enableMetadata` provided to vminsert/vmsingle.
Fixes github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
vmstorage nodes work perfectly with one CPU core and even with 10% of a single CPU core
if the allocated CPU resources matches their workload.
It is better to recommend allocating the an interger number of CPU cores to vmstorage
in order to achieve an optimal performance, since vmstorage allocates internal resources
according to the available CPU cores. If there is a fractional number of CPU cores,
then the allocation of internal resources may be not so optimal.
Fractional number of CPU cores may also lead to increased latencies and stalls
because some P threads at Go runtime won't be able to run goroutines from their ready queues
in a timely manner becasue of the lack of CPU time. See https://victoriametrics.com/blog/kubernetes-cpu-go-gomaxprocs/
Too big values for the -maxConcurrentRequests command-line flag increase memory usage
and increase CPU overhead for processing incoming requests in most cases.
The only valid reason for increasing the value for -maxConcurrentRequests command-line flag
is when many clients send data to vmagent over very slow network.
Previously, zstd Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components. And in case of incorrectly formed zstd block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.
This commit makes ingest endpoints check frame content size and window size headers based on MaxRequest Limits.
For users, if an alerting rule has a misconfigured annotation, it's more
important to deliver the alert when the rule triggers rather than skip
it with templating error logs.
Then users can see the faulty annotation in alert message and fix it.
Note: the previous behavior is retained in replay mode because errors
there should be noticed immediately; hiding them could waste time,
resources and require a re-replay after fixes.
Also the rule's status in the vmalert UI remains unhealthy if templating
failed.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9853
In prometheus ecosystem, a label with an empty value equals no label,
since a query like `test{something=""}` matches all the series without
label `something`.
So for vmalert, preserving empty-value labels in generated alerts or
time series is unnecessary and can cause alert hash mismatches during
[restore](https://docs.victoriametrics.com/victoriametrics/vmalert/#alerts-state-on-restarts).
The empty-value label shouldn't come from datasource response since they
follow the same rule(omit empty-value labels), it may come from
`-external.label` or rule labels, but the empty value could be caused by
occasionally templating failures, which is hard to check there.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
This is a follow-up for the commit 1130adebad .
The EntriesCount, BytesSize and MaxBytesSize metrics must take into account the data
stored in both prev and curr caches, since this data occupies memory and it is expected
that the exposed metrics - vm_cache_entries, vm_cache_size_bytes and vm_cache_size_max_bytes -
take into account all the memory occupied by the corresponding caches.
The GetCalls, SetCalls, Collisions and Corruptions metrics must take into account stats
from the curr cache only, since the corresponding stats for the prev cache is already taken
during the rotation (when moving curr to prev and resetting the previous prev).
The Misses metric must take into account only misses in the prev cache, since these misses
mean that the given entry is missing the both the curr and the prev cache.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
While at it, make sure that the cache mode and cache stats is always read and updated under c.mu lock.
This may help resolving races similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9921
This reverts commit 89fd27c922.
Reason for revert: this commit adds scalability bottleneck in the fast path - Cache.Get() -
in the form of c.getCalls.Add(). This call doesn't scale on systems with big number of CPU cores,
since it needs to update atomically a shared memory from big number of CPU cores.
The Cache.Get() is called per every ingested sample when obtaining TSID by MetricName from the cache
at lib/storage.Storage.get(), so this can be a major bottleneck on systems with many CPU cores.
The solution for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
is to properly track cache requests and misses: cache requests must be taken into account
only at the curr cache, while cache misses must be taken into account only at the prev cache.
This will be implemented in the follow-up commit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
This reverts commit 994dadb4d5.
Reason for revert: the introduced metrics have zero practical applicability.
The lib/workingsetcache doesn't need manual tuning in most cases - its' size
is automatically adjusted to the given working set, if the working set is smaller
than the cache size limit set at the cache creation time. The limit just prevents
unbounded cache growth for large working sets.
If the working set exceeds the given limit, then the cache may become inefficient
because of the increased cache miss rate. The introduced metrics do not help determining
the needed cache size.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
If the cache cannot be saved to the given file, this is a fatal error.
It is better to log this fatal error inside Cache.MustSave() and then exit
instead of returning it to the caller. This makes the code more clear at the caller side.
It is expected that the number of TSIDs misses over the last 5 minutes is zero in steady state.
If it is non-zero, then something wrong happens. That's why it is better to use increase() instead of rate() function
for this alert.
This alert is expected after unclean shutdown (OOM, power off, kill -9) of VictoriaMetrics.
It should go away in a few minutes after the restart while VictoriaMetrics deletes metricIDs
for the missing MetricID->TSID entries which were created for the newly registered time series
just before unclean shutdown. It is OK to delete such metricIDs, since the corresponding time series
will be re-registered again. See the commit 20812008a7 .
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502
When one goroutine attemps to update the min timestamp under the lock it
could have been updated already by another goroutine with a smaller
timestamp. As a result the goroutine will update the timestamp with a
bigger value.
A simple unit test (included in this commit) demonstrates that.
Additionally, use a simple Mutex instead of RWMutex. RWMutexes only
introduce an unnecessary overhead for operations as simple as retrieving
a value from a map and regular Mutex should be preferred.
Thanks to @valyala for spotting a bug and the advice on RWMutexes.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This commit improves overall performance and stability of chart rendering,
refines time series generation, and fixes incorrect median calculation
in metric series.
JavaScript execution time improved by up to ×6 on large datasets.
**Changes:**
* Reworked `getTimeSeries` - one point per pixel.
* Added legend auto-collapse when >20 items.
* Switched median algorithm to Quickselect (Floyd–Rivest).
* Unified array stats functions (`min`, `max`, `avg`, `median`) into a
single pass.
* Removed unused `last` value from series.
* Renamed `roundToMilliseconds` to `roundToThousandths` and moved to
`utils/math`.
* Replaced `isSupportedDuration` with `parseSupportedDuration`, added
fractional duration support.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9699
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9926
When multiple service discovery configs of the same type exist (e.g.,
`hetzner_sd_config`), vmagent currently behaves as follows:
1. Attempts to request each config.
2. Exits immediately if any config returns an error.
3. Skips the rest configs and falls back to the previous service
discovery result.
The correct behavior—more compatible with Prometheus—should be:
1. Attempt to request each config.
2. Collect all valid results.
3. Use the valid results if there's at least one. otherwise (all
failed), fall back to the previous SD result.
Scrape example:
```yaml
scrape_configs:
- job_name: hetzner-default
hetzner_sd_configs:
- role: "hcloud"
authorization:
credentials: "some_valid_value"
- role: "hcloud"
authorization:
credentials: "some_wrong_value"
```
Expected outcome:
- At least targets from `credentials: "some_valid_value"` should appear
in the service discovery result.
current outcome:
- the error from `credentials: "some_wrong_value"` leads to an **empty**
result.
This issue should affect service discovery which using
`getScrapeWorkGeneric` function:
- `azure_sd_config`
- `consul_sd_config`
- `consulagent_sd_config`
- `digitalocean_sd_config`
- `dns_sd_config`
- `docker_sd_config`
- `dockerswarm_sd_config`
- `ec2_sd_config`
- `eureka_sd_config`
- `gce_sd_config`
- `hetzner_sd_config`
- `http_sd_config`
- `kuma_sd_config`
- `marathon_sd_config`
- `nomad_sd_config`
- `openstack_sd_config`
- `ovhcloud_sd_config`
- `puppetdb_sd_config`
- `vultr_sd_config`
- `yandexcloud_sd_config`
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9375
This is needed in order to detect and prevent cases of improper usage of partitions
while they are closed.
This is a follow-up for the commit 9725ee50ec .
### Describe Your Changes
Follow up on PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839, which
addresses review comment
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839#discussion_r2477729886
Alex:
```
this design decision isn't good, since it will lead to potential security issues over time when we'll forget adding ApplySecretFlags() call after the flag.Parse() call or add it at the wrong place. BTW, we do not call flag.Parse() explicitly - instead envflag.Parse() is called. So it is natural to call ApplySecretFlags() inside this call. Are there restrictions which prevent from doing this? If there are no restrictions, then there is no need in making this function public - it will be called explicitly inside envflag.Parse().
```
There is no changelog entry as there is no change in user-visible
behavior.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This should catch possible errors related to improper release of Table parts.
Fix such an error at TestTableCreateSnapshotAt by properly closing all the initialized
TableSearch instances.
Thanks to @rtm0 for pointing to this issue.
Previously, snappy Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components. And in case of incorrectly formed snappy block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.
This commit makes ingest endpoints check block size header based on MaxRequest Limits.
- Standardize all sections to use 'Recommended for:' instead of mixed
'For whom:' and 'Target audience:'
- Fix wording: 'Query evaluation is always local'
Addresses comments in #9919
vmalert tries to spread the moment group starts its evaluation
on `[0..group.interval]` duration. This approach allows to avoid
thundering herd problem when on vmalert start all groups execute their
rules simultaneously. It was introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/724
While for most configs it works great, for groups with big evaluation
intervals (30min, 60min) the first evaluation can be delayed
significantly.
This change introduces a start delay limit via new flag
`--group.maxStartDelay` (5m default).
It limits the `[0..group.interval]` start delay to
`[0..math.min(--group.maxStartDelay, group.interval)]`.
So all groups will start in first 5m or earlier.
The --group.maxStartDelay is ignored if user set `eval_offset`.
The 5m default limitation was picked high to not affect users with
relatively low evaluation intervals.
-----------
Based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9929
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmbackupmanager: enforce newline at the end of CLI result
Previously, vmbackupmanager only printed a response from API which did not include a newline character. That leads to issues with the rendering of the next command when using a shell.
Always append a newline character to avoid breaking shell formatting when using CLI mode.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update docs/victoriametrics/changelog/CHANGELOG.md
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
The load distribution could be uneven when short queries arrive to vmauth while a part of backends are busy
with long-running queries. In this case the major load goes to the backend after a row of busy backend.
Suppose we have four backends - b1, b2, b3 and b4. The first two backends are busy with bigger number
of long-running queries than b3 and b4. Then 75% of short queries will go to b3, while only 25%
of short queries will go to b4.
The new algorithm makes the distribution more even in these cases by storing the next backend
after the chosen backend as candidate for the next query (its' index is stored in the atomicCounter).
Avoid races when updating atomicCounter from concurrently executed queries by using CompareAndSwap() -
if the concurrent query updated it first then the current query won't overwrite it with the outdated value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9712
Data patterns considered:
- Same series, same date
- Same series, different dates
- Different series, same date
- Different series, different dates
To make sure that the pattern condition holds, a new storage instance is
started every benchmark iteration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9912
Previously, vmctl only accepted one label for filtering. Extend this to
allow providing multiple-filters at once. This is useful when migrating
large volumes of data as it allows narrowing down migration scope of
migration for one run so that the source side is not overwhelmed with
migration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9917/
Before, rules that didn't get evaluated yet were showing weird values in
vmalert's UI. It was happening because of
`time.Since(r.LastEvaluation).Seconds()` expression when
`r.LastEvaluation` had 0 value.
With this change, rules that weren't evaluated yet would show `Never` in
Updated column instead.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9924
This commit request revert the commit
d6bbfaf164 for the following reasons:
1. HTTP/2 carries security risks.
2. Most components in the VictoriaMetrics stack do not require HTTP/2
support.
3. While HTTP/2 support was available only as an option in previous
commit, there remains a potential risk of misusing this option and
enabling HTTP/2 inadvertently.
For components (e.g., VictoriaTraces) that require HTTP/2 support, they
should currently build an HTTP server manually with built-in packages,
instead of using `lib/httpserver` in VictoriaMetrics. If the mentioned
issue is resolved in the future and more components need HTTP/2, this
support can be reintroduced into `lib/httpserver`.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9927
It is a cosmetic change: it simplifies function signature by making it a
method of the Group struct.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Rationale: Having query stats logging enabled by default can greatly
help in investigating incidents.
Currently, it is disabled by default, so many users don’t enable it, and
when issues occur there are no stats available.
After discussion with the team, a 5s threshold was agreed upon as a
reasonable default to capture meaningful slow query data without
excessive logging.
This commit adds new RPC protocol for vminsert-vmstorage communication,
it acts in the same way as vmselect-vmstorage RPC.
It's implemented with new handshake hello methods in a backward
compatible way. Server attempts to parse RPC only if client send new
Hello message, while client fallbacks to the old Hello message if server
closes connection.
This change is need for the new metrics metadata forwarded from vminsert
into vmstorage.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
Changes extracted from PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9487
Previously, if a storage started with curr indexDB different from one
stored in nextDayMetricIDs cache file, the cache would still be loaded
into memory possibly affecting the next day prefill.
This is an unlikely case but it is still possible when:
- A programmer makes a mistake in the code and uses something else
instead of idbCurr.generation.
- Downgrading from pt-index to previous version
Related to #7599 and #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This change validates that QueryRange() method for prometheus datasource
receives response with `matrix` data type. It would throw an error
otherwise.
The change is needed to avoid confusions like in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9779.
The fix is not elegant, but it should be simple from code support
perspective. So each API has its own parsing function. Even if some
processing code is repeated.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Currently, the `httpserver` disabled HTTP/2 support by design, because:
```
// Disable http/2, since it doesn't give any advantages for VictoriaMetrics services.
```
As VictoriaLogs and VictoriaTraces rely on `httpserver`, in order to
support gRPC over HTTP/2, an option to support HTTP/2 is required.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9881
Previously all timeseries pushed into aggregators were added
sequentially. It could cause delays on data ingestion and it was not
possible to use all available.
This commit adds concurrency based on available CPU cores.
Also, it adds new generic Buffer and BufferPool into slicesutil.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9878
Previously a misleading random error could be logged for canceled and/or timed out requests to vmauth.
Consistently log the request timeout error for timed out requests.
While at it, do not log errors for requests canceled by the remote client, since such logs aren't actionable
and just pollute error logs generated by vmauth.
Introduce a new flag, which converts only metric names into Prometheus
compatible format. And keeps label names in original form.
It's needed to keep labels in original form, which
is useful for correlation with other telemetry sources, such as logs or
traces.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
AWS SDK does not modify custom http client configuration if it was provided. This leads to
additional configuration such as environment variables being ignored.
Use AWS http client builder instead of custom implementation and
override DialContext to preserve metrics exposed by custom transport.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9858
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Tracking original labels requires storing a copy of labels obtained
from service discovery. It adds extra Garbage Collection pressure and as
a result increased CPU usage.
While dropOriginalLabels has almost no impact at test and small
installations.
Impact grows with a scale. And especially is impactful at Kubernetes
based installations.
In addition, this flag is disabled by default for `k8s-stack` helm
chart, which is our main Kubernetes monitoring solution.
An also, we recommend at vmagent optimisation guide to disable original
labels storing.
This commit changes default value to true and disables tracking of
dropped targets by default. In case of debugging, it could be easily
enabled back by providing `false` value to the flag:
`promscrape.dropOriginalLabels`. It should improve resource usage out of
box by reducing user-experience for minority of users.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9665
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9772
TextField component has ability to show error message and depending on
it's presence text field height changes, which may cause visibility
issues if this field is vertically aligned with some neighbour
components. This PR makes textfield height constant and its input box
horizontally symmetrical
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9693
Currently, it is hard to make sense of progress based on logging as it
requires manual calculation of progress and ETA.
Solve this by:
- making data units humanly readable
- adding an estimation of completion for the operation
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v3.30.7</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.7 - 06 Oct 2025</h2>
<p>No user facing changes.</p>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.7/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.6</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.6 - 02 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.2. <a
href="https://redirect.github.com/github/codeql-action/pull/3168">#3168</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.6/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.5</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.5 - 26 Sep 2025</h2>
<ul>
<li>We fixed a bug that was introduced in <code>3.30.4</code> with
<code>upload-sarif</code> which resulted in files without a
<code>.sarif</code> extension not getting uploaded. <a
href="https://redirect.github.com/github/codeql-action/pull/3160">#3160</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.5/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.4</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.4 - 25 Sep 2025</h2>
<ul>
<li>We have improved the CodeQL Action's ability to validate that the
workflow it is used in does not use different versions of the CodeQL
Action for different workflow steps. Mixing different versions of the
CodeQL Action in the same workflow is unsupported and can lead to
unpredictable results. A warning will now be emitted from the
<code>codeql-action/init</code> step if different versions of the CodeQL
Action are detected in the workflow file. Additionally, an error will
now be thrown by the other CodeQL Action steps if they load a
configuration file that was generated by a different version of the
<code>codeql-action/init</code> step. <a
href="https://redirect.github.com/github/codeql-action/pull/3099">#3099</a>
and <a
href="https://redirect.github.com/github/codeql-action/pull/3100">#3100</a></li>
<li>We added support for reducing the size of dependency caches for Java
analyses, which will reduce cache usage and speed up workflows. This
will be enabled automatically at a later time. <a
href="https://redirect.github.com/github/codeql-action/pull/3107">#3107</a></li>
<li>You can now run the latest CodeQL nightly bundle by passing
<code>tools: nightly</code> to the <code>init</code> action. In general,
the nightly bundle is unstable and we only recommend running it when
directed by GitHub staff. <a
href="https://redirect.github.com/github/codeql-action/pull/3130">#3130</a></li>
<li>Update default CodeQL bundle version to 2.23.1. <a
href="https://redirect.github.com/github/codeql-action/pull/3118">#3118</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.4/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.3</h2>
<h1>CodeQL Action Changelog</h1>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>3.29.4 - 23 Jul 2025</h2>
<p>No user facing changes.</p>
<h2>3.29.3 - 21 Jul 2025</h2>
<p>No user facing changes.</p>
<h2>3.29.2 - 30 Jun 2025</h2>
<ul>
<li>Experimental: When the <code>quality-queries</code> input for the
<code>init</code> action is provided with an argument, separate
<code>.quality.sarif</code> files are produced and uploaded for each
language with the results of the specified queries. Do not use this in
production as it is part of an internal experiment and subject to change
at any time. <a
href="https://redirect.github.com/github/codeql-action/pull/2935">#2935</a></li>
</ul>
<h2>3.29.1 - 27 Jun 2025</h2>
<ul>
<li>Fix bug in PR analysis where user-provided <code>include</code>
query filter fails to exclude non-included queries. <a
href="https://redirect.github.com/github/codeql-action/pull/2938">#2938</a></li>
<li>Update default CodeQL bundle version to 2.22.1. <a
href="https://redirect.github.com/github/codeql-action/pull/2950">#2950</a></li>
</ul>
<h2>3.29.0 - 11 Jun 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.22.0. <a
href="https://redirect.github.com/github/codeql-action/pull/2925">#2925</a></li>
<li>Bump minimum CodeQL bundle version to 2.16.6. <a
href="https://redirect.github.com/github/codeql-action/pull/2912">#2912</a></li>
</ul>
<h2>3.28.21 - 28 July 2025</h2>
<p>No user facing changes.</p>
<h2>3.28.20 - 21 July 2025</h2>
<ul>
<li>Remove support for combining SARIF files from a single upload for
GHES 3.18, see <a
href="https://github.blog/changelog/2024-05-06-code-scanning-will-stop-combining-runs-from-a-single-upload/">the
changelog post</a>. <a
href="https://redirect.github.com/github/codeql-action/pull/2959">#2959</a></li>
</ul>
<h2>3.28.19 - 03 Jun 2025</h2>
<ul>
<li>The CodeQL Action no longer includes its own copy of the extractor
for the <code>actions</code> language, which is currently in public
preview.
The <code>actions</code> extractor has been included in the CodeQL CLI
since v2.20.6. If your workflow has enabled the <code>actions</code>
language <em>and</em> you have pinned
your <code>tools:</code> property to a specific version of the CodeQL
CLI earlier than v2.20.6, you will need to update to at least CodeQL
v2.20.6 or disable
<code>actions</code> analysis.</li>
<li>Update default CodeQL bundle version to 2.21.4. <a
href="https://redirect.github.com/github/codeql-action/pull/2910">#2910</a></li>
</ul>
<h2>3.28.18 - 16 May 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.21.3. <a
href="https://redirect.github.com/github/codeql-action/pull/2893">#2893</a></li>
<li>Skip validating SARIF produced by CodeQL for improved performance.
<a
href="https://redirect.github.com/github/codeql-action/pull/2894">#2894</a></li>
<li>The number of threads and amount of RAM used by CodeQL can now be
set via the <code>CODEQL_THREADS</code> and <code>CODEQL_RAM</code>
runner environment variables. If set, these environment variables
override the <code>threads</code> and <code>ram</code> inputs
respectively. <a
href="https://redirect.github.com/github/codeql-action/pull/2891">#2891</a></li>
</ul>
<h2>3.28.17 - 02 May 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.21.2. <a
href="https://redirect.github.com/github/codeql-action/pull/2872">#2872</a></li>
</ul>
<h2>3.28.16 - 23 Apr 2025</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="aac66ec793"><code>aac66ec</code></a>
Remove <code>update-proxy-release</code> workflow</li>
<li><a
href="91a63dc72c"><code>91a63dc</code></a>
Remove <code>undefined</code> values from results of
<code>unsafeEntriesInvariant</code></li>
<li><a
href="d25fa60a90"><code>d25fa60</code></a>
ESLint: Disable <code>no-unused-vars</code> for parameters starting with
<code>_</code></li>
<li><a
href="3adb1ff7b8"><code>3adb1ff</code></a>
Reorder supported tags in descending order</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This reverts commit 772ac8803e.
The reason for revert is that this recommendation should not be strict.
Installations with <= 1 vCPU will continue working efficiently. The load
from reads, writes and background merges will be evenly spread by Go runtime.
cc @Sleuth56 @tiny-pangolin
Before, `req.URL.Redacted` info was present in some error messages and
empty in others. This change uniformly adds it to the errors context.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- fixed ignored `search` query argument in `Notifiers` and `Rules` tabs
- added dropdown state reset, if other filters were updated and selected
state is not a subset of available items
- proxy requests to config.json for a local setup
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
While there, remove excessive relabeling info and point users to Routing section.
The Routing section should explain how to build flexible processing pipleines.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Reverts 17ca1ba8c4
Reason for reverts are following:
1. The fix relies on release candidates of specific libraries
2. The real fix would be to update Alpine version, which is not released yet
3. It makes the fix partially done, as it would require follow-up in future to
switch from release candidates to stable versions, or to update Alpine version.
4. The fix is not effective, as it doesn't update the base image cached by Docker.
The real fix will be to host&update the base image separately like in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9811.
5. VM binaries aren't vulnerable to mentioned vulnerabilites.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This reverts commit ccf97a4143.
reason for revert: this change may break tests, which expect that ServesMetrics.GetMetric() fails
when the given metric doesn't exist in the output.
It is better to add 'TryGetMetric() (float64, bool)' function, which would return '(0, false)'
when the given metric doesn't exist, so the caller could decide what to do next.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Add ad-hoc filters to query stats and operator dashboards.
These filters are useful for exploring non-uniform metrics sets
without distinct job/instance filters.
The previous text didn't contain links to vmagent's capabilities.
Instead, it contained misleading multitenancy-mode link that doesn't
seem to be related to the subject.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, `GetMetric` do `t.Fatalf` immediately when the target metric
not exist in `/metrics` page.
However, some metrics may start to appear after the process has been
running for a while. `t.Fatalf` invalidates the retry mechanism of
assertions, if the metric is not found the first time, the test case
will terminate.
This commit request changes `t.Fatalf` to `t.Logf` (instead of `t.Errorf`,
because error output may be considered a test case failure in some
scenarios).
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Follow-up for cea9505bab
fastcache.Cache allocates off-heap memory, which must be explicitly
returned back to the pool with Reset method call.
After changed made at commit above, during cache transit from whole to
split mode, it's possible that current cache is referenced by Cache.Get
or Cache.Call atomic pointers. It leads to potential memory leaks, since
we don't have any memory synchronization for atomic.Pointer.Store calls.
This commit adds `Finalizer` to the `fastcache.Cache` instances.
It properly releases memory, when cache is no reachable.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9769
Previously, cache state transition from split into whole could left
cache into broken state, if Reset cache method was called in switching
mode.
Also, cache Reset didn't start background workers and didn't change
cache size.
This commit properly check mode during cache transition. In addition,
it no longer stops background workers after whole mode transition and
always start workers during start-up.
Access to the prev, curr and mode Cache fields are properly locked
in order to mitigate possible race conditions.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9769
It seems like go compilator skipped computations and allocations for samples
as they weren't used afterwards. Sinking results into global variable removes
this optimizations and benchmark starts showing allocations within `pushSamples` fn.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Make SearchTSIDs look similar to SearchMetricNames, i.e. search for metricIDs within the method
- Make the corresponding corrupted index test look similar to one for metric names search
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Consistently use the `v0.0.0-YYYYMMDDHHMMSS-commit_hash` reference for
the internal deps such as `github.com/VictoriaMetrics/VictoriaMetrics`
dependency, since it allows referring any commit without waiting for the
release tag.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
add ability to limit available in datePicker dates using `minDate` and
`maxDate` parameters. all dates before `minDate` and after `maxDate`
cannot be picked. lower and upper bounds can be set independently.
This `minDate` and `maxDate` parameters aren't set by default in vmui.
The datepicker component with these params is re-used elsewhere.
The change also explciitly mentions `out-of-order` phrase, as it is commonly
used in Prometheus ecosystem.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Searching metricName by metricID happens many times during a single API
call. This requires getting the current set of idbs before those calls
happen. Which is fine but requires propagating idbs across the code
base. This is also fine in case of OSS version as it is used in Search
only.
Propagating idbs across the code base becomes a problem in Enterprise
version as it is used in at least 3 places. As a result it becomes very
difficult to merge things from OSS to Ent.
Localizing the all the dependencies in one searchMetricName type and
reusing this type everywhere should make things simpler.
Related enterprise changes:
https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/compare/search-metric-name-ent?expand=1
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9756
A small refactoring that reduces Search dependency on Storage:
- Move searchTSIDs() from Search to Storage because this method does not
depend on anything Search-specific but does depend on Storage.
- Use metricsTracker instead of storage.metricTracker.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9754
Benchmarking storage search api requires taking into account many
parameters, such as:
- data configuration: how many series, deleted series, search time range
- where the index data recides: prev and or indexDB
- which search operation to measure
While adding a new benchmark use case involves a lot boilerplate code.
This pr implements a framework for testing storage search ops that can
be relatively easily extended. This come in expecially handy when adding
new cases for parition index.
The current set of params will result of a lot of benchmarks to be run
which most probably does not make sense because:
- it will take a lot of time and
- the output data is hard to compare manually.
However, these benchmarks are very useful when only small set of params
is of interest. For example, if I want to compare the search of 100k
metric names when the index data resides in prevOnly, currOnly or
prevAndCurr indexDBs. This would translate in the following cmd:
```shell
go test ./lib/storage --loggerLevel=ERROR -run=^$ -bench=^BenchmarkSearch/MetricNames/.*/VariableSeries/100000$
```
Why this change:
- I often need to run benchmarks with configs that I did not have
before, requires either modifying the existing one or writing a new one.
It is easy to get lost and make benchmark non-comparable
- I need some way to make legacy and pt index benchmarks comparable
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
- state that it is unsafe to use lifecycle rules and describe the reason
- update formatting according latest changes in docs
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
- Rename copyStream to copyStreamToClient in order to make it more clear
that the stream must be copied from backend to client.
- Make sure that the client implements net/http.Flusher interface.
It is a programming error (BUG) if the client passed to copyStreamToClient
doesn't implement net/http.Flusher interface.
- Do not write zero-length data to the backend.
Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/667
1beb629b removed logic which was used in order to keep full backup
location path in the restore mark file. Because of this, backups created
with a shortname (e.g. `vmbackupmanager restore create
daily/2025-09-12`) will fail as backup location is not prepended.
Fix that by properly constructing full backup name from parsed canonical
values.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously, vmagent always set enable.auto.commit to false and manually
commited messages. It adds additional pressure to the kafka brokers and could slow down
data consumption.
This commit allows vmagent to skip manual commit and use auto-commit
based on provided configuration. Which may improve message read throughput.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/931
### Describe Your Changes
This PR introduces a `make docs-update-flags` command that updates flags
in the documentation using the actual binaries compiled from the latest
`enterprise-single-node` and `enterprise-cluster` branches (hardcoded
for now). The command also normalizes the output format.
It can be run from any branch. All work happens inside temporary
directories under /tmp. The script checks out the required branch,
builds the binaries, and updates the documentation. The current Git
repository is not touched.
The command adjusts default values to more meaningful ones, such as
changing `-maxConcurrentInserts` (default 20) to (default
2*cgroup.AvailableCPUs()).
Currently the logic is implemented only for vminsert, vmstorage,
vmselect, vmagent, vmalert, and victoria-metrics (aka single).
The goal is to make it easy to keep documentation synchronized with real
binaries
_**Note:** Please ignore xxx_flags.md files for now. Review flags in
`README.md` and `Cluster-VictoriaMetrics.md`, and `vmagent.md`,
`vmalert.md` only. Once we agree on the changes in those files, I'll
replace the flags with the `{{% content "xxx_flags.md" %}}`._
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
* stress on requirement to have empty destination folder for copying;
* remove extra verbosity from docs;
* remove list vmctl migration options as they became unsynced. Instead of syncing,
refer to the vmctl docs;
* fix typos.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The application version can be then displayed in the vmui. Showing the
application version in vmui should make it easier to determine currently
used VM version (at least vmselect version).
------------
@Loori-R it would be could to add the app version in vmui in a follow-up
PR or by pushing a commit to this branch.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
`Table` component:
- add `format` property for table column, which allows to apply custom
formatting depending on column type
- add `rowClasses` table property, that allows to pass function that
allows to customize row css class depending on row value
- add `rowAction` table property, that allows to execute action while
clicking on table row
`Popper` component:
- add `classes` to specify additional CSS classes for popper to
differentiate from other poppers, since it's mounted to a DOM root
`Switch` component:
- use gap instead of left-margin
`DateTimeInput` component:
- add `dateOnly` property to allow accepting only date in the input
additional fixes:
- fix TopQuery header fields alignment
<img width="1279" height="125" alt="image"
src="https://github.com/user-attachments/assets/08ad4dbc-19e5-47f5-9ccd-a9fb222335a4"
/>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
Set rateEnabled to false for probe_success in VMUI
Fix issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9655
Problem:
probe_success is incorrectly initialized with rateEnabled = true because
the regex detecting counters (/_sum?|_total?|_count?/) matches partial
strings like _su. This causes probe_success (a gauge) to be treated as a
counter, producing slightly misleading graphs. For example, when
rateEnabled is set to true, probe_success often shows as 0 in VMUI when
the probe is actually succeding.
It is not intuative for users to have to disable rateEnabled manually
just to get the correct value for probe_success in VMUI.
Solution:
Update the regex to strictly match suffixes:
`/_sum$|_total$|_count$/`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: William Wren <william.wren@ericsson.com>
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This is needed for VictoriaLogs, which allows limiting query results with the given set of extra filters
specified via extra_filters query arg. The request url can contain multiple extra_filters query args -
they are all applied with AND logic to the query. See https://docs.victoriametrics.com/victorialogs/querying/#extra-filters
The merge_query_args option at vmauth allows merging the extra_filters provided by the client
(such as Grafana plugin for VictoriaLogs or built-in web UI) with the extra_filters specified in the backend
url at vmauth config.
This is needed for https://github.com/VictoriaMetrics/VictoriaLogs/issues/106
`workingsetcache` is built on top of two
[fastcache](https://github.com/VictoriaMetrics/fastcache) instances
(curr and prev) that are rotated periodically (configurable via
`-cacheExpireDuration` flag). During the rotation curr becomes prev and
prev is discarded, new curr is an empty. If an entry is not found in
curr then the prev cache is checked, and if the entry is found there it
is copied to curr.
`workingsetcache` also exports metrics, such as `EntriesCount`,
`GetCalls`, `SetCalls`, and `Misses` counts. These metrics are currently
implemented as the sum of the same metrics in prev and curr `fastcache`
instances. Given to rotation logic, these counts can be incorrect:
1. `EntriesCount`. It is the sum of prev and curr entry counts. If an
entry is not found in curr and found in prev (and therefore is copied
from prev to curr) the resulting entry count will be incorrect, i.e. it
will count copied entries two times.
2. `GetCalls`. It is the sum of prev and curr get calls. If an entry is
not found in curr the logic will attempt to retrieve it from prev, which
will result in double counting. While it is actually one get call to
`workingsetcache`.
3. `SetCalls`. It is the sum of prev and curr get calls. If an entry is
not found in curr but found in prev it will be copied to curr resulting
in a set call to curr. While from the `workingsetcache` perspective
there hasn't been any set operation at all.
4. `Misses`. It is the sum of prev and curr misses. If an etry is not
found in curr, it is recorded as a miss. If it is then found in prev,
the entry is returned to the caller, but that cache miss remains. If it
is not found in prev, then there will be 2 misses for 1
`worksingsetcache` get call.
This PR introduces `GetCalls`, `SetCalls`, and `Misses` counts at the
`workingsetcache` level in order to count the calls correctly. It also
excludes duplicates from `EntriesCount`.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
### Describe Your Changes
Implemented the script that generates graphs using `gnuplot`.
Those graphs show the write speed to the db.
How to use it:
1. From the root run `make tsbs`;
2. The file will be generated automatically
`/tmp/tsbs-load-100000-2025-07-22T00:00:00Z-2025-07-23T00:00:00Z-80s.csv`
4. From the root run `make tsbs-plot-load` and observe the result
5. If you have two files with the `tsbs_load_victoriametrics` output,
just define the second in the
`TSBS_LOAD_RESULT_CSV_FILE_COMPARE=/tmp/tsbs-load-10
0000-2025-07-22T01:00:00Z-2025-07-23T01:00:00Z-80s.csv
`
To plot the measurements from some other benchmark, run
`make tsbs-plot-load TSBS_LOAD_RESULT_CSV_FILE=/path/to/file.csv`
To plot the measurements from two benchmarks, run
`make tsbs-plot-load TSBS_LOAD_RESULT_CSV_FILE=/path/to/file1.csv
TSBS_LOAD_RESULT_CSV_FILE_COMPARE=/path/to/file2.csv`
This command should generate a graph like described in the picture
<img width="638" height="578" alt="Screenshot 2025-07-25 at 15 35 42"
src="https://github.com/user-attachments/assets/900b05ab-0b98-4f7f-8f2c-18d28ad2eab6"
/>
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
This helps to improve readability of changes, so users
can see more important changes first, and see changes related
to the same component one after another.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Previously mock storage `net.Listen("tcp", …)` could succeed even if
another process was bound to the same port, due to dual-stack behavior
(`[::]:port` vs `0.0.0.0:port`). That lead to strange test results that
hard to bound to port misuse. Tests queried not mock server but whatever
was running on that port.
Switched to `"tcp4"` to ensure conflicts are detected correctly.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
As there are quite a few files, and each file might have multiple
changes and to make it easily to review, i limited the PR to 5 files at
a time.
I suggest you take a look at markdownlint and add it as part of your CI,
similar to
https://github.com/MicrosoftDocs/PowerShell-Docs/blob/main/.markdownlint.yaml
And while at it, take a look at cspell and how its used in thier repo
and replace the python one you have in your current implementation -
might open a PR with it after all the fixes PRs).
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This allows performing a single MustFsyncPath() for the parent directory after multiple calls to these functions.
This clarifies code paths, which call these functions, and makes them more maintainable.
This also removes a redundant fsync() call for the parent directory when creating a file-based part.
Previously the first fsync() was indirectly called when the directory was created via MustMkdirFailIfExist()
and the second fsync() was called via MustSyncPathAndParentDir() after all the data is written to the part.
The fs.MustWriteSync() already fsyncs the created file, so there is no need in additional fsync() call.
While at it, add missing fsync for the parent directory after creating a directory for persistent queue.
The source file contents should be already fsynced to disk before creating a hard link,
so there is no sense in calling fsync() on the created hard link.
This commit ensures that the -search.maxQueryLen flag applies to Graphite
queries, matching the behavior already present for Prometheus queries.
Previously, Graphite queries could bypass this limit, creating an
inconsistency and a potential vector for resource exhaustion.
Key changes:
Added getMaxQueryLen() to access the global query length limit.
Enforced query length validation in execExpr() for Graphite queries.
Added comprehensive tests for the new validation logic and edge cases.
Error messages are consistent with Prometheus query validation.
The default limit is 16KB (configurable via -search.maxQueryLen).
Setting the limit to 0 disables validation.
This change closes the gap where Graphite queries could exceed
configured length limits, providing consistent protection against
excessively long queries across both query APIs.
Follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9534
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9600
The vm_deleted_metrics_total metric value represents the number of
metricIDs stored in deletedMetricIDs cache. This cache lives at the
storage level and stores the deleted metrics from both prev and curr
idbs. However, the metric is populated at the idb level. Since there are
always 2 idbs (prev and curr), the value is populated twice. Hence the
doubled value of the metric.
The fix is to populate the metric value at the storage level.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9602
- load and parse static`/vmui/config.json`, modify it according to
runtime values and use it as a replacement for static config.json
- remove using `/flags` endpoint for checking features, that should be
enabled on VMUI
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9635
`router.home` represents `/` path, which is the same for all UI apps,
but content and title for root path differs depending on application
type. added `getDefaultOptions` function, which returns proper home
route configuration depending on application type, which allows to
remove renamings in respective layouts
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9641
The commit
25cd5637bc
introduced the `-enableMetadata` flag and the
`promscrape.IsMetadataEnabled()` function, which is now used in multiple
places, including the `app/vminsert/prometheusimport` [request
handler](b24b76ff08/app/vminsert/prometheusimport/request_handler.go (L36)).
Because of the use of `promscrape` package vminsert registered all
`-promscrape.*` service discovery flags, which were not relevant for
`vminsert`.
This change moves the metadata flag logic into a dedicated package,
preventing vminsert from unintentionally loading unrelated promscrape
flags.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9631
This is an attempt to adjust image styles to GitHub themes, because
existing images with transparent backround become unreadable on dark theme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This is an attempt to adjust image styles to GitHub themes, because
existing images with transparent backround become unreadable on dark theme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
When having a `match` of `__name__` key alone for labels api, it's going
to hit max series limit in case of high cardinality metric name.
Instead, we can skip looking by `metricIDs` and fallback to inverted
index scan with a `composite key` since we only have some `__name__` and
a label name.
Common requests for optimisations are:
1) /api/v1/labels?match=up or /api/v1/labels?extra_filters=up
2) /api/v1/label/job/values?match=up or /api/v1/labels?extra_filters =up
It's widely used by grafana variables.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9489
a.Subtract(b) perfomance degrades as b becomes bigger than a. For
example if len(b2) == 10xlen(b1) then time(a.Subtract(b2)) == 10x
time(a.Subtract(b1)).
A quick fix is to iterate over a elements in len(b) > len(a). Iterating
over a's elements and at the same time deleting should be safe since no
elements are actually deleted (i.e. memory freed, etc). Deletion here
means setting a corresponding bit from 1 to 0.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9602
### Describe Your Changes
Add the support of all standard TSDB query types that can be executed
against VictoriaMetrics. `double-groupby-all` is commented out as it
attempts to retrieve all 1B samples and fails. While this can be fixed
by setting the `-search.maxSamplesPerQuery` this query is left disabled
anyway because it will consume way too much memory and cpu time.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
New benchmarks for storage search (data and index):
- Use the same dataset that accounts for prev and curr indexDBs and
deleted series
- The code is more structured
- Account for various numbers of series in response including higher
numbers (>10k) as this appears to be a quite common use case.
These bechmarks were used for investigating #9602 performance issue and
helped discover that prefetching metric names needed to be restored
#9619.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Some messages were written to `stdout` using `fmt.Printf` and
`fmt.Println`, while the other messages like import statistics were
written to `stderr` through the `log` package.
This led to ordering problems where the `Import finished!` +
`VictoriaMetrics importer stats` messages, which expected to be the last
messages, appeared before `Continue import process with filter`
messages, creating confusing output for users.
```
2025/08/20 13:07:26 Import finished!
2025/08/20 13:07:26 VictoriaMetrics importer stats:
time spent while importing: 20h49m10.8497184s;
total bytes: 277.1 GB;
bytes/s: 3.7 MB;
requests: 7978614;
requests retries: 0;
2025/08/20 13:07:26 Total time: 20h49m10.851006088s
Continue import process with filter
filter: match[]={__name__!=""}
start: 2025-08-08T00:00:00Z
end: 2025-08-15T00:00:00Z:
Continue import process with filter
filter: match[]={__name__!=""}
start: 2025-08-15T00:00:00Z
end: 2025-08-19T16:18:15Z:
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
It seems db39f045e1 accidentally reverted
#9419 changes.
```patch
--- a/app/vmagent/remotewrite/client.go
+++ b/app/vmagent/remotewrite/client.go
@@ -448,7 +448,8 @@ again:
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
- if statusCode == 409 {
+ switch statusCode {
+ case 409:
logBlockRejected(block, c.sanitizedURL, resp)
// Just drop block on 409 status code like Prometheus does.
@@ -461,7 +462,13 @@ again:
// - Remote Write v2 specification explicitly specifies a `415 Unsupported Media Type` for unsupported encodings.
// - Real-world implementations of v1 use both 400 and 415 status codes.
// See more in research: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054
- } else if statusCode == 415 || statusCode == 400 {
+ case 415, 400:
+ if c.canDowngradeVMProto.Swap(false) {
+ logger.Infof("received unsupported media type or bad request from remote storage at %q. Downgrading protocol from VictoriaMetrics to Prometheus remote write for all future requests. "+
+ "See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol", c.sanitizedURL)
+ c.useVMProto.Store(false)
+ }
+
if encoding.IsZstd(block) {
logger.Infof("received unsupported media type or bad request from remote storage at %q. Re-packing the block to Prometheus remote write and retrying."+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol", c.sanitizedURL)
```
cc @makasim
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Previously, if pushBlockPubSub function returned error, vmagent stopped
remote write worker thread assigned for it. Expected behavior for this
scenario is to retry error inside pushBlockPubSub function. It must
return only on vmagent shutdown.
This commit properly handles this error and prevents from ingestion
stop.
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol=flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey=flagutil.NewPassword("configAuthKey","Authorization key for accessing /config page. It must be passed via authKey query arg. It overrides -httpAuth.*")
configAuthKey=flagutil.NewPassword("configAuthKey","Authorization key for accessing /config and /remotewrite-.*-config pages. It must be passed via authKey query arg. It overrides -httpAuth.*")
reloadAuthKey=flagutil.NewPassword("reloadAuthKey","Auth key for /-/reload http endpoint. It must be passed via authKey query arg. It overrides -httpAuth.*")
dryRun=flag.Bool("dryRun",false,"Whether to check config files without running vmagent. The following files are checked: "+
// - Real-world implementations of v1 use both 400 and 415 status codes.
// See more in research: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054
case415,400:
ifc.canDowngradeVMProto.Swap(false){
logger.Infof("received unsupported media type or bad request from remote storage at %q. Downgrading protocol from VictoriaMetrics to Prometheus remote write for all future requests. "+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol",c.sanitizedURL)
c.useVMProto.Store(false)
}
ifencoding.IsZstd(block){
logger.Infof("received unsupported media type or bad request from remote storage at %q. Re-packing the block to Prometheus remote write and retrying."+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol",c.sanitizedURL)
streamAggrGlobalConfig=flag.String("streamAggr.config","","Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -streamAggr.keepInput, -streamAggr.dropInput and -streamAggr.dedupInterval")
streamAggrGlobalKeepInput=flag.Bool("streamAggr.keepInput",false,"Whether to keep all the input samples after the aggregation "+
"with -streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to remote storages write. See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDropInput=flag.Bool("streamAggr.dropInput",false,"Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to remote storages write. See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalKeepInput=flag.Bool("streamAggr.keepInput",false,"Whether to keep input samples that match any rule in "+
"-streamAggr.config. By default, matched raw samples are aggregated and dropped, while unmatched samples "+
"are written to the remote storage. See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDropInput=flag.Bool("streamAggr.dropInput",false,"Whether to drop input samples that not matching any rule in "+
"-streamAggr.config. By default, only matched raw samples are dropped, while unmatched samples "+
"are written to the remote storage. See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDedupInterval=flag.Duration("streamAggr.dedupInterval",0,"Input samples are de-duplicated with this interval on "+
"aggregator before optional aggregation with -streamAggr.config . "+
"See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
@@ -43,11 +43,11 @@ var (
streamAggrConfig=flagutil.NewArrayString("remoteWrite.streamAggr.config","Optional path to file with stream aggregation config for the corresponding -remoteWrite.url. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -remoteWrite.streamAggr.keepInput, -remoteWrite.streamAggr.dropInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrDropInput=flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput","Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. By default, only aggregates samples are dropped, while the remaining samples "+
streamAggrDropInput=flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput","Whether to drop input samples that not matching any rule in "+
"the corresponding -remoteWrite.streamAggr.config. By default, only matched raw samples are dropped, while unmatched samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrKeepInput=flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput","Whether to keep all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. By default, only aggregates samples are dropped, while the remaining samples "+
streamAggrKeepInput=flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput","Whether to keep input samples that match any rule in "+
"the corresponding -remoteWrite.streamAggr.config. By default, matched raw samples are aggregated and dropped, while unmatched samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDedupInterval=flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval",0,"Input samples are de-duplicated with this interval before optional aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
@@ -77,14 +76,13 @@ absolute path to all .tpl files in root.
`Link to VMUI: -external.alert.source='vmui/#/?g0.expr={{.Expr|queryEscape}}'. `+
`If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.`)
externalLabels=flagutil.NewArrayString("external.label","Optional label in the form 'Name=value' to add to all generated recording rules and alerts. "+
"In case of conflicts, original labels are kept with prefix `exported_`.")
"In case of conflicts, original labels are kept with prefix 'exported_'.")
dryRun=flag.Bool("dryRun",false,"Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.")
// targets with same address but different alert_relabel_configs are still considered duplicates since it's mostly due to misconfiguration and could cause duplicated notifications.
if_,ok:=duplicates[u];ok{
if!*suppressDuplicateTargetErrors{
logger.Errorf("skipping duplicate target with identical address %q; "+
"make sure service discovery and relabeling is set up properly; "+
"invalid_label":`error evaluating template: template: :1:268: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
"invalid_label":`error evaluating template: template: :1:268: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
ruleResultsLimit=flag.Int("rule.resultsLimit",0,"Limits the number of alerts or recording results a single rule can produce. "+
"Can be overridden by the limit option under group if specified. "+
"If exceeded, the rule will be marked with an error and all its results will be discarded. "+
"0 means no limit.")
ruleUpdateEntriesLimit=flag.Int("rule.updateEntriesLimit",20,"Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param.")
resendDelay=flag.Duration("rule.resendDelay",0,"MiniMum amount of time to wait before resending an alert to notifier.")
@@ -36,6 +38,8 @@ var (
disableAlertGroupLabel=flag.Bool("disableAlertgroupLabel",false,"Whether to disable adding group's Name as label to generated alerts and time series.")
remoteReadLookBack=flag.Duration("remoteRead.lookback",time.Hour,"Lookback defines how far to look into past for alerts timeseries. "+
"For example, if lookback=1h then range from now() to now()-1h will be scanned.")
maxStartDelay=flag.Duration("group.maxStartDelay",5*time.Minute,"Defines the max delay before starting the group evaluation. Group's start is artificially delayed for random duration on interval"+
" [0..min(--group.maxStartDelay, group.interval)]. This helps smoothing out the load on the configured datasource, so evaluations aren't executed too close to each other.")
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" title="The time when the rule was executed">Updated at</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
{% if seriesFetchedEnabled %}<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="The time used in execution query request">Execution timestamp</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>
</thead>
@@ -649,7 +658,7 @@
<span class="badge bg-warning text-dark" title="This firing state is kept because of `keep_firing_for`">stabilizing</span>
{% endfunc %}
{% func seriesFetchedWarn(prefix string, r apiRule) %}
{% func seriesFetchedWarn(prefix string, r rule.ApiRule) %}
"See https://docs.victoriametrics.com/victoriametrics/vmauth/#load-balancing for details")
defaultLoadBalancingPolicy=flag.String("loadBalancingPolicy","least_loaded","The default load balancing policy to use for backend urls specified inside url_prefix section. "+
"Supported policies: least_loaded, first_available. See https://docs.victoriametrics.com/victoriametrics/vmauth/#load-balancing")
defaultMergeQueryArgs=flagutil.NewArrayString("mergeQueryArgs","An optional list of client query arg names, which must be merged with args at backend urls. "+
"The rest of client query args are replaced by the corresponding query args from backend urls for security reasons; "+
"see https://docs.victoriametrics.com/victoriametrics/vmauth/#query-args-handling")
discoverBackendIPsGlobal=flag.Bool("discoverBackendIPs",false,"Whether to discover backend IPs via periodic DNS queries to hostnames specified in url_prefix. "+
"This may be useful when url_prefix points to a hostname with dynamically scaled instances behind it. See https://docs.victoriametrics.com/victoriametrics/vmauth/#discovering-backend-ips")
discoverBackendIPsInterval=flag.Duration("discoverBackendIPsInterval",10*time.Second,"The interval for re-discovering backend IPs if -discoverBackendIPs command-line flag is set. "+
streamAggrConfig=flag.String("streamAggr.config","","Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -streamAggr.keepInput, -streamAggr.dropInput and -streamAggr.dedupInterval")
streamAggrKeepInput=flag.Bool("streamAggr.keepInput",false,"Whether to keep all the input samples after the aggregation with -streamAggr.config. "+
"By default, only aggregated samples are dropped, while the remaining samples are stored in the database. "+
streamAggrKeepInput=flag.Bool("streamAggr.keepInput",false,"Whether to keep input samples that match any rule in -streamAggr.config. "+
"By default, matched raw samples are aggregated and dropped, while unmatched samples are written to the remote storage. "+
"See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDropInput=flag.Bool("streamAggr.dropInput",false,"Whether to drop all the input samples after the aggregation with -streamAggr.config. "+
"By default, only aggregated samples are dropped, while the remaining samples are stored in the database."+
streamAggrDropInput=flag.Bool("streamAggr.dropInput",false,"Whether to drop input samples that not matching any rule in -streamAggr.config. "+
"By default, only matched raw samples are dropped, while unmatched samples are written to the remote storage."+
"See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDedupInterval=flag.Duration("streamAggr.dedupInterval",0,"Input samples are de-duplicated with this interval before optional aggregation with -streamAggr.config . "+
"See also -streamAggr.dropInputLabels and -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
logger.Errorf("resource leak when processing the %s (full query: %s); please report this error to VictoriaMetrics developers",
expr.AppendString(nil),ec.originalQuery)
}
timerpool.Put(t)
returnnil
})
close(seriesCh)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.