This is follow-up for c5713a09d3
Originally, dateMetricID cache was fully rotate every 20 minutes. It
made daily-index pre-creation less efficient and caused CPU usage spikes
for index records lookup at midnight.
storage pre-fills index records for the next day in 1 hour before night.
But this rotation made only last 20 minutes before midnight visible in
the cache.
This commit changes rotation period from 20 minutes to 2 hours ( 1 hour
tick interval). While
it could slighlty increase cache memory usage ( in practice it shouldn't
be noticeable). It prevents from CPU usage spikes.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10064
Issue was introduced at d6ef8a807b commit.
Due to variable shadowing, if filter matched more than 100_000 metricIDs, it's fallback to the indexDB scan.
But because of type, `filter` value was not properly updated. And it triggered incorrect results.
This commit fixes this typo and adds test to verify this case.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10294
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10196
Prefer the non StaleNaN value when both StaleNaN and non-StaleNaN samples share the timestamp during deduplication(downsampling). The scenario can occur when:
1. Multiple vmagent instances scrape the same target(without -promscrape.cluster.name flag), one instance fails to scrape due to issues such as network, while others succeed.
2. Multiple vmalert instances evaluate the same recording rule, with one instance receiving a partial response while others receive a complete response.
In both cases, since the samples share the same timestamp and represent the metric state at that moment,
the non-StaleNaN value is entirely valid, whereas the StaleNaN could be caused by other unknown issues.
Therefore, it is reasonable to prioritize the non-StaleNaN value.
Previously, the first 20 raw samples were used for calculation.
But compare to the first 20 samples, the last 20 samples represent the latest state of the metrics,
so the lookbehind window calculated from them should be more accurate when applied to the most recent samples,
resulting in better query results for recent time ranges.
For example,if the scrape interval changes at day4, and the query range is set to last 7 days.
Applying the window derived from the first 20 samples(the old scrape interval) to new samples could result in consistently incorrect results from day4 through day11.
Conversely, applying the window derived from the last 20 samples (the new scrape interval) could lead to incorrect results for [day0-day4),
which are old states and generally less important.
This pull request does not address any specific bug, but change the general behavior, so there is no changelog.
Inspired by https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280, but not the fix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280.
This commit reduces default value for
`storage.vminsertConnsShutdownDuration` flag from `25s` to `10s`
seconds.
This change should help to reduce probability of ungraceful storage
shutdown at Kubernetes based environments, which has 30 seconds default
graceful termination period value (terminationGracePeriodSeconds).
Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10063
* Fixed a heatmap crash that happened when all visible cells had the
same value (division by zero produced invalid color indices).
* Improved how histogram buckets are chosen for display when the data is
very sparse, so the heatmap doesn’t look empty or drop the only
meaningful bucket.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10240
Without measuring this, we have a blind spot. Exposing it as a metric
improves visibility and should save time during future debugging
sessions.
Inspired by review commit
c9596a0364 (r173621968)
**Problem**
* VMUI had two tenant ID sources:
* URL path: `/select/<accountID>/vmui/`
* Query param: `tenantID`
* These could differ, causing confusion and inconsistent behavior.
**Solution**
* Removed the legacy `tenantID` query parameter.
* Use the URL path as the single source of truth for tenant ID.
* Changing the tenant in the UI now updates the URL path.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10232
### Describe Your Changes
Previously, we had to manually update flags in documentation whenever we
made flag-related changes in source code. Someone did it by hand, others
compiled enterprise binaries, executed them with `-help` flag, and
copied and pasted output to the documentation.
In https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9632 a
command `make docs-update-flags` was introduced. It automates the whole
process. It compiles binaries, runs `-help,` and syncs output to changes
automatically.
Now, we can **omit updating doc flags in the PR** and do it once before
releasing a new version.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
In alerting rules, annotations are only attached to alert messages that
are sent to the notifier (such as Alertmanager). These annotations
typically contain human-readable information, such as instructions for
resolving the alert.
In [replay
mode](https://docs.victoriametrics.com/victoriametrics/vmalert/#rules-backfilling),
vmalert does not send alert messages to the notifier at all(no notifier
is configured), as these alerts are outdated. Therefore, it does not
need to template the annotations in this mode.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10262
which tracks the number of requests to the query rollup cache
As described in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10117, when
retrieving cached data from the rollup result cache, there can be mixed
`get()` and `getBig()` calls to the underlaying fastcache. And it's
unpredictable how many times `getBig()` will call `get()`, so the
metrics from fastcache cannot be used to indicate query cache miss
ratio.
Exposing a new counter `vm_rollup_result_cache_requests_total` to track
the number of requests to the query rollup cache, together with the
existing `vm_rollup_result_cache_miss_total`, allows for monitoring the
rollup cache miss rate per query (or subquery), which is more
user-facing.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10117
related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5056
Previously VictoriaMetrics: Single-node version used
`http://<victoriametrics-addr>:8428/zabbixconnector/api/v1/history`
resulted in a missing path error. The issue was introduced during changes back-porting from vmagent.
Additionally, the http response was fixed. Zabbix expects a 200 status
code during normal operation.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10214
Since the cache may be reset too often, using the sizeBytes as an
indicator that this is the first met indexDB to collect tfssCache stats
is unreliable because it often can be zero all indexDB instances. Use
Requests metric instead because it is never reset.
Follow-up for #10204.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This fixes the following corner case: if all instances of a cache have
zero size, the stats won't be set at all. This results in some weird
graphs if the cache is reset very often (such as tfssCache): the cache
sizeMaxBytes alternates between the actual value and zero.
Follow-up for f62893c151
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Commit 5a587f2006 was not properly ported
to the single node branch. Since single node is able to perform both
promscrape and self-scrape, it's required to add metadata add methods to
those paths.
This commit fixes missing metadata add to the storage.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10175