This change introduces a helper `MustStartDefaultRWVmagent` that by
default sets `-remoteWrite.flushInterval=50ms`. This helper makes it
easier to setup RW tests as all of them rely on frequent flushes. So
instead of overloading the flag, we can use dedicated helper for that.
This helper was added after newly added RW test became flaky because it
didn't have `-remoteWrite.flushInterval=50ms` set.
---------
Failing test
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/25446725004/job/74769752869#step:5:71
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit adds possibility to omit tenantID in the URL path. In this case,
tenantID will be fetched from HTTP headers `AccountID` and `ProjectID`.
If headers are missing too, then default `0:0` tenantID is used.
This functionality can be enabled only if -enableMultitenantHandlers
cmd-line flag was set to vminsert, vmselect or vmagent.
Motivation: this change makes VM configuration for multienancy
consistent with VL configuration - see
https://docs.victoriametrics.com/victorialogs/#multitenancy. And keeps
backward compatibility in the same time.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4241
Add the support of storage and retrieval of samples with future
timestamps as requested in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/827
What to expect:
- By default, the max future timestamp is still limited to `now+2d`. To
change it, set the `-futureRetention` flag in `vmstorage`. The max flag
value is currently limited to `100y`. It can be extended if we see a
demand for this, but it can't be more than `~ 290y` due to how the time
duration is implemented in Go. The flag value can't be less than `2d`.
- downsampling and retention filters (available in enterprise edition)
are currently not supported for future timestamps
- If `vmstorage` restarts with a smaller value of `-futureRetention`
flag, any future partitions that are outside the new future retention
will be automatically deleted.
- Data ingestion, data retrieval, backup/restore, timeseries (soft)
deletion, and other operations work with future timestamps the same way
as with the historical timestamps.
- In the cluster version, the affected binaries are `vmstorage` and
`vmselect`. This means that `vmselect` version must match `vmstorage`
version if you want to query future timestamps. `vminsert` was not
affected, so its version can be a lower one.
- If you downgrade the `vmstorage`, the data with future timestamps will
remain on disk and memory (per-partition caches) but won't be available
for querying.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Previously
- `GetData` in the OpenTSDB client was returning empty `Metric{}` with
`nil` error for several conditions (multiple series returned, aggregate
tags present, `modifyData` failures), causing `vmctl opentsdb` to
silently drop series during migration
This commit changes these silent return paths to return proper errors with
descriptive messages including the query string, so operators can detect
and diagnose partial migrations.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10797
Fix app tests:
1. Sync code between vmsingle and vmcluster: it must be the same because
apptest does not differentiate between branches, it just runs pre-built
binaries
2. Simplify range queries in backup/restore test so that it does not
depend on the interval between samples to work correctly.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, vmselect in cluster-native mode could return partial responses to upstream vmselect.
Since upstream vmselect expects full responses (mimicking vmstorage behavior),
partial responses must be disabled in cluster-native mode.
This prevents incomplete responses from being cached at the upstream vmselect level.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10678
Automatically set daily and hourly series limits to `MaxInt32` when `remoteWrite.maxHourlySeries` or `remoteWrite.maxDailySeries` is set to `-1`.
This change addresses a usability issue with the cardinality limiter. Users may want to enable the limiter to observe its metrics before deciding on an appropriate limit. However, the underlying bloom filter only supports `int32`, so setting large values can lead to overflow.
With this PR:
* Setting either flag to `-1` is treated as “no practical limit” and internally mapped to `math.MaxInt32`
* Values exceeding `int32` are safely clamped to `MaxInt32` to prevent overflow
This allows users to enable the limiter for estimation purposes without risking invalid configurations or runtime issues.
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9614
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Implemented dedicated thanos migration mode for vmctl to migrate data from Thanos installations to VictoriaMetrics.
Key features:
1. Raw and downsampled blocks support: Reads both raw blocks
(resolution=0) and downsampled blocks (5m/1h resolution) directly from
Thanos snapshots
2. All aggregate types: Imports count, sum, min, max, and counter
aggregates from downsampled blocks as separate metrics with resolution
and type suffixes (e.g., metric_name:5m:count)
3. Dedicated flags: Uses `--thanos-*` prefixed flags (--thanos-snapshot,
--thanos-concurrency, --thanos-filter-time-start,
--thanos-filter-time-end, --thanos-filter-label,
--thanos-filter-label-value, --thanos-aggr-types)
4. Selective aggregate import: Use `--thanos-aggr-types` to import only
specific aggregates
Usage:
```
vmctl thanos --thanos-snapshot /path/to/thanos-data --vm-addr http://victoria-metrics:8428
```
Closes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9262
Signed-off-by: Dmytro Kozlov <d.kozlov@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Disabling is done by making the the handlers for `/tags/tagSeries` and
`/tags/tagMultiSeries` to return `501 (Not Implemented)` status code
along with the error message saying that the API has been disabled and
will be removed in future.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10544.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Cluster apptests failed from time to time with the following error:
```
timed out while waiting for inserted rows to be sent to vmstorage
cluster
```
due to incorrect calculation of inserted row count before and after
insertion. This PR fixes it by putting the "before" count calculation
before the send() operation.
Previosly, extra filters were ignored for
`/api/v1/label/vm_account_id/values` or
`/api/v1/label/vm_project_id/values` calls. In result, even if user's
visibility was limited by applying
`?extra_filters[]={vm_account_id="1"}` param they could get the list of
all available tenants in the system.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d2a033453e)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10196
Prefer the non StaleNaN value when both StaleNaN and non-StaleNaN samples share the timestamp during deduplication(downsampling). The scenario can occur when:
1. Multiple vmagent instances scrape the same target(without -promscrape.cluster.name flag), one instance fails to scrape due to issues such as network, while others succeed.
2. Multiple vmalert instances evaluate the same recording rule, with one instance receiving a partial response while others receive a complete response.
In both cases, since the samples share the same timestamp and represent the metric state at that moment,
the non-StaleNaN value is entirely valid, whereas the StaleNaN could be caused by other unknown issues.
Therefore, it is reasonable to prioritize the non-StaleNaN value.
This commits adds storage part and cluster RPC methods for metrics metadata.
Key concepts:
* vmstorage persists metadata in-memory only.
* vmstorage evicts metadata records older than 1 hour.
* vmstorage stores only the last value of metadata for time series
metric name.
* vminsert opens an additional TCP connection to the vmstorage for
metadata write requests.
* vmselect doesn't support `limit_per_metric_name`.
This feature is available optional and must be enabled via flag - `-enableMetadata` provided to vminsert/vmsingle.
Fixes github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
This reverts commit ccf97a4143.
reason for revert: this change may break tests, which expect that ServesMetrics.GetMetric() fails
when the given metric doesn't exist in the output.
It is better to add 'TryGetMetric() (float64, bool)' function, which would return '(0, false)'
when the given metric doesn't exist, so the caller could decide what to do next.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Previously, `GetMetric` do `t.Fatalf` immediately when the target metric
not exist in `/metrics` page.
However, some metrics may start to appear after the process has been
running for a while. `t.Fatalf` invalidates the retry mechanism of
assertions, if the metric is not found the first time, the test case
will terminate.
This commit request changes `t.Fatalf` to `t.Logf` (instead of `t.Errorf`,
because error output may be considered a test case failure in some
scenarios).
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Another batch of documentation improvements
Fix Spelling in:
- Comments in code
- Displayed strings
One change was in a json file used for the anomaly dashboard in docker,
else no other code was changed.
Some Markdown changes, related to standards:
- URLs
- List numbering
- Empty spaces at the end of a line
Previously, vmagent treated differently the following configuration:
1) ./bin/vmagent --remoteWrite.url=url-0 --remoteWrite.url=url-1 --remoteWrite.disableOndiskQueue
and
2)./bin/vmagent --remoteWrite.url=url-0 --remoteWrite.url=url-1 --remoteWrite.disableOndiskQueue=true,true
In first case, it could produce duplicates and blocks ingestion requests if one of remote write targets were not accessible.
In second case, it implicitly added --remoteWrite.dropSamplesOnOverload as true and silently dropped samples for inaccessible target.
This commit treat this configuration as the same and silently drop samples on both cases to mitigate possible duplicates.
It's expected, that vmagent provides delivery guarantees, only if it has a single remote write target, when flag remoteWrite.disableOndiskQueue=true is set.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9565
### Describe Your Changes
Do not check vmauth_config_last_reload_success_timestamp_seconds since
it may contain the timestamp < time.Now() due to how lib/fasttime works.
Instead, compare the number of config reloads.
follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9369 and
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9572
Also, split the config update and reload into two separate functions.
master:
```
$gotest -race ./apptest/tests/ -run=TestSingleVMAuthRouterWithInternalAddr -count=40
ok github.com/VictoriaMetrics/VictoriaMetrics/apptest/tests 90.176s
```
pr:
```
$gotest -race ./apptest/tests/ -run=TestSingleVMAuthRouterWithInternalAddr -count=40
ok github.com/VictoriaMetrics/VictoriaMetrics/apptest/tests 46.130s
```
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Fix flaky integration test `TestSingleVMAuthRouterWithAuth`.
The flakiness is caused by the
`vmauth_config_last_reload_success_timestamp_seconds` metric, which
reports time with second-level precision.
Update the test to account for this when verifying that the config
reloads correctly.
The prompb and prompbmarshal share exactly the same models and provide
marshal and unmarshale capabilities for them. This creates duplication
(changes in one model has to be made in another, case with metadata) and
confusion where for example you compare same looking models but golang
says they are not the same (because of the type).
This commit merge prompbmarshal logic into prompb so the rest of the
code is aligned on prompb models.
Moves samplesPool and labelsPool to WriteRequestUnmarshaller.
Make WriteRequest struct clean from unmarshal logic.
The benchmark shows no significant changes:
$benchstat prompbmarshal.bench prompb2.bench
goos: darwin
goarch: arm64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb
cpu: Apple M1 Pro
│ prompbmarshal.bench │ prompb2.bench │
│ sec/op │ sec/op vs base │
WriteRequestUnmarshalProtobuf-10 189.2µ ± 5% 190.8µ ± 8% ~ (p=0.579 n=10)
WriteRequestMarshalProtobuf-10 145.3µ ± 7% 143.6µ ± 2% ~ (p=0.143 n=10)
geomean 165.8µ 165.5µ -0.14%
│ prompbmarshal.bench │ prompb2.bench │
│ B/s │ B/s vs base │
WriteRequestUnmarshalProtobuf-10 50.42Mi ± 5% 49.99Mi ± 8% ~ (p=0.593 n=10)
WriteRequestMarshalProtobuf-10 65.64Mi ± 7% 66.39Mi ± 2% ~ (p=0.143 n=10)
geomean 57.53Mi 57.61Mi +0.14%
│ prompbmarshal.bench │ prompb2.bench │
│ B/op │ B/op vs base │
WriteRequestUnmarshalProtobuf-10 27.70Ki ± 4% 26.90Ki ± 7% ~ (p=0.190 n=10)
WriteRequestMarshalProtobuf-10 3.267Ki ± 12% 3.273Ki ± 12% ~ (p=0.971 n=10)
geomean 9.514Ki 9.383Ki -1.38%
│ prompbmarshal.bench │ prompb2.bench │
│ allocs/op │ allocs/op vs base │
WriteRequestUnmarshalProtobuf-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
WriteRequestMarshalProtobuf-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
geomean ² +0.00% ²
¹ all samples are equal
² summaries must be >0 to compute geomean
The import aliases may complicate maintenance of the code in the long term
if they aren't used consistently, e.g. if one file imports the apptest under the default name
while the other file imports the apptest under the "at" name.
The aliases also complicate grepping the code by apptest.* or prompbmarshal.* .
- Drop the code needed for asynchronous removal of the directory on NFS shares.
This code was needed when VictoriaMetrics could keep open files after their deletion
or renaming. This is no longer the case after the commit 43b24164ef .
Now files are deleted only after all the readers close them.
This updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
- Unify MustRemoveAll() and MustRemoveDirAtomic() into MustRemoveDir() and MustRemovePath()
functions:
- The MustRemoveDir() deletes the given directory with all its contents, in an "atomic" way:
it creates a special `.delete-this-dir` file in the directory, then removes all its contents
except of this file, and later removes the `.delete-this-dir` file together with the directory
itself. This makes possible easily determining whether the given directory needs to be deleted
after unclean shutdown - if it contains the `.delete-this-dir` file or if it is empty, it must be deleted.
Add IsPartiallyRemovedDir() function, which can be used for detecting whether the given directory must be removed
at starup.
Previously the MustRemoveDirAtomic() was using a "trick" for atomic directory removal: it was "atomically" renaming
the directory to a temporary directory with '.must-remove.' marker in the directory name, and after that it
was removing the renamed directory. On startup all the directories with the `.must-remove.` marker were deleted
if they are left after unclean shutdown. This "trick" doesn't work for NFS and object storage such as S3,
since these storage systems do not support atomic renaming of directories with multiple entries inside.
The new MustRemoveDir() function doesn't use this "trick", so it can be safely used in NFS and S3-like storage systems.
This is based on the pull request from @func25 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9486/files .
- The MustRemovePath() deletes the given file or an empty directory.
- Delete the existing parts and partitions at startup if they were partially deleted.
- Consistently use fs.MustRemoveDir() and fs.MustRemovePath() instead of os.RemoveAll() across the codebase.
This reduces the amounts of bolierplate code related to error handling.
- Consistently use fs.MustWriteSync() instead of os.WriteFile() across the codebase.