Another batch of documentation improvements
Fix Spelling in:
- Comments in code
- Displayed strings
One change was in a json file used for the anomaly dashboard in docker,
else no other code was changed.
Some Markdown changes, related to standards:
- URLs
- List numbering
- Empty spaces at the end of a line
While working on #9431 there has been introduced 2 bugs related to
indexDB.searchMetricName():
1. During the search the index records are unconditionally placed in
sparse index
2. If search touches index records in both prev and curr indexDBs, there
will be possible cases that metricIDs can be unintentionally removed
using `wasMetricIDMissingBefore()` logic
Additionally, the PR moves the searchMetricName from indexDB and Search
to Storage which simplifies the code and makes it spossible to reuse the
function as-is in enterprise code.
Follow up for #9431.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Removing extDB from indexDB makes prev, curr, and next indexDBs independent.
I.e. the search is performed independently in prev and curr, the results are
then merged.
Additionally, since no search is now performed in extDB:
- all indexDB search methods now return the original maps used for populating
the result, without invermediate conversion to slices.
- `NoExtDB` suffix has been removed from method names
This has been extracted from #8134.
Signed-off-by: Andrei Baidarov <baidarov@nebius.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, if limit was reached for cardinality limiter, vmstorage
started to perform index lookups for any series exceed limit. Since
storage must skip index creation for such series, it's not possible to
cache it. It resulted into opposite effect of cardinality limiter -
instead of reducing resource usage, it increased it instead.
This commit changes cardinality limit calculation from metricID to the
hash from raw metricName. It could slightly increase CPU usage if
cardinality limiter is configured, since hash must be calculated for
each metricName row. But it mitigates excessive CPU and memory usage on
limit hit
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9554
### Describe Your Changes
The commit changes CI behavior:
- Run build in parallel for different os\arch
- Run unit\integration\lint in parallel
- Remove the custom Go cache step in favor of the logic provided in
`actions/setup-go`. The custom cache was used to build key based on
go.sum and makefiles. This logic is preserved.
- Introduce cache for golangci-lint.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Previously, tcp listener perform synchronous proxy protocol header
read during connection accept. It could significantly reduce vmauth
performance and lead to timeout at serving http requests.
This commit changes this logic and performs proxy protocol header
parsing during first Read request from connection or RemoteAddr method
call. It significantly improves performance and reduce possible
bottleneck at connections accept method.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9546/
- Rename WriteRequestUnmarshaller to WriteRequestUnmarshaler
- Add a description to WriteRequestUnmarshaler struct
Review comments
b98e592752 (r163365472)
Follow up on
b98e592752
### Describe Your Changes
By default `zstd.Reader` creates multiple goroutines to process a single
connection:
- It doesn't match cgo behavior, which works synchronously, and creates
a lot more concurrent goroutines (0.5k -> 5k on my workload)
- It results in non-zero `vm_tcpdialer_errors_total{type="read"}` errors
on vmselect because an underlying connection is closed while a goroutine
is still reading from it. The goroutine created by
`zstd.NewReader`/`zstd.Reset`
abb348e4db/lib/handshake/buffered_conn.go (L113-L120)
- vmselect (and vmagent) doesn't benefit from async mode since it has
multiple readers in-use at the same time, which usually exceeds the
number of cpu cores
Partly related to #9218
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This commit introduces new storage API: `/internal/log_new_series`.
It helps to dynamically debug newly created series. Changing `-logNewSeries` value requires storage restart,
which may introduce downtime and is not recommended for production deployments.
In addition, this commit adds flags: `-logNewSeriesAuthKey`, which protects newly added API.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8879
This change will expose any IPv6 addresses assigned to an instance under
the meta labels:
* `__meta_gce_public_ipv6` -native IPv6 address, globally routed
* `__meta_gce_internal_ipv6` - unique local address (ULA).
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9370
The prompb and prompbmarshal share exactly the same models and provide
marshal and unmarshale capabilities for them. This creates duplication
(changes in one model has to be made in another, case with metadata) and
confusion where for example you compare same looking models but golang
says they are not the same (because of the type).
This commit merge prompbmarshal logic into prompb so the rest of the
code is aligned on prompb models.
Moves samplesPool and labelsPool to WriteRequestUnmarshaller.
Make WriteRequest struct clean from unmarshal logic.
The benchmark shows no significant changes:
$benchstat prompbmarshal.bench prompb2.bench
goos: darwin
goarch: arm64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb
cpu: Apple M1 Pro
│ prompbmarshal.bench │ prompb2.bench │
│ sec/op │ sec/op vs base │
WriteRequestUnmarshalProtobuf-10 189.2µ ± 5% 190.8µ ± 8% ~ (p=0.579 n=10)
WriteRequestMarshalProtobuf-10 145.3µ ± 7% 143.6µ ± 2% ~ (p=0.143 n=10)
geomean 165.8µ 165.5µ -0.14%
│ prompbmarshal.bench │ prompb2.bench │
│ B/s │ B/s vs base │
WriteRequestUnmarshalProtobuf-10 50.42Mi ± 5% 49.99Mi ± 8% ~ (p=0.593 n=10)
WriteRequestMarshalProtobuf-10 65.64Mi ± 7% 66.39Mi ± 2% ~ (p=0.143 n=10)
geomean 57.53Mi 57.61Mi +0.14%
│ prompbmarshal.bench │ prompb2.bench │
│ B/op │ B/op vs base │
WriteRequestUnmarshalProtobuf-10 27.70Ki ± 4% 26.90Ki ± 7% ~ (p=0.190 n=10)
WriteRequestMarshalProtobuf-10 3.267Ki ± 12% 3.273Ki ± 12% ~ (p=0.971 n=10)
geomean 9.514Ki 9.383Ki -1.38%
│ prompbmarshal.bench │ prompb2.bench │
│ allocs/op │ allocs/op vs base │
WriteRequestUnmarshalProtobuf-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
WriteRequestMarshalProtobuf-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
geomean ² +0.00% ²
¹ all samples are equal
² summaries must be >0 to compute geomean
This commit introduces new flag: `-storage.idbPrefillStart` with
default value of `1h`. It allows to adjust start time of the prefill
process for the data written into the next indexDB.
By default, VictoriaMetrics starts prefill indexDB at 3 A.M UTC, while
indexDB rotates at 4 A.M UTC. It could be useful to change start time
from 3 A.M. to 1 A.M or 00:00 A.M. It should smooth overall resource
usage.
However, changing value to the number bigger than 4 hours for the
default
installations doesn't make much sense ( like 11 P.M. for the day before
rotation). Since, VictoriaMetrics maintenances daily-indexes and it have
to repopulate it twice.
But it could be useful in conjunction with `-retentionTimezoneOffset`,
which could delay index rotation for the current day and give more time
for the prefill process.
As an example, `-retentionTimezoneOffset=4h` adds an additional 4 hours
to the rotation time and `-storage.idbPrefillStart` could be changed
accordingly.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9393
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
The `groupWatchersCleaner` iterates through all watchers, attempting to
lock them sequentially while holding the global `groupWatchersLock`.
Therefore, a large number of group watchers can cause the
`groupWatchersLock.Lock()` to block for a noticeable period.
Proposing to use `TryLock` as an optimistic case because if the watcher
lock is held, it's more likely that the watcher is still in-use so no
further cleanup is required.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9354#issuecomment-3089108889
The lib/fs.MustCloseParallel() accepts a slice of MustWriter items, which must implement only
a single method - MustWrite(). The previous lib/filestream.MustCloseWritersParallel() was
accepting CloseWriter items, which must implement Write() and Path() methods additionally
to MustClose() method. This was adding artificial restrictions on the applicability
of the MustCloseWritersParallel() method. Remove these restrictions.
This should reduce the time needed for the deletion of VictoriaLogs parts
with big number of files, which are created when wide events are ingested into VictoriaLogs
(e.g. logs with big number of log fields).
This may help improving scalability of VictoriaLogs at systems with big number of CPU cores,
which store data into high-latency storage such as Ceph or NFS.
See https://github.com/VictoriaMetrics/VictoriaLogs/issues/517
- Drop the code needed for asynchronous removal of the directory on NFS shares.
This code was needed when VictoriaMetrics could keep open files after their deletion
or renaming. This is no longer the case after the commit 43b24164ef .
Now files are deleted only after all the readers close them.
This updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
- Unify MustRemoveAll() and MustRemoveDirAtomic() into MustRemoveDir() and MustRemovePath()
functions:
- The MustRemoveDir() deletes the given directory with all its contents, in an "atomic" way:
it creates a special `.delete-this-dir` file in the directory, then removes all its contents
except of this file, and later removes the `.delete-this-dir` file together with the directory
itself. This makes possible easily determining whether the given directory needs to be deleted
after unclean shutdown - if it contains the `.delete-this-dir` file or if it is empty, it must be deleted.
Add IsPartiallyRemovedDir() function, which can be used for detecting whether the given directory must be removed
at starup.
Previously the MustRemoveDirAtomic() was using a "trick" for atomic directory removal: it was "atomically" renaming
the directory to a temporary directory with '.must-remove.' marker in the directory name, and after that it
was removing the renamed directory. On startup all the directories with the `.must-remove.` marker were deleted
if they are left after unclean shutdown. This "trick" doesn't work for NFS and object storage such as S3,
since these storage systems do not support atomic renaming of directories with multiple entries inside.
The new MustRemoveDir() function doesn't use this "trick", so it can be safely used in NFS and S3-like storage systems.
This is based on the pull request from @func25 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9486/files .
- The MustRemovePath() deletes the given file or an empty directory.
- Delete the existing parts and partitions at startup if they were partially deleted.
- Consistently use fs.MustRemoveDir() and fs.MustRemovePath() instead of os.RemoveAll() across the codebase.
This reduces the amounts of bolierplate code related to error handling.
- Consistently use fs.MustWriteSync() instead of os.WriteFile() across the codebase.
- Consistently use fs.IsPathExist() for checking whether the given path exists on the filesystem.
- Consistently use fs.MustWriteAtomic() for atomic store of the serialized state into file.
- Consistently panic with 'BUG:' prefix on unexpected errors.
- Read and write the state file contents in one go. This simplifies the code for loading and storing the state.
This shouldn't increase memory usage too much, since the parsed state is already stored in RAM.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9074
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9102
The following renamings were requested in #8134 to be done in a separate
PR:
- Rename putTSIDToCache(s) to storeTSIDToCache(s). This is because
get/put prefixes are normally reserved for pools and ref counters. This
unclear though whether name `getTSIDFromCache` is okay in this context.
- Rename `addPartitionNolock` to `addPartitionLocked`, because the
`Locked` suffix is used everywhere else
- Rename deleteMetricIDs to saveDeletedMetricIDs, because no deletion is
actually happening. The deleted metric ids are still present in the
index. One needs to filter the deleted metric ids out after the
retrieval.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, it was not possible to compile netBSD binary due to missing OS constrains at lib/fs and lib/filestream packages.
This commit fixes it by:
* apply proper constrains at lib/filestream
* Introduce statfs_t and statfs() to abstract unix internals: NetBSD
needs to use unix.Statvfs_t and unix.Statvfs() unlike other Unix-es.
* apply proper constrain for vmctl terminal package
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9473
The newly created data part could become missing after unclean shutdown (such as hardware power off),
since the contents of the parent directory wasn't synced to disk before storing the newly created data part in the parts.json file.
Fix this by syncing the parent directory contents before storing the newly created part in the parts.json file.
This commit is based on https://github.com/VictoriaMetrics/VictoriaLogs/pull/507
This commit adds tmp inmemory and data blocks buffers for
index search requests. It allows to reduce memory allocations on block
cache misses. Since block cache puts block into cache only on after
configured number of cache misses.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9324
This commit respects online CPU count for process_cpu_cores_available metric. Since CPUQuota may exceed it.
Detailed description:
There might be a case when `getCPUQuota()` returns a value bigger than
available logic CPU cores. In this case the lower value should be used.
These changes also reflect upcoming go1.25 changes
https://tip.golang.org/doc/go1.25#container-aware-gomaxprocs
> If the CPU bandwidth limit is lower than the number of logical CPUs
available, GOMAXPROCS will default to the lower limit
In practice that happens with [CPU
Manager](https://kubernetes.io/blog/2022/12/27/cpumanager-ga/) enabled.
It requires setting `/sys/devices/system/cpu/online` which shows the
total number of cores on a node. The container doesn't have any cgroup
limits, but it's pinned to a limited subset of cores.
```
root@vmselect:/# cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
-1
root@vmselect:/# cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000
root@vmselect:/# cat /sys/devices/system/cpu/online
0-255
```
go1.25 and go.uber.org/automaxprocs don't look into
`/sys/devices/system/cpu/online`, VictoriaMetrics does
The chunkedbuffer is released twice, leading to concurrent use of the
buffer after it's acquired from a pool by two different goroutines.
Issue was introduced at v1.115.0 at the following commit 5b87aff830
Add unit tests that verify filesystem hierarchy of the indexdb after
opening the storage and after rotation. The test have been originally
added in #8134 but are still applicable to the current version and will
also reduce the diff.
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
- Move `prefetchMetricNames`, `SearchMetricNames`, and
`SearchLabelValues` from `Storage` to `indexDB`
- Rename `searchLabelValuesWithFiltersOnTimeRange` to
`searchLabelValuesOnTimeRange`
- Rename `searchLabelValuesWithFiltersOnDate` to
`searchLabelValuesOnDate`
Extracted from #8134.
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>