Previously, the `lrucache` ttl was hardcoded to 3 minutes and the code comment
said it was enough for most queries that are sent to vmstorage repeatedly. But
it appears that it is not always the case.
`indexDB` has this `tagFilters loop cache` that is used in index search
optimizations (see `getMetricIDsForDateAndFilters()`). Until recently, this
cache was implemented with `workingsetcache`. In
[10154](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10154), this
implementation was changed to `lrucache`. After this change, some users reported
huge CPU utilization increase in vmstorage (see
[10297](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10297)).
`workingsetcache` evicts an entry after two rotations that happen every 30
minutes. I.e. the entry ttl is 1h since the last time it was retrieved from the
cache. Hence the assumption that `lrucache` causes the CPU to raise because of
its too aggressive eviction policy.
Reverting back to `workingsetcache` helped but `lrucache` is still preferred
because it occupies less memory. Since `lrucache` is preferred, its ttl needs to
be increased. Instead changing the hardcoded value, the ttl was made
configurable.
The `tagFilters loops cache` was configured to have ttl of 1h to match the
default behavior of `workingsetcache`.
Although no one complained about this yet, the ttl was also increased for
`tfssCache` because previously it was also implemented with `workingsetcache`.
Finally, the `regexpCache` and `prefixesCache` were also configured to have 1h
ttl because these caches are also during the data retrieval.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, the Kafka consumer in vmagent committed offsets per message
(manual commit). At high message rates, this could overload the commit
path (coordinator, __consumer_offsets topic, and network).
This commit introduces time-based manual commits with a controlled window:
* enable.auto.commit remains false by default.
* After a successful TryPush (data accepted into the buffer before the
vmagent queue/backend), vmagent adds the message to pending offsets.
* Offsets are committed periodically (every second), as well as during
shutdown and partition rebalance.
This keeps the commit point tied to TryPush (stronger guarantees than
auto-commit) while significantly reducing commit QPS.
Auto-commit is also time-based, but it advances offsets based on poll()
delivery rather than application-level processing. This means offsets
may be committed before data is actually accepted by the vmagent
pipeline, slightly increasing the risk of data loss on crash or restart.
This change does not make the Kafka consumer fully transactional
end-to-end. Buffers in vmagent/vminsert/vmstorage still imply possible
data loss on hard stops. However, it provides stronger guarantees than
auto-commit, since commits are based on TryPush rather than poll().
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10395
Keeping such byte slices in the pool may increase memory usage when processing a small share of requests
with much bigger sizes than the average processed request.
This should help reducing memory usage at https://github.com/VictoriaMetrics/VictoriaLogs/issues/1042
Respect the default value of http.DefaultTransport.Proxy. Previously,
it could be unintentionally overridden with a nil value.
This commit aligns Proxy configuration across all created transports.
Previously, default `Proxy` was unconditionally replaced with config value, which could be nil.
It made impossible to use default http client proxy env variables.
This commit adds check in oauth http client builder that only overwrites the
transport proxy if a custom proxy url function is defined.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10385
Bug was introduced at 5a587f2006, while porting change from cluster branch.
This commit properlyslice `mms
[]metricsmetadata.Row` slice . Previously, every WriteMetadata call triggered a
slice allocation.
This shouldn't significantly impact overall performance, so I haven't
included benchmarks.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10392
This should reduce cpu utilization while still removing the storage
connection saturation.
Follow-up for 6bc809813b (#10346)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This is needed in order to allow using lib/pushmetrics for vmctl as it does not use go native flags.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
app/vmctl: add metrics for the migrations
- add flags to allow setting up metrics push
- add metrics to track progress of the migration for all modes
- add metrics for generic backoff and limiter packages
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The current implementation has a bottleneck – a single mutex to access
`prev`/`next` metric sets. Each rotation results in storage utilization
spikes since lock-free `curr` is almost empty, and cache needs to
promote metrics from `prev` to `next`.
This is an attempt to reduce contention by spliting cache into separate
shards.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10367
This commit adds a new flag `disableScheduledBackups` for `vmbackupmanager. Which disables any scheduled backups. It could be useful to keep vmbackupmanager running and serving API calls only.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10364
The new `ALERTS_FOR_STATE` may be retrieved during restore when:
1. a group contains multiple heavy rules, alerting rule A may have
already been executed and its state metrics successfully uploaded to the
datasource by the time all rules within the group have finished
executing;
2. the datasource makes data queryable very quickly, for instance, when
users configure a small value for `-search.latencyOffset`.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10335
Previously, running a vmalert with an empty notifier.url does not produce an error and leads to vmalert which will never send a notification successfully.
This commit properly validates notifier.url empty value.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10355
After changing the scrape interval from a smaller value (e.g., 30s) to a larger value (e.g., 60s), the changes() function starts to yield non-zero values even when the underlying values have not changed.
This commit keeps unchanged series values when a large gap occurs between samples or when the scrape interval decreases.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280
Commit 83da33d8cf introduced a check to
detect directories partially removed via IsPartiallyRemovedDir.
However, the check was performed using the full path, while de.Name()
returns only the current entry name (without the path). As a result, the
check always succeeded and the function did not behave as intended.
This flag allows disabling the mincore() syscall introduced in
50fc48ac47. On older ZFS filesystems,
mincore() may trigger a bug related to ZFSÕs own in-memory cache. Mixing
reads from mmap()ed files and direct disk reads can corrupt the ZFS ARC
cache and lead to data read corruption.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10327
Commit f62893c151 added an attempt to fix
stats for `tagFiltersCache`, `metricIDCache`, and `dateMetricIDCache`.
Instead of aggregated stats, it returned the largest cache stats by
cache size.
This resulted in possible counter decreases for counter metric types. It
made aggregated metrics less usable.
This commit changes cache stats aggregation by metric type:
* size-related gauge metrics are returned based on max cache size usage
* metric counters are reported as a sum of all counters
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10275
### Describe Your Changes
In the Cardinality Explorer, when filtering, a "Percentage from total"
stat appears. This stat is documented as "the share of these series in
the total number of time series".
This works for pages for individual metrics. However, if using a filter
that returns *multiple* metrics, the value of "Percentage from total"
will only account for the size of the *first* metric. One can have a
filter that returns, say, 10k time series (out of, say, 100k in the VM
cluster), and if the first metric returned has 1k time series, then
"Percentage from total" will show 1%, not 10%.
This PR fixes that calculation.
Credits to @PleasingFungus for the original fix (PR #10288).
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Co-authored-by: PleasingFungus <PleasingFungus@users.noreply.github.com>
Commit e35a9a366c changed the order of wg.Add calls in the Graphite transform package. Previously, all wg.Add calls were made upfront, but after that change it became possible for wg.Wait to exit earlier than expected.
This commit fixes the issue by spawning all background goroutines first and starting the goroutine that calls wg.Wait afterward.
Previously vmagent had remoteWrite.queues as a global setting that was be applied to every persistentqueue. However, it could be useful to specify remotewrite.queues per remotewrite.url.
Considering each rw might have different workload(latency, throughput, and availability), so it will be more flexible for tuning if we can set remoteWrite.queues separately for specific rw.
This commit, makes `-remoteWrite-queues` configurable per remoteWrite.url.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10270
Initial implementation of searchTenantsOnDate used a index scan for the given prefix (index prefix + tenant + date).
It did not check whether the date prefix was actually outside the current date.
This commit adds the missing date check and makes the tenant search results accurate.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10295
These metrics must be incremented when the request couldn't be processed because of the configured per-user concurrency limit.
The commit 76176ac1d3 moved the counter increase to the place when the current request
is put in the wait queue because of the concurrency limit is reached. This is incorrect, since such requests
can still be successfully processed during -maxQueueDuration . This also contradicts the docs at https://docs.victoriametrics.com/victoriametrics/vmauth/#concurrency-limiting
There is a small practical sense in counting the number of times the concurrency limit is reached,
while the request is successfully processed during the -maxQueueDuration after that.
Add missing alerting rule for rejected unauthorized requests because of the concurrency limit.
Add missing grouping by instance for per-user counter of rejected queries because of the concurrency limit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
This simplifies troubleshooting by investigating the vm_log_messages_total metric
when logs are unavailable. The logs may be unavailable when the -loggerLevel command-line
flag is set to value other than INFO. The logs may be unavailable when clients
use Monitoring of Monitoring service ( https://victoriametrics.com/products/mom/ ),
which provides metrics, but doesn't provide logs from VictoriaMetrics components
running at the client side.
Add `is_printed` label to the `vm_log_messages_total` metric in order to detect whether
the given log has been suppressed or printed.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10304
While at it, make more readable the description for the TooManyLogs alert,
which is based on the vm_log_messages_total metric.
Also return back the `level!="info"` instead of `level="error"` filter
in the query for this alerting rule, in order to be consistent with queries
at the official dashboards for VictoriaMetrics components.
TODO: investigate too high warnings rate at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2760
and fix it at the source of these warnings instead of modifying the query
for the TooManyLogs alert.
Add fallback to legacy indexDB for stats search
After introducing the new partition index
(f97f627f79), storage stopped returning
stats for date ranges outside the partition index. This made the
migration backward incompatible, as there was no way to retrieve stats
for dates prior to the migration.
This change adds a fallback to the legacy indexDB search when the status
search on the current partition index returns zero series.
This is an imperfect solution: due to tag filters, the TSDB status
search may legitimately return empty results. However, the additional
overhead is small and acceptable.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10315
The vmanomaly has been moved to a separate repository. This means that the functionality related to vmanomaly is no longer needed in the app/vmui located in the VictoriaMetrics repository.
This commit removes all the functionality and unnecessary abstractions related to vmanomaly from the app/vmui repository. This should help improving long-term maintenance of the code.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9755
Wait for the first byte from the reader passed to ReadUncompressedData()
before obtaining concurrency token from -maxConcurrentInserts and before allocating
buffers needed for reading the request body in memory.
This should limit the amounts of memory needed for processing a big number of concurrent
HTTP requests via Prometheus remote_write protocol and via other HTTP-based data ingestion
protocols where every request contains a single block of data to process.
Now the maximum memory usage is limited by -maxConcurrentInserts, while the server
can process much more than -maxConcurrentInserts concurrent HTTP requests by pausing the excess requests.
Previously the memory usage wasn't limited by -maxConcurrentInserts, since buffers for reading the data
from concurrent connections were allocated before obtaining the concurrency token from -maxConcurrentInserts.
While at it, use protoparserutil.ReadUncompressedData() in lib/protoparser/promremotewrite/stream.Parse()
for the sake of consistency across parsers for protocols, which send the full block of data per every incoming HTTP request.
This is a follow-up for the commit d107dee9c7
This is follow-up for c5713a09d3
Originally, dateMetricID cache was fully rotate every 20 minutes. It
made daily-index pre-creation less efficient and caused CPU usage spikes
for index records lookup at midnight.
storage pre-fills index records for the next day in 1 hour before night.
But this rotation made only last 20 minutes before midnight visible in
the cache.
This commit changes rotation period from 20 minutes to 2 hours ( 1 hour
tick interval). While
it could slighlty increase cache memory usage ( in practice it shouldn't
be noticeable). It prevents from CPU usage spikes.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10064
Issue was introduced at d6ef8a807b commit.
Due to variable shadowing, if filter matched more than 100_000 metricIDs, it's fallback to the indexDB scan.
But because of type, `filter` value was not properly updated. And it triggered incorrect results.
This commit fixes this typo and adds test to verify this case.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10294
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10196
Prefer the non StaleNaN value when both StaleNaN and non-StaleNaN samples share the timestamp during deduplication(downsampling). The scenario can occur when:
1. Multiple vmagent instances scrape the same target(without -promscrape.cluster.name flag), one instance fails to scrape due to issues such as network, while others succeed.
2. Multiple vmalert instances evaluate the same recording rule, with one instance receiving a partial response while others receive a complete response.
In both cases, since the samples share the same timestamp and represent the metric state at that moment,
the non-StaleNaN value is entirely valid, whereas the StaleNaN could be caused by other unknown issues.
Therefore, it is reasonable to prioritize the non-StaleNaN value.
Previously, the first 20 raw samples were used for calculation.
But compare to the first 20 samples, the last 20 samples represent the latest state of the metrics,
so the lookbehind window calculated from them should be more accurate when applied to the most recent samples,
resulting in better query results for recent time ranges.
For example,if the scrape interval changes at day4, and the query range is set to last 7 days.
Applying the window derived from the first 20 samples(the old scrape interval) to new samples could result in consistently incorrect results from day4 through day11.
Conversely, applying the window derived from the last 20 samples (the new scrape interval) could lead to incorrect results for [day0-day4),
which are old states and generally less important.
This pull request does not address any specific bug, but change the general behavior, so there is no changelog.
Inspired by https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280, but not the fix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10280.
This commit reduces default value for
`storage.vminsertConnsShutdownDuration` flag from `25s` to `10s`
seconds.
This change should help to reduce probability of ungraceful storage
shutdown at Kubernetes based environments, which has 30 seconds default
graceful termination period value (terminationGracePeriodSeconds).
Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10063
* Fixed a heatmap crash that happened when all visible cells had the
same value (division by zero produced invalid color indices).
* Improved how histogram buckets are chosen for display when the data is
very sparse, so the heatmap doesn’t look empty or drop the only
meaningful bucket.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10240
It is better to delete the snapshot, since the client is no longer interested in it.
This should prevent from creating many unused snapshots when clients cancel creating snapshots
because of timeouts. This is the real production case from one of VictoriaMetrics users:
the disk IO subsystem became very slow, so creating a snapshot took a lot of time, so vmbackup
was canceling creating the snapshot because of the timeout. But vmstorage was still continue
creating the snapshot. This resulted in the increasing number of created but unused snapshots.
Without measuring this, we have a blind spot. Exposing it as a metric
improves visibility and should save time during future debugging
sessions.
Inspired by review commit
c9596a0364 (r173621968)
indexDB mergeset has an edge for single item inmemoryBlock. It stores
such items blocks in-memory at blockheader firstItem. So there is no
need to perform on-disk read operations and storing copy of it at cache.
It also may result in incorrect search results, inmemoryBlock with a
single item has always zero index block offset. Which causes collisions
if it's cached with the next index block at part.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10239
Probably fixes
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10063
**Problem**
* VMUI had two tenant ID sources:
* URL path: `/select/<accountID>/vmui/`
* Query param: `tenantID`
* These could differ, causing confusion and inconsistent behavior.
**Solution**
* Removed the legacy `tenantID` query parameter.
* Use the URL path as the single source of truth for tenant ID.
* Changing the tenant in the UI now updates the URL path.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10232
### Describe Your Changes
The Backups errors panel uses a hard coded rate, when looking over a
large period of time this number would likely stay low do to the hard
coded rate when in reality the amount of errors is much larger.
This change addresses this by using the __rate variable in Grafana so
the rate will align with the date/time range in Grafana.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Call decConcurrency() inside Reader.Read() before calling the Read() at the underlying reader.
This reduces chances of improper use of the writeconcurrencylimiter.Reader by callers.
While at it, move the creation of writeconcurrencylimiter.GetReader() to the top of stream parser functions
at lib/protoparser/* packages, and call incConcurrency() inside GetReader() call.
This reduces the frequency of decConcurrency() / incConcurrency() calls
for typical buffered reads when parsing the incoming data. This, in turn,
reduces the contention on the concurrencyLimitCh.
### Describe Your Changes
Previously, we had to manually update flags in documentation whenever we
made flag-related changes in source code. Someone did it by hand, others
compiled enterprise binaries, executed them with `-help` flag, and
copied and pasted output to the documentation.
In https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9632 a
command `make docs-update-flags` was introduced. It automates the whole
process. It compiles binaries, runs `-help,` and syncs output to changes
automatically.
Now, we can **omit updating doc flags in the PR** and do it once before
releasing a new version.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
In alerting rules, annotations are only attached to alert messages that
are sent to the notifier (such as Alertmanager). These annotations
typically contain human-readable information, such as instructions for
resolving the alert.
In [replay
mode](https://docs.victoriametrics.com/victoriametrics/vmalert/#rules-backfilling),
vmalert does not send alert messages to the notifier at all(no notifier
is configured), as these alerts are outdated. Therefore, it does not
need to template the annotations in this mode.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10262
which tracks the number of requests to the query rollup cache
As described in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10117, when
retrieving cached data from the rollup result cache, there can be mixed
`get()` and `getBig()` calls to the underlaying fastcache. And it's
unpredictable how many times `getBig()` will call `get()`, so the
metrics from fastcache cannot be used to indicate query cache miss
ratio.
Exposing a new counter `vm_rollup_result_cache_requests_total` to track
the number of requests to the query rollup cache, together with the
existing `vm_rollup_result_cache_miss_total`, allows for monitoring the
rollup cache miss rate per query (or subquery), which is more
user-facing.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10117
related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5056
Previously VictoriaMetrics: Single-node version used
`http://<victoriametrics-addr>:8428/zabbixconnector/api/v1/history`
resulted in a missing path error. The issue was introduced during changes back-porting from vmagent.
Additionally, the http response was fixed. Zabbix expects a 200 status
code during normal operation.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10214
Since the cache may be reset too often, using the sizeBytes as an
indicator that this is the first met indexDB to collect tfssCache stats
is unreliable because it often can be zero all indexDB instances. Use
Requests metric instead because it is never reset.
Follow-up for #10204.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Since the first connection is not closed, the vmstorage will never
terminate gracefully which will cause the reset of all caches on the
start-up.
Follow-up for 244769a00d (#10136)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
The `err` may contain information about request cancelation performed by the server code.
In such cases the error must be logged. The error must be ignored only if the client canceled the request.
This is a follow-up for the commit c9596a0364
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
This is to debug cases when metric name tracker resets the tsid cache
after restart. It could be due vmstorage not having enough time to stop
gracefully. Logs should provide this info.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This fixes the following corner case: if all instances of a cache have
zero size, the stats won't be set at all. This results in some weird
graphs if the cache is reset very often (such as tfssCache): the cache
sizeMaxBytes alternates between the actual value and zero.
Follow-up for f62893c151
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Commit 5a587f2006 was not properly ported
to the single node branch. Since single node is able to perform both
promscrape and self-scrape, it's required to add metadata add methods to
those paths.
This commit fixes missing metadata add to the storage.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10175
Previously a short spike in the number of concurrent requests immediately led to `429 Too Many Requests` errors
when the number of concurrent requests exceeds -maxConcurrentRequests or -maxConcurrentPerUserRequests.
This commit allows processing short spikes in the number of concurrent requests during the -maxQueueDuration timeout.
The requests are rejected only if they couldn't be served accroding to the concurrency limits during the -maxQueueDuration.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10112
- Introduce backendURLs struct, which holds all the backend urls and allows stopping
all the health checkers across all the backend urls with a single call to backendURLs.stopHealthChecks().
- Immediately cancel the pending Dial call to the backend when backendURLs.stopHealthChecks() is called.
Use lib/netutil.Dialer.DialContext() for this.
- Replace a fragile closing of stopHealthCheckCh channel via stopHealthCheckOnce.Do()
with easier to maintain call of cancel() func for the corresponding healthChecksContext.
- Wait until health checker goroutines are finished before return from UserInfo.stopHealthChecks().
Previously the health checker goroutines could run for some time trying to dial the backend
after the return from UserInfo.stopHealthChecks().
- Try dialing the broken backend for https urls. It is better if the broken backend logs the error
instead of routing client requests to the broken backend.
- Log dial errors to the broken backend, so users could troubleshoot the backend connectivity issue with more details.
- Refer the correct issue - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997 -
in the comments explaining why periodic dialing of the broken backend is needed.
Previously the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890 was incorrectly referred.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10147
follow up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10177
Add `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` which track
the number of user request errors, and aligned with
`vmauth_user_requests_total`.
The existing `vmauth_http_request_errors_total` currently only counts
requests with `invalid_auth_token`. Once authorization has passed, any
subsequent request errors are tracked under
`xxx_user_request_backend_requests_total`.
This commit introduces the global `sampleLimit` setting to restrict the number
of samples accepted per scrape target, mirroring the behavior of
Prometheus.
Motivation:
1) The existing `-promscrape.seriesLimitPerTarget` flag currently takes
precedence over any `sample_limit` setting defined directly on the
scrape target. The new `sampleLimit` implementation ensures that the
target configuration is able to override the global setting, allowing
users to define specific limits per target.
2) The existing series limit flag uses memory-intensive Bloom filters,
resulting in high RAM consumption under high-cardinality scraping
scenarios. The `sampleLimit` provides a much simpler, low-overhead
alternative.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10145
The encoding.DecompressZSTD* consistently updates the vm_zstd_block_decompress_calls_total metric.
Also make the follwing improvements after the commit 10f7cd2ffc:
- Add encoding.DecompressZSTDLimited() function and use it instead of zstd.DecompressLimited,
so it properly updates vm_zstd_block_decompress_calls_total metric.
- Clarify description for the encoding.DecompressZSTD* and zstd.Decompress* functions.
Currently, `dateMetricIDCache` is reset when it is full and it is never
reset is not full but the data it stores is no longer needed. This leads
to the following problems:
- During regular data ingestion the cache sizeBytes may exceed max
allowed size and the cache gets reset which may potentially slow down
data ingestion (see #10064)
- The cache is per-indexDB. This means that in partition index (#8134)
there will be as many instances of this cache as the number of
partitions. If someone performs a backfill across all partitions, this
will fill all caches and they will never get reset even if no more
historical data is ingested.
So the solution is to periodically rotate the cache. After first
rotation the data is not deleted but moved to `prev` storage. After
second rotation `prev` gets deleted. This gives the cache an opportunity
to restore the `prev` data if it is still in use. Based on #10167.
This PR also removes the introduced recently introduced
`-storage.cacheSizeIndexDBDateMetricID` flag (see #10135). This should
be safe since it is new and its use case is very niche, i.e. no one
would really use it.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
We have `vmauth_user_requests_total` and
`vmauth_unauthorized_user_requests_total` to track requests from the
user side. However, in scenarios such as request timeouts or when the
response code matches `retry_status_code`, a single request may be
retried across multiple backends.
Exposing counters `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` that track the
number of requests sent to backends provides insight into the routing
logic and can help identify if requests are being consistently retried,
which may contribute to increased request duration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10171
Currently, backendErrors may be counted twice if a request to the
backend fails due to context.DeadlineExceeded.
9bc7a17d80/app/vmauth/main.go (L328)9bc7a17d80/app/vmauth/main.go (L294)
And we increment this counter in a way that is somewhat inconsistent.
Given that the counter's name is `xx_request_backend_errors_total`, it
should only increase when a backend request returns an error. This value
can exceed the user request error count if multiple backend requests
fail for a single user request.
The `xxx_request_backend_errors_total` counter should be used in
conjunction with the `xxx_request_backend_requests_total` introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10171.
There is no reason to send a request to the first backend if all
backends are marked as broken.
Also,
>// getFirstAvailableBackendURL returns the first available backendURL,
which isn't broken.
The fix only skips a redundant request when all backends are
unavailable, it doesn't introduce any changes from user's perspective,
so I skipped changelog.
When the time series deletion is performed some of the storage caches
need to be reset but some not. This PR reviews all storage caches and
documents why there are reset or not and also places all the resetting
logic (and comments) in one place.
### Describe Your Changes
Previously, a backend was considered healthy as soon as its
'bu.brokenDeadline' deadline expired, even if it was still unavailable.
This caused avoidable request failures and retries.
Now vmauth performs a TCP dial (1s timeout) before restoring the backend
to the healthy
pool. This avoids routing traffic to backends that are still down.
The dial check also covers cases where a route to the backend cannot be
resolved. Without this check, user requests would hang until the
connection timeout, leading to long waits
or errors. The new check fails fast and doesn't impact real user
requests.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Return "too big response size" error only for responses bigger than c.maxScrapeSize
(this option can be set either via max_scrape_size option inside scrape config
or via -promscrape.maxScrapeSize command-line flag).
Previously responses with sizes equal to c.maxScrapeSize were incorrectly rejected.
This should prevent from the excess memory usage because of dangling source byte slices
referred by decoderContext.ls.Labels.
This is a change similar to 63a68edb05
The tagFilters loop cache is per-indexDB which means that currently
there are two instances, one for idbCurr and one for idbPrev. When the
partition index (#8134) is released, there will be as many instances of
this cache as there will be partitions.
The cache is implemented using workingsetcache. Which occupies at least
30MB even when unused. Given that only the latest indexDB is used most
of the time, a lot of memory can be wasted.
Therefore the cache implementation is changed to lrucache because it
does not consume memory when it is unused and also has timeout-based
eviction.
This is a follow-up for 4cd727a511
(https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10072).
Right now we have two separate panels: RSS memory % usage and RSS
anonymous memory % usage. This makes trend comparison difficult because
one have to visually correlate two independent panels. Another problem
is that these panels don't show Go runtime allocations at all. The same
applies to memory allocated in C. There are allocations in C (zstd) one
should account for but there is no even a metric to expose it.
The commit adds Memory usage breakdown panel into Drilldown section. It
provides insight into Go Stack, Go Heap, Go Heap Released, Go Other,
Mmap: VM Cache, File cache memory distribution
It should help spot trends changes in memory by type or invistigate
issues such as #10069 and #10028 easier.
Panel info:
This panel shows memory usage by category.
How to use:
- Start from the high-level RSS panel.
- Identify an instance with unexpected or abnormal memory growth.
- Filter to that instance to inspect the detailed breakdown here.
Interpretation
- A steadily rising Go Heap usually indicates a memory leak. Collect
pprof memory profile.
- A growing Go Stack commonly points to a goroutine leak.
<img width="1508" height="628" alt="Screenshot 2025-12-08 at 13 18 44"
src="https://github.com/user-attachments/assets/0e794324-e86d-468e-b926-8bb11f5a2043"
/>
<img width="1503" height="674" alt="Screenshot 2025-12-08 at 13 19 34"
src="https://github.com/user-attachments/assets/62fc3fff-33b3-4dfe-ad3f-ad0526a8a606"
/>
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10139
Adjust slowness-based rerouting logic.
Rerouting now occurs only from the slowest node, and only if the cluster
as a whole has enough available capacity to handle the additional load.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890
The following dmc metrics were given standard names, i.e.:
- vm_date_metric_id_cache_resets_total became
vm_cache_resets_total{type="indexdb/date_metricID"}
- vm_date_metric_id_cache_syncs_total became
vm_cache_syncs_total{type="indexdb/date_metricID"}
This change should be safe since these metrics are currently not used in
VictoriaMetrics Gragana dashboards.
Additionally, other cache metrics were organized within the code so that
each metric has the same order.
This commit optimizes the performance of the storage by improving the utilization of persisted hourMetricIDs cache to avoid redundant indexDB lookups after vmstorage restart. The change refactors the hour-based cache checking logic using a switch statement to handle multiple hour scenarios more efficiently.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10114
- Remove mentions of `Caches` section in Grafana dashobards since this section does not exist anymore.
- Rewrite a bit the description of cache panels in Troubleshooting section.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
- Add more metrics to the protobuf to parse.
- Measure scan speed of the original protobuf at bytes/sec. Previously the number of ParseStream() calls per second was measured.
`tagFiltersCache` and `dateMeticIDCache` are now per-indexDB. Currently
we have 2 instance of indexDBs (prev and curr) and therefore 2 instances
of each cache.
When the storage stats is collected, the stats of individual caches is
added together. For example, is the `sizeMaxBytes` of each
tagFiltersCache is `100MB` and the `sizeBytes` of each instance is
`10MB` and `99MB`, then the resulting stats will be `sizeMaxBytes ==
200MB, sizeBytes == 109MB`.
While this is accurate, this stats hides a potential problem. It says
that the cache utilization is slightly above `50%` (109/200) and
everything seems to be okay. But in reality one of the caches is
utilized by 99% and soon will start evicting existing records to make
room for new ones, potentially slowing down the data retrieval. Ops
won't see it and will not take necessary action.
The solution is to report stats only for one instance of cache whose
utilization is the highest.
Alternatives considered:
- #10123. Might work, but breaks the encapsulation and can potentially
be slower
- Do not aggregate the stats and report is per-indexDB. This increases
the number of metrics and makes it dependent on the number of indexDB
instances (which can be many once #8134 is released).
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8134
This reverts commit dbe71700b5.
tsidCache is persistent and must be reset before deletedMetricID records
are added to the index. THis is needed to handle ungraceful shutdowns
properly.
A number of changes to `dateMetricIDCache` stats and configuration:
1. Export `SizeMaxBytes` metric and make the size configurable via a
flag
2. Fix `EntriesCount` and `SizeBytes` stats. Previously the cache
reported this stats for its immutable part only. Whereas there are cases
when the number of entries in its mutable part is comparable with the
number in immutable part. The stats from the mutable part remains
invisible until it is sync'ed to the immutable part. It is also possible
that the cache gets reset after the sync because the cache size exceeds
the max allowed size. Reporting the stats for both mutable and immutable
parts should provide a clear picture of the cache utilization.
Together, SizeBytes and SizeMaxBytes should enable tracking the cache
utilization properly. And take appropriate actions if necessary (such as
adjusting the memory resources and/or cache size limit via a flag).
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10064
After closing last connection to vminsert, vmstorage will still wait for
an interval, causing actual shutdown time will be always longger than
configurations.
This commit just skip the last sleep
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10136
Right now we have two separate panels: RSS memory % usage and RSS
anonymous memory % usage. This makes trend comparison difficult because
one have to visually correlate two independent panels. Another problem
is that these panels don't show Go runtime allocations at all. The same
applies to memory allocated in C. There are allocations in C (zstd) one
should account for but there is no even a metric to expose it.
The commit adds Memory usage breakdown panel into Drilldown section. It
provides insight into Go Stack, Go Heap, Go Heap Released, Go Other,
Mmap: VM Cache, File cache memory distribution
It should help spot trends changes in memory by type or invistigate
issues such as #10069 and #10028 easier.
Panel info:
This panel shows memory usage by category.
How to use:
- Start from the high-level RSS panel.
- Identify an instance with unexpected or abnormal memory growth.
- Filter to that instance to inspect the detailed breakdown here.
Interpretation
- A steadily rising Go Heap usually indicates a memory leak. Collect
pprof memory profile.
- A growing Go Stack commonly points to a goroutine leak.
indexDB has 3 block caches. These caches export metrics. Storage
collects these
metrics for each indexDB it has (currently prev and curr only).
There is a potential problem:
- These caches are shared by all indexDBs
- Each indexDB reports the block cache metrics.
- Storage collects the metrics of all indexDBs by adding them together.
I.e. it is possible to count block cache metrics several times.
It is not the case in current implementation because the addition of the
metrics
is not performed intentionally.
The added unit test 1) demonstrates that the resulting counts are
reported
correctly and 2) protects from future unintentional changes in this
behavior.
Additionally a code comment is added to explain why block cache metrics
are not summed up.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, influx parser allocated a new slice byte for
unescape of Row fields. It adds extra pressure at GC and increases CPU
usage.
This commit changes escape to in-place updates for provided []byte.
Since request for parsing is actually a []byte converted into the
string, it's safe to update it in-place. To be able to interact with
[]byte directly, this commit changes parser API and accepts []byte
instead of string.
Benchstat:
```
│ before │ after │
│ sec/op │ sec/op vs base │
RowsUnmarshalUnescape-10 74.68n ± 4% 54.23n ± 5% -27.38% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 40.41n ± 2% 42.59n ± 1% +5.39% (p=0.000 n=10)
geomean 54.93n 48.06n -12.51%
│ before │ after │
│ B/s │ B/s vs base │
RowsUnmarshalUnescape-10 1.035Gi ± 4% 1.425Gi ± 5% +37.72% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 1.613Gi ± 2% 1.531Gi ± 1% -5.11% (p=0.000 n=10)
geomean 1.292Gi 1.477Gi +14.32%
│ before │ after │
│ B/op │ B/op vs base │
RowsUnmarshalUnescape-10 149.00 ± 0% 96.00 ± 0% -35.57% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 80.00 ± 0% 80.00 ± 0% ~ (p=1.000 n=10) ¹
geomean 109.2 87.64 -19.73%
¹ all samples are equal
│ before │ after │
│ allocs/op │ allocs/op vs base │
RowsUnmarshalUnescape-10 5.000 ± 0% 1.000 ± 0% -80.00% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
geomean 2.236 1.000 -55.28%
```
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10053
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously, if user defined value for `memory.allowedBytes` flag
exceeded system memory limit, remaining memory could take negative
value. It results into incorrect memory auto-detect calculations for
various components. Such as vmstorage unique timeseries limit and parts
size.
This commit adds negative value check. And also logs system memory
limit at start-up of vm components.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10083
Previously, in order to cache results for `rate`, we consider
`rate(m[d])` as `(increase(m[d]) / d)` and cache the `increase` result.
However, in MetricsQL, `rate(d) = (lastValue - firstValue) /
(lastTimestamp - firstTimestamp)`, so it does not equal to
`increase(d)/d` if `d != (lastTimestamp - firstTimestamp)`.
Although the issue primarily arises when the time series samples are not
continuous, but the discrepancy is hard to debug and can be confusing to
users. Because the range query doesn't use this optimization, causing
recording rule results to
differ from raw query results in VMUI.
Therefore, it is better to disable the usage and only enable it when we
can cache it correctly.
fixes https://github.com/VictoriaMetrics/victoriaMetrics/issues/10098
As indexDBs became independent from each other, the tsidCache is now
reset more than once when the DeleteSeries() operation is performed. But
it needs to be performed only once. Thus, move the deletion from indexDB
to Storage.
Follow-up for 16d75ab0bd.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10119
There’s no need to call `c.Reset()` for rollup result cache if it’s not
persisted(`-cacheDataPath` not specified) or has already been cleared by
`-search.resetRollupResultCacheOnStartup`, as it is already newly
created.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10095
Previously vmgateway didn't handle http.Abort error.
It could lead to the unexpected panic at webserver.
This commit adds panic recover and prevent app from crash.
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md">js-yaml's
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.1] - 2025-11-12</h2>
<h3>Security</h3>
<ul>
<li>Fix prototype pollution issue in yaml merge (<<)
operator.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="cc482e7759"><code>cc482e7</code></a>
4.1.1 released</li>
<li><a
href="50968b862e"><code>50968b8</code></a>
dist rebuild</li>
<li><a
href="d092d86603"><code>d092d86</code></a>
lint fix</li>
<li><a
href="383665ff42"><code>383665f</code></a>
fix prototype pollution in merge (<<)</li>
<li><a
href="0d3ca7a27b"><code>0d3ca7a</code></a>
README.md: HTTP => HTTPS (<a
href="https://redirect.github.com/nodeca/js-yaml/issues/678">#678</a>)</li>
<li><a
href="49baadd52a"><code>49baadd</code></a>
doc: 'empty' style option for !!null</li>
<li><a
href="ba3460eb9d"><code>ba3460e</code></a>
Fix demo link (<a
href="https://redirect.github.com/nodeca/js-yaml/issues/618">#618</a>)</li>
<li>See full diff in <a
href="https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Previously, context cancellation was ignored when reading user response
for the prompt. That leads to ignoring of "Ctrl+C" and other termination
signals to vmctl until user finishes the input.
Fix that by properly propagating the context and respecting the
cancellation of the context.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Add more S3 configurations.
- SSES3KeyID allows to push to a bucket that is another account as the
KMS key it uses to encrypt data server side.
- ACL allows configure which permissions are given to the object
uploaded on the bucket (usefull when bucket policy expect a given
permission such as `bucket-owner-full-control`).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Some influx clients ( such as nimon monitoring client) adds excess white spaces in the influx line and does not set a
timestamp. Since Influx protocol requires whitespace before timestamp only when it set, it could present without timestamp. Whitespace before omitted timestamp confuses parser.
This commit adds check for the skipped timestamp and test case for it.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10049
Previously, proxy vmselect (aka 1st level vmselect) performed parsing
of MetricBlock received from vmstorage before forwarding it into top vmselect. It required an additional CPU and Memory, which greatly slowed down query requests.
This commit changes lib/vmselectapi iterator API, instead of MetricBlock, it returns encoded MetricBlock as a byte slice.
It allows to save CPU and memory at proxy vmselect by eliminating need of decoding MetricBlock received from storage.
In addition, it adds the following optimizations for proxy vmselect:
* reduces memory allocations by using iterator pool
* add per storageNode workerItem for iterator
Also, it adds optimization for vmstorage, it no longer performs extra memory copy of MetricName for MetricBlock.
vmselect and vmstorage metrics vm_vmselect_metric_rows_read_total and vm_metric_rows_read_total were removed, it's not used at any dashboards and rules. New Iterator API doesn't support it.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9899
The purpose of this PR is the same as #10000, except `lrucache` is used
for implementing tfss cache.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
revert change, that was introduced in
483e00ffb9
since rendering of all nested children significantly impacts alerting
tab performance in case of multiple items
@Loori-R @arturminchukov , what do you think about using react-virtuoso
additionally for alerting tab to decrease dom size?
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
A throttled logger will continue to log messages occasionally with a
suffix indicating how many similar logs were throttled. Using the same
logger for multiple log messages can result in certain logs being
entirely suppressed and invisible in the logs. This updates most of the
loggers used in `appendFromScopeMetrics` to be their own logger so that
"unsupported delta temporality/metric type" logs will be visible for all
metric types. Additionally, `skippedSampleLogger` is only used by
`appendSamplesFromHistogram` so this was moved closer to that function.
Related to #9447
Related to #9498
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
Go runtime executes all the goroutines on GOMAXPROCS operating system threads.
Go runtime cannot switch the OS thread to another goroutine if the current goroutine
is stuck in the major pagefault while reading the data from memory-mapped file,
because Go runtime doesn't distiguinsh between reading from regular memory and reading
from memory-mapped file. So the OS thread becomes stuck while waiting until the OS
reads the data from file at the requested memory address and returns back control to Go application.
In the worst case it is possible that all the GOMAXPROCS threads are stuck in major pagefaults,
so Go runtime pauses executing all the goroutines. This state is possible in environments
with small GOMAXPROCS and high-latency disks such as NFS or small HDD-based disks at AWS.
See https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d for more details.
This commit protects from such stalls by verifying whether the given memory location from memory-mapped file
is already loaded in the OS page cache before reading from that memory.
If the location isn't in the OS page cache, then it falls back to pread() syscall for reading the data from file.
Go runtime allocates extra OS threads for long-running syscalls, so it can continue executing goroutines
across all the GOMAXPROCS threads while reading the data from slow storage via pread() syscall.
This commit uses mincore() syscall for detecting whether the given memory page is available in the OS page cache.
It also caches mincore() results for up to a minute in order to reduce the overhead for the mincore() syscall.
This commit reduces the increase rate for the process_major_pagefaults_total metric by multiple orders of magnitude
on systems with high-latency disks.
Currently, `lrucache.Cache` `SizeBytes()` and `SizeMaxBytes()` return
type is `int`. The cache `Entry.SizeBytes()` also returns `int` value.
Changing the type to `uint64` will allow using `uint64set.Set` as the
cache entry type (see #10072).
Please note that using `uint64` regardless the cpu architecture is set
is not entirely correct, because in 32-bit systems the size won't ever
get bigger than `2^32`, so the `uint64` will too much. However current
type (`int`) is not correct either since it is signed and will only
allow to store values up to `2^31`. Alternatively, all `SizeBytes()`
methods should return `uint`.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
….maxConcurrentRequests error
If `vmstorage` is currently overloaded it could return
maxConcurrentRequests error. Now `vmselect` immediately fails the whole
request even if `replicationFactor` is set up and other replicas could
respond without errors.
This PR treats them as regular errors, not fatal ones.
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.43.0 to 0.45.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4e0068c009"><code>4e0068c</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="e79546e28b"><code>e79546e</code></a>
ssh: curb GSSAPI DoS risk by limiting number of specified OIDs</li>
<li><a
href="f91f7a7c31"><code>f91f7a7</code></a>
ssh/agent: prevent panic on malformed constraint</li>
<li><a
href="2df4153a03"><code>2df4153</code></a>
acme/autocert: let automatic renewal work with short lifetime certs</li>
<li><a
href="bcf6a849ef"><code>bcf6a84</code></a>
acme: pass context to request</li>
<li><a
href="b4f2b62076"><code>b4f2b62</code></a>
ssh: fix error message on unsupported cipher</li>
<li><a
href="79ec3a51fc"><code>79ec3a5</code></a>
ssh: allow to bind to a hostname in remote forwarding</li>
<li><a
href="122a78f140"><code>122a78f</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="c0531f9c34"><code>c0531f9</code></a>
all: eliminate vet diagnostics</li>
<li><a
href="0997000b45"><code>0997000</code></a>
all: fix some comments</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/crypto/compare/v0.43.0...v0.45.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Describe Your Changes
fix#9987
Avoid blocking when a connection to `-opentsdbListenAddr` doesn't send
any data. This issue blocked other connections from being handled.
> This bug can be tested with:
> 1. Start VictoriaMetrics Single-node with `-opentsdbListenAddr=:4242`.
> 2. Run: `telnet 127.0.0.1 4242` without typing any data after
connection established.
> 3. Run (in another terminal, after step 2): `curl -H 'Content-Type:
application/json' -d
'{"metric":"x.y.z","value":2222222.34,"tags":{"t1":"v1","t2":"v2"}}'
http://localhost:4242/api/put`
>
> Before the change:
> - Step 3 was blocked infinitely.
>
> Expect result after the change:
> - Step 3 was executed.
> - Connection established by step 2 will be closed after 5 seconds.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
Clarified the index size note in
docs/guides/understand-your-setup-size/README.md to steer readers toward
the FAQ when indexdb feels oversized, noting typical ratios and
troubleshooting guidance.
- Fix comment
- Re-use dst instead introducing a new variable.
This change has been requested to be in a separated PR during the
pt-index (#8134) code review.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Currently, when a partition is created its corresponding parts.json file
is not created right away (see createNewParition()). Its creation is
delayed until the first part files are created on disk (see
swapSrcWithDstParts()). However, the parts.json file is created for a
possibly empty partition when an existing partition is opened (see
mustOpenPartition()) and when a partition snapshot is create (see
MustCreateSnapshotAt()).
I.e. `parts.json` is an important part of a partition, since it is an
artifact that describes the partition contents. And it should be created
on pt creation even if its contents is empty.
To be honest, this change is mostly a no-op for the current storage
implementation. It only makes the code consistent, i.e. the parts.json
is created along with the partition.
However having it created when a partition is created becomes in
pt-index (#7599, #8134), because it allows having partitions with no
data and therefore without parts.json file. Still not a big deal but the
unit tests start failing.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
dateMetricIDCache does not belong to storage anymore since it has been
moved to indexDB. Instead moving the case to index_db.go, move it to a
separate file in order to navigate the code more easily.
No changes have been done to the code or tests.
Follow up for: #9983
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Alexander Frolov <9749087+fxrlv@users.noreply.github.com>
The data structure used for holding the nextDayMetricIDs is too complex
and can be simplified (flattened).
Follow up for: #9983
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
The change was introduced in pt-index PR (#8134) and is extracted into a
separate PR.
Currently used in partition_search and partition. If you see more places
like this, please let me know.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Looks like the `dateMetricIDCache` must be per indexDB:
- the use of this cache and `is.hasDateMetricID()` often go in pairs. So
it makes
sense to use this cache in that method.
- The same is true for `createPerDayIndexes()`: everytime the index
entry is
created, a corresponding entry is added to the cache.
- As a result the generation field is also removed from the cache.
Related to #7599 and #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This is very frequent question from new users of VcitoriaMetrcs who migrate from other solutions
with automatic data rebalancing among storage nodes, so it is a good idea to cover it in the docs.
Metrics metadata is loaded from a per-tenant storage map
(perTenantStorage map[uint64]map[string]*Row), so result rows order is
non-deterministic. The existing sortRows implementation only sorts by
metric name and ingestion time, which means rows that differ only by
tenant/account ID still sorted undeterministically.
This change updates `sortRows` to include account\project identifiers in
the comparison, ensuring stable and deterministic ordering for metadata
entries that share the same metric name and timestamp.
First discovered as flaky test:
--- FAIL: TestStorageRead (0.00s)
storage_test.go:337: unexpected rows get result (-want, +got):
[]*metricsmetadata.Row{
&{
... // 2 ignored and 1 identical fields
Help: "uselesshelp1",
Unit: "seconds1",
- AccountID: 1,
+ AccountID: 0,
- ProjectID: 1,
+ ProjectID: 0,
Type: 1,
},
&{
... // 2 ignored and 1 identical fields
Help: "uselesshelp1",
Unit: "seconds1",
- AccountID: 0,
+ AccountID: 1,
- ProjectID: 0,
+ ProjectID: 1,
Type: 1,
},
}
FAIL
https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/actions/runs/19361594138/job/55394642029#step:4:133
This commits adds storage part and cluster RPC methods for metrics metadata.
Key concepts:
* vmstorage persists metadata in-memory only.
* vmstorage evicts metadata records older than 1 hour.
* vmstorage stores only the last value of metadata for time series
metric name.
* vminsert opens an additional TCP connection to the vmstorage for
metadata write requests.
* vmselect doesn't support `limit_per_metric_name`.
This feature is available optional and must be enabled via flag - `-enableMetadata` provided to vminsert/vmsingle.
Fixes github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
vmstorage nodes work perfectly with one CPU core and even with 10% of a single CPU core
if the allocated CPU resources matches their workload.
It is better to recommend allocating the an interger number of CPU cores to vmstorage
in order to achieve an optimal performance, since vmstorage allocates internal resources
according to the available CPU cores. If there is a fractional number of CPU cores,
then the allocation of internal resources may be not so optimal.
Fractional number of CPU cores may also lead to increased latencies and stalls
because some P threads at Go runtime won't be able to run goroutines from their ready queues
in a timely manner becasue of the lack of CPU time. See https://victoriametrics.com/blog/kubernetes-cpu-go-gomaxprocs/
Too big values for the -maxConcurrentRequests command-line flag increase memory usage
and increase CPU overhead for processing incoming requests in most cases.
The only valid reason for increasing the value for -maxConcurrentRequests command-line flag
is when many clients send data to vmagent over very slow network.
Previously, zstd Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components. And in case of incorrectly formed zstd block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.
This commit makes ingest endpoints check frame content size and window size headers based on MaxRequest Limits.
For users, if an alerting rule has a misconfigured annotation, it's more
important to deliver the alert when the rule triggers rather than skip
it with templating error logs.
Then users can see the faulty annotation in alert message and fix it.
Note: the previous behavior is retained in replay mode because errors
there should be noticed immediately; hiding them could waste time,
resources and require a re-replay after fixes.
Also the rule's status in the vmalert UI remains unhealthy if templating
failed.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9853
In prometheus ecosystem, a label with an empty value equals no label,
since a query like `test{something=""}` matches all the series without
label `something`.
So for vmalert, preserving empty-value labels in generated alerts or
time series is unnecessary and can cause alert hash mismatches during
[restore](https://docs.victoriametrics.com/victoriametrics/vmalert/#alerts-state-on-restarts).
The empty-value label shouldn't come from datasource response since they
follow the same rule(omit empty-value labels), it may come from
`-external.label` or rule labels, but the empty value could be caused by
occasionally templating failures, which is hard to check there.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
This is a follow-up for the commit 1130adebad .
The EntriesCount, BytesSize and MaxBytesSize metrics must take into account the data
stored in both prev and curr caches, since this data occupies memory and it is expected
that the exposed metrics - vm_cache_entries, vm_cache_size_bytes and vm_cache_size_max_bytes -
take into account all the memory occupied by the corresponding caches.
The GetCalls, SetCalls, Collisions and Corruptions metrics must take into account stats
from the curr cache only, since the corresponding stats for the prev cache is already taken
during the rotation (when moving curr to prev and resetting the previous prev).
The Misses metric must take into account only misses in the prev cache, since these misses
mean that the given entry is missing the both the curr and the prev cache.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
While at it, make sure that the cache mode and cache stats is always read and updated under c.mu lock.
This may help resolving races similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9921
This reverts commit 89fd27c922.
Reason for revert: this commit adds scalability bottleneck in the fast path - Cache.Get() -
in the form of c.getCalls.Add(). This call doesn't scale on systems with big number of CPU cores,
since it needs to update atomically a shared memory from big number of CPU cores.
The Cache.Get() is called per every ingested sample when obtaining TSID by MetricName from the cache
at lib/storage.Storage.get(), so this can be a major bottleneck on systems with many CPU cores.
The solution for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
is to properly track cache requests and misses: cache requests must be taken into account
only at the curr cache, while cache misses must be taken into account only at the prev cache.
This will be implemented in the follow-up commit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
This reverts commit 994dadb4d5.
Reason for revert: the introduced metrics have zero practical applicability.
The lib/workingsetcache doesn't need manual tuning in most cases - its' size
is automatically adjusted to the given working set, if the working set is smaller
than the cache size limit set at the cache creation time. The limit just prevents
unbounded cache growth for large working sets.
If the working set exceeds the given limit, then the cache may become inefficient
because of the increased cache miss rate. The introduced metrics do not help determining
the needed cache size.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
If the cache cannot be saved to the given file, this is a fatal error.
It is better to log this fatal error inside Cache.MustSave() and then exit
instead of returning it to the caller. This makes the code more clear at the caller side.
It is expected that the number of TSIDs misses over the last 5 minutes is zero in steady state.
If it is non-zero, then something wrong happens. That's why it is better to use increase() instead of rate() function
for this alert.
This alert is expected after unclean shutdown (OOM, power off, kill -9) of VictoriaMetrics.
It should go away in a few minutes after the restart while VictoriaMetrics deletes metricIDs
for the missing MetricID->TSID entries which were created for the newly registered time series
just before unclean shutdown. It is OK to delete such metricIDs, since the corresponding time series
will be re-registered again. See the commit 20812008a7 .
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502
When one goroutine attemps to update the min timestamp under the lock it
could have been updated already by another goroutine with a smaller
timestamp. As a result the goroutine will update the timestamp with a
bigger value.
A simple unit test (included in this commit) demonstrates that.
Additionally, use a simple Mutex instead of RWMutex. RWMutexes only
introduce an unnecessary overhead for operations as simple as retrieving
a value from a map and regular Mutex should be preferred.
Thanks to @valyala for spotting a bug and the advice on RWMutexes.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This commit improves overall performance and stability of chart rendering,
refines time series generation, and fixes incorrect median calculation
in metric series.
JavaScript execution time improved by up to ×6 on large datasets.
**Changes:**
* Reworked `getTimeSeries` - one point per pixel.
* Added legend auto-collapse when >20 items.
* Switched median algorithm to Quickselect (Floyd–Rivest).
* Unified array stats functions (`min`, `max`, `avg`, `median`) into a
single pass.
* Removed unused `last` value from series.
* Renamed `roundToMilliseconds` to `roundToThousandths` and moved to
`utils/math`.
* Replaced `isSupportedDuration` with `parseSupportedDuration`, added
fractional duration support.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9699
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9926
When multiple service discovery configs of the same type exist (e.g.,
`hetzner_sd_config`), vmagent currently behaves as follows:
1. Attempts to request each config.
2. Exits immediately if any config returns an error.
3. Skips the rest configs and falls back to the previous service
discovery result.
The correct behavior—more compatible with Prometheus—should be:
1. Attempt to request each config.
2. Collect all valid results.
3. Use the valid results if there's at least one. otherwise (all
failed), fall back to the previous SD result.
Scrape example:
```yaml
scrape_configs:
- job_name: hetzner-default
hetzner_sd_configs:
- role: "hcloud"
authorization:
credentials: "some_valid_value"
- role: "hcloud"
authorization:
credentials: "some_wrong_value"
```
Expected outcome:
- At least targets from `credentials: "some_valid_value"` should appear
in the service discovery result.
current outcome:
- the error from `credentials: "some_wrong_value"` leads to an **empty**
result.
This issue should affect service discovery which using
`getScrapeWorkGeneric` function:
- `azure_sd_config`
- `consul_sd_config`
- `consulagent_sd_config`
- `digitalocean_sd_config`
- `dns_sd_config`
- `docker_sd_config`
- `dockerswarm_sd_config`
- `ec2_sd_config`
- `eureka_sd_config`
- `gce_sd_config`
- `hetzner_sd_config`
- `http_sd_config`
- `kuma_sd_config`
- `marathon_sd_config`
- `nomad_sd_config`
- `openstack_sd_config`
- `ovhcloud_sd_config`
- `puppetdb_sd_config`
- `vultr_sd_config`
- `yandexcloud_sd_config`
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9375
This is needed in order to detect and prevent cases of improper usage of partitions
while they are closed.
This is a follow-up for the commit 9725ee50ec .
### Describe Your Changes
Follow up on PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839, which
addresses review comment
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839#discussion_r2477729886
Alex:
```
this design decision isn't good, since it will lead to potential security issues over time when we'll forget adding ApplySecretFlags() call after the flag.Parse() call or add it at the wrong place. BTW, we do not call flag.Parse() explicitly - instead envflag.Parse() is called. So it is natural to call ApplySecretFlags() inside this call. Are there restrictions which prevent from doing this? If there are no restrictions, then there is no need in making this function public - it will be called explicitly inside envflag.Parse().
```
There is no changelog entry as there is no change in user-visible
behavior.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This should catch possible errors related to improper release of Table parts.
Fix such an error at TestTableCreateSnapshotAt by properly closing all the initialized
TableSearch instances.
Thanks to @rtm0 for pointing to this issue.
Previously, snappy Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components. And in case of incorrectly formed snappy block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.
This commit makes ingest endpoints check block size header based on MaxRequest Limits.
- Standardize all sections to use 'Recommended for:' instead of mixed
'For whom:' and 'Target audience:'
- Fix wording: 'Query evaluation is always local'
Addresses comments in #9919
vmalert tries to spread the moment group starts its evaluation
on `[0..group.interval]` duration. This approach allows to avoid
thundering herd problem when on vmalert start all groups execute their
rules simultaneously. It was introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/724
While for most configs it works great, for groups with big evaluation
intervals (30min, 60min) the first evaluation can be delayed
significantly.
This change introduces a start delay limit via new flag
`--group.maxStartDelay` (5m default).
It limits the `[0..group.interval]` start delay to
`[0..math.min(--group.maxStartDelay, group.interval)]`.
So all groups will start in first 5m or earlier.
The --group.maxStartDelay is ignored if user set `eval_offset`.
The 5m default limitation was picked high to not affect users with
relatively low evaluation intervals.
-----------
Based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9929
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmbackupmanager: enforce newline at the end of CLI result
Previously, vmbackupmanager only printed a response from API which did not include a newline character. That leads to issues with the rendering of the next command when using a shell.
Always append a newline character to avoid breaking shell formatting when using CLI mode.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update docs/victoriametrics/changelog/CHANGELOG.md
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
The load distribution could be uneven when short queries arrive to vmauth while a part of backends are busy
with long-running queries. In this case the major load goes to the backend after a row of busy backend.
Suppose we have four backends - b1, b2, b3 and b4. The first two backends are busy with bigger number
of long-running queries than b3 and b4. Then 75% of short queries will go to b3, while only 25%
of short queries will go to b4.
The new algorithm makes the distribution more even in these cases by storing the next backend
after the chosen backend as candidate for the next query (its' index is stored in the atomicCounter).
Avoid races when updating atomicCounter from concurrently executed queries by using CompareAndSwap() -
if the concurrent query updated it first then the current query won't overwrite it with the outdated value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9712
Data patterns considered:
- Same series, same date
- Same series, different dates
- Different series, same date
- Different series, different dates
To make sure that the pattern condition holds, a new storage instance is
started every benchmark iteration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9912
Previously, vmctl only accepted one label for filtering. Extend this to
allow providing multiple-filters at once. This is useful when migrating
large volumes of data as it allows narrowing down migration scope of
migration for one run so that the source side is not overwhelmed with
migration.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9917/
Before, rules that didn't get evaluated yet were showing weird values in
vmalert's UI. It was happening because of
`time.Since(r.LastEvaluation).Seconds()` expression when
`r.LastEvaluation` had 0 value.
With this change, rules that weren't evaluated yet would show `Never` in
Updated column instead.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9924
This commit request revert the commit
d6bbfaf164 for the following reasons:
1. HTTP/2 carries security risks.
2. Most components in the VictoriaMetrics stack do not require HTTP/2
support.
3. While HTTP/2 support was available only as an option in previous
commit, there remains a potential risk of misusing this option and
enabling HTTP/2 inadvertently.
For components (e.g., VictoriaTraces) that require HTTP/2 support, they
should currently build an HTTP server manually with built-in packages,
instead of using `lib/httpserver` in VictoriaMetrics. If the mentioned
issue is resolved in the future and more components need HTTP/2, this
support can be reintroduced into `lib/httpserver`.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9927
It is a cosmetic change: it simplifies function signature by making it a
method of the Group struct.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Rationale: Having query stats logging enabled by default can greatly
help in investigating incidents.
Currently, it is disabled by default, so many users don’t enable it, and
when issues occur there are no stats available.
After discussion with the team, a 5s threshold was agreed upon as a
reasonable default to capture meaningful slow query data without
excessive logging.
This commit adds new RPC protocol for vminsert-vmstorage communication,
it acts in the same way as vmselect-vmstorage RPC.
It's implemented with new handshake hello methods in a backward
compatible way. Server attempts to parse RPC only if client send new
Hello message, while client fallbacks to the old Hello message if server
closes connection.
This change is need for the new metrics metadata forwarded from vminsert
into vmstorage.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
Changes extracted from PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9487
Previously, if a storage started with curr indexDB different from one
stored in nextDayMetricIDs cache file, the cache would still be loaded
into memory possibly affecting the next day prefill.
This is an unlikely case but it is still possible when:
- A programmer makes a mistake in the code and uses something else
instead of idbCurr.generation.
- Downgrading from pt-index to previous version
Related to #7599 and #8134.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This change validates that QueryRange() method for prometheus datasource
receives response with `matrix` data type. It would throw an error
otherwise.
The change is needed to avoid confusions like in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9779.
The fix is not elegant, but it should be simple from code support
perspective. So each API has its own parsing function. Even if some
processing code is repeated.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Currently, the `httpserver` disabled HTTP/2 support by design, because:
```
// Disable http/2, since it doesn't give any advantages for VictoriaMetrics services.
```
As VictoriaLogs and VictoriaTraces rely on `httpserver`, in order to
support gRPC over HTTP/2, an option to support HTTP/2 is required.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9881
Previously all timeseries pushed into aggregators were added
sequentially. It could cause delays on data ingestion and it was not
possible to use all available.
This commit adds concurrency based on available CPU cores.
Also, it adds new generic Buffer and BufferPool into slicesutil.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9878
Previously a misleading random error could be logged for canceled and/or timed out requests to vmauth.
Consistently log the request timeout error for timed out requests.
While at it, do not log errors for requests canceled by the remote client, since such logs aren't actionable
and just pollute error logs generated by vmauth.
Introduce a new flag, which converts only metric names into Prometheus
compatible format. And keeps label names in original form.
It's needed to keep labels in original form, which
is useful for correlation with other telemetry sources, such as logs or
traces.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
AWS SDK does not modify custom http client configuration if it was provided. This leads to
additional configuration such as environment variables being ignored.
Use AWS http client builder instead of custom implementation and
override DialContext to preserve metrics exposed by custom transport.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9858
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Tracking original labels requires storing a copy of labels obtained
from service discovery. It adds extra Garbage Collection pressure and as
a result increased CPU usage.
While dropOriginalLabels has almost no impact at test and small
installations.
Impact grows with a scale. And especially is impactful at Kubernetes
based installations.
In addition, this flag is disabled by default for `k8s-stack` helm
chart, which is our main Kubernetes monitoring solution.
An also, we recommend at vmagent optimisation guide to disable original
labels storing.
This commit changes default value to true and disables tracking of
dropped targets by default. In case of debugging, it could be easily
enabled back by providing `false` value to the flag:
`promscrape.dropOriginalLabels`. It should improve resource usage out of
box by reducing user-experience for minority of users.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9665
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9772
TextField component has ability to show error message and depending on
it's presence text field height changes, which may cause visibility
issues if this field is vertically aligned with some neighbour
components. This PR makes textfield height constant and its input box
horizontally symmetrical
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9693
Currently, it is hard to make sense of progress based on logging as it
requires manual calculation of progress and ETA.
Solve this by:
- making data units humanly readable
- adding an estimation of completion for the operation
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v3.30.7</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.7 - 06 Oct 2025</h2>
<p>No user facing changes.</p>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.7/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.6</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.6 - 02 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.2. <a
href="https://redirect.github.com/github/codeql-action/pull/3168">#3168</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.6/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.5</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.5 - 26 Sep 2025</h2>
<ul>
<li>We fixed a bug that was introduced in <code>3.30.4</code> with
<code>upload-sarif</code> which resulted in files without a
<code>.sarif</code> extension not getting uploaded. <a
href="https://redirect.github.com/github/codeql-action/pull/3160">#3160</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.5/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.4</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.30.4 - 25 Sep 2025</h2>
<ul>
<li>We have improved the CodeQL Action's ability to validate that the
workflow it is used in does not use different versions of the CodeQL
Action for different workflow steps. Mixing different versions of the
CodeQL Action in the same workflow is unsupported and can lead to
unpredictable results. A warning will now be emitted from the
<code>codeql-action/init</code> step if different versions of the CodeQL
Action are detected in the workflow file. Additionally, an error will
now be thrown by the other CodeQL Action steps if they load a
configuration file that was generated by a different version of the
<code>codeql-action/init</code> step. <a
href="https://redirect.github.com/github/codeql-action/pull/3099">#3099</a>
and <a
href="https://redirect.github.com/github/codeql-action/pull/3100">#3100</a></li>
<li>We added support for reducing the size of dependency caches for Java
analyses, which will reduce cache usage and speed up workflows. This
will be enabled automatically at a later time. <a
href="https://redirect.github.com/github/codeql-action/pull/3107">#3107</a></li>
<li>You can now run the latest CodeQL nightly bundle by passing
<code>tools: nightly</code> to the <code>init</code> action. In general,
the nightly bundle is unstable and we only recommend running it when
directed by GitHub staff. <a
href="https://redirect.github.com/github/codeql-action/pull/3130">#3130</a></li>
<li>Update default CodeQL bundle version to 2.23.1. <a
href="https://redirect.github.com/github/codeql-action/pull/3118">#3118</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.30.4/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.30.3</h2>
<h1>CodeQL Action Changelog</h1>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>3.29.4 - 23 Jul 2025</h2>
<p>No user facing changes.</p>
<h2>3.29.3 - 21 Jul 2025</h2>
<p>No user facing changes.</p>
<h2>3.29.2 - 30 Jun 2025</h2>
<ul>
<li>Experimental: When the <code>quality-queries</code> input for the
<code>init</code> action is provided with an argument, separate
<code>.quality.sarif</code> files are produced and uploaded for each
language with the results of the specified queries. Do not use this in
production as it is part of an internal experiment and subject to change
at any time. <a
href="https://redirect.github.com/github/codeql-action/pull/2935">#2935</a></li>
</ul>
<h2>3.29.1 - 27 Jun 2025</h2>
<ul>
<li>Fix bug in PR analysis where user-provided <code>include</code>
query filter fails to exclude non-included queries. <a
href="https://redirect.github.com/github/codeql-action/pull/2938">#2938</a></li>
<li>Update default CodeQL bundle version to 2.22.1. <a
href="https://redirect.github.com/github/codeql-action/pull/2950">#2950</a></li>
</ul>
<h2>3.29.0 - 11 Jun 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.22.0. <a
href="https://redirect.github.com/github/codeql-action/pull/2925">#2925</a></li>
<li>Bump minimum CodeQL bundle version to 2.16.6. <a
href="https://redirect.github.com/github/codeql-action/pull/2912">#2912</a></li>
</ul>
<h2>3.28.21 - 28 July 2025</h2>
<p>No user facing changes.</p>
<h2>3.28.20 - 21 July 2025</h2>
<ul>
<li>Remove support for combining SARIF files from a single upload for
GHES 3.18, see <a
href="https://github.blog/changelog/2024-05-06-code-scanning-will-stop-combining-runs-from-a-single-upload/">the
changelog post</a>. <a
href="https://redirect.github.com/github/codeql-action/pull/2959">#2959</a></li>
</ul>
<h2>3.28.19 - 03 Jun 2025</h2>
<ul>
<li>The CodeQL Action no longer includes its own copy of the extractor
for the <code>actions</code> language, which is currently in public
preview.
The <code>actions</code> extractor has been included in the CodeQL CLI
since v2.20.6. If your workflow has enabled the <code>actions</code>
language <em>and</em> you have pinned
your <code>tools:</code> property to a specific version of the CodeQL
CLI earlier than v2.20.6, you will need to update to at least CodeQL
v2.20.6 or disable
<code>actions</code> analysis.</li>
<li>Update default CodeQL bundle version to 2.21.4. <a
href="https://redirect.github.com/github/codeql-action/pull/2910">#2910</a></li>
</ul>
<h2>3.28.18 - 16 May 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.21.3. <a
href="https://redirect.github.com/github/codeql-action/pull/2893">#2893</a></li>
<li>Skip validating SARIF produced by CodeQL for improved performance.
<a
href="https://redirect.github.com/github/codeql-action/pull/2894">#2894</a></li>
<li>The number of threads and amount of RAM used by CodeQL can now be
set via the <code>CODEQL_THREADS</code> and <code>CODEQL_RAM</code>
runner environment variables. If set, these environment variables
override the <code>threads</code> and <code>ram</code> inputs
respectively. <a
href="https://redirect.github.com/github/codeql-action/pull/2891">#2891</a></li>
</ul>
<h2>3.28.17 - 02 May 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.21.2. <a
href="https://redirect.github.com/github/codeql-action/pull/2872">#2872</a></li>
</ul>
<h2>3.28.16 - 23 Apr 2025</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="aac66ec793"><code>aac66ec</code></a>
Remove <code>update-proxy-release</code> workflow</li>
<li><a
href="91a63dc72c"><code>91a63dc</code></a>
Remove <code>undefined</code> values from results of
<code>unsafeEntriesInvariant</code></li>
<li><a
href="d25fa60a90"><code>d25fa60</code></a>
ESLint: Disable <code>no-unused-vars</code> for parameters starting with
<code>_</code></li>
<li><a
href="3adb1ff7b8"><code>3adb1ff</code></a>
Reorder supported tags in descending order</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This reverts commit 772ac8803e.
The reason for revert is that this recommendation should not be strict.
Installations with <= 1 vCPU will continue working efficiently. The load
from reads, writes and background merges will be evenly spread by Go runtime.
cc @Sleuth56 @tiny-pangolin
Before, `req.URL.Redacted` info was present in some error messages and
empty in others. This change uniformly adds it to the errors context.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- fixed ignored `search` query argument in `Notifiers` and `Rules` tabs
- added dropdown state reset, if other filters were updated and selected
state is not a subset of available items
- proxy requests to config.json for a local setup
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
While there, remove excessive relabeling info and point users to Routing section.
The Routing section should explain how to build flexible processing pipleines.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Reverts 17ca1ba8c4
Reason for reverts are following:
1. The fix relies on release candidates of specific libraries
2. The real fix would be to update Alpine version, which is not released yet
3. It makes the fix partially done, as it would require follow-up in future to
switch from release candidates to stable versions, or to update Alpine version.
4. The fix is not effective, as it doesn't update the base image cached by Docker.
The real fix will be to host&update the base image separately like in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9811.
5. VM binaries aren't vulnerable to mentioned vulnerabilites.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This reverts commit ccf97a4143.
reason for revert: this change may break tests, which expect that ServesMetrics.GetMetric() fails
when the given metric doesn't exist in the output.
It is better to add 'TryGetMetric() (float64, bool)' function, which would return '(0, false)'
when the given metric doesn't exist, so the caller could decide what to do next.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Add ad-hoc filters to query stats and operator dashboards.
These filters are useful for exploring non-uniform metrics sets
without distinct job/instance filters.
The previous text didn't contain links to vmagent's capabilities.
Instead, it contained misleading multitenancy-mode link that doesn't
seem to be related to the subject.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, `GetMetric` do `t.Fatalf` immediately when the target metric
not exist in `/metrics` page.
However, some metrics may start to appear after the process has been
running for a while. `t.Fatalf` invalidates the retry mechanism of
assertions, if the metric is not found the first time, the test case
will terminate.
This commit request changes `t.Fatalf` to `t.Logf` (instead of `t.Errorf`,
because error output may be considered a test case failure in some
scenarios).
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9773
Follow-up for cea9505bab
fastcache.Cache allocates off-heap memory, which must be explicitly
returned back to the pool with Reset method call.
After changed made at commit above, during cache transit from whole to
split mode, it's possible that current cache is referenced by Cache.Get
or Cache.Call atomic pointers. It leads to potential memory leaks, since
we don't have any memory synchronization for atomic.Pointer.Store calls.
This commit adds `Finalizer` to the `fastcache.Cache` instances.
It properly releases memory, when cache is no reachable.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9769
Previously, cache state transition from split into whole could left
cache into broken state, if Reset cache method was called in switching
mode.
Also, cache Reset didn't start background workers and didn't change
cache size.
This commit properly check mode during cache transition. In addition,
it no longer stops background workers after whole mode transition and
always start workers during start-up.
Access to the prev, curr and mode Cache fields are properly locked
in order to mitigate possible race conditions.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9769
It seems like go compilator skipped computations and allocations for samples
as they weren't used afterwards. Sinking results into global variable removes
this optimizations and benchmark starts showing allocations within `pushSamples` fn.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Make SearchTSIDs look similar to SearchMetricNames, i.e. search for metricIDs within the method
- Make the corresponding corrupted index test look similar to one for metric names search
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Consistently use the `v0.0.0-YYYYMMDDHHMMSS-commit_hash` reference for
the internal deps such as `github.com/VictoriaMetrics/VictoriaMetrics`
dependency, since it allows referring any commit without waiting for the
release tag.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
add ability to limit available in datePicker dates using `minDate` and
`maxDate` parameters. all dates before `minDate` and after `maxDate`
cannot be picked. lower and upper bounds can be set independently.
This `minDate` and `maxDate` parameters aren't set by default in vmui.
The datepicker component with these params is re-used elsewhere.
The change also explciitly mentions `out-of-order` phrase, as it is commonly
used in Prometheus ecosystem.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Searching metricName by metricID happens many times during a single API
call. This requires getting the current set of idbs before those calls
happen. Which is fine but requires propagating idbs across the code
base. This is also fine in case of OSS version as it is used in Search
only.
Propagating idbs across the code base becomes a problem in Enterprise
version as it is used in at least 3 places. As a result it becomes very
difficult to merge things from OSS to Ent.
Localizing the all the dependencies in one searchMetricName type and
reusing this type everywhere should make things simpler.
Related enterprise changes:
https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/compare/search-metric-name-ent?expand=1
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9756
A small refactoring that reduces Search dependency on Storage:
- Move searchTSIDs() from Search to Storage because this method does not
depend on anything Search-specific but does depend on Storage.
- Use metricsTracker instead of storage.metricTracker.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9754
Benchmarking storage search api requires taking into account many
parameters, such as:
- data configuration: how many series, deleted series, search time range
- where the index data recides: prev and or indexDB
- which search operation to measure
While adding a new benchmark use case involves a lot boilerplate code.
This pr implements a framework for testing storage search ops that can
be relatively easily extended. This come in expecially handy when adding
new cases for parition index.
The current set of params will result of a lot of benchmarks to be run
which most probably does not make sense because:
- it will take a lot of time and
- the output data is hard to compare manually.
However, these benchmarks are very useful when only small set of params
is of interest. For example, if I want to compare the search of 100k
metric names when the index data resides in prevOnly, currOnly or
prevAndCurr indexDBs. This would translate in the following cmd:
```shell
go test ./lib/storage --loggerLevel=ERROR -run=^$ -bench=^BenchmarkSearch/MetricNames/.*/VariableSeries/100000$
```
Why this change:
- I often need to run benchmarks with configs that I did not have
before, requires either modifying the existing one or writing a new one.
It is easy to get lost and make benchmark non-comparable
- I need some way to make legacy and pt index benchmarks comparable
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
- state that it is unsafe to use lifecycle rules and describe the reason
- update formatting according latest changes in docs
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
- Rename copyStream to copyStreamToClient in order to make it more clear
that the stream must be copied from backend to client.
- Make sure that the client implements net/http.Flusher interface.
It is a programming error (BUG) if the client passed to copyStreamToClient
doesn't implement net/http.Flusher interface.
- Do not write zero-length data to the backend.
Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/667
1beb629b removed logic which was used in order to keep full backup
location path in the restore mark file. Because of this, backups created
with a shortname (e.g. `vmbackupmanager restore create
daily/2025-09-12`) will fail as backup location is not prepended.
Fix that by properly constructing full backup name from parsed canonical
values.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously, vmagent always set enable.auto.commit to false and manually
commited messages. It adds additional pressure to the kafka brokers and could slow down
data consumption.
This commit allows vmagent to skip manual commit and use auto-commit
based on provided configuration. Which may improve message read throughput.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/931
### Describe Your Changes
This PR introduces a `make docs-update-flags` command that updates flags
in the documentation using the actual binaries compiled from the latest
`enterprise-single-node` and `enterprise-cluster` branches (hardcoded
for now). The command also normalizes the output format.
It can be run from any branch. All work happens inside temporary
directories under /tmp. The script checks out the required branch,
builds the binaries, and updates the documentation. The current Git
repository is not touched.
The command adjusts default values to more meaningful ones, such as
changing `-maxConcurrentInserts` (default 20) to (default
2*cgroup.AvailableCPUs()).
Currently the logic is implemented only for vminsert, vmstorage,
vmselect, vmagent, vmalert, and victoria-metrics (aka single).
The goal is to make it easy to keep documentation synchronized with real
binaries
_**Note:** Please ignore xxx_flags.md files for now. Review flags in
`README.md` and `Cluster-VictoriaMetrics.md`, and `vmagent.md`,
`vmalert.md` only. Once we agree on the changes in those files, I'll
replace the flags with the `{{% content "xxx_flags.md" %}}`._
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
* stress on requirement to have empty destination folder for copying;
* remove extra verbosity from docs;
* remove list vmctl migration options as they became unsynced. Instead of syncing,
refer to the vmctl docs;
* fix typos.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The application version can be then displayed in the vmui. Showing the
application version in vmui should make it easier to determine currently
used VM version (at least vmselect version).
------------
@Loori-R it would be could to add the app version in vmui in a follow-up
PR or by pushing a commit to this branch.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
`Table` component:
- add `format` property for table column, which allows to apply custom
formatting depending on column type
- add `rowClasses` table property, that allows to pass function that
allows to customize row css class depending on row value
- add `rowAction` table property, that allows to execute action while
clicking on table row
`Popper` component:
- add `classes` to specify additional CSS classes for popper to
differentiate from other poppers, since it's mounted to a DOM root
`Switch` component:
- use gap instead of left-margin
`DateTimeInput` component:
- add `dateOnly` property to allow accepting only date in the input
additional fixes:
- fix TopQuery header fields alignment
<img width="1279" height="125" alt="image"
src="https://github.com/user-attachments/assets/08ad4dbc-19e5-47f5-9ccd-a9fb222335a4"
/>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
Set rateEnabled to false for probe_success in VMUI
Fix issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9655
Problem:
probe_success is incorrectly initialized with rateEnabled = true because
the regex detecting counters (/_sum?|_total?|_count?/) matches partial
strings like _su. This causes probe_success (a gauge) to be treated as a
counter, producing slightly misleading graphs. For example, when
rateEnabled is set to true, probe_success often shows as 0 in VMUI when
the probe is actually succeding.
It is not intuative for users to have to disable rateEnabled manually
just to get the correct value for probe_success in VMUI.
Solution:
Update the regex to strictly match suffixes:
`/_sum$|_total$|_count$/`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: William Wren <william.wren@ericsson.com>
### Describe Your Changes
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This is needed for VictoriaLogs, which allows limiting query results with the given set of extra filters
specified via extra_filters query arg. The request url can contain multiple extra_filters query args -
they are all applied with AND logic to the query. See https://docs.victoriametrics.com/victorialogs/querying/#extra-filters
The merge_query_args option at vmauth allows merging the extra_filters provided by the client
(such as Grafana plugin for VictoriaLogs or built-in web UI) with the extra_filters specified in the backend
url at vmauth config.
This is needed for https://github.com/VictoriaMetrics/VictoriaLogs/issues/106
`workingsetcache` is built on top of two
[fastcache](https://github.com/VictoriaMetrics/fastcache) instances
(curr and prev) that are rotated periodically (configurable via
`-cacheExpireDuration` flag). During the rotation curr becomes prev and
prev is discarded, new curr is an empty. If an entry is not found in
curr then the prev cache is checked, and if the entry is found there it
is copied to curr.
`workingsetcache` also exports metrics, such as `EntriesCount`,
`GetCalls`, `SetCalls`, and `Misses` counts. These metrics are currently
implemented as the sum of the same metrics in prev and curr `fastcache`
instances. Given to rotation logic, these counts can be incorrect:
1. `EntriesCount`. It is the sum of prev and curr entry counts. If an
entry is not found in curr and found in prev (and therefore is copied
from prev to curr) the resulting entry count will be incorrect, i.e. it
will count copied entries two times.
2. `GetCalls`. It is the sum of prev and curr get calls. If an entry is
not found in curr the logic will attempt to retrieve it from prev, which
will result in double counting. While it is actually one get call to
`workingsetcache`.
3. `SetCalls`. It is the sum of prev and curr get calls. If an entry is
not found in curr but found in prev it will be copied to curr resulting
in a set call to curr. While from the `workingsetcache` perspective
there hasn't been any set operation at all.
4. `Misses`. It is the sum of prev and curr misses. If an etry is not
found in curr, it is recorded as a miss. If it is then found in prev,
the entry is returned to the caller, but that cache miss remains. If it
is not found in prev, then there will be 2 misses for 1
`worksingsetcache` get call.
This PR introduces `GetCalls`, `SetCalls`, and `Misses` counts at the
`workingsetcache` level in order to count the calls correctly. It also
excludes duplicates from `EntriesCount`.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
### Describe Your Changes
Implemented the script that generates graphs using `gnuplot`.
Those graphs show the write speed to the db.
How to use it:
1. From the root run `make tsbs`;
2. The file will be generated automatically
`/tmp/tsbs-load-100000-2025-07-22T00:00:00Z-2025-07-23T00:00:00Z-80s.csv`
4. From the root run `make tsbs-plot-load` and observe the result
5. If you have two files with the `tsbs_load_victoriametrics` output,
just define the second in the
`TSBS_LOAD_RESULT_CSV_FILE_COMPARE=/tmp/tsbs-load-10
0000-2025-07-22T01:00:00Z-2025-07-23T01:00:00Z-80s.csv
`
To plot the measurements from some other benchmark, run
`make tsbs-plot-load TSBS_LOAD_RESULT_CSV_FILE=/path/to/file.csv`
To plot the measurements from two benchmarks, run
`make tsbs-plot-load TSBS_LOAD_RESULT_CSV_FILE=/path/to/file1.csv
TSBS_LOAD_RESULT_CSV_FILE_COMPARE=/path/to/file2.csv`
This command should generate a graph like described in the picture
<img width="638" height="578" alt="Screenshot 2025-07-25 at 15 35 42"
src="https://github.com/user-attachments/assets/900b05ab-0b98-4f7f-8f2c-18d28ad2eab6"
/>
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
This helps to improve readability of changes, so users
can see more important changes first, and see changes related
to the same component one after another.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Previously mock storage `net.Listen("tcp", …)` could succeed even if
another process was bound to the same port, due to dual-stack behavior
(`[::]:port` vs `0.0.0.0:port`). That lead to strange test results that
hard to bound to port misuse. Tests queried not mock server but whatever
was running on that port.
Switched to `"tcp4"` to ensure conflicts are detected correctly.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
As there are quite a few files, and each file might have multiple
changes and to make it easily to review, i limited the PR to 5 files at
a time.
I suggest you take a look at markdownlint and add it as part of your CI,
similar to
https://github.com/MicrosoftDocs/PowerShell-Docs/blob/main/.markdownlint.yaml
And while at it, take a look at cspell and how its used in thier repo
and replace the python one you have in your current implementation -
might open a PR with it after all the fixes PRs).
This pull request consists of the following:
1. Markdown fixes
following https://www.markdownguide.org/basic-syntax/
and https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md
- Add empty lines after headers or lists
- Remove extra lines between paragraphs
- Remove extra spaces at the end of a line
- Add language to code quote
- Consistent list (dont mix astrixes and dashes on same file, choose one
and be consistent in the same file)
- Proper URL links
- Use meaningful context to URLs instead of "here".
2. Concise language
3. Grammar fixes
- removing extra spaces between words
- there are multiple ones but i picked the basic ones that triggered my
eye :)
4. Spelling fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This allows performing a single MustFsyncPath() for the parent directory after multiple calls to these functions.
This clarifies code paths, which call these functions, and makes them more maintainable.
This also removes a redundant fsync() call for the parent directory when creating a file-based part.
Previously the first fsync() was indirectly called when the directory was created via MustMkdirFailIfExist()
and the second fsync() was called via MustSyncPathAndParentDir() after all the data is written to the part.
The fs.MustWriteSync() already fsyncs the created file, so there is no need in additional fsync() call.
While at it, add missing fsync for the parent directory after creating a directory for persistent queue.
The source file contents should be already fsynced to disk before creating a hard link,
so there is no sense in calling fsync() on the created hard link.
This commit ensures that the -search.maxQueryLen flag applies to Graphite
queries, matching the behavior already present for Prometheus queries.
Previously, Graphite queries could bypass this limit, creating an
inconsistency and a potential vector for resource exhaustion.
Key changes:
Added getMaxQueryLen() to access the global query length limit.
Enforced query length validation in execExpr() for Graphite queries.
Added comprehensive tests for the new validation logic and edge cases.
Error messages are consistent with Prometheus query validation.
The default limit is 16KB (configurable via -search.maxQueryLen).
Setting the limit to 0 disables validation.
This change closes the gap where Graphite queries could exceed
configured length limits, providing consistent protection against
excessively long queries across both query APIs.
Follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9534
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9600
The vm_deleted_metrics_total metric value represents the number of
metricIDs stored in deletedMetricIDs cache. This cache lives at the
storage level and stores the deleted metrics from both prev and curr
idbs. However, the metric is populated at the idb level. Since there are
always 2 idbs (prev and curr), the value is populated twice. Hence the
doubled value of the metric.
The fix is to populate the metric value at the storage level.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9602
- load and parse static`/vmui/config.json`, modify it according to
runtime values and use it as a replacement for static config.json
- remove using `/flags` endpoint for checking features, that should be
enabled on VMUI
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9635
`router.home` represents `/` path, which is the same for all UI apps,
but content and title for root path differs depending on application
type. added `getDefaultOptions` function, which returns proper home
route configuration depending on application type, which allows to
remove renamings in respective layouts
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9641
The commit
25cd5637bc
introduced the `-enableMetadata` flag and the
`promscrape.IsMetadataEnabled()` function, which is now used in multiple
places, including the `app/vminsert/prometheusimport` [request
handler](b24b76ff08/app/vminsert/prometheusimport/request_handler.go (L36)).
Because of the use of `promscrape` package vminsert registered all
`-promscrape.*` service discovery flags, which were not relevant for
`vminsert`.
This change moves the metadata flag logic into a dedicated package,
preventing vminsert from unintentionally loading unrelated promscrape
flags.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9631
This is an attempt to adjust image styles to GitHub themes, because
existing images with transparent backround become unreadable on dark theme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This is an attempt to adjust image styles to GitHub themes, because
existing images with transparent backround become unreadable on dark theme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
When having a `match` of `__name__` key alone for labels api, it's going
to hit max series limit in case of high cardinality metric name.
Instead, we can skip looking by `metricIDs` and fallback to inverted
index scan with a `composite key` since we only have some `__name__` and
a label name.
Common requests for optimisations are:
1) /api/v1/labels?match=up or /api/v1/labels?extra_filters=up
2) /api/v1/label/job/values?match=up or /api/v1/labels?extra_filters =up
It's widely used by grafana variables.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9489
a.Subtract(b) perfomance degrades as b becomes bigger than a. For
example if len(b2) == 10xlen(b1) then time(a.Subtract(b2)) == 10x
time(a.Subtract(b1)).
A quick fix is to iterate over a elements in len(b) > len(a). Iterating
over a's elements and at the same time deleting should be safe since no
elements are actually deleted (i.e. memory freed, etc). Deletion here
means setting a corresponding bit from 1 to 0.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9602
### Describe Your Changes
Add the support of all standard TSDB query types that can be executed
against VictoriaMetrics. `double-groupby-all` is commented out as it
attempts to retrieve all 1B samples and fails. While this can be fixed
by setting the `-search.maxSamplesPerQuery` this query is left disabled
anyway because it will consume way too much memory and cpu time.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
New benchmarks for storage search (data and index):
- Use the same dataset that accounts for prev and curr indexDBs and
deleted series
- The code is more structured
- Account for various numbers of series in response including higher
numbers (>10k) as this appears to be a quite common use case.
These bechmarks were used for investigating #9602 performance issue and
helped discover that prefetching metric names needed to be restored
#9619.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Some messages were written to `stdout` using `fmt.Printf` and
`fmt.Println`, while the other messages like import statistics were
written to `stderr` through the `log` package.
This led to ordering problems where the `Import finished!` +
`VictoriaMetrics importer stats` messages, which expected to be the last
messages, appeared before `Continue import process with filter`
messages, creating confusing output for users.
```
2025/08/20 13:07:26 Import finished!
2025/08/20 13:07:26 VictoriaMetrics importer stats:
time spent while importing: 20h49m10.8497184s;
total bytes: 277.1 GB;
bytes/s: 3.7 MB;
requests: 7978614;
requests retries: 0;
2025/08/20 13:07:26 Total time: 20h49m10.851006088s
Continue import process with filter
filter: match[]={__name__!=""}
start: 2025-08-08T00:00:00Z
end: 2025-08-15T00:00:00Z:
Continue import process with filter
filter: match[]={__name__!=""}
start: 2025-08-15T00:00:00Z
end: 2025-08-19T16:18:15Z:
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
It seems db39f045e1 accidentally reverted
#9419 changes.
```patch
--- a/app/vmagent/remotewrite/client.go
+++ b/app/vmagent/remotewrite/client.go
@@ -448,7 +448,8 @@ again:
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
- if statusCode == 409 {
+ switch statusCode {
+ case 409:
logBlockRejected(block, c.sanitizedURL, resp)
// Just drop block on 409 status code like Prometheus does.
@@ -461,7 +462,13 @@ again:
// - Remote Write v2 specification explicitly specifies a `415 Unsupported Media Type` for unsupported encodings.
// - Real-world implementations of v1 use both 400 and 415 status codes.
// See more in research: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054
- } else if statusCode == 415 || statusCode == 400 {
+ case 415, 400:
+ if c.canDowngradeVMProto.Swap(false) {
+ logger.Infof("received unsupported media type or bad request from remote storage at %q. Downgrading protocol from VictoriaMetrics to Prometheus remote write for all future requests. "+
+ "See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol", c.sanitizedURL)
+ c.useVMProto.Store(false)
+ }
+
if encoding.IsZstd(block) {
logger.Infof("received unsupported media type or bad request from remote storage at %q. Re-packing the block to Prometheus remote write and retrying."+
"See https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol", c.sanitizedURL)
```
cc @makasim
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Previously, if pushBlockPubSub function returned error, vmagent stopped
remote write worker thread assigned for it. Expected behavior for this
scenario is to retry error inside pushBlockPubSub function. It must
return only on vmagent shutdown.
This commit properly handles this error and prevents from ingestion
stop.
Add build tag `disable_grpc_modules` for vmbackup, vmrestore and
vmbackupmanager. Binary size increases only for 3MB with it. It's
acceptable trade-off for security and feature updates.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8008
* check if built binary is present for `make tsbs-build`. Before, if
build fails, the command stopped working.
* make ENV variables configurable from command line, so `TSBS_STEP=15s
make tsbs-generate-data` would respect the configured step.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Another batch of documentation improvements
Fix Spelling in:
- Comments in code
- Displayed strings
One change was in a json file used for the anomaly dashboard in docker,
else no other code was changed.
Some Markdown changes, related to standards:
- URLs
- List numbering
- Empty spaces at the end of a line
Previously, vmagent treated differently the following configuration:
1) ./bin/vmagent --remoteWrite.url=url-0 --remoteWrite.url=url-1 --remoteWrite.disableOndiskQueue
and
2)./bin/vmagent --remoteWrite.url=url-0 --remoteWrite.url=url-1 --remoteWrite.disableOndiskQueue=true,true
In first case, it could produce duplicates and blocks ingestion requests if one of remote write targets were not accessible.
In second case, it implicitly added --remoteWrite.dropSamplesOnOverload as true and silently dropped samples for inaccessible target.
This commit treat this configuration as the same and silently drop samples on both cases to mitigate possible duplicates.
It's expected, that vmagent provides delivery guarantees, only if it has a single remote write target, when flag remoteWrite.disableOndiskQueue=true is set.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9565
While working on #9431 there has been introduced 2 bugs related to
indexDB.searchMetricName():
1. During the search the index records are unconditionally placed in
sparse index
2. If search touches index records in both prev and curr indexDBs, there
will be possible cases that metricIDs can be unintentionally removed
using `wasMetricIDMissingBefore()` logic
Additionally, the PR moves the searchMetricName from indexDB and Search
to Storage which simplifies the code and makes it spossible to reuse the
function as-is in enterprise code.
Follow up for #9431.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
`-eula` was deprecated and made no-op in v1.123.0, so examples with
`-eula` will no longer work.
Replace those with proper license configuration.
While at it, remove license flags from vmbackupmanager CLI commands as
it is not required when using CLI.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Do not check vmauth_config_last_reload_success_timestamp_seconds since
it may contain the timestamp < time.Now() due to how lib/fasttime works.
Instead, compare the number of config reloads.
follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9369 and
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9572
Also, split the config update and reload into two separate functions.
master:
```
$gotest -race ./apptest/tests/ -run=TestSingleVMAuthRouterWithInternalAddr -count=40
ok github.com/VictoriaMetrics/VictoriaMetrics/apptest/tests 90.176s
```
pr:
```
$gotest -race ./apptest/tests/ -run=TestSingleVMAuthRouterWithInternalAddr -count=40
ok github.com/VictoriaMetrics/VictoriaMetrics/apptest/tests 46.130s
```
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Removing extDB from indexDB makes prev, curr, and next indexDBs independent.
I.e. the search is performed independently in prev and curr, the results are
then merged.
Additionally, since no search is now performed in extDB:
- all indexDB search methods now return the original maps used for populating
the result, without invermediate conversion to slices.
- `NoExtDB` suffix has been removed from method names
This has been extracted from #8134.
Signed-off-by: Andrei Baidarov <baidarov@nebius.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
Before, we showed summarized 99th percentile for query complexity across
all available instance. This doesn't make much sense, as it doesn't
answer on the following questions:
1. What complexity limits to set per vmselect
2. What are the most expensive queries
The change is to use `max` instead of `sum`, to show only outliers, the
heaviest served queries. The update should help answering on questions
above.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
fixes case, when `histogram_quantile` result contains gaps, that occur
in same time range, where NaNs are present in a first bucket of a
histogram
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Fix flaky integration test `TestSingleVMAuthRouterWithAuth`.
The flakiness is caused by the
`vmauth_config_last_reload_success_timestamp_seconds` metric, which
reports time with second-level precision.
Update the test to account for this when verifying that the config
reloads correctly.
Previously, if limit was reached for cardinality limiter, vmstorage
started to perform index lookups for any series exceed limit. Since
storage must skip index creation for such series, it's not possible to
cache it. It resulted into opposite effect of cardinality limiter -
instead of reducing resource usage, it increased it instead.
This commit changes cardinality limit calculation from metricID to the
hash from raw metricName. It could slightly increase CPU usage if
cardinality limiter is configured, since hash must be calculated for
each metricName row. But it mitigates excessive CPU and memory usage on
limit hit
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9554
The current one-second timeout for individual read or write operations
during the handshake phase has proven to be insufficient in some
scenarios
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9345. For
example, short-lived CPU spikes lasting a few seconds can cause
handshake failures due to the low timeout threshold.
While a small timeout may work well in environments with fast and
reliable networking, such as within a single datacenter, it becomes
problematic in more complex setups—particularly in a [multi-level
cluster
setup](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multi-level-cluster-setup)
where the top-level vmselect may reside in a different availability zone
and work on a less reliable network.
Another issue with the per-operation timeout approach is that it allows
the total time for a handshake to accumulate significantly in the
worst-case scenario. If each operation experiences a delay just under
the timeout threshold, the entire handshake process could take up to 6s.
Which accounts for 60% of `-search.maxQueueDuration` and leaves only 4s
for the actual query.
Introducing a single timeout for the entire handshake process would
provide more predictable behavior and improve usability from a
configuration standpoint. The timeout for the whole handshake op is also
easier to understand from the operator's point of view. Increasing the
timeout value and providing a configuration option for it would make the
system more resilient to transient conditions like CPU contention and
better suited for use cases involving cross-AZ communication.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9345
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
### Describe Your Changes
The commit changes CI behavior:
- Run build in parallel for different os\arch
- Run unit\integration\lint in parallel
- Remove the custom Go cache step in favor of the logic provided in
`actions/setup-go`. The custom cache was used to build key based on
go.sum and makefiles. This logic is preserved.
- Introduce cache for golangci-lint.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
vmselect is experiencing memory exhaustion and OOM kills
when processing complex Graphite queries with nested functions and large
numbers of label selectors (30k+ values).
The root cause was unbounded growth of the pathExpression field.
This commit adds configurable truncation for Graphite pathExpression fields to
prevent memory exhaustion while preserving query functionality:
New flag: -search.maxGraphitePathExpressionLen=1024 (default 1024
characters)
Safe truncation: Long expressions are truncated with "..." suffix
Zero disables: Set to 0 to disable truncation entirely
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9534/
Previously, tcp listener perform synchronous proxy protocol header
read during connection accept. It could significantly reduce vmauth
performance and lead to timeout at serving http requests.
This commit changes this logic and performs proxy protocol header
parsing during first Read request from connection or RemoteAddr method
call. It significantly improves performance and reduce possible
bottleneck at connections accept method.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9546/
- Rename WriteRequestUnmarshaller to WriteRequestUnmarshaler
- Add a description to WriteRequestUnmarshaler struct
Review comments
b98e592752 (r163365472)
Follow up on
b98e592752
* remove duplicated content between single and cluster versions
* mention recommendation to group component types by jobs in scrape
config
* link the example of scrape configs
* update wording
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This metric can be used for building alerts and graphs for free disk space usage percentage by using the following MetricsQL query:
100 * (vm_free_disk_space_bytes / vm_total_disk_space_bytes)
### Describe Your Changes
By default `zstd.Reader` creates multiple goroutines to process a single
connection:
- It doesn't match cgo behavior, which works synchronously, and creates
a lot more concurrent goroutines (0.5k -> 5k on my workload)
- It results in non-zero `vm_tcpdialer_errors_total{type="read"}` errors
on vmselect because an underlying connection is closed while a goroutine
is still reading from it. The goroutine created by
`zstd.NewReader`/`zstd.Reset`
abb348e4db/lib/handshake/buffered_conn.go (L113-L120)
- vmselect (and vmagent) doesn't benefit from async mode since it has
multiple readers in-use at the same time, which usually exceeds the
number of cpu cores
Partly related to #9218
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This commit introduces new storage API: `/internal/log_new_series`.
It helps to dynamically debug newly created series. Changing `-logNewSeries` value requires storage restart,
which may introduce downtime and is not recommended for production deployments.
In addition, this commit adds flags: `-logNewSeriesAuthKey`, which protects newly added API.
Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8879
This change will expose any IPv6 addresses assigned to an instance under
the meta labels:
* `__meta_gce_public_ipv6` -native IPv6 address, globally routed
* `__meta_gce_internal_ipv6` - unique local address (ULA).
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9370
Previously, the tenant selector was hidden when only one tenant
was returned, making it impossible to run queries in multi-tenant mode.
Now, the selector is always shown as long as at least one tenant exists.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9396
The prompb and prompbmarshal share exactly the same models and provide
marshal and unmarshale capabilities for them. This creates duplication
(changes in one model has to be made in another, case with metadata) and
confusion where for example you compare same looking models but golang
says they are not the same (because of the type).
This commit merge prompbmarshal logic into prompb so the rest of the
code is aligned on prompb models.
Moves samplesPool and labelsPool to WriteRequestUnmarshaller.
Make WriteRequest struct clean from unmarshal logic.
The benchmark shows no significant changes:
$benchstat prompbmarshal.bench prompb2.bench
goos: darwin
goarch: arm64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb
cpu: Apple M1 Pro
│ prompbmarshal.bench │ prompb2.bench │
│ sec/op │ sec/op vs base │
WriteRequestUnmarshalProtobuf-10 189.2µ ± 5% 190.8µ ± 8% ~ (p=0.579 n=10)
WriteRequestMarshalProtobuf-10 145.3µ ± 7% 143.6µ ± 2% ~ (p=0.143 n=10)
geomean 165.8µ 165.5µ -0.14%
│ prompbmarshal.bench │ prompb2.bench │
│ B/s │ B/s vs base │
WriteRequestUnmarshalProtobuf-10 50.42Mi ± 5% 49.99Mi ± 8% ~ (p=0.593 n=10)
WriteRequestMarshalProtobuf-10 65.64Mi ± 7% 66.39Mi ± 2% ~ (p=0.143 n=10)
geomean 57.53Mi 57.61Mi +0.14%
│ prompbmarshal.bench │ prompb2.bench │
│ B/op │ B/op vs base │
WriteRequestUnmarshalProtobuf-10 27.70Ki ± 4% 26.90Ki ± 7% ~ (p=0.190 n=10)
WriteRequestMarshalProtobuf-10 3.267Ki ± 12% 3.273Ki ± 12% ~ (p=0.971 n=10)
geomean 9.514Ki 9.383Ki -1.38%
│ prompbmarshal.bench │ prompb2.bench │
│ allocs/op │ allocs/op vs base │
WriteRequestUnmarshalProtobuf-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
WriteRequestMarshalProtobuf-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
geomean ² +0.00% ²
¹ all samples are equal
² summaries must be >0 to compute geomean
This commit introduces new flag: `-storage.idbPrefillStart` with
default value of `1h`. It allows to adjust start time of the prefill
process for the data written into the next indexDB.
By default, VictoriaMetrics starts prefill indexDB at 3 A.M UTC, while
indexDB rotates at 4 A.M UTC. It could be useful to change start time
from 3 A.M. to 1 A.M or 00:00 A.M. It should smooth overall resource
usage.
However, changing value to the number bigger than 4 hours for the
default
installations doesn't make much sense ( like 11 P.M. for the day before
rotation). Since, VictoriaMetrics maintenances daily-indexes and it have
to repopulate it twice.
But it could be useful in conjunction with `-retentionTimezoneOffset`,
which could delay index rotation for the current day and give more time
for the prefill process.
As an example, `-retentionTimezoneOffset=4h` adds an additional 4 hours
to the rotation time and `-storage.idbPrefillStart` could be changed
accordingly.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9393
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Previously, vmctl tried to create a tmp directory in a directory with
Prometheus snapshot. This is not always possible as snapshot can be
mounted in read-only mode.
Use a system default temporary location and allow user to customize the
tmp path in order to avoid issues with custom tmp locations.
Closes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9505
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The `groupWatchersCleaner` iterates through all watchers, attempting to
lock them sequentially while holding the global `groupWatchersLock`.
Therefore, a large number of group watchers can cause the
`groupWatchersLock.Lock()` to block for a noticeable period.
Proposing to use `TryLock` as an optimistic case because if the watcher
lock is held, it's more likely that the watcher is still in-use so no
further cleanup is required.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9354#issuecomment-3089108889
The lib/fs.MustCloseParallel() accepts a slice of MustWriter items, which must implement only
a single method - MustWrite(). The previous lib/filestream.MustCloseWritersParallel() was
accepting CloseWriter items, which must implement Write() and Path() methods additionally
to MustClose() method. This was adding artificial restrictions on the applicability
of the MustCloseWritersParallel() method. Remove these restrictions.
This should reduce the time needed for the deletion of VictoriaLogs parts
with big number of files, which are created when wide events are ingested into VictoriaLogs
(e.g. logs with big number of log fields).
This may help improving scalability of VictoriaLogs at systems with big number of CPU cores,
which store data into high-latency storage such as Ceph or NFS.
See https://github.com/VictoriaMetrics/VictoriaLogs/issues/517
The import aliases may complicate maintenance of the code in the long term
if they aren't used consistently, e.g. if one file imports the apptest under the default name
while the other file imports the apptest under the "at" name.
The aliases also complicate grepping the code by apptest.* or prompbmarshal.* .
- Drop the code needed for asynchronous removal of the directory on NFS shares.
This code was needed when VictoriaMetrics could keep open files after their deletion
or renaming. This is no longer the case after the commit 43b24164ef .
Now files are deleted only after all the readers close them.
This updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
- Unify MustRemoveAll() and MustRemoveDirAtomic() into MustRemoveDir() and MustRemovePath()
functions:
- The MustRemoveDir() deletes the given directory with all its contents, in an "atomic" way:
it creates a special `.delete-this-dir` file in the directory, then removes all its contents
except of this file, and later removes the `.delete-this-dir` file together with the directory
itself. This makes possible easily determining whether the given directory needs to be deleted
after unclean shutdown - if it contains the `.delete-this-dir` file or if it is empty, it must be deleted.
Add IsPartiallyRemovedDir() function, which can be used for detecting whether the given directory must be removed
at starup.
Previously the MustRemoveDirAtomic() was using a "trick" for atomic directory removal: it was "atomically" renaming
the directory to a temporary directory with '.must-remove.' marker in the directory name, and after that it
was removing the renamed directory. On startup all the directories with the `.must-remove.` marker were deleted
if they are left after unclean shutdown. This "trick" doesn't work for NFS and object storage such as S3,
since these storage systems do not support atomic renaming of directories with multiple entries inside.
The new MustRemoveDir() function doesn't use this "trick", so it can be safely used in NFS and S3-like storage systems.
This is based on the pull request from @func25 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9486/files .
- The MustRemovePath() deletes the given file or an empty directory.
- Delete the existing parts and partitions at startup if they were partially deleted.
- Consistently use fs.MustRemoveDir() and fs.MustRemovePath() instead of os.RemoveAll() across the codebase.
This reduces the amounts of bolierplate code related to error handling.
- Consistently use fs.MustWriteSync() instead of os.WriteFile() across the codebase.
- Consistently use fs.IsPathExist() for checking whether the given path exists on the filesystem.
- Consistently use fs.MustWriteAtomic() for atomic store of the serialized state into file.
- Consistently panic with 'BUG:' prefix on unexpected errors.
- Read and write the state file contents in one go. This simplifies the code for loading and storing the state.
This shouldn't increase memory usage too much, since the parsed state is already stored in RAM.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9074
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9102
The following renamings were requested in #8134 to be done in a separate
PR:
- Rename putTSIDToCache(s) to storeTSIDToCache(s). This is because
get/put prefixes are normally reserved for pools and ref counters. This
unclear though whether name `getTSIDFromCache` is okay in this context.
- Rename `addPartitionNolock` to `addPartitionLocked`, because the
`Locked` suffix is used everywhere else
- Rename deleteMetricIDs to saveDeletedMetricIDs, because no deletion is
actually happening. The deleted metric ids are still present in the
index. One needs to filter the deleted metric ids out after the
retrieval.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously, it was not possible to compile netBSD binary due to missing OS constrains at lib/fs and lib/filestream packages.
This commit fixes it by:
* apply proper constrains at lib/filestream
* Introduce statfs_t and statfs() to abstract unix internals: NetBSD
needs to use unix.Statvfs_t and unix.Statvfs() unlike other Unix-es.
* apply proper constrain for vmctl terminal package
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9473
The newly created data part could become missing after unclean shutdown (such as hardware power off),
since the contents of the parent directory wasn't synced to disk before storing the newly created data part in the parts.json file.
Fix this by syncing the parent directory contents before storing the newly created part in the parts.json file.
This commit is based on https://github.com/VictoriaMetrics/VictoriaLogs/pull/507
This commit adds tmp inmemory and data blocks buffers for
index search requests. It allows to reduce memory allocations on block
cache misses. Since block cache puts block into cache only on after
configured number of cache misses.
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9324
This commit respects online CPU count for process_cpu_cores_available metric. Since CPUQuota may exceed it.
Detailed description:
There might be a case when `getCPUQuota()` returns a value bigger than
available logic CPU cores. In this case the lower value should be used.
These changes also reflect upcoming go1.25 changes
https://tip.golang.org/doc/go1.25#container-aware-gomaxprocs
> If the CPU bandwidth limit is lower than the number of logical CPUs
available, GOMAXPROCS will default to the lower limit
In practice that happens with [CPU
Manager](https://kubernetes.io/blog/2022/12/27/cpumanager-ga/) enabled.
It requires setting `/sys/devices/system/cpu/online` which shows the
total number of cores on a node. The container doesn't have any cgroup
limits, but it's pinned to a limited subset of cores.
```
root@vmselect:/# cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
-1
root@vmselect:/# cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000
root@vmselect:/# cat /sys/devices/system/cpu/online
0-255
```
go1.25 and go.uber.org/automaxprocs don't look into
`/sys/devices/system/cpu/online`, VictoriaMetrics does
The chunkedbuffer is released twice, leading to concurrent use of the
buffer after it's acquired from a pool by two different goroutines.
Issue was introduced at v1.115.0 at the following commit 5b87aff830
### Describe Your Changes
Previously, if I run `make publish-vmstorage` it composed the tag name
like:
```
docker.io/victoriametrics/vmstorage:heads-tcpdialer-increase-idle-timeout-0-g1b20c2dbd3-dirty-84b03c0eEXTRA_DOCKER_TAG_SUFFIX
```
It would work okay if I explisitly provide empty
`EXTRA_DOCKER_TAG_SUFFIX=` env var.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
This reverts commit 63e1bf5d97. The reason
for revert is that the change makes increase/rate behavior less
predictable for users. See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9420.
Another negative impact of this change is that it negatively affects
recording rules that calculate increases or rates over time series on
big intervals. For example,
`increase(http_errors_total{instance="foo"}[30d])` would stop producing
results if `http_errors_total{instance="foo"}` was marked as stale. But
this is logically incorrect, as `http_errors_total{instance="foo"}` over
last 30d should still return results.
-------------
This change doesn't revert
63e1bf5d97
completely. It keeps changes made to `removeCounterResets` in order to
preserve original `staleNaN` values. Before, `staleNaN` were compared
with float values and producing `NaN` instead. Which could confuse us in
future. So keeping the fix and tests. It shouldn't have any negative
effect.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, snapshots were never deleted automatically. This lead to
disk space waste in case snapshots were left behind and never deleted,
which happened in case of backups failure. This required manual
investigation and snapshots cleanup.
This change enables removal of snapshots older than 3 days by default.
This should give enough time to upload backup in time and also make sure
old snapshots are not wasting disk space.
Closes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9344
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Related issue: #9388
- removed all the code related to VictoriaLogs
- updated dependencies to the latest versions
- removed unnecessary `React` import from components
- removed deprecated dependencies such as `lodash.get`,
`@babel/plugin-proposal-nullish-coalescing-operator`,
`@babel/plugin-proposal-private-property-in-object`
- removed unused packages
- fixed proxy for local playground development (after refresh the page
app cannot properly parse the `/#/` in the path)
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
Add unit tests that verify filesystem hierarchy of the indexdb after
opening the storage and after rotation. The test have been originally
added in #8134 but are still applicable to the current version and will
also reduce the diff.
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
- Move `prefetchMetricNames`, `SearchMetricNames`, and
`SearchLabelValues` from `Storage` to `indexDB`
- Rename `searchLabelValuesWithFiltersOnTimeRange` to
`searchLabelValuesOnTimeRange`
- Rename `searchLabelValuesWithFiltersOnDate` to
`searchLabelValuesOnDate`
Extracted from #8134.
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
- Split the release process into two steps.
- Add some links
- Described in more details what should be done to check branches in
sync.
- Added some small checks, like running tests, and testing final release
on sandbox.
- Changed the order of some actions (like build final image before
publish release).]
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
When you try to run `make docker-cluster-up` and try to check metrics in
the Explore page of the Grafana, you will see the basic auth request. If
you enter the correct login and password from the `auth-vm-cluster.yml`
it will continue to ask about the username and password. It happens
because the datasource is configured to ask for data vmauth, but in the
configuration, no auth information is provided.
Added the basich auth info to the datasource provision file
Adding note for -dst config. Adding additional reference for snapshot troubleshooting for better accessibility
Making documentation easier to use following customer issues
TestClusterMultiTenantSelect passes in cluster branch but fails in master.
Integration tests must pass in any branch since they do not distinguish
between master and cluster.
The test fails because the multitenant_test.go is different in master and
cluster. This commit makes the file identical in both branches.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
The example for using the `activeAt` template in `vmalert.md` was wrong,
with the `from` field being before the `to` field. This switches them
round to be correct.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
The remoteAddr and requestURI aren't sent to the client in the error response after the commit 524f58c78c .
They are logged locally instead. This increases security by not exposing potentially sensitive information (remoteAddr and requestURI) to the client.
This doesn't reduce debuggability, since remoteAddr and requestURI are logged locally together with the error message.
### Describe Your Changes
This PR adds support for parsing Unix timestamps (both integer and
float) in the format pipe using the `time:` prefix.
The timestamp precision (seconds, milliseconds, microseconds, or
nanoseconds) is automatically determined based on the value.
It might be worth creating a new prefix for this rather than reusing
`time:`, but I haven't found any compelling reason to extend the syntax.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
This should allow detecting the client, which led to the error logged via SendPrometheusError() function,
e.g. this improves troubleshooting of the logged errors.
This is a follow-up for c9bb4ddeed
This commit renames flag `remoteWrite.retryMaxTime` into `remoteWrite.retryMaxInterval`. New name aligns with corresponding `MinInterval` flag. Previous flag name still could be used, but vmagent will log warning message with suggested migration.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9169
While working on #8134 it has been discovered that
`/graphite/index.json` always returns an empty result. This PR adds a
test and a fix. This fix is a no-op for the master but adding it here in
order to reduce the diff in #8134.
Co-authored-by: Artem Fetishev <149964189+rtm0@users.noreply.github.com>
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
Add metrics to track the behavior of the message processor:
- `vl_insert_processors_count` tracks the current number of active log
message processors.
- `vl_insert_flush_duration_seconds{type="protocol"}` measures the time
taken to flush log data from recently parsed logs to the underlying
storage system (in-memory buffering, remote storage nodes, or local
storage).
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9320
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Re-use existing `lib/snapshot` for snapshot operations in order to reduce code duplicates and unify supported options and error handling.
Previously, vmbackupmanager did not support `snapshot.tls*` options and did not handle errors in a same was as vmbackup does.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9340
When the page loads, the query field automatically receives focus, which
is expected. However, this also immediately opens the autocomplete
popup, triggering an unnecessary request and showing suggestions before
the user starts typing.
This PR commit the autocomplete popup from opening on initial page
load.
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9342
This commits changes release process by introducing new intermediate
docker image tag with `-rcY` suffix. It's needed to properly test
releases for possible regressions without need to erase release docker
images. Which could be cached by some proxy and actually used in
production.
Release candidate image must re-tagged into final release via make
command.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9136
This commit adds the following new metric `vm_cache_eviction_bytes_total` for tsid, metricName and metricIDs caches backed by workingset cache.
The problem this is attempting to solve is doing fine grained tuning for
caches on large high churn, high query load VictoriaMetrics deployments.
Cache performance in those scenarios is critical for cluster
performance, however for stability reasons its imperative not to provide
_too much_ cache space compared to available memory or there becomes a
risk of OOM which can easily destabilize a cluster.
Given the cost of memory, especially for large deployments, there is
therefore a need to optimize cache usage to be near the minimum required
to provide acceptable performance under churn, ingestion and query
loads.
VM provides an excellent set of flags for configuring this cache
performance(size and expiry and removal percentage).
However specifically when tuning `prevCacheRemovalPercent` and
`cacheExpireDuration` it can be hard to known exactly what impact
they're having. Sometimes its very clear when theres cyclical behavior
that cacheExpireDuration is involved, but in other scenarios it can be
very unclear why cache miss rate goes up.
Specifically when attempting to debug spikes in cache miss rate there
are questions that are hard to answer:
* Did a new workload come in that was poorly cached?
* Were caches rotated due to expiry and the `prev` cache contained many
active entries due to limited cache size?
* Where the caches rotated due to miss percentage and our tuning is too
aggressive?
In many of these debugging scenarios it would be greatly helpful to have
explicit metrics on how many bytes of key caches were evicted and for
what reasons, it then becomes trivial to associate spikes in eviction
for a specific reason with a miss rate spike that causes poor
performance. The appropriate tuning them becomes much more obvious.
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
removeCounterResets was operating only using stalenessInterval (when set
by user) and didn't account for user-specified [window] in rollup
functions. Since in rollup functions MetricsQL could look behind for up
to [window] interval - it can capture those datapoints that weren't
accounted in removeCounterResets. With this change, we remove resets on
stalenessInterval+window interval to avoid this corner case.
Fixes
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935#issuecomment-2978728661
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit adds a new option `concurrency` for kafka producer, which adds additional workers
for publishing messages into the kafka cluster.
This setting could be useful at networks with high latency. And it increases write throughput.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9249
Headers sent together with queries could contain important information
about the source of the request. This could help to find sources that send specific queries.
Logging all headers could expose sensitive info in logs.
So introducing `-search.logSlowQueryStatsHeaders` to whitelist headers that need
to be logged. See test case for example.
Previously, regex `.+|^$` was optimised into `.+`. But in fact, it must
optimised into `.*`. Since this regex matches any non empty symbols and
empty value. Which combines into any input.
This commit adds a special case of isDotStar check, which covers this
expression.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9290
This commit introduces new component - VictoriaLogs Agent (vlagent).
It accepts logs data via any data ingestion protocol supported by
VictoriaLogs and forwards it to the provided remote storages.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766
It appears the `lib/mergeset.Item` violates the 3rd rule
https://pkg.go.dev/unsafe#Pointer
> (3) Conversion of a Pointer to a uintptr and back, with arithmetic.
>
> Unlike in C, it is not valid to advance a pointer just beyond the end
of its original allocation:
> ```
> // INVALID: end points outside allocated space.
> b := make([]byte, n)`
> end = unsafe.Pointer(uintptr(unsafe.Pointer(&b[0])) + uintptr(n))
> ```
`CGO_ENABLED=0 go test -trimpath -gcflags=all=-d=checkptr -run
TestTableAddItemsSerial`
```
fatal error: checkptr: pointer arithmetic result points to invalid allocation
This commit adds a special path to item encoding, that prevents invalid memory references.
See this PR for details:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9264
* reduce time wait on notifiers update
* deterministically sort notifiers() call. This improves and tests and output for users
a9b641531b
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Besides `/select/`-prefixed requests vmauth also does `/admin/` requests
to fetch tenant IDs. Without whitelisting, such requests will fail to route.
I am adding this change only to generic vmauth configs where /admin/ access
is expected to be by default.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This is ready, since https://github.com/VictoriaMetrics/cloud/pull/3229
is merged .
This PR deletes the cloud docs folder after moving it into the private
cloud repo. The main reason is to keep things tidy and handle reviews in
a better way, as the helm charts repo is doing.
Future updates should be protected since the github actions file with
rsync command should not be messing with this folder when updating
everything.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
### Describe Your Changes
The example in the VictoriaLogs docs (Comments section) is currently
missing an aggregate function in the `stats` pipe:
```logsql
error # find logs with `error` word
| stats by (_stream) logs # then count the number of logs per `_stream` label
| sort by (logs) desc # then sort by the found logs in descending order
| limit 5 # and show top 5 streams with the biggest number of logs
```
However, `stats by (_stream) logs` is invalid syntax - `logs` is
interpreted as a function name, which causes a parsing error.
This fix replaces it with a valid version using `count()`:
```logsql
| stats by (_stream) count() as logs
```
Without `count()`, the query fails with:
```
cannot parse 'stats' pipe: unknown stats func "logs"
```
Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
* split migration mods in separate sub-section. This removes conflicting
#-anchors and makes it easier to read&modify in future
* remove duplicating wording
* simplify texts
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8964
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, if annotation was update on object, it could result into
duplicate targets register for dropped targets service-discovery page.
Mostly it affects endpoint annotations update. Endpoint holds
annotation with last update time. If any pod that belongs to the given
annotation changed, it causes duplication targets for all pod backed by
the endpoint. It makes service-discovery debug page hard to use.
This commit excludes `__metadata_kubernetes_*_annotation_` from key
generation for dropped targets map. Instead it updates target with new
labels value.
It may lead to some targets collision, but
since hash function already could produce collisions, it should not be a
problem.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8626
The files with synctest usage are built only if the goexperiment.synctest build tag is set.
This allows running `go test ./lib/...` and `go vet ./lib/...` without the need to pass GOEXPERIMENT=synctest to them.
This is a follow-up for d33d7e20be
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8848
New variable should help to change docker image tag name during release prepare.
Instead of publishing an exact version, like v1.1.0, it should use suffixed version v1.1.0-rc1.
Which should be re-tagged later, when release will be promoted to stable.
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9136
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Revival and modification of original PR
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8762 after
discussion on
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7387.
`group.concurrency` is now respected if and only if
`-replay.rulesDelay=0` rather than always. This allows rules to be run
concurrently without ambiguity about rule chaining. If
`-replay.rulesDelay` is set greater than zero concurrency is still
ignored. This will be the default behavior since it defaults to 1s.
Implementation considerations:
I chose to add split some simple logic into a helper function in
preparation for adding `replay.singleRuleEvaluationConcurrency` in a
follow up PR as thats we're we can share that logic.
cc @Haleygo
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Co-authored-by: Hui Wang <haley@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
During the data retrieval, VictoriaMetrics switches to global index
search if the time range is > 40 days. Prior
ba0d7dc2fc, this switch was handled in
indexDB code. That commit moved that logic to the storage level. As the
result, indexDB will search per-day index for whatever time range that
is passed to it.
This broke the BenchmarkHeadPostingForMatchers. It now shows very slow
indexDB performance compared to v1.111.0 (the last release before that
commit). This is because now indexDB tries to search 55 years of data by
spawning a separate goroutine for each day. Even though the data was
inserted only for the current day, running that many goroutines
significantly slows down the search.
The fix is to use global index time range explicitly.
Fixes#9203
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Similar to other storage caches `storage/metricName` can be very
important to performance, however it is not tunable independently like
other caches.
In high cardinality setups where a large amount of that cardinality is
actively queried we can see a high `metricName` miss rate.
The only way to correct this is to increase available memory either by
provisioning more or by increasing `memory.allowedPercent`, which is
often expensive or undesirable for stability reasons.
Its possible to work around this by increasing `memory.allowedPercent`
and then adjusting `storage.cacheSizeIndexDBDataBlocks` and
`storage.cacheSizeStorageTSID` down as they are the largest caches, but
this is a less ideal solution than being able to directly control this
cache size.
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8843
### Describe Your Changes
The `DataBlock` contains structs with string fields, and while the
original comment mentioned not holding references to `br`, it wasn't
immediately clear that this also applies to fields like strings within
the data.
This change clarifies that the `writeBlockResultFunc` must not retain
references to any part of `br`, including its fields. This makes it
explicit that even seemingly safe types like strings must be copied if
needed.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
The 'select' HTTP request duration metric is currently implemented as a
summary, while the 'insert' HTTP request duration uses a histogram.
All time series under the same metric name must use the same metric
type. Using the summary type for this metric is preferable, as it
significantly reduces the number of unique time series generated. While
summaries have the limitation of not supporting accurate aggregation
across multiple instances or paths, this trade-off is more manageable
than dealing with a high-cardinality explosion caused by histograms.
### Describe Your Changes
- removed unneeded loop for binary field value size extraction, since it
should not be delimited by multiple `\n` symbols
- changed loop condition
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
### Describe Your Changes
Related issue: #8749
Added query to requests `/field_names` and `/field_values` to get more
precision results:
- If `filterName` is `_msg` or `_stream_id`, the query cannot be
generated specifically,
so a wildcard query (`"*"`) is returned.
- If `filterName` is `_stream`, the query is generated using regexp
(`{type=~"value.*"}`).
- If `filterName` is `_time`, a simplified query is created by trimming
the value up
to the first occurrence of a delimiter such as `-` or `:`.
- For all other values of `filterName`, a prefix query is returned using
the `query` value with a `*` appended (e.g., `"value*"`).
Related issue: #8806
Enhanced autocomplete with parsed field suggestions from unpack pipe.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
This PR adds a call to optimize pipes after the `limit` pipe has been
appended.
Related: #9200
While this approach is not ideal, since it forces us to re-optimize all
pipes, but it is simpler.
An alternative would be to reapply only the relevant optimizations
specifically for this case, something like:
```go
func (q *Query) AddPipeLimit(n uint64) {
if len(q.pipes) > 0 {
ps, ok := q.pipes[len(q.pipes)-1].(*pipeSort)
if ok {
if ps.limit == 0 || n < ps.limit {
ps.limit = n
}
return
}
pu, ok := q.pipes[len(q.pipes)-1].(*pipeUniq)
if ok {
if pu.limit == 0 || n < pu.limit {
pu.limit = n
}
return
}
}
q.pipes = append(q.pipes, &pipeLimit{
limit: n,
})
}
```
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
vlinsert package could be used as an imported dependency. Mostly it's
needed for vlagent, but it could be a case for other applications.
This commit introduces new interface for vlinsertutil package, which
represents actual storage for log rows ingestion.
It includes CanWriteData, which also could be used to introduce
back-pressure mechanism for vlinsert clients.
Related PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9034
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously, ForEachRow always reset last row fields after iteration.
It makes impossible concurrent iteration with forEachRow, since
ForEachRow performed hidden mutation of LogRows.
This commit resolves this issue by removal of fields reference.
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076
This allows parsing unlimited number of logs in a single HTTP request,
without the need to buffer the logs in memory.
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9070
Thanks to @AndrewChubatiuk for the initial pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9153
This commit is based on the https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9153 .
It contains the following changes comparing to the original pull request:
- Remove ugly function LineReader.NextLineWithLineFn(). Instead, uglify the Journald parser a bit
with hacky calls to LineReader.NextLine() in order to parse binary-encoded field values.
This should preserve the maintainability of the LineReader, which is shared among multiple protocol parsers,
under control, while keeping the complexity of Journald parsing inside the app/vlinsert/journald package.
- Fix a typo bug inside isNameValid() - `(r < '0' && r > '9')` must be written as `(r < '0' || r > '9')`.
Rewrite isNameValid() into easier to understand code and rename it to isValidJournaldFieldName() for better readability.
Add tests for this function.
- Remove mentioning of the -journald.maxRequestSize command-line flag from VictoriaLogs docs.
- Add the description of the fix to VictoriaLogs changelog.
- Properly increment errorsTotal metric on every journald parse error.
- Add missing protoparserutil.PutUncompressedReader(reader) call, so the reader could be re-used between client requests.
- Remove improperly working code, which tries continuing parsing the request stream after parse errors.
It is impossible to recover reliably from journald parse errors related to reading the data from the request stream,
since the journald protocol format is completely braindead. So it is better to immediately return the error
to the client instead of trying to recover. The only errors, which could be recovered, are related to invalid field names / values.
Such errors are logged with the WARN level and the corresponding fields are skipped.
- Fix incorrect storage of the re-used name and value strings into fb.fields. The contents of the name and value strings
must be copied per every loop, which reads these strings from the request stream. Otherwise the contents of the previously
added Name and Value fields into fb.fields will be overwritten on the next loop.
- Ensure that LineReader.Line is set to nil after LineReader.NextLine() returns false. This should prevent from subtle bugs
when the LineReader.Line is read after LineReader.NextLine() returns false.
### Describe Your Changes
> ⚠️ still draft, don't merge even if already approved
Docs update for vmanomaly v1.24.0 release with stateful service option
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
Such filters can be optimized to `*`. This avoid executing other OR filters.
For example, `foo or * or bar` is optimized to `*`, while `foo` and `bar` filters aren't executed.
Such filters are frequently generated by Grafana, so this should improve query performance there.
Use getCompoundToken() instead of getCompoundFuncArg() for obtaining regexp filter value,
since getCompoundFuncArg() skips trailing '*' chars.
This allows detecting invalid queries in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8582 .
`Samples` could be confusing for users, especially for alerting rules:
* we say in docs that each returned "series" will create a new alert
* we say that there are "series fetched", so both columns should be
either "series" or "samples".
Let's rename it to `series` for consistency.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
…s `query` template, but only logging a warning
The `query` template is not supported in replay mode, because we perform
range queries on the rule’s expression, but not on the `query` template.
Previously, if user see error `query template isn't supported in replay
mode`, they need remove the `query` template from the rule for replay
mode.
Also, templating is only used for alerting rules. Replaying alerting
rules don't send notifications(rule annotations are included here) and
users mainly focus on the generated `ALERTS`, the `query` result is
trivial. This pull request shouldn't break things but simplifies the
usage of replay mode for the case.
related https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8746
### Describe Your Changes
Users try to migrate from systems like Thanos or Prometheus using
`vmctl` and `promtheus` migration protocol, with questions about the
problems when they try to specify the snapshot, but `vmctl` shows `0`
series to be found for migration.
This issue happens because users specify the block folder instead of the
snapshot folder.
Added clarification about snapshot structure and its appearance with
multiple blocks inside.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
* packaging/make: fix archiving release binaries for cluster
Cluster binaries for non-windows platforms did not include fips binaries, fix that by properly including binaries.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
See #9144
VictoriaLogs was not correctly parsing timestamps from journald data
when using the default time field configuration. This caused
VictoriaLogs to look for a `_time` field instead of
`__REALTIME_TIMESTAMP` in journald data by default, resulting in
timestamps falling back to ingestion time rather than the actual log
timestamps.
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The CommonParams.TimeFields is initialized to []{"_time"} by default. This prevent from the proper usage of the -journald.timeField
as the default field for reading log timestamps.
This bug has been introduced in the commit a1a731eb61
While at it, make sure that every parsed log entry has its own timestamp if the timestamp couldn't be read from the log entry.
This provides reliable sort order of the log fields.
### Describe Your Changes
fixes#8535
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Storage node with large number of partitions (e.g. 150+ partitions with 12Tb of data) will take more than 30 seconds to start. This was causing vmbackupmanager restarts when running vmstorage colocated with vmbackupmanager in a single pod.
Use exponential backoff for retries and increase overall timeout for storage node healthcheck to 3 minutes to avoid vmbackupmanager restarts during storage node startup.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Using SAS token is not required when copying data within a single
storage account. Using a plain object URL is sufficient for this case. A
single storage account is normally used to store backups so it is safe
to remove SAS tokens usage.
This fixes support of server-side copy when using managed identity as
authentication source as SAS token can only be generated when using
"shared key" type of credentials.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9131
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
removed trailing comma in ignoreFields, which led to _msg field removal,
since it is converted to empty string before filtering
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
### Describe Your Changes
This pull request addresses a bug in the Less method of the tagFilter
struct. The original implementation incorrectly assigned the value of
isCompositeB by calling tf.isComposite() instead of other.isComposite().
This caused both isCompositeA and isCompositeB to always have the same
value, leading to incorrect comparisons when determining the order of
tagFilter instances.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
### Describe Your Changes
Currently, some PRs have a failed CI due to low code coverage reported
by Codecov. However, the team typically ignore this and relies on other
quality indicators such as thorough code reviews.
This change configures Codecov to continue posting coverage reports
without marking the build as failed.
It also helps reduce confusion for external contributors, who might
otherwise feel pressured to add unnecessary tests just to satisfy
Codecov requirements (for example
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9002#discussion_r2111651046).
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
This commit adds a basic backup/restore test for vmsingle and vmcluster. A
more sophisticated was originally added to the partition index PR
(#8134) and was aimed to test backup/restore when switching back and
forth between legacy and partition index. During the code review it was
decided that it would be good to have a separate test as well since
legacy code will be removed in future and so will the test.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This speeds up building the Go builder image significantly (from hours to a few minutes),
since the build speed was limited by the download speed from https://musl.cc , and this speed
was extremely slow (e.g. 10kb/s and slower).
This also improves build security, since the local mirror of musl.cc is under our control.
This PR addresses two issues:
When tenant labels (e.g. vm_account_id, vm_project_id) are passed via
extra_filters, they were not included in the rollupCache key. This could
cause cache entries to be reused across different tenants, resulting in
incorrect query results.
If a tenant is specified only via extra_filters, and that tenant does
not exist in TenantsCached, it gets silently filtered out by
GetTenantTokensFromFilters, causing the query to fall back to a global
(non-tenant) query — which is likely unexpected and potentially unsafe.
This fix ensures correct tenant scoping and avoids unintended data
exposure or cache pollution.
Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9001
---------
Co-authored-by: Zakhar Bessarab <me@zekker.dev>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
### Describe Your Changes
Typo? It's called "fields" pipe, not "field".
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Remove the community-provided dashboard as it remains without updates
for a few years already. Recommending it may hurt user's experience.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Dual stack mode is disabled by default in order to avoid accidentally
exposing components via IPv6 networks.
vmctl does not expose any endpoints and does not allow using default Go
flags as it is using `urfave/cli` lib.
This commit enables IPv6 support by default since there is no security
risks related to network configuration and this make vmctl easier to use
with default configuration.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9116
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
This change adds logging of the number of skipped bytes when a log line
exceeds the configured `insert.maxLineSizeBytes`.
it helps diagnose and tune systems dealing with oversized log records by
showing how much to increase the parameter for the log to fit in
storage.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
---------
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
Added a log limit if the 200 logs per second limit is reached and a
notification for the user asking them to add a filter to the query
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
### Describe Your Changes
This PR adds the explore section to the docs. It emphasizes on
explaining and linking assets for VMUI and
MetricsQL
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
### Describe Your Changes
Properly check precense of `/`, previously it was
ignoring a case where "/" would be at the beginning of the string.
This is a follow-up for 00712b18
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
## Problem
In vlcluster evel setups, components like vlselect can still accept and
forward /insert requests. The lack of strict endpoint control increases
the risk of human error and undermines deployment security boundaries.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9061
## Fix
Add flags to disable the vlinsert and vlselect endpoints. The
`-insert.disable` flag also disables the internalinsert endpoint.
Similarly for vlselect.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously, if metric_usage_tracker file was corrupted. It prevented
VictoriaMetrics from start and required manual action. Corruption may
happen in various reasons, such as unclean shutdown of the process.
This commit changes panic into error message, in the same way as other
caches do.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9074
Previously, any case when cache returned items was skipping lookup of
tenants at vmstorage nodes. This leaded to inconsistent results for
cases when cache contained items to cover only some part of requested
time range.
Fix this by forcing a cache item to cover full requested time range.
This forces cache hits to always be "full hits".
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9042
Target branch for this PR is another PR related to the same issue -
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9048, this is in
order to avoid additional rebasing/merge as this PR will conflict with
cluster branch after initial PR merge. GH will change target for this pr
to cluster brance once #9048 will be merged.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This commits adds additional vmselect routes.
Such as `/static`, `/api/v1/status/metric_names_stats` and others.
In addition it properly redirects `/vmui` and `/vmalert` access endpoints requests. Such endpoints require to preserve trailing `/`. Previously it was omitted and redirect requests failed.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9003
### What this PR does
log error returned by `fastcache.LoadFromFile` before falling back to
creating a new cache instance. this improves observability and helps
detect problems like file corruption or permission issues early.
this replaces `fastcache.LoadFromFileOrNew` with a custom function
`loadFromFileOrNewWithLog` that explicitly logs errors encountered
during cache restoration.
---
### Related Issue
Closes#8934
---
### Test Plan
- manually tested by simulating a missing file scenario
- ensured expected log output on cache load failure
- verified normal cache creation fallback path
---
### Changelog
log error when cache fails to restore from file during workingsetcache
initialization (#8934)
---
### Checklist
- [x] Signed commits
- [x] Follows coding and commit message conventions
- [x] Tested manually
- [x] Scope limited to relevant change
- [x] Changelog entry added
Co-authored-by: Robin Hayer <rshayer95@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
Add the step for running this command after publishing Docker images during the release process.
See docs/victoriametrics/Release-Guide.md
This commit resolves the issue with the missing `latest` and `stable` tags after the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7336
This also resolve issues with accidental publishing of incorrect Docker images under the `latest` and `stable` tags.
It is very easy to fix incorrectly published `latest` and `stable` tags by re-running the `TAG=v1.x.y make publish-latest` command,
which updates the `latest` and `stable` tags, so they point to the given TAG=v1.x.y.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Fixed grammatical and phrasing issues in first half of FAQ docs.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
This should simplify working with big number of log fields in LogsQL queries.
Examples:
- `... | keep foo*` leaves only fields starting with `foo` prefix
- `... | rm foo*` removes all the fields starting with `foo` prefix
- `... | mv foo* bar*` replaces `foo` prefix with `bar` prefix in log fields
- `... | sum(foo*)` sums all the log fields starting with `foo` prefix
The DataBlock.GetTimestamps() was returning a slice of strings, which belong to the DataBlock.
These strings are changed whenever the DataBlock is re-used for the next block.
So these strings couldn't be assigned to logRow.timestamp and to tailProcessor.lastTimestamps,
which outlive the DataBlock. The commit aa8c18fc9f5d44091d7ca92be6935eeaf3b85d7f broke this assumption,
which triggered the following bugs:
1. The bug, which could return incorrectly sorted results from /select/logsql/query when the 'limit' query arg is passed to it.
The endpoint must return the last 'limit' log entries on the selected time range in this case, and these log entries
must be sorted by _time.
2. The bug, which could return incorrect results from /select/logsql/tail (e.g. it could incorrectly skip some matching logs,
it could return the same logs multiple times and it could return out-of-order logs without proper sorting by _time).
The solution is to return parsed timestamps from the DataBlock.GetTimestamps() function, so they could be safely
used by the caller without worries that they could be changed while in use.
Fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9016.
Data will carry `vm_account_id` and `vm_project_id` labels when
exporting with native export API in cluster.
These labels could be treated as normal labels and be imported to
victoriametrics cluster, hence inconsistent with the source metrics
data.
e.g.:
1. source data: `{__name__="metrics_test"}`.
2. exported data: `{__name__="metrics_test", vm_account_id="0",
vm_project_id="0"}`.
3. re-imported data: `{__name__="metrics_test", vm_account_id="0",
vm_project_id="0", vm_account_id="0", vm_project_id="0"}`.
4. query result for MetricsQL `metrics_test{}`:
`{__name__="metrics_test", vm_account_id="0", vm_project_id="0"}`.
5. expect query result: `{__name__="metrics_test"}`
In VictoriaMetrics cluster, `vm_account_id` and `vm_project_id` label
are only useful when doing multi-tenant export/import. So they should be
remove if the export URL is not for multi-tenant.
This pull request:
- properly remove tenant info when exporting data in native format.
Note:
- Commit 67514c37ef23c22b91638e80e30504be23fa8dc1 is for apptest and
need to be cherry pick to master branch cc @rtm0 .
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
`tests-full` (plural) target doesn't exist, but test (singular) does
discovered while working through unrelated PR
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Add an option to configure metadata of objects when uploading backups.
For AWS S3 also support using object tagging.
Using metadata of objects is useful in order to get extended reports
about bucket content and billing details. It is also useful when
performing queries to bucket content based on metadata.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8010
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Previously, address was always parsed as "host:port" and added port if
it was missing. This leaded to hard to understand errors in case address
was provided in "http://host:port" format.
Improve error validation in order to provide more precise error message
in case of invalid address format.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9029
Previous error message: `cannot dial storageNode
"http://localhost:8488": dial tcp4: address http://localhost:8488: too
many colons in address` and vminsert continue running.
Current error message: `cannot normalize
-storageNode="http://localhost:8480": invalid address
"http://localhost:8480"; expected format: host:port` and vminsert exists
with error status code.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously, if the value of rentetionFilter was changed within the same
retention, storage didn't start background merge for historical data.
This commits changes this behaviour by writing applied
filters into metadata.json. For backward-compatibility it reads content
of appliedRetention.txt file. It should prevent from triggering
background merge on storage update. If needed, manually remove appliedRetention.txt file from
storage/data/PART folder and remove storage.
Also, it properly applies retentionFilter for data back-filling.
Previously, it was ignored and data outside of retention could be
ingested.
In addition, it changes scheduling of historical merges.
Instead 2 separate background processes, storage launches a single
thread. It reduces CPU resource and disk IO resources usage.
Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8885https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4592
### Describe Your Changes
Previously, invalid label name or value could cause a panic of vmselect
or vmsingle as it was using MustNewLabelsFromString which was added for
usage in tests only.
Fix this by properly handling and propagating error to user interface if
there is any.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8661
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously, restore mark could be create to a backup which does not exist or incomplete. This would lead to a crash when attempting to perform restore later on.
This commit adds verification of backup availability and completion to prevent such issues from happening. It also adds a verification bypass mechanism for cases when user wants to create a restore mark which is not currently available.
related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5361https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8771
- dynamically adjusts the concurrent dial limit between 8 and 64 based
on the `-search.maxConcurrentRequests`.
- goroutines now have the chance to access available connections while
awaiting the dial limit token.
Related PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8922
Skipping downsampling rules with filters based on the timestamp and offset leads to unexpected behaviour in case both rules with and without filters are present.
For example, with the following configuration: `-downsampling.period='{__name__="foo"}:60d:2m,7d:4m'`
The user would expect `foo` metrics to be downsampled only after 60d to 2m intervals. But actually pre-filter would skip scoped rule and use global rule after 7d with 4m interval.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8969
### Describe Your Changes
Fix an issue where queries were not triggered when relative time was
selected and the chart was hidden.
Related issue: #8983
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
* UI now respects the `sort by` pipe in queries — if it's present, the
order returned by the server is preserved. Related issue: #8660.
* If no `sort by` pipe is used, logs are reversed on the client to show
the newest entries first (since VictoriaLogs returns them in ascending
time order — [see this in the
code](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vlselect/logsql/logsql.go#L1047)).
* Removed redundant client-side time-based sorting logic.
Additionally:
* Log record fields are now sorted alphabetically in UI selectors such
as **Group by field**, **Display fields**, and **Customize columns**.
Related issue: #8438.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Move change 53a6bbfdf8 to the actual
release. Before, it was mistakenly merged to prev release.
Re-classify change from BUGFIX to FEATURE due to following reasons:
* the risk of facing this issue is low, as it reveals itself only for short staleness intervals
* it slightly changes increase_pure logic in a good way. But it is still a change, not bugfix.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This change has effect only if one of the flags below are set:
`-search.maxLookback`, `-search.setLookbackToStep` or
`-search.maxStalenessInterval`
These flags instruct query engine to ignore data points outside of the
look-behind window if these data points are beyond the staleness
interval.
This logic is used for `removeCounterResets` function, and in functions
`increase`, `increase_pure` or `delta`. The bug described in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8935 hit the
corner case when `removeCounterResets` detected the stale series and
`increase` did not.
The reason why staleness detection failed for `increase` is that
`removeCounterResets` calculates interval between real data points. And
`realPrevValue` (that is used by those functions) calculates the
difference between look-behind window start and previous data point.
Which, at smaller gaps or smaller staleness intervals, could affect
staleness detection and make it different to `removeCounterResets`.
This change makes `realPrevValue` to acocunt for staleness between first
data point in captured look-behind window and previous data point.
-------
While there, also updated `increase_pure` logic. It was changed in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1381 without
good explanation. Turns out, that `increase_pure` always compared last
value on the interval with value before the interval. While other
increase or delta functions did compare it with first data point on
interval, and only if it is missing - with the realPrevValue.
This change makes `increase_pure` logic consistent with other similar
function. The reason why it is not a separate PR is because tests
started to fail once `realPrevValue` callculation logic changed and
there were no good solution to isolate this change.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
When downloading archives for benchmarks, an error appears saying that
the archive was placed in a new path.
The error could have been prevented by providing the `-L (--location)`
flag that would tell curl to follow the redirect, so in addition to
updating the paths, this flag was added.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Fix extra newline escape characters in panel descriptions.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
1. Add `!lex.isEnd()` to prevent an infinite loop. Although the current
code doesn't trigger this bug, it's a latent issue that could occur if
someone modifies the callers or adds new code paths without proper stop
tokens.
This PR improves integrations docs in 2 areas:
- Background set to white, avoiding issues when changing to dark mode.
- Height set to avoid blank spaces
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
When multiple users run tests on the same instance, the first user
creating a folder will own the testStoragePath, which can lead to issues
accessing this folder for other users. This change will allow us to
create unique folders per user.
```
% ls -ld /usr/tmp/vmalert-unittest/
drwxr-xr-x 2 some_user users 4096 May 12 17:22 /usr/tmp/vmalert-unittest/
...
2025-05-20T13:56:16.488Z panic lib/fs/fs.go:132 FATAL: cannot create directory: mkdir /usr/tmp/vmalert-unittest/1747749376488491648: permission denied
panic: FATAL: cannot create directory: mkdir /usr/tmp/vmalert-unittest/1747749376488491648: permission denied
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Co-authored-by: Hui Wang <haley@victoriametrics.com>
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8985
When using `AddTimeFilter`, it creates a string representation with the
exact same timestamps but doesn't transform the internal end value. This
is different from the `parseFilterTime` function, which makes the
behavior of these two paths different.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
compose.yml does not exist, only compose-base.yml
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
This commit re-fines the relabeling cookbok and moves all
relabeling related docs to the same page.
It also removes duplicated information from vmagent readme.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
The link to enterprise release guide now points to the doc in
enterprise-single-node branch instead of enterprise master. This is
because we don't use enterprise master.
Additionally the `Public Announcement` section has been removed because
we don't make public announcements for releases.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
As planned, we are adding a more structured way of explaining
deployments and tiers.
In this PR the following changes are added:
- The previous tiering section is moved under the deployments
placeholder
- Explanations for users to pick single or cluster
- Explanations for different parameters
- Re-styling of the docs to look more appealing
- Reorder some hanging docs so the sections are more clearly presented
to the users
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Co-authored-by: Alexander Marshalov <_@marshalov.org>
Apparently, when the doc was created for vmctl the anchor conflicts
weren't accounted for or weren't a thing yet. Now, various migration modes
have conflicting anchors. This should be addressed in follow-up commits.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Improve wording and instructions. It has been a while since the last
time we updated it (more than 4 years!).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
added command to run vmui locally backed by vm playground
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Related issue: #8925
Properly escaped special characters in field values shown in
autocomplete suggestions.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
The following test produces duplicate per-day index records on a system
with 1 CPU even when data inserted sequentially:
```
GOEXPERIMENT=synctest taskset -c 0 go test ./lib/storage -run=TestStorageAddRowsForVariousDataPatternsConcurrently/perDayIndexes/serial/sameBatchMetrics/sameRowMetrics/sameBatchDates/diffRowDates
```
See: #8654
Make this test pass by relaxing got and want data equality requirement
if the number of CPUs is 1. This is temporary until one insertion corner
case is fixed:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8948
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Related issue: #7046
Changes:
- add JSX automatic runtime import of React, you don't need to import
React anymore in JSX files.
- add unused imports eslint rule
- add headers to ProcessLiveTailRequest to enable client-side connection
setup
- refactor ExploreLogs: divided the component into several separate
components for better readability and code maintenance
- add live tailing tab to VictoriaLogs
short demo:
https://github.com/user-attachments/assets/3e5f57ee-8e72-4835-9fc6-35c6f38bc9ef
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Capitalize "Enterprise" in VictoriaMetrics Enterprise phrase
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Improve relabel bug fix changelog message
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Move anchors that were kept only for backward-compatibility reasons
to the bottom of the document. So they don't take extra space in main doc.
Demote anchor level to `h6`, so docs engine will stop rendering in the
right navigation column.
In this way, we still keep the backward compatibility: old links will
continue working. And free space occupied by these link in the main doc.
Thanks to @makasim for the idea.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Commit 3b84f45e0a introduce a typo at `relabelConfigs.IsSet` function. It incorrectly returned value if relabeling configuration is set or not.
As a result, vmagent was not able to properly perform relabel configuration reload.
And incorrectly exposed metrics for reload configuration.
Related issue:
https://github.com/VictoriaMetrics/helm-charts/issues/2119
Previously, headers hash calculation had a typo, instead of `hash.Write` - `hash.Sum` method was used.
It discards any previous writes and as a result digest always had the same value. It prevented from proper config reload.
This commit fixes this typo and properly calculates headers digest.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8931
Previously, OTL attributes fields with KeyValue type were ingested as a single json formated field. It complicates requests and requires extra effort at query time.
This commit adds support for handling nested fields to match the behavior of
other handlers, such as `/jsonline`. KeyValue attribute will be converted into separate field.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8862
### Describe Your Changes
Added npm command `npm run start:logs:playground` to run VictoriaLogs UI
with playground env.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Version with the latest changes from prometheus/common have been
released as v0.303.1 so builds are no longer failing.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
It solves the following build error:
```
duplicate menu entry with identifier "Release process guidance" in menu "docs"
```
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Update /anomaly-detection/ docs according to release `v1.22.1`
P.s. The links in guides and quickstarts were not intentionally set to
v1.22.1 until we fully restore parallelization in forthcoming releases
and mark it as a go-to release everywhere
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
There are three possible cases for blockResultColumn.getValuesEncoded():
- The blockResultColumn.valuesEncoded is already set. This is the case for manually constructed blockResultColumn,
or if it is cloned via blockResult.clone().
- The blockResultColumn.chSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.bs,
by applying br.bm filter.
- The blockResultColumn.cSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.brSrc,
by applying br.bm filter.
It is better from maintainability and debuggability PoV to write this logic in a single getValuesEncoded() function
instead of indirecting it via valuesEncodedCreator.
VictoriaLogs stores min and max column values per every data block.
These values were incorrectly used by min() and max() stats functions
inside updateStatsForAllRows() function. It was assumed that this function
could use min / max values stored in the block, since all the rows in the blockResult
must be processed. But the blockResult contains _filtered_ rows,
e.g. it may have less rows than the number of rows in the original block.
In this case it is unsafe assuming that the min / max values from the original block
exist in the filtered rows inside blockResult.
Add blockResult.isFull() function, which returns true if the blockResult contains all rows
from the original block (e.g. they aren't filtered). Use this function in fast path,
while fall back to slow path, which triggers reading the column values and iterating over them.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
* update sorting to reflect more important changes first
* use issue numbers instead of `this issue` wording
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Add a link to UI on the welcome page.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
* actualize information
* explain alert state in more details
* re-order content by its importance
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fix logs not updating after query change
Related issue: #8912
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
Fixes UI freeze in logs timeline after repeated zooming and panning.
The issue was caused by excessive query requests triggered during fast
timeline interactions. Added debounce to limit the frequency of
requests, similar to the approach used in vmui for metrics.
Related issue: #8655
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* tidy up wording
* mention available integrations in quickstart
* cross-reference integrations and data ingestion
* move these docs closer to each other
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Limit number of points per series and total response size (30 MiB) on
the Raw Query page to improve UI stability.
Related issue: #7895
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
* move example alerts out of /rules folder, since this folder should
contain only useful rules
* expose ports for vlogs and vmetrics for local debug
* add some comments explamining vmalert config
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new templating option could contain following options: `prometheus`,
`vlogs` or `graphite`.
It represents the datasource type this rule is evaluated against. The
new templating option
can be used in rule's annotations or for routing to a specific
destination (e.g. Grafana datasource).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
This functionality was broken in
0e313e5355
Was caught by integration tests:
```
--- FAIL: TestSingleVMAuthRouterWithInternalAddr (5.00s)
vmauth_routing_test.go:148: Could not start vmauth: could not extract some or all regexps from stderr: ["pprof handlers are exposed at http://(.*:\\d{1,5})/debug/pprof/"]
```
Signed-off-by: hagen1778 <roman@victoriametrics.com>
VictoriaLogs happily accepts logs without _msg field, and there are legitimate cases where the _msg field isn't needed.
Remove the warning in order to reduce the level of confusion for VictoriaLogs users.
This feature is useful when ingesting logs, which may contain timestamps across different fields.
Then the first non-empty field from the _time_field list is used as a timestamp field.
This is needed in order to be able to find these playgrounds by `demo` word.
While at it, replace misleading `playground` word with less confusing wording across different places of the docs.
Previously `*_last_reload_successful` metrics are set to 1 even when respective
configuration is not defined, besides stream aggregation and scraping.
This commit initializes config reload related metrics only when respective
configuration is set.
Related issue:
https://github.com/VictoriaMetrics/helm-charts/issues/2119
This commit adds the following changes to the enterprise version:
- add make target for testing in FIPS mode
- disallow using OVH in FIPS mode. OVH is using SHA1 for authentication via headers and SHA1 is not allowed to be used in FIPS mode. There is no option to switch to another hashing algorithm in OVH API, so disabling it completely.
- build fips binaries together with regular ones. This will allow to make sure that FIPS builds are always up to date and compatible with regular ones.
- disable CGO in FIPS builds for vmagent, since vmagent imports Kafka library which uses CGO imports. This might lead to using OpenSSL version which is not certified for FIPS mode. Using pure Go implementation allows to avoid this and keep all validations on Go build process side.
- vm_retention_filtering_partitions_scheduled is a guage metric that shows now
many partitions the retention filtering is currently being applied to.
- vm_retention_filtering_partitions_scheduled_size_bytes is a guage metric that
shows the total size (in bytes) of partitions the retention filtering is
currently being applied to.
These metrics are similar to vm_downsampling_partitions_scheduled and
vm_downsampling_partitions_scheduled_size_bytes and can be used for observing
how much time it took for a given vmstorage instance to perform retention
filtering across this many partitions whose total size was this many bytes.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
fixed typos in docs and code
fixed collision in cloud docs
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
* {lib/backup,app/}: gracefully cancel currently running operation during graceful shutdown
Make backup/restore process interruptable by passing global context from the operation caller.
This is needed in order to reduce shutdown delays in case backup/restore cancellation is requested.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8554
This PR fixes two related bugs in the `replace_regexp` pipe:
1. **Infinite loop on empty matches when `limit` is not set**
[#8625](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625)
When a regex pattern like `\d*`, `()`, or `\b` was used, the
implementation could repeatedly match the same zero-width position
without advancing the string, causing unbounded memory usage and
eventual OOM. This is now fixed by collecting all matches up front,
respecting the `limit`, and applying replacements in a single pass.
2. **Incorrect handling of anchors (`^` and `$`)**
The previous implementation applied regex matching to progressively
sliced substrings (`s = s[end:]`), which unintentionally caused anchor
patterns like `^` (start-of-string) to match at every new substring's
start. As a result, patterns that should have matched only once (e.g.,
`^|$`) ended up matching multiple times.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625
Hopefully this should reduce the amounts of questions whether vlinsert replicates or shards the incoming logs.
The answer - it spreads the incoming log _evenly_ among the configured vlstorage nodes (e.g. it shards logs).
### Changes
Updated `lib/httpserver/httpserver.go` to include a flag that can toggle
CORS (defaults to true to keep the current behavior).
This PR relates to
[this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8680#issue-2983786438)
feature request
### Checklist
The following checks are **mandatory**:
- [x] My change does not break backwards compatibility (i.e., preserves
CORS being enabled unless specified otherwise via the
`-http.cors.disabled=true` flag & value)
---------
Co-authored-by: Jai Mehra <jai.mehra@nav-timing.safrangroup.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
#8842
As a quick fix, it seems like it works. checked on my failing tests
(they are not emitting this error anymore)
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
These were added while working on paritition index (#8134). Submitting
the separately in order to:
1. Make sure partition index will not break anything and
2. Be able to compare performance before and after swiching to parition
index.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
fixed not successful rebase in PR, moved svg icons to a sprite and added
VM logo to UI
<img width="1290" alt="image"
src="https://github.com/user-attachments/assets/34cfdc61-6349-4320-b133-38965cb6c30f"
/>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
### Describe Your Changes
use the built-in max/min to simplify the code
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: pkucode <cssjtu@163.com>
The new rule should notify user when there is a job with 0 configured or
discovered targets, which is usually a sign of misconfiguration.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8466
This pull request:
1. Add `GraphiteWrite`, `CSVImport`, `OpenTSDBHTTPImport` data import
methods for vmsingle and vminsert.
2. Add TCP `Write` method to `apptest.Client` to make it capable of
writing data in TCP-based protocol.
3. Add test cases:
1. for new import methods.
2. for corner cases placed under `app/victoria-metrics/main_test`.
4. Removed all test cases under `app/victoria-metrics/main_test`.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Updated "> note" sections in /anomaly-detection/ doc path for new
formatting and improved clarity
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
This is needed because Go interprets backticked string as a measurement unit for the given command-line flag:
The listed type, here int, can be changed by placing a back-quoted name in the flag's usage string;
the first such item in the message is taken to be a parameter name to show in the message and the back quotes
are stripped from the message when displayed
See https://pkg.go.dev/flag#PrintDefaults
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
1. Fixed log entry sorting in the "Group" view: newest entries are now
displayed at the top again. In v1.18.0, log entries were incorrectly
sorted (oldest first) due to changes related to nanosecond precision
support.
Related issue: #8726
2. Added alphabetical sorting of fields by key in the "Group" view for
easier navigation.
Related issue: #8438
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Related issue: #8697
Renamed `retentionFilters` to `retentionFilter`.
🛑 before merge this pull request, need to merge API changes in
enterprise version.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* set Y-min to 0. This helps to see real spikes in resource usage;
* add text panel explaining how to use query hash;
* support filtering by tenant;
* use `\tvm_slow_query_stats` prefix filter to filter out logs
that mention failed queries with `vm_slow_query_stats` word;
* add panel to show the minimum queried timestamp - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8828
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Based on the code, an active time series is one that has received at
least one sample during the last hour. It does not includes querying.
The first time the docs were updated with this was in
d06aae9454. But even at that time, the
code shows that querying a metric did not make it active.
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
* put important changes on top;
* replace `this issue` with corresponding issue number to make it more sense;
* consistently format components names.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Jose Gómez-Sellés <14234281+jgomezselles@users.noreply.github.com>
* display max values instead of cumulative values
to outline anomalies;
* remove unnecessary overrides on column width;
* add corresponding formatting units to displayed values.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
datadb.rb contains logRows shards, which weren't freed up after the data ingestion
for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs
for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards
after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order
to reduce the pressure on GC.
This commit moved all the docs/*.md files to docs/victoriametrics/ folder. This broke direct links to these files
such as https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/vmagent.md .
Add the missing `/victoriametrics/` part here, so the link becomes https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmagent.md .
The move of all the docs/*.md files into docs/victoriametrics/ folder is dubious because of the following reasons:
- It breaks direct links to *.md files like mentioned above. It is impossible to fix such links all over the Internet,
so they will remain broken :(
The best thing we can do is to fix them on the resources we control such as VictoriaMetrics repository.
- It breaks Google indexing of VictoriaMetrics docs. Google index contains old links such as https://docs.victoriametrics.com/vmagent/#features .
Now such links are automatically redirected to https://docs.victoriametrics.com/victoriametrics/vmagent/#features by a javascript on the https://docs.victoriametrics.com/vmagent/ page.
Google doesn't like redirects at javascript, since they are frequently used by black hat SEO purposes. This leads to pessimization of VictoriaMetrics docs
in Google search result :( We cannot update the old links all over the Internet in order to avoid the redirect by javascript :(
The best thing we can do is to add <meta rel="canonical"> header with the new location of the page at the old url,
and hope Google won't remove VictoriaMetrics docs from its search results. @AndrewChubatiuk , please do this.
- It breaks backwards navigation. When you click the https://docs.victoriametrics.com/vmagent/ link on some page and then press `back` button
in the web browser, it won't return back you to the original page because of the intermediate redirect :( The broken navigation cannot be fixed
for old links located all over the Internet.
- It increases chances of breaking old links left on the Internet in the future, which will lead to 404 Not Found pages
and angry users :(
The sad thing is that we hit the same wall with harmful redirects again :( In the beginning the VictoriaMetrics docs links had .html suffix,
and they were case-sensitive. For example, http://docs.victoriametrics.com/MetricsQL.html#rate . It has been decided for some unknown reason
that it is a good idea to remove the .html suffix and to make all the links lowercase. So now such links are automatically redirected by javascript
to https://docs.victoriametrics.com/metricsql/#rate and then redirected again by another javasript to https://docs.victoriametrics.com/victoriametrics/metricsql/#rate :(
There are many old links all over the Internet (for example, at Reddit, StackOverflow, some internal Wiki pages, etc.). We cannot fix all of them,
so we need to pray these links won't break in the future.
@hagen1778, @tenmozes, @makasim , please make sure we won't hit the same wall in a third time.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8595
Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts.
Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time.
This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one
from concurrently running goroutines, by 2x.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Here a description of important terms related to Organizations is
provided. The intention is to leverage on a good UI design rather than
explaining all steps needed to perfomr certain actions.
However, an effort on explaining the meaning of internal states and
terms has been done.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This commit modifies the logging behavior for client network errors
(e.g., EOFs, timeouts) during the handshake process. They are now logged
as warnings instead of errors, as they are not actionable from the
server’s perspective. Here's some examples of such errors.
Timeouts during the initial read phase:
2025-04-09T07:08:59.323Z error
VictoriaMetrics/lib/vmselectapi/server.go:204 cannot perform
vmselect handshake with client "<REDACTED>": cannot read hello: cannot
read message with size 11: read tcp4 <REDACTED>-><REDACTED>: i/o
timeout; read only 0 bytes
EOFs occurring later in the handshake process:
2025-04-08T18:01:30.783Z error
VictoriaMetrics/lib/vmselectapi/server.go:204 cannot perform
vmselect handshake with client "<REDACTED>": cannot read isCompressed
flag: cannot read message with size 1: EOF; read only 0 bytes
By logging these as warnings, we reduce noise in error logs while
preserving valuble information for debug.
Previously, if `cpu.max` file has only `max` resource defined without
`period`, it was parsed incorrectly and silently drop error. While this
syntax is valid and actually used by some container runtimes. If period
is not defined, default value for it 100_000 must be used.
This commit fixes parsing function by using default value for period.
In addition, it adds zero value check, which fixes possible panic if
period has 0 value.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8808
This reverts commit fa6a32a39d.
Reason for revert: the broken tests were fixed on GOARCH=386 by skipping the check for the state size
after improting the state of stats function, since the state size depends on the hardware architecture.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8710
### Describe Your Changes
Two new columns were added to the **Cardinality Explorer** table:
`Requests count`
- Displays how many times a metric was queried since stats collection
began
- Highlighted **red** when the value is `0` or missing
`Last request`
- Shows when the metric was last queried
- Highlighted:
- **Red** if older than **30 days**
- **Yellow** if between **7 and 30 days**
- Shows exact timestamp on hover in `YYYY-MM-DD HH:mm:ss` format
Related issue: #6145

### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
* downsampling: allow using `-downsampling.period=filter:0s:0s` to skip downsampling for time series that match the specified `filter`
* doc minor fix
* lib/storage/downsampling: fix a typo in error message
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
### Describe Your Changes
Fixes incorrect /flags request in vmui when using URL prefixes.
Related issue: #8641
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Hide identical series in the legend on the Raw Query page when
deduplication is disabled.
Related issue: #8688
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* update wording;
* add more details about returned response;
* cover details on counter increments;
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, this doc section was related to vmui, by mistake.
Moved it closer to tsdb stats, where it makes more sense to be.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Badges is more a decorative element rather than source of useful information.
Some badges break from time to time, giving misleading impression to users.
It is better to remove them to reduce visual load and confusion.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
HTTP/2 support is used by some S3-compatible storage providers, so
disabling it default leads to unexpected errors when trying to connect
to S3 endpoint.
For example, using MinIO as S3 storage backend: `net/http: HTTP/1.x
transport connection broken: malformed HTTP response`.
HTTP/2 was enabled by default previously, but while fixing inconsistency
e5f4826 commit disabled this by default.
cc: @valyala
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Fixes which are required in order to build FIPS-compliant binaries.
These changes were originally added for enterprise version and synced to
opensource for consistency and easier maintenance.
- consistently use `hash/fnv` at `app/vmalert` when calculating
checksums. Usage of md5 is not allowed in FIPS mode.
- increase encryption keys size used in testing in order to allow tests
to successfully run in FIPS mode
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The suffix after the ID may change in dashboard settings.
If it changes, the link becomes broken (dubious decision at grafana.com/grafana/dashboards/ ).
That's why it is better to drop all the suffixes and use links for Grafana dashbooards ending with IDs.
In this case they are automatically redirected to the url with the proper suffix.
This is a follow-up for 3e4c38c56c
See also the previous commits 9c0863babc and 0a5ffb3bc1
This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts
when small number of log entries are passed to Storage.MustAddRows().
The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1,
up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100.
This should reduce CPU usage during data ingestion when every request contains small number of rows.
…te panel
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
On shutdown, rw client printed two log messages about shutting down and
how many series it flushed on shut down.
This commit removes these log messages for the following reasons:
1. These messages were printed for each `client.cc`, which is set to x2
of available CPUs. This behavior generated a lot of useless logs on
shutdown.
2. Number of flushed series on shutdown doesn't matter to the user.
Instead, it now prints only two log messages: when shutdown starts and
when it ends:
```
^C2025-04-22T07:37:11.932Z info VictoriaMetrics/app/vmalert/main.go:189 service received signal interrupt
2025-04-22T07:37:11.932Z info VictoriaMetrics/app/vmalert/remotewrite/client.go:152 shutting down remote write client: flushing remained series
2025-04-22T07:37:11.933Z info VictoriaMetrics/app/vmalert/remotewrite/client.go:155 shutting down remote write client: finished in 210.917µs
```
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, metric names stats API had a false assumption, that max
size of metric name is 256 byte. But this is configurable parameter with
4096 bytes max size. It triggered errors during API requests.
This commit replaces hard-coded 256 byte limit with common constant:
maxLabelValueSize. It has 16 MB limit.
In addition, this commit adds check for metric name stats tracker,
if metric name size exceeds default buffer limit, it will be allocated
directly on heap. It must be rare case, since most metric names has
16-64 byte size.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8759
Throttle remote write retry warning logs to one message per 5 seconds.
This reduces log noise during connectivity issues.
Users can monitor `vmagent_remotewrite_retries_count_total` and
`vmagent_remotewrite_errors_total` metrics for detailed retry rates per
destination if needed. This change prioritizes reducing log volume over
immediate destination-specific error logging in the retry warnings.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8723
VictoriaMetrics executables need the latest available stable release of Go (1.24 currently),
since they use its features. See, for example, GOEXPERIMENT=synctest from the commit 06c26315df .
There is no need in specifying the minimum supported Go version for building VictoriaMetrics products,
since Go automatically downloads and uses the version specified at `go ...` directive inside go.mod
starting from Go1.21. See https://go.dev/doc/toolchain for details.
So the actual minimum needed Go version is Go1.21, which has been released 1.5 years ago. It should automatically
install newer Go versions specified at `go ...` directive inside go.mod.
Do not reset wc.labels in order to properly keep track of the number of used labels for the scrape,
and properly re-use the same number of wc.labels on subsequent scrapes.
See 12f26668a6 (r155481168)
### Describe Your Changes
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8694
additionally removed container_name, docker network, renamed all
compose, config files for consistency
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Before the change, the vmagent integration tests created their directory
and files inside apptest/tests.
After the change, vmagent is instructed to store all files in a real
temporary directory, which is automatically deleted after the tests
complete.
Using this package lets to manipulate time. In this particular case, it
lets to advance the time 61 second forward instantly.
A few side changes were necessary:
- Do not use fasttime in unit tests. The fasttime package starts a
goroutine outside the test bubble which causes the clock to be real, not
fake.
- Stop the time.Ticker explicitly and also stop idbNext. These two
create goroutines with infinite loops which causes the unit tests that
use synctest to hang forever. All goroutines created inside the bubble
must exit in order for the syntest to finish.
- synctest is an experimental package and requires an environment
variable to be set. The Makefile was changed to set it.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
The write concurrency limiter in ReadUncompressedData was previously
removed in
22d1b916bf
to avoid suboptimal behavior in certain scenarios. However, follow-up
reports—including issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8674 and
production feedback from VictoriaMetrics Cloud—indicated a noticeable
degradation in performance after its removal.
To mitigate these regressions, this commit reintroduces the concurrency
limiter. A long-term, more optimal solution will be explored separately
in issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8728.
TODO:
* [x] Changelog
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
The doc was incorrectly ported into wrong directory after
a113516588
This change moves it to the victoriametrics dir.
While there, updated the order of some pages to couple them by meaning.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fixed table sorting logic and added unit tests for descendingComparator.
Values are now correctly sorted by type: number, date, or string.
Related issue: #8606
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This commit adds new fields - `requestsCount` and `lastRequestTimestamp`
to series count be metric names stats.
It allows to display an additional stats at explore cardinality page.
Stats will only be added if `storage.trackMetricNameStats` flag is set.
This change requires an update to RPC protocol in order to properly
marshal data.
In addition, this commit adds integration tests to TSDB stats API.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6145
This commit adds `search.logSlowQueryStats=<duration>` cmd-line flag on vmselect.
It reads stats from eval function, and doesn't slow down the query execution.
Log line has the following structure:
vm_slow_query_stats type=%s query=%q query_hash=%d start_ms=%d end_ms=%d step_ms=%d range_ms=%d tenant=%q execution_duration_ms=%v series_fetched=%d samples_fetched=%d bytes=%d memory_estimated_bytes=%d
This feature is only available for enterprise version.
### Describe Your Changes
Changelog note about experimental v1.22.0 release that solves deadlock
issue on multicore systems by complete parallelization turned off until
proper refactor is made to return it back w/o reintroducing the risk of
the deadlock in child processes.
P.s. other references and guides were not updated to experimental tag,
as long as it has downsides of dropped speed. The links will be updated
once we have parallelization properly turned on.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Follow-up to
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462
Addressed review comments:
- Log panic with FATAL prefix to indicate possible on-disk data
corruption
- Moved version bump line to the tip block (v1.114.0 has already been
released) in changelog
- Removed duplicate vmagent entry from targets list from Makefile
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Under heavy load, vmagent's wirte concurrency limiter
(2ab53acce4/lib/writeconcurrencylimiter/concurrencylimiter.go (L111))
queues incoming requests. If a client's timeout is shorter than the wait
time in the
queue, the client may close the connection before vmagent starts
processing it. When vmagent then tries to read the request body, it
encounters an ambiguous `unexpected EOF` error
(https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8675).
This commit adds more context to such errors to help users diagnose and
resolve
the issue when it's related to vmagent's own load and queuing behavior.
Possible user actions include:
- Lowering `-insert.maxQueueDuration` below the client's timeout.
- Increasing the client-side timeout, if applicable.
- Scaling up vmagent (e.g., adding more CPU resources).
- Increasing `-maxConcurrentInserts` if CPU capacity allows.
Steps to reproduce:
https://gist.github.com/makasim/6984e20f57bfd944411f56a7ebe5b6bf
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
This commit improves how vmagent selects the remote write protocol.
Previously, vmagent [performed a handshake
probe](0ff1a3b154/lib/protoparser/protoparserutil/vmproto_handshake.go (L11))
at
[startup](0ff1a3b154/app/vmagent/remotewrite/client.go (L173)):
- If the probe succeeded, it used the VictoriaMetrics (VM) protocol.
- If the probe failed, it downgraded to the Prometheus protocol.
- No protocol changes occurred after the initial probe at runtime.
However, this approach had limitations:
- If vmstorage was unavailable during vmagent startup, vmagent would
immediately downgrade to the Prometheus protocol, leading to higher
network usage unitl vmagent restarted. This case has been reported in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7615.
- If the remote write server was updated or downgraded (e.g., during a
fallback or migration), vmagent would not detect the protocol change. It
would continue retrying failed requests and eventually drop them.
Require a restart of vmagent to pick up the new protocol.
This commit introduces a more adaptive mechanism.
vmagent always starts with the VM protocol and downgrades to the
Prometheus protocol only if an unsupported media type or bad request
response is received.
When this happens, the protocol is downgraded for all future requests.
In-flight requests are re-packed from Zstd to Snappy and retried
immediately.
Snappy-encoded requests are dropped if an unsupported media type or bad
request is received (no retrying).
Additionally, the in-memory and persisted queues could mix snappy and
zstd encoded blocks. The proper encoding is decided before sending by
encoding.IsZstd function.
TODO:
* [x] Add tests
* [x] Update documentation
* [x] Changelog
* [x] Research on
[content-type](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054),
[accept-encoding](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786923382)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7615#top
issue.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously, if vmagent was built with CGO_ENABLED=0, vmagent cannot start and reported runtime error:
`Kafka client is not supported at systems without CGO`
This error was trigger even if `-kafka.consumer.topic` was not
provided. CGO_ENABLED=0 is default build option for linux/arm and some other archs.
This commit properly inits kafka consumer by checking if `-kafka.consumer.topic` is set.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6019
'authKey' is well-known url and form param for VictoriaMetrics
components authorization. Previously, it could be printed into stdout
via httpserver error logger. It makes this authKey insecure and hard to
use.
This commit prevents from logging authKey defined at PostForm or as part
of url.Query.
It's recommneded to transfer authKey via PostForm and it should be
implemented at separate PRs.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Previously, if ProfileName is set to empty value (as default). AWS s3
lib ignored any profile config defined with `-configProfilePath`.
This commit correctly configure client options and set profile name only
if it's set to non-empty value.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8668
Currently, requests failing due to network timeout would receive "200
OK" while producing a warning log message about the timeout. This
behaviour is confusing and might produce unexpected issues as it is not
possible to retry errors properly.
Change this to return "502 Bad Gateway" response so that error can be
handled by the client.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8621
Config for testing:
```
unauthorized_user:
url_prefix: "http://example.com:9800"
```
Before the change:
```
* Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
/* NOTE: 30 seconds timeout passes */
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 08:54:05 GMT
Date: Tue, 01 Apr 2025 08:54:05 GMT
<
* Connection #0 to host 127.0.0.1 left intact
```
After:
```
* Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 502 Bad Gateway
HTTP/1.1 502 Bad Gateway
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 09:13:57 GMT
Date: Tue, 01 Apr 2025 09:13:57 GMT
< Content-Length: 109
Content-Length: 109
<
* Connection #0 to host 127.0.0.1 left intact
```
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Log when the data response from vmselect is partial during
rule(recording, alertingrule) evaluations.
vmselect returns `isPartial: true` in case data is not fully fetched
from scattered vmstorages. At the time of rule evals, it may be drifting
apart from real values due to missing points. This is an important event
that should be logged to inform users to see how often that happens as
it may lead to false positive alerts.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: emreya <emre.yazici@adyen.com>
Signed-off-by: emreya <e.yazici1990@gmail.com>
Signed-off-by: Emre Yazici <e.yazici1990@gmail.com>
(cherry picked from commit 56f60e8be9)
When creating and listing snapshots, panic instead of returning an error
since errors are not recoverable anyway.
Also do not cleanup the filesystem on panic. Leave as is for further
manual inspection.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
This reduces CPU usage by up to 30% in exchange of the increased RAM usage by 10%
when scraping thousands of targets, which expose millions of metrics in summary.
This looks like a good tradeoff after the commit edac875179 ,
which reduced RAM usage by more than 10%, so the final RAM usage for vmagent
is still lower than the RAM usage at v1.114.0 by ~15%, while CPU usage drops by 30%.
This reduces memory usage when reading large response bodies because the underlying buffer
doesn't need to be re-allocated during the read of large response body in the buffer.
Also decompress response body under the processScrapedDataConcurrencyLimitCh .
This reduces CPU usage and RAM usage a bit when scraping thousands of targets.
This reduces memory usage for vmagent when scraping big number of targets at the cost of slightly higher CPU usage.
The increased CPU usage can be decreased by disabling tracking of stale markers either via -promscrape.noStaleMarkers
command-line flag or via `no_stale_markers: true` option at the scrape config pointed by -promscrape.config command-line flag.
See https://docs.victoriametrics.com/vmagent/#prometheus-staleness-markers
The pool is used mostly for obtaining byte buffers for responses from scrape targets.
There are no responses smaller than 256 bytes in practice, so there is no sense in maintaining
pools for byte slices up to 64 and 128 bytes.
Previously the maxLabelsLen could be updated with smaller value after it is updated to bigger value by concurrently running goroutines.
Prevent this by loading the latest maxLabelsLen value and updating it only if it is smaller than the current len(wc.labels)
before the exit from callback passed to stream.Parse.
While at it, return early from the callback on the sample_limit exceeding error,
since the rest of the code in the callback becomes no-op after wc.reset().
This simplifies following the logic in the code a bit.
Also remove outdated misleading comment in front of sw.pushData() call inside callbacks passed to stream.Parse.
This comment has no sense after every callback start working with its own goroutine-local wc.
Start with writeRequestCtx containing up to 256 labels instead of 8 labels,
since a typical response from scrape target contains much more than 8 labels across all the exposed metrics.
Do not pre-allocate labels at writeRequestCtx, since they are pre-allocated inside writeRequestCtx.addRows(),
together with the pre-allocation of samples and writeRequest.Timeseries.
The applySeriesLimit applies the limit to samples stored at writeRequestCtx,
while the scrapeWork is used as read-only configuration source.
That's why it is better from maintainability PoV to attach the applySeriesLimit
method to writeRequestCtx.
While at it, clarify docs for the applySeriesLimit function.
The rows are added to writeRequestCtx, while the scrapeWork is used only as a read-only configuration source.
So it is better from maintainability PoV to attach addRows function to writeRequestCtx instead of scrapeWork.
Also attach addAutoMetrics to writeRequestCtx instead of scrapeWork due to the same reason:
addAutoMetrics adds metrics to the writeRequestCtx, while using scrapeWork as a read-only configuration source.
While at it, remove tmpRow from scrapeWork struct in order to reduce the complexity of this struct.
areIdenticalSeries doesn't access scrapeWork members except of sw.Config of *ScrapeWork type.
It is better from maintainability PoV to attach this methos to ScrapeWork then.
While at it, replace sw.Config with cfg shortcut at scrapeWork.processDataOneShot()
and scrapeWork.processDataInStreamMode().
Having `victoriametrics` or `victorialogs` tags should be enough for
filtering dashboards related to VictoriaMetrics components.
Related ticket
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8618
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Support of the latest prometheus/common is not released yet so pin to previous version.
Related commit at prometheus/prometheus: 95f49dd84b
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This allows verifying how the benchmark performance scales with the number of available CPU cores
and makes the results of the benchmark consistent with other BenchmarkScrapeWorkScrapeInternal* benchmarks.
Also reduce the amounts of memory allocations inside generateScrape() function in order to reduce
measurement noise during the BenchmarkScrapeWorkScrapeInternalStreamBigData run.
This is a follow-up after c05ffa906d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8515
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
It is always better to run benchmarks in parallel on all the available CPU cores
in order to see how their performance scales with the number of CPU cores (GOMAXPROCS).
The commit also performs the following modifications:
- Removes the dependency of on the scrapeWork from getLabelsHash() function.
- Makes sure that the benchmark cannot be optimized out by the compiler, by introducing a dependency
on a global Sink variable. Previously the getLabelsHash() function call could be optimized out
by the compiler, since this call has no side effects, and the returned result is ignored.
- Reduces the amounts of memory allocations inside the BenchmarkScrapeWorkGetLabelsHash
when preparing the labels for the benchmark. This should reduce measurements' noise during the benchmark.
This is a follow-up for c05ffa906d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8515
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
* vmgateway: properly set the `Host` header when routing requests to `-write.url` and `-read.url`
* Update docs/victoriametrics/changelog/CHANGELOG.md
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: properly attach tenant labels `vm_account_id` and `vm_project_id` to alerting rules when enabling `-clusterMode`
Previously, these labels were lost in alert messages to Alertmanager. Bug was introduced in [v1.112.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.112.0).
Previously the buffer was increased by 30% after it became 50% full.
For example, if more than 5MB of data is read into 10MB buffer, then its' size
was increased to 13MB, leading to 13MB-5MB = 8MB of waste.
This translates to 8MB/5MB = 160% waste in the worst case.
The updated algorithm increases the buffer by 30% after it becomes ~94% full.
This means that if more than 9.4MB of data is read into 10MB buffer,
then its' size is increased to 13MB, leading to 13MB-9.4MB = 3.6MB of waste.
This translates to 3.6MB / 9.4MB = ~38% waste in the worst case.
This should reduce memory usage when vmagent reads big responses from scrape targets.
While at it, properly append the data to buffer if it already has more than 4KiB of data.
Previously the data over 4KiB in the buffer was lost after ReadFrom call.
This is a follow-up for f28f496a9d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6761
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759
It is faster to read the whole data and then decompress it in one go for zstd and snappy encodings.
This reduces the number of potential read() syscalls and decompress CGO calls needed
for reading and decompressing the data.
### Describe Your Changes
replace /VictoriaLogs aliases to /victorialogs as generated directories
are anyway renamed to victorialogs before deployment
need to merge this PR immediately after
https://github.com/VictoriaMetrics/vmdocs/pull/116
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
The panel was producing wrong predictions as it is almost impossible,
without making too expensive queries, to make a precise predictions.
More details on reasoning why it is better to remove it than fix it
is here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8492.
This change also removes ETA panels from alerting rules annotations.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
remove unused files from docs
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
`its'` -> `its`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Update dependencies to latest versions
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- add smaller search weights for changelog content
- remove replace `<details>` tag with collapse shortcode
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
related PR https://github.com/VictoriaMetrics/vmdocs/pull/115
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- Move lib/httputil.Transport to lib/promauth.NewTLSTransport. Remove the first arg to this function (URL),
since it has zero relation to the created transport.
- Move lib/httputil.TLSConfig to lib/promauth.NewTLSConfig. Re-use the existing functionality
from lib/promauth.Config for creating TLS config. This enables the following features:
- Ability to load key, cert and CA files from http urls.
- Ability to change the key, cert and CA files without the need to restart the service.
It automatically re-loads the new files after they change.
- Avoid a data race when multiple goroutines access and update roundTripper.trBase inside roundTripper.getTransport().
The way to go is to make sure the roundTripper.trBase is updated only during roundTripper creation,
and then can be only read without updating.
- Use the http.DefaultTransport for http2 client connections at Kubernetes service discovery.
Previously golang.org/x/net/http2.Transport was used there. This had the following issues:
- An additional dependency on golang.org/x/net/http2.
- Missing initialization of Transport.DialContext with netutil.Dialer.DialContext for http2 client.
- Missing initialization of Transport.TLSHandshakeTimeout for http2 client.
- Introduction of the lib/promauth.Config.NewRoundTripperFromGetter() method, which is hard to use properly.
- Unnecessary complications of the lib/promauth.roundTripper, which led to the data race described above.
- Avoid a data race when multiple goroutines access and update tls config shared between multiple
net/http.Transport instances at the TLSClientConfig field. The way to go is to always make a copy of the tls config
before assigning it to the net/http.Transport.TLSClientConfig field.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7114
It is much better to panic instead of returning an error on programming error (aka BUG),
since this significantly increases chances that the bug will be noticed, reported and fixed ASAP.
The returned error can be ignored, even if it is logged, while panic is much harder to ignore.
The code must always panic instead of returning errors when any programming error (aka unexpected state) is detected.
This is a follow-up for the commit 9feee15493
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6783
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771
- update references to the latest version
- port LTS changelog
- revert changes from 47201ace96 and overwrite by an actual latest enterprise release version instead of latest LTS release
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
PR updates docs to release v1.21.0, in particular, adjust docs and its
structure to High Availability (HA) and horizontal scalability (HS)
capabilities.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously, performance of stream.Parse could be limited by mutex.Lock on callback function. It used shared writeContext. With complicated relabeling rules and any slowness at pushData function, it could significantly decrease parsed rows processing performance.
This commit removes locks and makes parsed rows processing lock-free in the same manner as `stream.Parse` processing implemented at push ingestion processing.
Implementation details:
- Removing global lock around stream.Parse callback.
- Using atomic operations for counters
- Creating write contexts per callback instead of sharing
- Improving series limit checking with sync.Once
- Optimizing labels hash calculation with buffer pooling
- Adding comprehensive tests for concurrency correctness
Benchmark performance:
```
# before
BenchmarkScrapeWorkScrapeInternalStreamBigData-10 13 81973945 ns/op 37.68 MB/s 18947868 B/op 197 allocs/op
# after
goos: darwin
goarch: arm64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape
cpu: Apple M1 Pro
BenchmarkScrapeWorkScrapeInternalStreamBigData-10 74 15761331 ns/op 195.98 MB/s 15487399 B/op 148 allocs/op
PASS
ok github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape 1.806s
```
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
---------
Signed-off-by: Maksim Kotlyar <kotlyar.maksim@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
Previously, `getBackupsList` was appending `latest` backup in all cases without checking if it actually exists.
This lead to `vm_backup_last_run_failed` metric being set to `1` since folder did not contain successful completion marker.
This commit adds a check to handle a case when remote storage does not contain any backups yet.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8490
---
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
These errors could be caused by intermittent network issues, especially
in case of using proxies when accessing S3 storage. Previously, such
error would abort backup/restore process and require manual intervention
to ensure backups consistency.
This commit adds automatic retries to handle this to improve backups
reliability and resilience to network issues.
1. remove "vmalert" word from vmgateway doc and exposed metrics;
2. remove unrelated flags like -datasource.roundDigits, remoteRead.disablePathAppend, -datasource.disableStepParam
vmbackupmanager uses `runC` channel for inter-goroutine communication between `scheduler` and `execute` goroutines.
Previously, `runC` wasn't closed during graceful shutdown. And vmbackupmanager process couldn't gracefully stop. It could only be killed with-in configured timeout.
This commit properly closes `runC` by `scheduler` when stopping vmbackupmanager in order to avoid shutdown delay.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8554
Previously, backup was first scheduled at 00:00:00 and `getSleepDuration` was immediately executed to get the sleep duration for the next backup. Since it was returning `1 * time.Second` the next backup was attempted and failed to be scheduled.
Update logic to wait for full backup interval in this case so that there will be no attempt to schedule an unneeded backup.
It also adds the following changes:
* fix error log entry reference to type of policy
* add a message about retention completion similar to existing message for backups to make it more consistent
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8499
This is a follow-up for fb6d2e92e3a1cf412d1f7dee64a4852941a8aa1b
### Describe Your Changes
during initial flush with deduplication and windows enabled lower
timestamps threshold is set to an upper bound of the next deduplication
interval, which leads to ignoring all samples on subsequent intervals
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This PR adds nanosecond precision to log sorting, ensuring accurate
ordering of entries with sub-millisecond differences.
Related issue: #8346
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Commit 9ca74d1fff introduced an issue with metrics registration. Due to metrics.Summary type always registered at the global state of metrics package, vmalert had increased memory and CPU usage after multiple configuration reloads.
This commit addresses this issue and properly registers metrics.Summary metric. Now metrics for group and rules must be explicitly registered before group.Start with group.Init method. It simplifies metrics usage an ensures that all needed metrics were registered and group is ready to start.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8532
1. fix possible data race on group checksum when reload is called
concurrently. Before, it didn't affect much but might update the group
one more time.
2. remove the unnecessary g.mu.RLock() and compute group.id at newGroup creation. Changes to group.ID()
indicate that type and interval have changed, and the group is new.
Related PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8540
Previous commit 9ca74d1fff added a regression for notifier's metrics exposed by vmalert. vmalert returned new notifier instances for the blackhole notifier type. And it registered new metrics each get notifiers function was called. It registered duplicate metrics and lead to OOM crash.
This commit properly init blachole notifier instances and add metrics for it only once, during application start.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8532
Previously, vmagent may incorrectly store partial scrape response
in case of scrapping error. It may happen if `sw.ReadData` call fetched
some chunked response and store it at buffer. And later context deadline
exceed error happened.
As a result, at the next scrape iteration this partial response could
be forwarded to the `sw.sendStaleSeries(lastScrape...)` function call
and lead to `Prometheus line` parsing error.
This commit properly set response body to the empty value in case of
scrapping error. It prevents storing partial scrape response body. And
it no longer sends partial staleness markers to the remote storage.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8528
This panel should have help us to identify
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8501 during
release checks. While we already track the GC pressure, its value is
relative and change wasn't noticeable for the workloads that we
observed.
The absolute values of allocations rate could have helped to see the
anomaly.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
`prasedRelabelConfig` -> `parsedRelabelConfig`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fix endless group expansion loop.
Related issue: #8347
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fix many spelling errors and some grammar, including misspellings in
filenames.
The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards.
So it seems to have low impact on users.
The change also deprecates `cspell` as it is much heavier and less usable.
---------
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
### Describe Your Changes
This PR adds the dedicated FAQ for VictoriaMetrics Cloud
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This reduces the size of LogRows.streamTagCanonicals by 1/3 because of the eliminated `cap` field
in the slice header (reflect.SliceHeader) compared to the string header (reflect.StringHeader).
### Describe Your Changes
Doc updates to a patch release v1.20.1, fixing a bug in
`PeriodicScheduler` that may affect some of the customers' deployments
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This commit adds lib/chunkedbuffer.Buffer - an in-memory chunked buffer
optimized for random access via MustReadAt() function.
It is better than bytesutil.ByteBuffer for storing large volumes of data,
since it stores the data in chunks of a fixed size (4KiB at the moment)
instead of using a contiguous memory region. This has the following benefits over bytesutil.ByteBuffer:
- reduced memory fragmentation
- reduced memory re-allocations when new data is written to the buffer
- reduced memory usage, since the allocated chunks can be re-used
by other Buffer instances after Buffer.Reset() call
Performance tests show up to 2x memory reduction for VictoriaLogs
when ingesting logs with big number of fields (aka wide events) under high speed.
Pre-allocate the needed slice of strings and then assign items to it by index
instead of appending them. This reduces the number of memory allocations
and improves performance a bit.
…an 64KB at Reset
This commit reverts
b58e2ab214
as it has negative impacts when ByteBuffer is used for workloads that
always exceed 64KiB size. This significantly slows down affected
components because:
* buffers aren't beign reused;
* growing new buffers to >64KiB is very slow.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8501
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Properly decode protobuf-encoded Loki request if it has no Content-Encoding header.
Protobuf Loki message is snappy-encoded by default, so snappy decoding must be used
when Content-Encoding header is missing.
- Return back the previous signatures of parseJSONRequest and parseProtobufRequest functions.
This eliminates the churn in tests for these functions. This also fixes broken
benchmarks BenchmarkParseJSONRequest and BenchmarkParseProtobufRequest, which consume
the whole request body on the first iteration and do nothing on subsequent iterations.
- Put the CHANGELOG entries into correct places, since they were incorrectly put into already released
versions of VictoriaMetrics and VictoriaLogs.
- Add support for reading zstd-compressed data ingestion requests into the remaining protocols
at VictoriaLogs and VictoriaMetrics.
- Remove the `encoding` arg from PutUncompressedReader() - it has enough information about
the passed reader arg in order to properly deal with it.
- Add ReadUncompressedData to lib/protoparser/common for reading uncompressed data from the reader until EOF.
This allows removing repeated code across request-based protocol parsers without streaming mode.
- Consistently limit data ingestion request sizes, which can be read by ReadUncompressedData function.
Previously this wasn't the case for all the supported protocols.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8380
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8300
The `ignore_fields` HTTTP query args can contain prefixes ending with '*'.
For example, `ignore_fields=foo.*,bar` skips all the fields starting with `foo.`
during data ingestion.
The optimization touches 2 things:
1. Reduces amount of allocations when comparing canonical metric names
between left and right parts of expressions.
2. Adds fast path for cases when right part of expression returns
scalar: `series_selector or on() vector(1)`, which is a typical
expression.
```
benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark old ns/op new ns/op delta
BenchmarkBinaryOpOr/tss:1_or_tss:1-14 291 272 -6.56%
BenchmarkBinaryOpOr/tss:1_or_tss:1000-14 44590 28592 -35.88%
BenchmarkBinaryOpOr/tss:1000_or_tss:1-14 103124 39563 -61.64%
BenchmarkBinaryOpOr/tss:1000_or_tss:1000-14 20386150 1859335 -90.88%
BenchmarkBinaryOpOr/tss:1000_or_on()_vector(0)-14 91382 36805 -59.72%
```
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8382
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* tie relevant functionality together
* change hierarchy of related options to visually group it
No breaking changes to links.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* move related sections clother to each other
* group related sections within the same section
The intention of the change is to tie related documentation together.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Long constant fields cannot be stored in columnsHeader as a const column,
because their size exceeds maxConstColumnValueSize, so they are stored as regular values.
This commit optimizes storing such fields by storing only a single value
across the field values in a block instead of storing multiple values.
This should improve data ingestion performance a bit. This also should improve query
performance when the query accesses such fields because of better cache locality.
Also improve persisting of constant string lengths by storing them only once.
Previously, vmselect didn't stop multitenant query execution if it
receives error from vmstorage. Such as limit error or any other. It
continued to execute queries until it did it for all tenants. It leads
to the potential waste of resources.
In addition, callback error was incorrectly reference and can be updated by
subsequent callback call.
This commit returns error earlier, cancels sub-sequent requests for
tenants and properly return storageNode request error.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8461
Previously, if the command-line flag value `-remoteWrite.showURL` changed, vmagent dropped content of persistent queues. It's not expected behavior and may lead to data-loss at queue.
Further more if command-line flag value `-remoteWrite.showURL` is set to `true`, any changes to url query arguments will lead to persistent queue drop. The most common uses is kafka and gcp pub-sub integration. It uses url query arguments for client configuration.
Also, it complicates copy content of persistent queue between vmagents. Since it requires to properly change name inside metainfo.json.
This commit removes persistent queue name equality check from `lib/persistentqueue`. This check was added as an additional protection from on-disk data corruption.
It's safe to skip this check for vmagent, because vmagent encodes remoteWrite.url as part of path to the queue. It guarantees that there will be no collision.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8477.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
Previously, TLS config was only created for URLs with `https` scheme.
This could lead to unexpected errors when original URL was redirecting
to `https` one as TLS config is not applied.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8494
Almost all storage API operations, both ingestion and retrieval, involve
writing and/or reading the indexdb. However, during these operations,
the indexdb refcount is not incremented. This may lead to panics if
indexdb is rotated more than once during these operations.
This commit increments the refcount before using indexdb and decrements it
after use.
Note that rotating indexdb more than once during some operation is an
impossible case under normal circumstances as the min retention period
is 1 day (i.e. the indexdb will be rotated once per day). However, we
want the storage to behave correctly in all cases.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Loki doesn't support well high-cardinality log fields (e.g. fields with big number of unique values).
That's why Promtail, Grafana Agent and Grafana Alloy encode such fields into a JSON and push them
as a plaintext log message to the remote storage. This isn't an efficient way to store high-cardinality
log fields in VictoriaLogs, since it is optimized for storing and querying such fields when they are stored
distinctly as a regular log fields according to VictoriaLogs data model ( https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model ).
This commit enables automatic parsing of JSON-encoded log fields at plaintext log message received over Loki protocol
and storing them as a separate log fields. This should improve data compression ratio and reduce disk space usage.
This should also improve query performance when the parsed log fields are used in queries for filtering and aggregation.
The old behaviour can be restored by passing -loki.disableMessageParsing command-line flag to VictoriaLogs.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8486
Fix metric that shows number of active time series when per-day index is disabled. Previously, once per-day index was disabled, the active time series metric would stop being populated and the `Active time series` chart would show 0.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8411.
- Properly handle negative timestamps (e.g. timestamps before 1970-01-01)
- Optimize parsing floating-point timestamps by eliminating the memory allocation
needed for returning an error from strconv.ParseInt. Instead, check whether the string contains a dot,
and then parse it as a floating-point number.
- Add tests for ParseUnixTimestamp function.
- Make the code easier to understand and maintain by removing unneeded generic function toNano().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8470
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8472
### Describe Your Changes
Add `vmalert_alerts_send_latency_seconds` metric for
alertmanager.notifier.
To measure the time for alertmanager calls to send alerts per notifier.
This is needed to see the latency for each notifier from vmalert calls.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: emreya <e.yazici1990@gmail.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
* add version since feature is available
* add cluster endpoint paths
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Fix scrapePool name
If in the scrape file, I do some magic and manipulate the job name then
Prometheus will show scrapePool as the original job name in the targets
API, but vmagent will set it to the final value which is wrong.
example
```
job: consul-targets
...
- source_labels: [ __meta_consul_service ]
regex: (\w+)[_-]exporter
target_label: job
replacement: $1
```
curl to prom API will show
`"scrapePool": "consul-targets",`
vmagent:
`""scrapePool": "node",`
before changes:
```
curl -s 'http://localhost:8429/api/v1/targets' | jq -r '.data.activeTargets[].scrapePool'| sort|uniq
blackbox
pgbackrest
postgres
```
after changes
```
curl -s 'http://localhost:8429/api/v1/targets' | jq -r '.data.activeTargets[].scrapePool'| sort|uniq
blackbox
consul-targets
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Ordering changes by release versions enhances the searchability
of the documentation. For example, tracking which release got
the bugfix becomes easier if releases are already sorted.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This PR updates the documentation by removing old assets and adding the
user management chapter, divided in different sections.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fixed the typo in the documentation. Updated `Ir provides` to `It
provides`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
fixes#8469
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This feature allows to track query requests by metric names. Tracker
state is stored in-memory, capped by 1/100 of allocated memory to the
storage. If cap exceeds, tracker rejects any new items add and instead
registers query requests for already observed metric names.
This feature is disable by default and new flag:
`-storage.trackMetricNamesStats` enables it.
New API added to the select component:
* /api/v1/status/metric_names_stats - which returns a JSON
object
with usage statistics.
* /admin/api/v1/status/metric_names_stats/reset - which resets internal
state of the tracker and reset tsid/cache.
New metrics were added for this feature:
* vm_cache_size_bytes{type="storage/metricNamesUsageTracker"}
* vm_cache_size{type="storage/metricNamesUsageTracker"}
* vm_cache_size_max_bytes{type="storage/metricNamesUsageTracker"}
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4458
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Previously, "selector @ another_selector" assumed that
"another_selector" metric is supposed to exist since "start" used in the
query.
If the query was evaluated in the following case (timestamps):
- start - 2, end - 10
- "another_selector" 5,6,7,8,9,10
- "selector" The resulting "at" timestamp would be taken from NaN (as
`int64(NaN * 1000)`), causing a panic or invalid behavior later.
Note that type cast of `NaN` to int64 is also platform-dependent, so
value of `int64(math.NaN() * 1000)` can produce `0` or max int64 on
different platforms and versions of Go.
This commit changes this and checks for the first non-NaN value. This
makes it easier to use for users as series are not always aligned and
returning an error in this case would disallow using this for some time
ranges.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8444
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Previously, the trailing slash was removed and caused an incorrect redirect path when visiting VMUI.
This commit leaves it as is. Also it applies minor refactoring to url formatting.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8439
Previously, opentelemetry attribute parsed added extra field names according to
golang JSON parser spec for structs:
```
struct AnyValue{
StringValue string
}
```
Was serialized into:
```
{"StringValue": "some-string"}
```
While opentelemetry-collector serializes it as
```
"some-string"
```
This commit changes this behaviour it makes parses compatible with opentelemetry-collector format. See test cases for examples.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8384
### Describe Your Changes
Fix available from version hint. The feature was introduced in
[v1.91.0](https://docs.victoriametrics.com/changelog/changelog_2023/#v1910).
I noticed that the sentence uses both `{{% available_from "v1.91.0" %}}`
and a manual reference like `starting from
[v1.91](https://docs.victoriametrics.com/changelog/#v1910)`. Does {{%
available_from %}} fully supersede the manual changelog reference, and
should the later be removed? Or should\could both be used together?
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Removed VictoriaMetrics prefix from anomaly detection menu items
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
This PR adds the remaining subsections for the get started part. Some
content is taken from the product page.
Ideally, the get-started page would have some cards instead of raw
links. We should explore that.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Commit cd39df1 introduced regression, which caused any write path related limits to be ignored.
This commit fixes match typo and adds check to prevent such kind of regression in future.
### Describe Your Changes
> ⚠️ Even if approved, please don't merge it on my behalf, I
still may apply a couple of re-phrasings before merging it on Monday
03.03.2025
- Aligned `vmanomaly` docs with release v1.20.0
- Re-structured root page of anomaly detection docs for clarity
- Added several sections to FAQ, e.g. on how to incorporate domain
knowledge into anomaly detection configs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
This PR fixes an issue where the Downsampling filters debug page would
get stuck in an infinite loading state when labels had no matches. Now,
the case is properly handled.
Related issue: #8339
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
I guess a "downsampling" term is more appropriate than "deduplication"
in the downsampling paragraph.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
VictoriaLogs inserts `_time` field as a label in result when query with
[time buckets stats
pipe](https://docs.victoriametrics.com/victorialogs/logsql/#stats-by-time-buckets),
making the result meaningless and may lead to cardinality issues.
>curl --location --request POST
'https://play-vmlogs.victoriametrics.com/select/logsql/stats_query?query=_time%3A1m%20%7C%20stats%20by%20(_time%3A10s)%20count%20()%20as%20total'
>{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"total","_time":"2025-01-24T12:31:30Z"},"value":[1737721904.4476516,"12"]},{"metric":{"__name__":"total","_time":"2025-01-24T12:31:10Z"},"value":[1737721904.4476516,"10"]},{"metric":{"__name__":"total","_time":"2025-01-24T12:31:00Z"},"value":[1737721904.4476516,"10"]},{"metric":{"__name__":"total","_time":"2025-01-24T12:31:20Z"},"value":[1737721904.4476516,"12"]},{"metric":{"__name__":"total","_time":"2025-01-24T12:30:50Z"},"value":[1737721904.4476516,"10"]},{"metric":{"__name__":"total","_time":"2025-01-24T12:30:40Z"},"value":[1737721904.4476516,"9"]}]}}%
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* use unified numeric list;
* apply line width limits;
* remove time filter from quantile examples, as we suggest to
not use time filters.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Just a small typo in the docs
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs,
vmalert, victoriametrics and other components, importing promutils only
for this reason makes them to export irrelevant
`vm_rows_invalid_total{type="prometheus"}` metric.
This change removes `vm_rows_invalid_total{type="prometheus"}` metric
from /metrics page for these components.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Previous link was incorrectly created by adding repository name to
organization link.
All other links are using proper schema.
This is a follow-up for 6ff61a1c.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
#8342
fix negative rate result when the lookbehind window is longer than
`-search.maxLookback` or `-search.maxStalenessInterval` and data
contains gap.
This issue was introduced since
[v1.110.0](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072).
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
`MustParsePromMetrics` imports `lib/protoparser/prometheus`, and this
package exposes the following metrics:
```
vm_protoparser_rows_read_total{type="promscrape"}
vm_rows_invalid_total{type="prometheus"}
```
It means every package that uses `lib/prompbmarshal` will start exposing
these metrics. For example, vlogs imports `lib/protoparser/common` which
uses `lib/prompbmarshal.Label`. And only because of this vlogs starts
exposing unrelated prometheus metrics on /metrics page.
Moving `MustParsePromMetrics` to `lib/protoparser/prometheus` seems like
the leas intrusive change.
-----------
Depends on another change
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8403
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Updated `publish-via-docker` task to push images to multiple registries
as defined by `DOCKER_REGISTRIES`. By default, publish pushes images to
both docker.io and quay.io. It is possible to choose a single repo by
overriding `DOCKER_REGISTRIES`, for example: `DOCKER_REGISTRIES=quay.io
make publish-victoria-logs`
Note that `package-via-docker` task is not using multiple registries as
there is usually little sense in building same image with different tags
for local use.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4116
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously, if indexDB search failed for some reason during search at previous indexDB (aka extDB), VictoriaMetrics stored empty search result at cache. It could cause incorrect search results at subsequent requests.
This commit checks search error and stores request results only on success.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8345
For example, `field:~".+"`, `field:~".*"` or `field:""`
Replace such filters to faster ones. For example, `field:~".*"` is replaced with `*`,
while `field:~".+"` is replaced with `field:*`.
These filters can be used for selecting logs where one field value is less than another field value.
These filter complement `<=` and `<` filters for constant literals.
TooHighQueryLoad should trigger when vmsingle or vmselect can't start
processing read queries for last 15min.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The version tooltips were incorrectly set to `#tip` instead of `#`,
so ` make docs-update-version` didn't catch them.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This allows using different intervals for flushing in-memory data among different mergeset.Table instances.
The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval
to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds
flush interval, which didn't depend on the Storage.flushInterval.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
The new version of github.com/VictoriaMetrics/metricsql handles $__interval and $__rate_interval
inside rollup functions in more correct way - it drops square brakets, so VictoriaMetrics
could automatically detect the needed lookbehind window depending on the time distance between real samples.
For example, rate(m[$__rate_interval]) is parsed and processed as rate(m) now.
Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage,
doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects
when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object
as a string instead.
This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.
- `contains_any` selects logs with fields containing at least one word/phrase from the provided list.
The provided list can be generated by a subquery.
- `contains_all` selects logs with fields containing all the words and phrases from the provided list.
The provided list can be generated by a subquery.
### Describe Your Changes
- Fixed 404 links to anomaly score dashboard `.json` file on Github.
- Fixed broken markdown on QuickStart page.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
> (currently in DRAFT) This dashboard and the docs may look differently
after recent review and the feedback received.
As requested by our customers, this PR introduces `anomaly score`-based
dashboard for `vmanomaly` to improve the visual experience and ease the
drill down debugging process for respective anomaly detection setups.
Accompanying guide (under `default` preset mode) is provided as well.
- [For additional
benefits](https://docs.victoriametrics.com/victoriametrics-datasource/#motivation),
the dashboard is based on [VictoriaMetrics
datasource](https://docs.victoriametrics.com/victoriametrics-datasource/)
rather than on `Prometheus` datasource
- Tested locally and on our https://play.victoriametrics.com/
`anomaly_score` data (tenant ID 0)
- To check the guide, build the docs locally (`vmdocs`, `make run
local`) and follow to the
http://localhost:1313/anomaly-detection/presets/#default section
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously if rule group parameters were changed, alerting rules related metrics could be deleted due to bug at `utils/metrics` package.
This commit introduces `metrics.Set` per rule group. It holds group and alerting rules metrics. It properly unregister alerting rules metrics and addresses issue.
In addition:
- expose group metrics only once group is started - this helps to avoid
exposing metrics for groups which are created during YAML unmarshaling
and only used to update existing group.
- properly close rules which are discarded after updating existing rules
so that metrics are also correctly closed.
- detect file renames and properly recreate groups "moved" between files.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8229
Examples:
- `top 5 x, y` is equivalent to `top 5 by (x, y)`
- `uniq foo, bar` is equivalent to `uniq by (foo, bar)`
- `unroll foo, bar` is equivalent to `unroll (foo, bar)`
_time:<=max_time filter must include logs with timestamps matching max_time.
For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.
Examples:
_time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC
_time:>1d selects logs with timestamps older than one day comparing to the current time
This simplifies writing queries with _time filters.
See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter
It looks like the version deployed works differently than in local, and
some paths are not taken correctly. This commit fixes that by adding
absolute paths to the 2 images included in this section.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
It looks like when using html image insertion, the path is changed. This
commit fixes that by redirecting to the previous dir in path.
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This guide was outdated. the intention of these changes is to:
* Ease maintainability
* Ease user experience
* Encourage the user to quickly set up the product
Some images are removed because they were obsolete or to improve
readability
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
apply proper syntax for code blocks
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
follow-up for
855dfb324d
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Legend settings have been added to **Graph Settings**.
#### **New Features:**
1. **Table View** – Toggle to display the legend in a table format.
2. **Hide Common Values** – Option to hide fields with identical values
across all series.
3. **Hide Min/Medium/Max** – Ability to hide min, median, and max values
from the legend.
4. **Custom Label Format** – Set a custom format for series labels
(applies only to the legend).
5. **Group by Label** – Group legend entries based on a selected label.
Related Issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8031
Currently, vmalert uses a fixed 10-second client timeout for notifiers,
which can prevent large sets of alerts from being sent successfully.
This introduces `-notifier.sendTimeout` flag to vmalert to control the
client timeout duration for the notifiers.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8287
Inside PARAM-VALUE, the characters '"' (ABNF %d34), '\' (ABNF %d92),
and ']' (ABNF %d93) MUST be escaped. This is necessary to avoid
parsing errors. Escaping ']' would not strictly be necessary but is
REQUIRED by this specification to avoid syslog application
implementation errors. Each of these three characters MUST be
escaped as '\"', '\\', and '\]' respectively. The backslash is used
for control character escaping for consistency with its use for
escaping in other parts of the syslog message as well as in traditional syslog.
Related RFC:
https://datatracker.ietf.org/doc/html/rfc5424#section-6.3.3
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8282
This commit properly validate retention flag value input for the enterprise debug UI.
API properly checks:
- presence of global retention configuration
- validate that retention is at least 1 day, same as vmstorage enforcement
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8343
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Use `rate_sum` instead of `total` output for the following reasons:
* `rate_sum` is far less sensitive for data delays than `total`, since
it represents the speed of change instead of absolute values.
* `rate_sum` remove need in using `rate` function for calculating final
results on query time.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* make QuickStart first, similarly to vlogs docs
* push release guide to bottom, as it is not for users
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
By default, stream aggregation and deduplication stores a single state
per each aggregation output result.
The data for each aggregator is flushed independently once per
aggregation interval. But there's no guarantee that
incoming samples with timestamps close to the aggregation interval's end
will get into it. For example, when aggregating
with `interval: 1m` a data sample with timestamp 1739473078 (18:57:59)
can fall into aggregation round `18:58:00` or `18:59:00`.
It depends on network lag, load, clock synchronization, etc. In most
scenarios it doesn't impact aggregation or
deduplication results, which are consistent within margin of error. But
for metrics represented as a collection of series,
like
[histograms](https://docs.victoriametrics.com/keyconcepts/#histogram),
such inaccuracy leads to invalid aggregation results.
For this case, streaming aggregation and deduplication support mode with
aggregation windows for current and previous state. With this mode,
flush doesn't happen immediately but is shifted by a calculated samples
lag that improves correctness for delayed data.
Enabling of this mode has increased resource usage: memory usage is
expected to double as aggregation will store two states
instead of one. However, this significantly improves accuracy of
calculations. Aggregation windows can be enabled via
the following settings:
- `-streamAggr.enableWindows` at [single-node
VictoriaMetrics](https://docs.victoriametrics.com/single-server-victoriametrics/)
and [vmagent](https://docs.victoriametrics.com/vmagent/). At
[vmagent](https://docs.victoriametrics.com/vmagent/)
`-remoteWrite.streamAggr.enableWindows` flag can be specified
individually per each `-remoteWrite.url`.
If one of these flags is set, then all aggregators will be using fixed
windows. In conjunction with `-remoteWrite.streamAggr.dedupInterval` or
`-streamAggr.dedupInterval` fixed aggregation windows are enabled on
deduplicator as well.
- `enable_windows` option in [aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
It allows enabling aggregation windows for a specific aggregator.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
A follow-up after 9bb5ba5d2f
that impacted compression ratio for data compressed with native GO zstd lib (`make test-pure`).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This PR introduces several enhancements and optimizations for the Group
view:
1. **Disable hover effect:**
Add the option to disable the hover effect. This can help reduce CPU
load when viewing a large number of logs.
2. **Limit entries per Group:**
Add the ability to limit the number of records displayed per group. When
a limit is set, groups are rendered sequentially – the next group starts
only after the current group has finished. By default, there is no
limit.
3. **Display group info:**
Include the group number in the title along with the total count of
groups to improve clarity and navigation.
4. **Performance improvement:**
In addition to the features above, separate optimizations reduce CPU
load during hover interactions by approximately 5-10%.
Related issue: #8135
<details>
<summary>Demo UI</summary>
<img
src="https://github.com/user-attachments/assets/9c89066e-28af-4df2-b368-2380412b3c3f"/>
<img
src="https://github.com/user-attachments/assets/a2338c8d-558c-437c-969e-f825043eb35b"/>
</details>
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
It has been appeared these optimizatios do not give measurable performance improvements,
while they complicate the code too much and may result in slowdown when the ingested logs have
different sets of fields.
This is a follow-up for 630601488e
It has been appeared that 256 files increase RAM usage too much comparing to 128 files
when ingesting logs with hundreds of fields (aka wide events). So let's return back 128 files
limit for now.
This is a follow-up for 9bb5ba5d2f
There is little sense in keeping too big buffers - they just waste RAM and do not reduce
the load on GC too much. So it is better dropping such buffers at Reset instead of keeping them around.
The number of filestream readers is proportional to the number of parts to be merged,
while the number of filestream writers is proportional to the number of concurrent merges.
Usually around 4-16 parts are merged at once, so the number of active filestream readers is ~8x
bigger than the number of active filestream writers.
So it is a good idea to use smaller size of read buffers comparing to the size of write buffers.
Limit read buffer size by 64Kb, while write buffer size is limited by 128Kb.
This should reduce the overall memory usage when merging parts with big number of files.
This is the case for VictoriaLogs, which works with logs containing hundreds of fields (aka wide events).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This should improve query performance for logs with hundreds of fields (aka wide events).
Previously there was a high chance that the data for multiple log fields is stored in the same file.
This could result in query performance slowdown and/or increased disk read IO,
since the operating system could read unnecessary data for the fields, which aren't used in the query.
Now log fields are guaranteed to be stored in separate files until the number of fields exceeds 256.
After that multiple log fields start sharing files.
This reduces memory usage when many filestreams are processed simultaneously.
This is the case for VictoriaLogs when it processes logs with hundreds of fields.
- Re-use column names and values from the previously added rows if possible.
This increases locality of reference for field names and values, while improving
access speed for the field names and values.
- Postpone sorting fields in the added rows until creating inmemory part from them.
This allows optimizing the sorting for log fields with the same set of fields.
This is usually the case for logs, which belong to the same logs stream.
In this first step, content is moved from cloud docs root to a new
folder: get started. This one will allocate: overview, key features and
benefits, quickstart and guides and best practices.
### Changes Description
- In order to reuse content, the overview and README sections are
merged.
- A dummy Get started section is created, but needs further work.
- Next steps will also include reworking the quick start section and
creating the 2 new docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Add missing sudo's to binary install snippets
### Describe Your Changes
Some of the commands the need to be done as root are missing invocation
via sudo. This commit adds missing sudo invocations.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
trigger docs pipeline on docs workflow change
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
exclude operator and helm charts docs sync as these repos will be synced
with vmdocs directly
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Commit 68791f9ccc21548d27f1cf04d0b3270be4146b82 introduced regression.
It performed basicAuth check before built-in routes. It made impossible
to bypass basic authorization with `authKey` param.
This commit fixeds that issue and removes unneeded check. It also adds
integration tests for this case.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7345
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
This commit changes journald ingestion validation regex:
from `^[A-Z_][A-Z0-9_]+` to `^[A-Z_][A-Z0-9_]*`.
It's needed to properly support entities with single-character
names.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8314
Some of the test cases were failing because of the changes in data sharding
caused by changes in vmstorage ports caused by cluster restarts required by the
test.
This commit configures the cluster under test to use static ports for vmstorage
replicas. This required to make vmcluster type to become visible to tests.
We normally prefer ports assigned dynamically to avoid any conflicts with other
apps running locally. Should the conflict happen, please re-run the tests.
In CI/CD environment, no conflicts should happen because the tests are run
within a container and the components of the vmcluster will be the only servers
listening for incoming connections.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Just an extremely nit-picky grammatical fix.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This eliminates a class of potential bugs with incorrect stats calculations when an additional filter
is applied to the blockResult before passing it to the stats function, and this filter removes
all the rows from blockResult.
### Describe Your Changes
Allow disabling the per-day index using the `-disablePerDayIndex` flag.
This should significantly improve the ingestion rate and decrease the
disk space usage for the use cases that assume small or no churn rate.
See the docs added to `docs/README.md` for details.
Both improvements are due to no data written to the per-day index.
Benchmark results:
```shell
rm -Rf ./lib/storage/Benchmark*; go test ./lib/storage -run=NONE -bench=BenchmarkStorageInsertWithAndWithoutPerDayIndex --loggerLevel=ERROR
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage
cpu: 13th Gen Intel(R) Core(TM) i7-1355U
BenchmarkStorageInsertWithAndWithoutPerDayIndex/HighChurnRate/perDayIndexes-12 1 3850268120 ns/op 39.56 data-MiB 28.20 indexdb-MiB 259722 rows/s
BenchmarkStorageInsertWithAndWithoutPerDayIndex/HighChurnRate/noPerDayIndexes-12 1 2916865725 ns/op 39.57 data-MiB 25.73 indexdb-MiB 342834 rows/s
BenchmarkStorageInsertWithAndWithoutPerDayIndex/NoChurnRate/perDayIndexes-12 1 2218073474 ns/op 9.772 data-MiB 13.73 indexdb-MiB 450842 rows/s
BenchmarkStorageInsertWithAndWithoutPerDayIndex/NoChurnRate/noPerDayIndexes-12 1 1295140898 ns/op 9.771 data-MiB 0.3566 indexdb-MiB 772119 rows/s
PASS
ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 11.421s
```
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
There are many subtle issues while ingesting logs from popular log shippers into VictoriaLogs via Loki JSON protocol.
For example, the common issue is that structured logs are ingested as a JSON string at _msg field.
This is not what most users expect - they expect that fields in structured logs are ingested as separate log fields.
It is better removing examples with configs for Loki JSON protocol, since other supported protocols work much better
without any issues. This should reduce the confusion level for new users, who try Loki protocol and hit its issues.
- Elasticsearch ( https://docs.victoriametrics.com/victorialogs/data-ingestion/#elasticsearch-bulk-api )
- JSON stream ( https://docs.victoriametrics.com/victorialogs/data-ingestion/#json-stream-api )
MustOpenStorage function may accept variable number of optional
arguments. This commit combines optional arguments into dedicated OpenOptions
struct. It reduces complexity of adding new optional arguments.
Related PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8118
1. Use distinct code paths for blockResult.getValues() and blockResult.getValuesBucketed().
This should simplify debugging and maintenance of the resulting code.
2. Do not load column values if all the values in the block fit the same bucket.
Use blockResultColumn.minValue and blockResultColumn.maxValue for determining whether
column values must be loaded via blockResultColumn.getValuesEncoded().
This signiciantly improves performance for big buckets, which cover all the column
values in a block.
3. Properly calculate buckets for negative values.
4. Properly adjust weekly buckets by Monday.
### Describe Your Changes
Fix autocomplete not passing `AccountID` and `ProjectID` headers when
fetching suggestions in VictoriaLogs UI.
Related: #8042
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
fix typo for description of flag -pprofAuthKey
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
See https://github.com/codecov/codecov-action?tab=readme-ov-file#usage
```
- uses: codecov/codecov-action@v5
with:
files: ./coverage1.xml,./coverage2.xml # optional
```
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Adding Accept-Encoding response header to support content negotiation,
which was introduced in [this
PR](https://github.com/systemd/systemd/pull/34822) and enables
compression on journald
Previously the `callbackErr` is silently ignored in clusternative parser, which is used at vminsert for parsing clusterNative requests and at vmstorage for parsing vminsert requests.
This commit fixes that by properly return callbackError after reading all block metrics. This aligns
with other parsers in `lib/protoparser`.
### Describe Your Changes
Updated default `step` calculation for `Table` and `JSON` views.
- When switching to the `Table` or `JSON` view, the default step is now
automatically set to `end - start`, where `end` and `start` correspond
to the selected time range.
- If the user has manually set a step value, it will remain unchanged
when switching views or changing the time range.
- The UI now explicitly indicates when the step is automatically
calculated.
<details> <summary>Demo</summary>
https://github.com/user-attachments/assets/2540de24-36ed-4764-a047-1c6b48a80ed4
</details>
**Note:** These views use the
[`/api/v1/query`](https://docs.victoriametrics.com/keyconcepts/#instant-query)
endpoint for instant queries.
Related issue: #8240
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This improves performance for queries, which use `sort by (...) limit N` without mentioning _time field.
For example, the following query must work faster now
_time:1d | rm _time | sort by (request_duration desc) limit 10
- Add a fast path for timestamps ending with 'Z'
- Use strings.LastIndexAny instead of strings.IndexAny for searching
for timezone offset at the end of the string. This works faster
for timestamps with sub-second precision.
The 2025 changelog was in the parent directory - a default page
that opens for /changelog.
But it seems like it was confusing for users, so add 2025 that mirrors
/changelog page.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
If test is using binaries from `master` branch, then test name should be prefixed
with `TestVmsingle` word.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Update the pipe state only once per each series of repeated strings, uint8 values and tuples.
This improves performance a bit for the following `top` pipes:
- top (string_field)
- top (uint8_field)
- top (field1, ..., fieldN)
Do not apply the optimization for uint16, uint32, uint64 and int64 fields, since they
usually contain big number of unique values, which do not repeat most of the time.
### Describe Your Changes
There is an issue described in #8040 this should fix it
- The alerts slice is shared across multiple goroutines (since send() is
called concurrently).
- `alerts[:0]` creates a new slice header, but it still references the
same underlying array.
- Appending (append(alertsToSend, a)) modifies the underlying array,
which may also be used by another goroutine.
Solution: Use a separate slice copy for each goroutine.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Evgeny Kuzin <evgeny@hudson-trading.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
### Describe Your Changes
- Update references to the latest release - v1.110.1
- Update links to the latest LTS releases
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously RFC3339 timestamps with sub-second precision could be incorrectly compared by lessString().
For example, 2025-01-20T10:20:30.1Z was incorrectly treated as smaller than 2025-01-20T10:20:30.09Z,
because the first timestamp has smaller decimal number after the last dot than the second timestamp.
Previosly the parsing of the input stream was stopped after the first parse error.
This isn't what most users expect when ingesting JSON lines in a stream where some JSON lines may be invalid.
### Describe Your Changes
- Migrated build process from Webpack (CRA) to Vite
Reason for migration: `create-react-app` has been
[deprecated](b532a58792)
and contains outdated dependencies that haven’t been updated for over
two years, leading to security vulnerabilities. Additionally, build
speed improved by more than 2x.
- Updated dependencies and fixed TypeScript typings in accordance with
the updates
b532a58792
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* add troubleshooting link
* add panels for CPU and mem usage
* rm unnecessary links from panels
* update to grafana v11
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Split unique values (groups) into shards according to the configured concurrency
during processing of the matching rows if the number of unique values exceeds the hardcoded threshold.
Previously this splitting was performed unconditionally at the merge stage when merging independently
calculated per-CPU states into a single state. It is faster to perform the split during rows processing
if the number of unique values is big.
This gives up to 30% perfromance improvements when these pipes are applied to big number of unique values (groups).
The number of worker shards per each pipe processor is created during query initialization.
This number equals to the `options(concurrency=N)` if this option is set or to the number of available CPU cores.
This means that all the pipes must adhere the given concurrency when passing data blocks
to the next pipe.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8201
The bug has been introduced in 0214aa328e
vmauth uses 'lib/httpserver' for serving HTTP requests. This server
unconditionally defines built-in routes (such as '/metrics',
'/health', etc). It makes impossible to proxy `HTTP` requests to backends with the same routes.
Since vmauth's httpserver matches built-in route and return local
response.
This commit adds new flag `httpInternalListenAddr` with
default empty value. Which removes internal API routes from public
router and exposes it at separate http server.
For example given configuration disables private routes at `0.0.0.0:8427` address and serves it at `0.0.0.0:8426`:
`./bin/vmauth --auth.config=config.yaml --httpListenAddr=:8427 --httpInternalListenAddr=127.0.0.1:8426`
Related issues:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6468
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7345
This commit improves integration with third-party solutions who rely on non-root
endpoints (i.e. MinIO) when the vmselect path has been specified in the
configured Prometheus URL like:
`http://vmselect.:8481/select/0/prometheus`
Comparable change has been done before
(b885a3b6e9), however only takes care of
the root path. This means endpoints `-/healthy` and `-/ready` are still
not available on full vmselect Prometheus paths, resulting in
unsupported path requests.
This change makes these endpoints available on the full paths like:
`/select/0/prometheus/-/healthy` and `/select/0/prometheus/-/ready`,
thus achieving full Prometheus compatibility for external dependencies.
Related issues:
- https://github.com/minio/console/issues/2829
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1833
---
Signed-off-by: Joost Buskermolen <j.buskermolen@cloudmeesters.com>
### Describe Your Changes
The cardinality limiter in this case does not receive the actual
metricID but some other value found in r.TSID.MetricID and is not
initialized. Depending on the system and/or go runtime implementation,
this value can be 0 or some garbage value (which shouldn't have too wide
a range). Thus, there basically no limit for inserted metricIDs.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
- fixed recently found 404 under `/anomaly-detection/` docs section
- get rid of remaining .html / .index.html endings in links under
`/anomaly-detection/` docs section
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Explicitly mention that `--vm-rate-limit` is used for each individual
worker defined by `--vm-concurrency`.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Fix metric name labels set not being "closed" by `}`.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
At v1.107.0 release with commit 21a7f06fc4beeb6ad32b9f7fd88704ed33674905 was introduced regression.
It prevents kafka producer client from graceful shutdown. This change also removed kafka message headers.
This commit properly closes client.
It also addresses the following issues:
* properly add headers to kafka message
* add url_id to the exported metrics, which prevents possible metrics clash if multiple remote write targets have
the same url and topic name.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Currently the alert descrption considers only one end of the connection
(vmagent). While saturation can also be caused by slowness of the
receiving components (vminsert, vmstorage). Update the alert description
with a brief suggestion to also check the dashboards of these
components.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
metricsql: bump to v0.83.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7703
The update also returns an error if metric name is specified twice in
metrics selector.
For example, `foo{__name__="bar"}` is not allowed anymore. It would
successfully parse before
this change, but it won't satisfy the search filter any way. So it had
no sense in supporting this. This is why some test cases were removed.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fixed misspelling of the word "components" from "componetns"
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
fixed some cloud docs links
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* use signed datasource
* bump Grafana because on v10 pre-installing+provisioning didn't work
* consistently rename provisioning folder
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fixed an issue where dropdown menus were not visible in the Group View
settings.
Ref issue: #8153
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
At this time `bufferedwriter` [silently ignores connection close
errors](78eaa056c0/lib/bufferedwriter/bufferedwriter.go (L67)).
It may be very convenient in some situations (to not log such
unimportant errors), but it's too implicit and unsafe for the others.
For example, if you close [export
API](https://docs.victoriametrics.com/#how-to-export-time-series) client
connection in the middle of communication, VictoriaMetrics won't notice
it and will start to hog CPU by exporting all the data into nowhere
until it process all of them. If you'll make a few retries, it will be
effectively a DoS on the server.
This commit replaces this implicit error suppressing with explicit error
handling which fixes the issue with export API.
Issue was introduced at e78f3ac8ac
### Describe Your Changes
docs/vmanomaly: release v1.19.2 (patch that addresses some of the bugs
found in 1.19.0)
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Mathias Palmersheim <mathias@victoriametrics.com>
### Describe Your Changes
- add example a generic example to vmauth docs
- add an multi-tenancy usage example to VictoriaLogs docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This should reduce the time needed for opening the storage with retentions exceeding a few months.
While at at, limit the concurrency of opening partitions in parallel to the number of available CPU cores,
since higher concurrency may increase RAM usage and CPU usage without performance improvements
if opening a single partition is CPU-bound task.
This is a follow-up for 17988942ab
Use the testing.Testing() function in order to determine whether the code runs in test.
This allows running tests and fast speed without the need to specify DISABLE_FSYNC_FOR_TESTING
environment variable.
This is a follow-up for the commit 334cd92a6c
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6871
### Describe Your Changes
- Added the `fields_limit` parameter for the `hits` query to limit the
number of returned fields. This reduces the response size and decreases
memory usage.
#### Legend Menu:
A context menu has been added for legend items, which includes:
1. Metric name
2. **Copy _stream name** - copies the full stream name in the format
`{field1="value1", ..., fieldN="valueN"}`.
3. **Add _stream to filter** - adds the full stream value to the current
filter:
`_stream: {field1="value1", ..., fieldN="valueN"} AND (old_expr)`.
4. **Exclude _stream from filter** - excludes the stream from the
current filter:
`(NOT _stream: {field1="value1", ..., fieldN="valueN"}) AND (old_expr)`.
5. List of fields with options:
- Copy as `field: "value"`.
- Add to filter: `field: "value" AND (old_expr)`.
- Exclude from filter: `-field: "value" AND (old_expr)`.
6. Total number of hits for the stream.
Related issue: #7552
<details>
<summary>UI Demo - Legend Menu</summary>
<img width="400"
src="https://github.com/user-attachments/assets/ee1954b2-fdce-44b4-a2dc-aa73096a5414"/>
<img width="400"
src="https://github.com/user-attachments/assets/19d71f04-c207-4143-a176-c5f221592e3d"/>
</details>
---
#### Legend:
1. Displays the total number of hits for the stream.
2. Added hints below the legend for total hits and graph interactions.
3. Click behavior is now the same as in other vmui charts:
- `click` - shows only the selected series.
- `click + ctrl/cmd` - hides the selected series.
<details>
<summary>UI Demo - Legend</summary>
before:
<img
src="https://github.com/user-attachments/assets/18270842-0c39-4f63-bcda-da62e15c3c73"/>
after:
<img
src="https://github.com/user-attachments/assets/351cad3a-f763-4b1d-b3be-b569b5472a7c"/>
</details>
---
#### Tooltip:
1. The `other` label is moved to the end, and others are sorted by
value.
2. Values are aligned to the right.
3. Labels are truncated and always shown in a single line for better
readability; the full name is available in the legend.
<details>
<summary>UI Demo - Tooltip</summary>
| before | after |
|----------|----------|
| <img
src="https://github.com/user-attachments/assets/adccff38-e2e6-46e4-a69e-21381982489c"/>
| <img
src="https://github.com/user-attachments/assets/81008897-d816-4aed-92cb-749ea7e0ff1e"/>
|
</details>
---
#### Group View (tab Group):
Groups are now sorted by the number of records in descending order.
<details>
<summary>UI Demo - Group View</summary>
before:
<img width="800"
src="https://github.com/user-attachments/assets/15b4ca72-7e5d-421f-913b-c5ff22c340cb"/>
after:
<img width="800"
src="https://github.com/user-attachments/assets/32ff627b-6f30-4195-bfe7-8c9b4aa11f6b"/>
</details>
The purpose of extra filters ( https://docs.victoriametrics.com/victorialogs/querying/#extra-filters )
is to limit the subset of logs, which can be queried. For example, it is expected that all the queries
with `extra_filters={tenant=123}` can access only logs, which contain `123` value for the `tenant` field.
Previously this wasn't the case, since the provided extra filters weren't applied to subqueries.
For example, the following query could be used to select all the logs outside `tenant=123`, for any `extra_filters` arg:
* | union({tenant!=123})
This commit fixes this by propagating extra filters to all the subqueries.
While at it, this commit also properly propagates [start, end] time range filter from HTTP querying APIs
into all the subqueries, since this is what most users expect. This behaviour can be overriden on per-subquery
basis with the `options(ignore_global_time_filter=true)` option - see https://docs.victoriametrics.com/victorialogs/logsql/#query-options
Also properly apply apply optimizations across all the subqueries. Previously the optimizations at Query.optimize()
function were applied only to the top-level query.
logger.Fatalf("BUG: ...") complicates investigating the bug, since it doesn't show the call stack,
which led to the bug. So it is better to consistently use logger.Panicf("BUG: ...") for logging programming bugs.
Previously, NewChild elements of querytracer could be referenced by concurrent
storageNode goroutines. After earlier return ( if search.skipSlowReplicas is set), it is
possible, that tracer objects could be still in-use by concurrent workers.
It may cause panics and data races. Most probable case is when parent tracer is finished, but children
still could write data to itself via Donef() method. It triggers read-write data race at trace
formatting.
This commit adds a new methods to the querytracer package, that allows to
create children not referenced by parent and add it to the parent later.
Orphaned child must be registered at the parent, when goroutine returns. It's done synchronously by the single caller via finishQueryTracer call.
If child didn't finished work and reference for it is used by concurrent goroutine, new child must be created instead with
context message.
It prevents panics and possible data races.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8114
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Follow-up for
4574958e2e
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit fixes incorrect behaviour when pressing `Enter` did not execute the query, and
`Shift+Enter` did not insert a new line.
- The issue occurred when autocomplete was disabled.
- This problem affected the query editor in both the VictoriaMetrics UI
and VictoriaLogs UI.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8058
When using Kafka, it's important to estimate the message rate and size
before applying for resources.
This commit explains why and how to use remote write metrics to
do the evaluation.
As reported by a user, the value in [VictoriaMetrics Single helm
chart](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-single/values.yaml#L420)
and actual example, should be "enabled", not "enable". This commit fixes
it.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This is done via 'options(concurrency=N)' prefix for the query.
For example, the following query is executed on at most 4 CPU cores:
options(concurrency=4) _time:1d | count_uniq(user_id)
This allows reducing RAM and CPU usage at the cost of longer query execution times,
since by default every query is executed in parallel on all the available CPU cores.
See https://docs.victoriametrics.com/victorialogs/logsql/#query-options
Also always initialize Query.timestamp with the timestamp from the lexer.
This should avoid potential problems with relative timestamps inside inner queries.
For example, the `_time:1h` filter in the following query is correctly executed
relative to the current timestamp:
foo:in(_time:1h | keep foo)
### Describe Your Changes
- fix formatting in Presets
- change integration guide config according to new format + optimisation
of prophet
- minor fixes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- provide prod ready config examples for vmanomaly
- minor formating fix
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Sync.Pool for readTrackingBody was added in order to reduce potential
load on garbage collector. But golang net/http standard library does not
allow to reuse request body, since it closes body asynchronously after
return. Related issue: https://github.com/golang/go/issues/51907
This commit removes sync.Pool in order to fix potential panic and data
race at requests processing.
Affected releases:
- all releases after v1.97.7
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8051
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Validation expects license key to be present without any addition whitespaces, so having a trailing whitespaces would lead to verification failure.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
docs/vmanomaly: release v1.19.1
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Updated plugins to the latest versions
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Despite requirement in OpenTelemetry spec that histograms should contain
sum, [OpenTelemetry collector promremotewrite
translator](37c8044abf/pkg/translator/prometheusremotewrite/helper.go (L222))
and [Prometheus OpenTelemetry
parsing](d52e689a20/storage/remote/otlptranslator/prometheusremotewrite/helper.go (L264))
skip only sum if it's absent. Our current implementation drops buckets
if sum is absent, which causes issues for users, that are expecting a
similar to Prometheus behaviour
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Updated `vmanomaly` docs to release v1.19.0. Also, slight improvements
on other pages, like FAQ
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This introduces the ability to copy-n-paste queries with $__interval and $__rate_interval
placeholders into VictoriaMetrics - these placeholders are automatically replaced with 1i,
which equals to the `step` query arg value passed to /api/v1/query_range and /api/v1/query.
### Describe Your Changes
optimize for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6226
for user who set `*AuthKey` flag, they will receive new response in
body:
```go
// query arg not set
The provided authKey '' doesn't match -search.resetCacheAuthKey
// incorrect query arg
The provided authKey '5dxd71hsz==' doesn't match -search.resetCacheAuthKey
```
previously, they receive:
```
The provided authKey doesn't match -search.resetCacheAuthKey
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Release procedure was updated on helm charts side, update documented
procedure with up-to-date steps for the release.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Using latest leads to inconsistent results between runs and might also
lead to data corruption since "latest" can be updated by any development
build.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Fixing column widht in tables for components`sections: reader, writer,
monitoring and scheduler
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Enable components to fail faster in case all verification attempts have failed. Currently, there will be a final sleep before returning an error.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Fixes#8019 and adds retention filters to victorialogs roadmap as an
enterprise feature
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
`doInternal` has adaptive staleness detection mechanism. It is
calculated using timestamp distance between samples in selected list of
samples. It is dynamic because VM can store signals from many sources
with different samples resolution. And while it works for most of cases,
there are edge cases for rollup functions that are comparing values
between windows: increase, increase_pure, delta.
The edge case 1.
There was a gap between series because of the missed scrape or two. In
this case staleness will trigger and increase-like functions will assume
the value they need to compare with is 0. In result, this could produce
spikes for a flappy data - see
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/894 This
problem was solved by introducing a `realPrevValue` field -
1f19c167a4.
It stores the closest real sample value on selected interval and is used
for comparison when samples start after the gap.
The edge case 2.
`realPrevValue` doesn't respect staleness interval. In result, even if
gap between samples is huge (hours), the increase-like functions will
not consider it as a new series that started from 0. See
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002.
Covering both edge cases is tricky, because `realPrevValue` has to
respect and not respect the staleness interval in the same time. In
other words, it should be able to ignore periodic missing gaps, but
reset if the gap is too big. While "too big gap" can't be figured out
empirically, I suggest using `-search.maxStalenessInterval` for this
purpose. If `-search.maxStalenessInterval` is set to 0 (default), then
`realPrevValue` ignores staleness interval. If
`-search.maxStalenessInterval` is > 0, then `realPrevValue` respects it
as a staleness interval.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Updated email images with new support email
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Cutting new changelod doc reduces the size of the current's year
changelog and improves navigation for users.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Commit eef6943084 added new test
functions. Which checks various cases for metricName registration at
data ingestion.
Initial dataset size had 4 batches with 100 rows each. It works fine at
machines with 5GB+ memory.
But i386 architecture supports only 4GB of memory per process.
Due to given limitations, batch size should be reduced to 3 batches and
30 rows. It keeps the same
test funtionality, but reduces overall memory usage to ~3GB.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Samples parsing is a hot path. Bad client could easily overwhelm
receiver with bad or unsupported data. So it is better to throttle such
messages.
Follow-up after
b26a68641c
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Process values in batches instead of passing every value in the callback.
This improves performance of reading the encoded values from storage by up to 50%.
### Describe Your Changes
- fix table rendering on writer and scheduler pages
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
added search page required for docs site
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
New version has additional checks and reduced resource consumption, so
it doesn't timeout for our internal repos.
To make linter happy, I addressed "redefinition of the built-in
function" lint error.
----
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- url encoding / decoding with <urlencode:field> and <urldecode:field>
- base64 encoding / decoding with <base64encode:field> and <base64decode:field>
- hex encoding / decoding with <hexencode:field> and <hexdecode:field>
- hex encoding for integers with <hexnumencode:field> and <hexnumdecode:field>
This allows reducing the state of every statsProcessor by removing pointer to the corresponding statsFunc.
For example, this reduces statsCountProcessor size by 2x.
Previoysly finalizeStats() for some functions such as count_uniq() could run for long periods
of time after the query is canceled, since stopCh wan't propagated to finalizeStats().
Prevsiously integer values were converted to strings before being passed to `updateState()` function at `count_uniq`
and `count_uniq_hash`. Later such values are converted back to integers in order to track them via integer map of unique values.
This commit avoids the int -> string -> int conversion. Instead, it passes integers directly to the integer map of unique values.
This improves performance of `count_uniq` and `count_uniq_hash` functions even further.
This filter can be used when debugging and exploring logs in order to understand better
which value types are used for storing the particular log fields.
The `value_type` filter complements `block_stats` pipe.
Add exclude google.golang.org/grpc/stats/opentelemetry v0.0.0-20240907200651-3ffb98b2c93a to go.mod
according to https://github.com/googleapis/google-cloud-go/issues/11283#issuecomment-2558515586 .
This fixes the following strange issue on `make vendor-update`:
cloud.google.com/go/storage imports
google.golang.org/grpc/stats/opentelemetry: ambiguous import: found package google.golang.org/grpc/stats/opentelemetry in multiple modules:
google.golang.org/grpc v1.69.0 (/go/pkg/mod/google.golang.org/grpc@v1.69.0/stats/opentelemetry)
google.golang.org/grpc/stats/opentelemetry v0.0.0-20240907200651-3ffb98b2c93a (/go/pkg/mod/google.golang.org/grpc/stats/opentelemetry@v0.0.0-20240907200651-3ffb98b2c93a)
- Pass the calculated results to the next pipe in float64 columns.
Previously the results were converted to string columns. This could slow down further calculations.
- Use custom optimized logic for processing numeric columns, which are passed to math pipe.
Previously all the input columns were converted to string and then converted to float64
before math pipe calculations.
- Initialize the newly added columns at blockResult as soon as they are added.
This improves performance when big number of columns are calculated by math pipe.
Previously integer values were tracked in string maps. Now every input value is parsed as integer.
On success the parsed integer is tracked via specialized maps, which hold only integers.
This reduces CPU usage and memory usage in general case.
Use the column name attached to the corresponding part. The lifetime of this column name exceed the blockSearch lifetime,
so it is safe using it here.
This is a follow-up for 8d968acd0a
Previously columns with negative int64 values were stored either as float64 or string
depending on whether the negative int64 values are bigger or smaller than -2^53.
If the integer values are smaller than -2^53, then they are stored as string, since float64 cannot
hold such values without precision loss. Now such values are stored as int64.
This should improve compression ratio and query performance over columns with negative int64 values.
Previously field values could be automatically converted to float64 with precision loss.
This could lead to unexpected results when querying such field values.
For example, "10007199254740992" was incorrectly represented as 10007199254740993.
This commit prevents from such lossy conversions when storing field values.
While at it, prevent from int64 overflow at tryParseBytes and tryParseDuration functions,
which are used for parsing constants in queries for byte sizes and durations.
Now these functions return 1<<63-1 (the maximum int64 value) for constants exceeding
this value. Previously they could return arbitrary garbage for such constants.
1. Do not copy every line from LineReader.buf to LineReader.Line - just refer the line at LineReader.buf.
2. Do not copy the next found line to the beginning of LineReader.buf - just track the next line start index with LineReader.bufOffset.
This reduces memory copying when many lines are read into LineReader.buf by a single read() syscall.
When vmselect process a rollup function it fetches all the raw samples
on requested `start-end` interval of the query. It then loops through
the raw samples, picks the range of the samples based on provided `step`
interval and invokes a rollup function for each of the picked ranges of
samples.
During this processing, vmselect always populates the `realPrevValue`
field with the closest previous raw sample value before the picked range
of samples. This `realPrevValue` is used by rollup functions like
increase_pure or delta to decide whether the counter change happened or
not. For example, we get the counter value == 1. If we've seen this
counter before and its value was also 1 - then no change happened. If we
didn't see it before, then this counter should have started with value=0
and we need to account for `1-0=1` change. All this is required to deal
with situations when scrapes are missing or `step` is too small.
However, vmselect doesn't check how "old" is the `realPrevValue`. In
other words, it doesn't respect the staleness interval when picking it.
In result, depending on the `start` and `end` params, vmselect can use
`realPrevValue` which is a couple of hours old and is unlikely to be a
temporary scrape fail. In result, some increases can be incorrectly
ingnored by vmselect.
This change makes sure that vmselect doesn't populate `realPrevValue`
with samples that are older than staleness interval.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
-------------------
To reproduce, create a dataset with one metric `foo` which has samples
with value=1 on interval of couple of hours and resolution 15s, and a
gap for an hour in the middle:
<img width="769" alt="image"
src="https://github.com/user-attachments/assets/a39b2740-b741-45f8-ad18-093b7c57c3b3"
/>
Then run `increase(foo[1m])` expression on this time range (disable
cache):
<img width="1472" alt="image"
src="https://github.com/user-attachments/assets/463cece1-f359-4c75-a96c-60092a31cab2"
/>
In result, there will be one increase on the beginning of the series.
And no increase after the gap. Then change the time range so it starts
in the middle of the gap:
<img width="1505" alt="image"
src="https://github.com/user-attachments/assets/f4a460c3-9fd1-4ec7-ab47-15e716ec1019"
/>
Now, there is an increase>0 because the `realPrevValue` wasn't
populated. This is wrong, because it hides the increase of the series.
With the fix, the original increase query on full time range should show
2 increases:
<img width="1492" alt="image"
src="https://github.com/user-attachments/assets/aa9d8a6b-7b22-41f6-9eb9-83b3113a6982"
/>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This chapter is needed for referring from Github issues when CPU or memory profiles are needed to be collected
in order to investigate issues with high CPU and/or RAM usage at VictoriaLogs.
Since
44b071296d
`evalNumber` function no longer updating MetricName tenancy information.
This leads to mismatch in metric names between the query result and
evaluated number for all tenants other than 0:0.
For example, query `count(up) or 0` will return different results for
tenants 0:0 and 1:1 (assuming up is present for both tenants):
- tenant 0:0 - will only contain result of `count(up)`
- tenant 1:1 - will return both `count(up)` and `0` since metric names
will not be matched
This restores setting of tenancy information for metric name for
single-tenant queries.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7987
---
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Hint allows to choose type of cache to be used for index search:
- in-memory parts are storing recently ingested samples and should use
main cache. This improves ingestion speed and cache hit ration for
queries accessing recently ingested samples.
- merges of file parts is performed in background, using a separate
cache allows avoiding pollution of the main cache with irrelevant
entries.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
This commit makes configurable interval for checking if final dedup
process for the historical data should be started. It allows to spread
resource utilisation for multiple vmstorage/vmsingle instances in time.
Since final dedup may add additional preasure on disk, backup systems
and make cluster less stable. Storage unconditionally adds 25% jitter to
the provided value, it should simplify configuration management at
Kubernetes ecosystem. Because Kubernetes application pods must have the
same configuration.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7880
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
It reduces memory usage during tests execution. It makes tests execution more reliable.
Since it sometimes crashes with OOM at small github runners.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
recently new datadog extension was released, where custom endpoint
configuration was added
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
commitL c7fc0d0d2f enabled skipping alerts
in case there is no labels present for an alert. This made clause which
was adding a comma for the JSON list incorrect as it is not possible to
determine if the next alert will be skipped or not.
This fix renders all alert labels in advance allowing properly format
JSON payload for Alertmanager notification.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7985
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
fix function name in comment
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: cuiweiyuan <cuiweiyuan@aliyun.com>
### Describe Your Changes
Fixes error in `vmauth` when discovering ipv6 addresses.
`vmauth` attempts to [slice till
`:`](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/auth_config.go#L397)
in the discovered addresses without accounting for ipv6. This causes it
to fail in ipv6 only environments.
```sh
$ nslookup vmselect.ns.svc.cluster.local
...
Name: vmselect.ns.svc.cluster.local
Address: 2600:dead:beef:dead:beef::8
```
```sh
$ kubectl logs -f vmauth
...
error: dial tcp: lookup 2600: no such host
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
This pull request adds support for autocomplete in LogsQL queries. The
new feature provides suggestions for field names, field values, and pipe
names as you type.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
added links to badges and made them clickable at
docs.victoriametrics.com
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
The point of the new section is to highlight publicly available
playgrounds for users. All of them were mentioned in other parts of the
documentation, but we didn't have all of them in one place before.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
clarify extra resource is needed when downsampling with filter(s) or
retention filter(s) is applied
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Binary operations like `exprFirst op exprSecond` in VictoriaMetrics are
performed in the following way:
1. Execute exprFirst.
2. Extract **common label filters** from the result of step 1.
3. Apply these common label filters to `exprSecond` and execute it, in
order to retrieve less time series from vmstorage nodes.
In step 2, only labels with less than `100` (hard-coded) value could be
used as **common label filter** (e.g. `{common_lb=~"v1|v2|...|v100"}`.
In our scenarios, a label, take `instance` label as an example, could
has thousands of candidate values. Regarding bring more pressure to
vmstorage node, it's still beneficial if labels with more than 100
values could be used as filter in `exprSecond`, with enough vmstorage
resources. After adjusting the value from `100` to `10000`, our query
round-trip time drops significantly from 5s to 2s.
This pull request change the hard-coded value into a configurable flag.
storageNode sorting should be BUGFIX, since previously vminsert performed sort and this behaviour was changed.
Also this change only affects OSS version
### Describe Your Changes
Parse cache is a pretty simple implementation of cache. It's just a
standard map with mutex.
Map with mutex overall has poor performance, plus when the cache
overflow occurs, the whole cache locks until 1k elements have been
deleted (now it's 10% of 10000 max elements in the cache). To avoid this
bottleneck and improve performance of cache on systems with many CPU
cores but keep it rather simple, we can implement cache with per bucket
locks like it's done in fastcache. The logic and API remain the same. So
now each bucket will have a map with approximately 78 elements (with 128
buckets), and overflow will occur now for each bucket, and only 7
elements need to be deleted.
Because exec_test.go has about 10k lines of code, it's better to move
the cache into a separate file to add tests and benchmarks for it,
because now it does not have them.
```
goos: windows
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql
cpu: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
Current cache implementation performance on 8 cores:
BenchmarkCachePutNoOverFlow-8 1932 618372 ns/op 253 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-8 6547 211527 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-8 1873 621718 ns/op 261 B/op 0 allocs/op
BenchmarkCachePutOverflow-8 2262 464328 ns/op 32 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-8 1764 655866 ns/op 38 B/op 0 allocs/op
New cache implementation performance on 8 cores:
BenchmarkCachePutNoOverFlow-8 10408 111412 ns/op 0 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-8 22407 52809 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-8 6583 168088 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutOverflow-8 9822 117212 ns/op 2 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-8 6481 175952 ns/op 3 B/op 0 allocs/op
Current cache implementation performance on 16 cores:
BenchmarkCachePutNoOverFlow-16 2331 475307 ns/op 218 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-16 6069 196905 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-16 1870 644236 ns/op 262 B/op 0 allocs/op
BenchmarkCachePutOverflow-16 2296 509279 ns/op 34 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-16 1726 671510 ns/op 45 B/op 0 allocs/op
New cache implementation performance on 16 cores:
BenchmarkCachePutNoOverFlow-16 13549 82413 ns/op 0 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-16 30274 38997 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-16 8512 126239 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutOverflow-16 13884 88124 ns/op 1 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-16 7903 131299 ns/op 3 B/op 0 allocs/op
```
From the benchmarks above, we can see that the new implementation is ~5
times faster than the old one.
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
In order for third-party tooling to identify the source repository of
VictoriaMetrics, add the org.opencontainers.image label to the
Dockerfiles. This enables a whole suite of tools that scan container
images to further correlate data with the source code.
The lack of these annotations can be identified using docker:
```shell
docker pull victoriametrics/victoria-metrics
docker inspect victoriametrics/victoria-metrics
```
```jsonc
// ...
"Labels": null
// ...
```
If we try an image that has the annotations, we'll see more output.
```shell
docker pull traefik
docker image inspect traefik
```
```jsonc
// ...
"Labels": {
"org.opencontainers.image.description": "A modern reverse-proxy",
"org.opencontainers.image.documentation": "https://docs.traefik.io",
"org.opencontainers.image.source": "https://github.com/traefik/traefik",
"org.opencontainers.image.title": "Traefik",
"org.opencontainers.image.url": "https://traefik.io",
"org.opencontainers.image.vendor": "Traefik Labs",
"org.opencontainers.image.version": "v3.2.3"
}
// ...
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
consistently use `vmagent_remotewrite_pending_data_bytes` on vmagent dashboard to represent persistent queue size.
`vmagent_remotewrite_pending_data_bytes =
vm_persistentqueue_bytes_pending + pendingInmemoryBytes`
According to panel description, `vmagent_remotewrite_pending_data_bytes`
is more accurate.
>Persistent queue size shows size of pending samples in bytes which
hasn't been flushed to remote storage yet.
And we already use `vmagent_remotewrite_pending_data_bytes` in other two
panels.
44d2205136/dashboards/vmagent.json (L7132)
- removed absolute paths to run without docker
- set cspell to default entrypoint value
- set cspell config path instead of cspell.json copying and removal
Previously, since labels slice is reused for both `ALERTS` and
`ALERTS_FOR_STATE`, metrics might have incorrect labels and affect the
restore process. Tested the fix under `TestAlertingRule_Exec:
"for-pending=>empty"`.
The bug is introduced in
282f13cf11.
Affected versions: v1.106.1, v1.107...v1.108.x
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7796
At Enterprise version of the vmalert, `group` supports `tenant` field.
`tenant` field value must be added to the `datasource` as a part of the URL path prefix.
But VictoriaLogs can obtain tenant information only from `headers` and defined `tenant` breaks requests to the `VictoriaLogs` datasource.
This commit properly checks `datasourceType` and skips adding path prefix if `datasourceType` is `vlogs`.
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
previously vmstorage ignored limit values from vmselect component.
This behavior is prohibited starting from v1.105.0, with
85f60237e2.
This breaks the original intent of the -search.maxUniqueTimeseries command-line flag, which has been added at vmselect nodes in the commit b843f0e : to be able to override the default limit at vmstorage on the number of unique time series, at different subsets of vmselect nodes.
The behavior should be the following:
* If -search.maxUniqueTimeseries command-line flag isn't set at both vmselect and vmstorage nodes, then the limit on the number of unique time series must be automatically detected at vmstorage nodes according to
* vmstorage: automatically adjust -search.maxUniqueTimeseries max value . This simplifies configuration of VictoriaMetrics cluster for the typical case.
* If -search.maxUniqueTimeseries command-line flag is explicitly set at vmstorage node, then it must be used as the limit on the number of unique time series, without automatic detection of the limit. Explicitly set limit at vmstorage node cannot be exceeded by the limit from vmselect nodes.
* If the -search.maxUniqueTimeseries command-line flag is explicitly set at vmselect node, then it must override the automatically detected limit at vmstorage node. For example, if vmselect node provides the limit, which exceeds the automatically detected limit at vmstorage node, then the limit from the vmselect node must be applied during query execution at vmstorage node. This will allow properly executing queries from the subset of vmselect nodes for reporting queries described above.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7852
### Describe Your Changes
Previously, vmctl expect that tag must exist for each measurement, but
it's actually not necessary.
f16a58f14c/app/vmctl/influx/influx.go (L183-L186)
This pull request fix it by removing the check. For influx series
`measurement1_value1{}`, it will be represented as:
```go
Series{
Measurement: "measurement1",
Field: "value1",
LabelPairs: []LabelPair{},
EmptyTags: []string{},
}
```
and searched by the following query:
```sql
select "value1" from "measurement1"
```
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7921
Commit 71bb9fc0d0 introduced a regression.
If labels are empty and relabeling is not configured, influx ingestion hanlder
performed an earlier exit due to TryPrepareLabels call.
Due micro-optimisations for this procotol, this check was not valid.
Since it didn't take in account metircName, which added later and skip metrics line.
This commit removes `TryPrepareLabel` function call from this path and inline it instead.
It properly track empty labels path.
Adds initial tests implementation for data ingestion protocols.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7933
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Fixed typo in contributing.md (enterpriZe -> enterpriSe in the label
name)
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Currently if multiple msgFields are present in a log row it's not
obvious which field is selected as a _msg field. With this PR and order
of msgfield values defined either via headers or query arg params
defines a priority of these values
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7761
### Describe Your Changes
- datadog /api/v2/logs api supports message field in json format, which
is not documented and is used by serverless extension. This PR allows
message field to be both string and object type. Also added support of
not documented timestamp field
- added `-datadog.streamFields` and `-datadog.ignoreFields` flags to
configure default stream fields for datadog logs, where there's no
alternative option to pass extra headers and query args
- added ingest `max` and `min` values of data, which are ingested using
`datadogsketches` API, which is also actively used by serverless
extensions
- use default `.` separator instead of `_` for sketches metric names
until metrics are not sanitized
This should prevent from excess usage of CPU, RAM and other resources when too many logs
are passed to 'stream_context' pipe.
It is expected that 'stream_context' pipe results are investigated by humans, who cannot inspect
surrounding logs for millions of initial logs. That's why it is OK to limit the number of logs
and/or log streams, which can be passed to 'stream_context' pipe.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7766
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7903
While at it, reduce memory allocations at Storage.getFieldValuesNoHits and make it more scalable on multi-CPU systems.
This improves performance of in(<query>) filter when the <query> returns big number of values.
Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size.
This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups.
Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions,
so they are executed faster on hosts with many CPU cores when applied to fields with big number
of unique values.
The commit 4599429f51 improperly set br.cs to nil,
while it should set br.bs to nil instead. This resulted in excess memory allocations
at br.csInit() and br.csInitFast().
### Describe Your Changes
added deprecated form and available from popups in vmanomaly docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- Adds Headers to FAQ questions in vmalert for Victorialogs
- Adds FAQ for multitenant recording rules described in #7656
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Haley Wang <haley@victoriametrics.com>
Historically some of VictoriaMetrics components were optimized for the low rate of memory allocations.
These are: vmagent, single-node VictoriaMetrics and vmstorage. These components benefit from the low
GOGC value, since this allow reducing their memory usage in steady state on typical workloads.
Other VictoriaMetrics components aren't optimized for the reduced rate of memory allocations.
This results in the increased CPU usage spent on garbage collection (GC) in these components,
since it must be triggered at higher rate. See https://tip.golang.org/doc/gc-guide#GOGC for details.
These components do not use too much memory, so it is OK increasing the GOGC for these components
from 30 to 100 - this won't affect the most users.
Keep GOGC to 30 only for vmagent, single-node VictoriaMetrics and vmstorage components.
See 077193d87c and 54b9e1d3cb .
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7902
Numeric fields can be stored as const values in the block of logs. In this case the `sort` pipe
was incorrectly comparing such values as strings instead of numbers. This results in incorrect
sort results. For example, 123 was smaller than 2. Fix this by removing the incorrect case
for comparing const fields.
While at it, replace lessString() with strings.LessNatural() in the sortBlockLess.
This improves sorting performance a bit, since the sortBlockLess function already tried
comparing numeric values, and it doesn't need to spend CPU time on such a comparison again inside lessString() call.
The commit 42c9183281 wasn't correct by replacing strings.LessNatural() with lessString()
inside the sortBlockLess() function.
* simplify wording
* update styles
* remove extra info about go application details. The details are likely
not needed and we didn't have details for rolling-dice app anyway. So
keep it simple for consstency and brevity.
* update navigation for simplicity sake
* fix typos
follow-up after
40b47601d1
Signed-off-by: hagen1778 <roman@victoriametrics.com>
vmauth started to use request.Host after commit
f4776fec1b for`src_hosts` routing rules.
This commit adds http.Request.Host to the debugInfo output in order to
be consistent with routing logic.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
- added VictoriaLogs to OpenTelemetry guide
- updated deprecated dependencies
- added deltatocumulative processor to example and deltatemporality
selector to one of examples to use for counters by default
- added exponential histograms to example
---
Signed-off-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Added `available_from` popup into documentation of vmanomaly
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
While at at, allow passing an array of string values per each JSON entry at extra_filters and extra_stream_filters.
For example, `extra_filters={"foo":["bar","baz"]}` is converted into `foo:in("bar", "baz")` extra filter,
while `extra_stream_fitlers={"foo":["bar","baz"]}` is converted into `{foo=~"bar|baz"}` extra filter.
This should simplify creating faceted search when multiple values per a single log field must be selected.
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7365#issuecomment-2447964259
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5542
### Describe Your Changes
-Deprecate Overview page in Anomaly Detection docs.
- Adding service description to `README.md`
- Moving Licensing information to Quickstart page
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Such log fields do not give any useful information during logs' exploration.
They just clutter the output of the `facets` pipe. So it is better to drop such fields by default.
If these fields are needed, then `keep_const_fields` option can be added to `facets` pipe.
### Describe Your Changes
fixes#7804 by adding alert for missing uptime metric in vmanomaly
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Fix a typo and simplify the statement
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Update docs to reflect the changes introduced in #7767 to fix#5796
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Updated docker.compose.yml, set remotewrite.url to port 8429 so it would
correspond to documentation
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- Remove Loki sink, since it brings more troubles when users try using it in Vector.
For example, it encodes all the log fields as a JSON string and puts it into "message" field.
This results in storing the "message" field with the JSON string containing all the log fields
in VictoriaLogs. This is not what expected - every log field must be stored as a separate field
according to https://docs.victoriametrics.com/victorialogs/keyconcepts/
- Remove 'mode: bulk' option from Elasticsearch sink configuration, since this option is set by default to this value,
so there is no need in explicit setting.
- Add 'compression: gzip' to all the config examples, since the compression reduces the used network bandwidth by 4-5 times,
while it doesn't increase CPU usage too much at both Vector and VictoriaLogs sides. So it is better to enable the compression in config examples.
- Mention about HTTP parameters accepted by VictoriaLogs data ingestion APIs in both examples for Elasticsearch and JSON line protocols.
### Describe Your Changes
This PR fixes#5796. See the points 6 and 7 in `Steps to reproduce`:
> Now let's set time to only 5ms past the timestamp of the first point,
since even 199ms worked for the second point. Surprise, the point isn't
returned 💥:
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456705' -d
'step=1ms' | grep 10 # nothing!```
>
> But, 4ms works: 🤨🤔
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456704' -d
'step=1ms' | grep 10 # found```
This happens so because the actual step becomes 5ms due to jitter being
applied. THe fix is to do not apply jitter if scrape interval was not
detected (the case when vmstorage returns only one result). In this case
the scrape interval is set to `5m+step`.
An integration test has been added to check the steps to reproduce and
then to confirm that fix works. Note that the cluster tests are
currently disabled because the fix is not in cluster branch yet.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously cluster with the following vmselect configuration:
./bin/vmselect
-storageNode=gr1/:8211,gr1/:8212
-storageNode=gr2/:8213,gr2/:8214
-search.skipSlowReplicas=true
-globalReplicationFactor=2
Here we have two vmstorage groups and -globalReplicationFactor=2, which effectively means that "every ingested sample is replicated across multiple vmstorage groups". Hence, gr1 and gr2 contain identical data set. And when we set -search.skipSlowReplicas=true it is expected vmselect should return result as soon as at least one storage group returned the full result.
In current state, -search.skipSlowReplicas is ignored on the storage group level. It is only respected within the group (with -replicationFactor flag).
This commit fixes global replication for skipSlowReplicas.
To ensure that the fix works and does not break
anything replication tests have been added. For checking the fix for
skipping slow replicas see `testGroupSkipSlowReplicas()`.
To emulate storage groups, the integration test creates a cluster with
multilevel vminsert. The L1 inserts are group-level inserts, each writes
to its own group of vmstorages. The L2 vminsert is a global vminsert
that writes replicated to the L1 vminserts.
To enable multilevel inserts changes in apptest framework and
`lib/ingestserver/clusternative/server.go` were necessary.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6924
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously service labels won't be attached when `role: tasks` is set.
Because the `addServicesLabels` function is shared by `role: tasks` and
`role: services`, and it will return nothing when `vip.Addr` is invalid
or empty.
In Prometheus, even if `vip.Addr` is empty, it attach common service
labels with [a standalone
function](f10c3454e9/discovery/moby/services.go (L129)),
which offers:
- `__meta_dockerswarm_service_id`: the id of the service.
- `__meta_dockerswarm_service_name`: the name of the service.
- `__meta_dockerswarm_service_mode`: the mode of the service.
- `__meta_dockerswarm_service_label_<labelname>`: each label of the
service, with any unsupported characters converted to an underscore.
This PR add a `addServicesLabelsForTask`, to replace the usage of
`addServicesLabels` when `role: tasks` is set. This function offers
common service labels listed above.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7800
Previously after configuration reload call `externalURL` templaing function defined at external templates could be lost. Since it was added only at initial `Load` call and never copied during template reload process.
External templates for vmalert could be defined via `-rule.templates` flag.
This commit properly reload external templates. It's no longer copies mutated templates and instead fully reloads it each time if there is any changes.
Previously, during de-duplication staleness markers could be removed due to incorrect logic at
values equality check.
During the evaluation of read query vmselect deduplicates samples using dedupInterval option. It picks the highest value across all points with the same timestamp next to the border of dedupInterval. The issue is any comparison with NaN via <, > returns false. This means that the position of NaN in srcValues could affect the result.
This commit changes this logic with additional step, that explicitly checks for staleness marker for the following cases:
1. Deduplication on vmselect
2. Deduplication in vmstorage during merges
3. Deduplication in stream aggregation
check performed only for stale markers, because other NaNs are rejected on ingestion
by vmstorage or by stream aggregation.
Checking for stale markers in general slows down dedup speed by 3%:
```
benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage
cpu: Apple M4 Pro
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
DeduplicateSamples/minScrapeInterval=1s-14 462.8n ± ∞ ¹ 425.2n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamples/minScrapeInterval=2s-14 905.6n ± ∞ ¹ 903.3n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamples/minScrapeInterval=5s-14 710.0n ± ∞ ¹ 698.9n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamples/minScrapeInterval=10s-14 632.7n ± ∞ ¹ 638.5n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamplesDuringMerge/minScrapeInterval=1s-14 439.7n ± ∞ ¹ 409.9n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamplesDuringMerge/minScrapeInterval=2s-14 908.9n ± ∞ ¹ 882.2n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamplesDuringMerge/minScrapeInterval=5s-14 721.2n ± ∞ ¹ 684.7n ± ∞ ¹ ~ (p=1.000 n=1) ²
DeduplicateSamplesDuringMerge/minScrapeInterval=10s-14 659.1n ± ∞ ¹ 630.6n ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 659.5n 636.0n -3.56%
```
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7674
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.29.0 to 0.31.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b4f1988a35"><code>b4f1988</code></a>
ssh: make the public key cache a 1-entry FIFO cache</li>
<li><a
href="7042ebcbe0"><code>7042ebc</code></a>
openpgp/clearsign: just use rand.Reader in tests</li>
<li><a
href="3e90321ac7"><code>3e90321</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="8c4e668694"><code>8c4e668</code></a>
x509roots/fallback: update bundle</li>
<li>See full diff in <a
href="https://github.com/golang/crypto/compare/v0.29.0...v0.31.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Changed enabled limit condition to `or` instead of `and`. Since labels must checked if at least one of the limits is defined.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Previously, time series with labels exceeding the configured limits were truncated and written to storage, potentially causing data inconsistency. This could lead to collisions between time series and make it difficult to identify the source due to truncated labels.
This commit changes the behavior:
* Such time series are now rejected outright.
* Rejected time series are logged to stdout, and corresponding counters are incremented.
* removes `vm_too_long_label_values_total`, `vm_too_long_label_names_total`, `vm_metrics_with_dropped_labels_total` metrics.
* adds new values `[too_many_labels,too_long_label_name,too_long_label_value]` to `reason` label of the `vm_rows_ignored_total` metric name
related issues:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6928
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7661
This commit aligns behaviour of docker service discovery with Prometheus implementation.
It adds the following changes:
* introduce new config param `match_first_network` with default value of `true`. It uses the first network if the container has multiple networks
defined. It should help to avoid collecting duplicate targets error with multi network setups.
* add `networks` for the containers with linked network to the other containers with `network_mode: container:id` setting. It resolve an issue with attached containers aka `pods` in Kubernetes.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7398
Previous commit b09272ccac added regression, which could lead to the template
global state overwrites.
The issue related to the mechanism how `vmalert` inherits templates. It has global templates, that could be changed via `rule.templates` flag. And local templates defined per labels/annotations for rules and groups.
During labels/annotations templating state could be changed via `define` syntax.
This commit restores previous behavior with `Clone` call for templates before templating labels/annotations.
Affected releases:
- 1.106.1
- v1.102.7
- v1.97.12
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6894
This commit adds ability to launch vmauth without configuration file.
Which is possible use case for operator based installations.
Operator provides global resource `VMAuth` and allows to create
`VMUser` objects for it. Eventually operator creates configuration for
`VMAuth` based on user defined selectors for `VMUser`.
Since there is no direct relations between
those objects. And any object could be created in on-demand by
Kubernetes users. It's required to be able to start `vmauth` with empty
auth config file.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6467
* mention that `debug` messages require -loggerLevel=INFO
* rm version requirement, as mentioned version is pretty old
and it is liklely everyone is using a newer version
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, if only `-dedup.minScrapeInterval` was set without
`downsampling.Period, function
getDownsamplingFilters returned empty result for
downsamplingPeriodFilters. Because it didn't take in
account globalDedup variable.
This commit adds fast path for this case and returns a single
downsampling filter with global interval value.
In addition, it adds the following changes:
* Removes global state modification at ParseDownsamplingPeriods
function. Which could lead to data races at vmselect
* simplifies logic of isDedupNeeded function. Since
donwsamplingPeriodsWithout filters is subset of
dowsamplingPeriodByFilters. There is no need for len check
* Improves tests by proper reset global state of downsampling
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7764
This function calculates the number of unique value hashes. This number is a good approximation
for the number of unique values. The `count_uniq_hash` function uses less memory and works faster
than `count_uniq` when applied to fields with big number of unique values.
* vmbackupmanager: increase min sleep time between scheduling cycles from 0 to 1s to avoid spammed logs.
* Update docs/changelog/CHANGELOG.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The panic may occur when the surrounding logs for some original log entry are empty.
This is possible when these logs were included into surrounding logs for the previous original log entry.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7762
The following patterns are detected:
- `<N>-<N>-<N>-<N>-<N>` is replaced with `<UUID>`.
- `<N>.<N>.<N>.<N>` is replaced with `<IP4>`.
- `<N>:<N>:<N>` is replaced with `<TIME>`. Optional fractional seconds after the time are treated as a part of `<TIME>`.
- `<N>-<N>-<N>` and `<N>/<N>/<N>` is replaced with `<DATE>`.
- `<N>-<N>-<N>T<N>:<N>:<N>` and `<N>-<N>-<N> <N>:<N>:<N>` is replaced with `<DATETIME>`. Optional timezone after the datetime is treated as a part of `<DATETIME>`.
This is useful for detecting patterns across log messages, which differ by various numeric fields,
with the following query:
_time:1h | collapse_nums | top 10 by (_msg)
The max_value_len query arg allows controlling the maximum length of values
per every log field. If the length is exceeded, then the log field is dropped
from the results, since it contains incomplete (misleading) set of most frequently seen field values.
It is impossible to count all the empty value per every seen field,
since they aren't counted for data blocks, which do not contain the given field.
So it is better ignoring empty values in order to reduce the level of confusion
when users see incorrect hits for empty per-field values.
It is very confusing to see incomplete set of values for fields, which contain a subset of short values,
while the rest of values are too long. It is better to ignore all the values in such fields.
It is also very confusing if the list of most frequently values has no an empty value.
So it is better counting hits for an empty value.
### Describe Your Changes
Many users are running k8s-stack in multiple kubernetes clusters and to
configure a proper routing in alertmanager it's required to support
`cluster` label in alerting rules. It's now implemented in helm-chart
hack scripts, but it's tricky part to define if cluster label should be
added or not, when functions has no `by` expression. Updated existing
alerts to provide later an ability to inject cluster label later
Also take into an account `storage.minFreeDiskSpaceBytes` in
`DiskRunsOutOfSpace` alerts
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
`stream_context` is implemented in the way, which needs scanning all the logs for the selected log streams.
The scan performance is usually fast, since the majority of blocks are skipped, since they do not contain
rows with the needed timestamps. But there was a pathological case with `stream_context before N`:
VictoriaLogs usually scans blocks in chronological order. That means that the `before` context logs are constantly
updated with the new logs. This requires reading the actual data for the requested log fields from disk.
The workaround is to split the process of obtaining stream context logs into two phases:
1. Select only timestamps for the stream context logs, whithout selecting other log fields.
This operation is usually much faster than reading the requested log fields.
2. Select stream context logs for the selected timestamps. This operation is usually fast,
since the requested number of context logs is usually not so big.
Performance testing for the new algorithm shows up to 30x speed improvement for `stream_context before N`
and up to 5x speed improvement for `stream_context after N` when applied to log stream with 50M logs.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7637
This endpoint returns the most frequent values per each field seen in the selected logs.
This endpoint is going to be used by VictoriaLogs web UI for faceted search.
Previously such expressions were improperly formatted, which could result
in incorrect calculations at vlogscli.
For example, 'x / (y / z)' was formatted as 'x / y / z',
while 'x - (y + z)' was formatted as 'x - y + z'.
`top` pipe can be confused with the `first` and `last` pipes, so add references to these pipes from the `top` pipe docs.
This should help users locating the needed pipes.
Requests processed by built-in HTTP server has the [origin
form](https://datatracker.ietf.org/doc/html/rfc7230#section-5.3) rather
than the absolute form.
So in[Request.URL](https://pkg.go.dev/net/http#Request), fields other than
Path and RawQuery will be empty.
> // For server requests, the URL is parsed from the URI
> // supplied on the Request-Line as stored in RequestURI. For
> // most requests, fields other than Path and RawQuery will be
> // empty. (See RFC 7230, Section 5.3)
Using `request.Host` field instead to match `src_hosts` fixes issue and allows to route requests properly.
An addition It allows user to route requests with customized `Host` header.
This fixes possible `index out of range` panic when -search.logImplicitConversion
or -search.disableImplicitConversion command-line flags are passed to vmselect
and it tries executing incorrect query with too small number of arguments passed
to rollup function.
`offset` and `limit` pipes cannot be applied individually per every step on the [start ... end] time range,
so they must be disallowed at /select/logsql/stats_query_range.
This is a follow-up for 534371031e
The `first N by (field)` pipe is a shorthand to `sort by (field) limit N`,
while the `last N by (field)` pipe is a shorthand to `sort by (field) desc limit N`.
While at it, add support for partitioning sort results by log groups and applying
individual limit per each group.
For example, the following query returns up to 3 logs per each host with the biggest value
for the `request_duration` field:
_time:5m | last 3 by (request_duration) partition by (host)
This query is equivalent to the following one:
_time:5m | sort by (request_duration) desc limit 3 partition by (host)
Automatically add the 'partition by (_time)` into `sort`, `first` and `last` pipes
used in the query to `/select/logsql/stats_query_range` API.
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7699
Previously streamFields were unconditionally added to log stream fields, even if they were listed in the ignoreFields.
Also do not add extraStreamFields to log stream fields if streamFields is non-nil, since this may confuse users.
This is a follow-up for 17b813ba28
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554
The storage isn't designed to work efficiently with logs containing too many log fields.
It is better to emit a warning to the user and ignore such logs instead of trying to store them.
This will allow fixing the issue by the user ASAP, and won't lead to excess resource usage
at VictoriaLogs side, such as RAM, CPU, disk IO and disk space.
While at it, ignore too long logs with the size exceeding the maximum block size during data ingestion.
This should prevent from possible issues when dealing with such long logs if they were stored in the storage.
Emit a warning in this case, so the user could identify and fix the issue ASAP.
This is a follow-up for 22e6385f56
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568
Previously too long line in Elasticsearch bulk import protocol resulted in clsoing
the client stream and ignoring the rest of log messages in the stream.
Now only the too long message is ignored properly, while the rest of log messages
are read successfully.
This is a follow-up for 61e7c77ce25967269192ed2e201f67d8c48b972e
Previously vl_rows_ingested_total metric was tracked individually per each supported data ingestion protocols.
It is better from maintainability PoV tracking this metric consistently in a single place - at logMessageProcessor.AddRow() function
in the same way as vl_bytes_ingested_total metric is tracked.
This is a follow-up for 50bfa689c9
### Describe Your Changes
docs/vmanomaly: release v1.18.8
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* explicitly mention it is using HTTP protocol
* consistently use `victoriametrics_url` placeholder across the docs
* mention v2 influx format in docs
* consistently remove/add extra newlines for better formatting
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Add cluster replication tests. No group replication yet. Some necessary
enhancements to the apptest framework have been done as well. Also other
existing tests were revisitied to take advantage of new QueryOpts added
by @f41gh7 in #7635.
The tests verify the following scenarios:
1. Data is written to vmstorages multiple times
2. Vmselect deduplicates replicated data
3. Vmselect does not return partial result if it receives responses from
enough replicas
4. Vmselect does not wait for the rest from all replicas (skips slower
ones)
Something similar will be added for storage groups. These tests should
be used to prove that the fix for #6924 works and at the same time does
not break other aspects of replication.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
- docs/vmanomaly - release v1.18.7
- modified table markdown for proper rendering on vmanomaly component
pages
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
docs/vmanomaly - patch release v1.18.6 docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously Block columns wasn't properly limited by maxColumnsPerBlock.
And it was possible a case, when more columns per block added than
expected.
For example, if ingested log stream has many unuqie fields
and it's sum exceed maxColumnsPerBlock.
We only enforce fieldsPerBlock limit during row parsing, which limits
isn't enough to mitigate this issue. Also it
would be very expensive to apply maxColumnsPerBlock limit during
ingestion, since it requires to track all possible field tags
combinations.
This commit adds check for maxColumnsPerBlock limit during
MustInitFromRows function call. And it returns offset of the rows and
timestamps added to the block.
Function caller must create another block and ingest remaining rows
into it.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This metric tracks an approximate amounts of bytes processed when parsing the ingested logs.
The metric is exposed individually per every supported data ingestion protocol. The protocol name
is exposed via "type" label in order to be consistent with vl_rows_ingested_total metric.
Thanks to @tenmozes for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7682
While at it, remove the unneeded "format" label from vl_rows_ingested_total metric.
The "type" label must be enough for encoding the data ingestion format.
Comment broken tests for remote_read integration test.
Prometheus broke library compatibility and it's required to rewrite tests.
Also, test structure and format should be revisited and improved according to our test code style.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Removes global defaultAuthToken, since it's no longer needed.
It was added as fallback for 'remoteWrite.multitenantURL' feature.
This feature was deprecated at v1.102 version and removed.
Updates newRemoteWriteCtxs function, it shouldn't accept auth.Token no longer.
This was also a part of remove feature.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Previously, vmagent produced parsing error for 'multitenant' auth token
value for the cases:
* data ingestion with enableMultitentEndpoints
* data scrapping at promscrape
It's inconsistent to the other VictoriaMetrics components.
Since 'multitenant' is well-known token value for multitenancy via
labels. And vmagent is intended to be compatible with vminsert ingestion
endpoints.
This commit replaces NewToken with NewTokenPossibleMultitenant function
for token parsing. It allows to use multitenant value for it. And it
makes token values consistent for the all components.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7694
Previously ip_filters wasn't properly inited for this part of config.
It resulted to bypass requests for this section.
This commit properly inits `ip_filter`.
Previously, all requests rejected by `ip_filter` were silently aborted.
This commit adds new metrics:
* vmauth_user_ip_denies_total
* vmauth_global_ip_denies_total
* vmauth_unauthorized_user_ip_denies_total
It adds observability to this feature and allow to measure rejected requests.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6883
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Both vmalert and vmalert-tool support multiple `rule_files` and use
directory as a file, so it's ok if some files don't contain any rule
group. But vmalert-tool should warn the user if no rule group is found
in any of the `rule_files`.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7663
Previously, there was no option to replace value of `X-Forwarded-For`
HTTP Header. It was only possible to completely remove it. It's not good
solution, since backend may require this information. But using direct
value of this header is insecure. And requires complex knowledge of
infrastruce at backend side (see spoofing X-Forwarded-For articles).
This commit adds new flag, that replaces content of `X-Forwarded-For`
HTTP Header value with current `RemoteAddress` of client that send
request.
It should be used if `vmauth` is directly attached to the internet.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6883
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
This commit allows vmauth to obtain client IP address from HTTP Headers.
Main scenario for it is vmauth located behind reverse-proxy.
It adds both global and per user configuration settings: -httpRealIPHeader and `real_ip_header` config option.
vmauth try to obtain IP from header if this setting is set. If header is not exists, vmauth fallbacks to `remoteAddress`.
Commit also updates incorrect benchmarks and align test package naming for ip_filters
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6883
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
- Fixes the handling of the `showLegend` flag.
- Fixes the handling of `alias`.
- Adds support for alias templates, allowing dynamic substitutions like
`{{label_name}}`.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7565
### Describe Your Changes
docs/vmanomaly - release 1.18.5 doc updates
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- **Memory Optimization**: Reduced memory consumption on the "Group" and
"JSON" tabs by approximately 30%.
- **Table Pagination**: Added pagination to the "Table" view with an
option to select the number of rows displayed (from 10 to 1000 items per
page, with a default of 1000). This change significantly reduced memory
usage by approximately 75%.
Related to #7185
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Fix wrong path to rules_file in vmalert-tool doc
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Viet Hung Nguyen <hvn@familug.org>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously, this filter did not apply to virtual
machine scale sets, causing all virtual machines to be discovered.
This commit conditionally adds `resource_group` filter for Azure service discovery on virtual
machine scale sets.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7630.
### Describe Your Changes
doing similar changes for both vmagent and vminsert (like one in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7399) ends up
with almost same implementations for each of packages instead of having
this shared code in one place. one of the reasons is the same Timeseries
and Labels structure from different prompb and prompbmarshal packages.
My proposal is to use structures from prompb package only to
marshal/unmarshal sent/received data, but for internal transformations
use only structures from prompbmarshal package
Another example, where it already can help to simplify code is streaming
aggregation pipeline for vmsingle (now it first marshals
prompb.Timeseries to storage.MetricRow and then if streaming aggregation
or deduplication is enabled it unmarshals all the series back but to
prompbmarshal.Timeseries)
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Additional info from the dump can be used to debug rotuing rules.
https://pkg.go.dev/net/http/httputil#DumpRequest
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
LTS versions are out of date, linking to old releases. This updates them
to the latest.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Previously, default dial timeout was used for kubernetes API server connection.
This commit changes it for custom dialer used by the all VictoriaMetrics components. It has lower connection timeout (30s by default).
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7127
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
* cross-link related doc chapters and guides about otel
* mention different URL format for cluster version of VM
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, clicking on Ingestion row could result in a visual blip.
Re-ordering panels within the row seems to fix it.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously multitenant cache was inited before flag.Parse call. It
didn't allow to change cache expiration value and default value was
always used.
This commit adds cache init at the first time cache was called.
Also this commit adds small cache improvements:
* chore for cleanup cache, it now uses common pattern for in-place items
filtering
* fail cache request fast if item is already expired
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This is a follow-up after 3120dc2
- Consistently use key for rollupCache in multitenant mode cache keys use different authTokens. Previously it could lead to panic in rare cases when cache state was inconsistent.
- Do not share `err` variable across goroutines for `processBlock` function. It could lead to data races.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: Andrei Baidarov <abaidarov@yandex.ru>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously, when the alert got resolved shortly before the vmalert
process shuts down, this could result in false alerts.
This change switches vmalert to use MetricsQL function during alerts state restore, which makes it
incompatible for state restoration with PromQL.
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
These are the integration tests that confirm that instant queries may
return stale NaNs when the query contains a rollup function.
The bug was reported at #5806. There is also a fix: #7275. The tests in
this PR will be used co confirm that the fix works.
Some test refactoring has been done along the way. Sorry, couldn't
resist.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
* mention stream aggregation
* rm statement that Prometheus can only pull data, which is not true anymore
* mention absence of backfilling limitations
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
As of right now by default aggregated output in streaming aggregation
takes a staleness interval and only starts sending first samples after
the staleness interval passes. We have a use case where we prefer to
start sending data as soon as we have any. This adds the option to
configure when we start sending first samples
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7116
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
1. **Add new `Raw Query` tab**
A new `Raw Query` tab has been added to the
[vmui](https://docs.victoriametrics.com/#vmui) interface for displaying
raw data. The tab uses the `/api/v1/export` API endpoint. Related issue:
[#7024](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7024)
2. **Fix rendering of isolated points on the graph**
Previously, isolated points (not connected to other points on the left
or right) were not visible on the graph. Now, they are rendered
correctly.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Previously, for `^` aka pow function calls, VictoriaMetrics returned `1`
if left arg was Nan. For example, given query=`(hour()==2)^1` returns 1
for NaN produced by hour() == 2 function. It added additional non-exist
datapoints to the timeseries.
This commit port bugfix from `metricql` package and adds test for it.
Now, VictoriaMetrics
correctly returns `NaN` for such cases.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7359
Signed-off-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Add test cases proving that it is possible to lose indexDB after
changing the retention period. See #7609
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This is a follow-up for 49fe403af1
Commit disables the verbosity in integration
tests after confirming that the tests run in both master and cluster
branches.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Updated the victorialogs data source version to the v0.8.0 release
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Add the ability to create a simple cluster configuration for tests that
do not verify the cluster-specific behavior but instead are focused on
the business logic tests, such as API surface or MetricsQL. For such
tests this cluster configuration will be enough in most cases.
Cluster-specific tests should continue creating custom configurations.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
fixes#7579 7579
Previously, dedup was added as a downsampling rule with 0s offset to all downsmapling rules with filters. That enforced a metric name lookup even in cases it is not needed.
For example, the following configuration: `-dedup.minScrapeInterval=10s -downsampling.period={__name__=~"node.*"}:1h:1m` would be parsed as: `{__name__=~"node.*"}:1h:1m {}:0s:10s`
This commit changes this logic and treats dedup as a separate case. This allows to perform metric name lookups only in cases when timestamp of current partition can be eligible to use some of downsampling filters. Newer parts will not trigger metric name lookup and will apply deduplication directly.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7440
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
After changes at commit 787b9cd. Minimal timestamps for extDB check was performed without context of the index search prefix.
It worked fine for Single node version, but for cluster version a different prefix was used for
metricID search requests. It may lead to incomplete results, if minimal missing timestamp was cached
for the tenant with different ingestion patterns.
Minimal reproducible case is:
- metrics were ingested for tenants 0 and 1
- at some point in time metrics ingestion for tenant 1 stopped
- index records have the following timestamps layout:
tenant 0: 1,2,3,4,5,6
tenant 1: 1,2,3,4
- after indexDB rotation, containsTimeRange lookups may produce
incorrect results:
time range request for tenant 1 - 5:6 caches 5 as min timestamp
request for the same or smaller time range for tenant 0 now returns
empty results.
Second case:
- requests for the tenant without metrics always updates atomic value with incorrect minimal time range for other tenants.
This commit replaces single atomic with map of search prefix keys. It should have slight performance overhead,
but work consistently for cluster version. minMissingTimestamp is cached by prefix search key, which included tenantID.
Since it will be only populated at runtime, it doesn't hold unused tenants for queries.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7417
This commit fixes panic for multitenant requests and empty storage node responses for tenants api.
It also optimizes `populateSqTenantTokensIfNeeded` function calls, by making it only once for query request. Previously it was incorrectly called multiple times per each storage node request.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
This commit makes vmauth respect the routing config for unauthorized
requests for requests that despite having Authorization header failed to
authorize successfully.
It covers the following use-cases:
- vmauth is used at load-balanacer and must forward requests as is. There is no any authorization configs.
- vmauth has authorization config, but it must forward requests with invalid credential tokens to some other backend.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7543
---------
Signed-off-by: Andrii <andriibeee@gmail.com>
### Describe Your Changes
`vmanomaly` docs update with patch release 1.18.3
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
1. Avoid storing the last evaluation results outside of rules, check for
stale time series as soon as possible;
2. remove duplicated template `Clone()`.
This pull request is primarily reducing memory usage when rules produce
large volumes of results, as seen in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6894.
The CPU time spent on garbage collection remains high and may be
addressed in a separate PR.
### Describe Your Changes
docs update for `vmanomaly` release 1.18.2
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Y-min set to 0 gives better understanding of changes, as it shows absolute change.
Otherwise, the panel will show relative change and could make a false impression
of the changes.
Other panels in dashboards are either instant (no historical data displayed),
or already set to Y-min: 0.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The following user-level options must be unconditionally inherited by url_map, since this is what most users expect:
- retry_status_codes
- load_balancing_policy
- drop_src_path_prefix_parts
- discover_backend_ips
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7519
### Describe Your Changes
when migrating from cluster to single, the native export API attach
`vm_account_id` and `vm_project_id` labels.
`--vm-native-disable-binary-protocol` is a workaround to remove them,
which is available since v1.93.0 but did not documented in vmctl doc.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4716
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
More key concept tests
- Verify how the time range points are calculated
- Vefify that a range query is equivalent to many instant queries
Fix docs accordingly.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
The fix at a0a154511a looks too complicated and fragile:
- It moves buMin initialization to the place, which is far from its usage.
- It embeds unclear logic on selecting the proper buMin if it is broken,
into unrelated loop.
The actual fix must be more clear:
$ git diff 95acca6b52 -- app/vmauth/
- if n := bu.concurrentRequests.Load(); n < minRequests {
+ if n := bu.concurrentRequests.Load(); n < minRequests || buMin.isBroken() {
This should simplify further maintenance of this code.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7489
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3061
### Describe Your Changes
doc updates for vmanomaly v1.18.1
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
…ides
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- The --delete option is needed to be passed to rsync during backups,
Since otherwise the backup may contain superfluous files after the second run of rsync,
because these files can be already removed at the source because of background merge
- the --delete option is needed when restoring from backup in order to remove superfluous files
from the destination directory. Otherwise these files may lead to inconsistent data at VictoriaLogs.
Move the repeated check for an empty value into statsUniqValuesProcessor.updateState() function.
This allow removing duplicate code for this check from statsUniqValuesProcessor.updateState() call sites.
* make test suite responisble for stopping apps
* reuse test suite fields to simplify function signatures
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e60cce54a8)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
By default the delay equals to 1 second.
While at it, document refresh_interval query arg at /select/logsql/tail endpoint.
Thanks to @Fusl for the idea and the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7428
This eliminates possible bugs related to forgotten Query.Optimize() calls.
This also allows removing optimize() function from pipe interface.
While at it, drop filterNoop inside filterAnd.
Update figures for the existing RELEX Oy case study.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
Previously, vmauth could have pick `buMin` as least loaded backend
without checking its status. In result, vmauth could have respond to the
user with an error even if there were healthy backends. That could
happen if healthy backends already had non-zero amount of concurrent
requests executing at the moment of least-loaded backend choosing logic.
Steps to reproduce:
1. Setup vmauth with two backends: healthy and non-healthy
2. Execute a bunch of concurrent requests against vmauth (i.e. Grafana
dash reload)
3. Observe that some requests will fail with message that all backends
are unavailable
Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3061
---
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* rename vmsingle test to actually match the mask
* swap got/want arguments, as they were misplaced
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This PR continues the implementation of integration tests (#7199). It
adds the support for vm-single:
- A vmsingle app wrapper has been added
- Sample vmsingle tests that test the VM documentation related to
querying data (#7435)
- The tests use the go-cmp/{cmp,/cmpopts} packages, therefore they have
been added to ./vendor
- Minor refactoring: data objects have been moved to model.go
Advice on porting things to cluster branch:
- The build rule must include tests that start with TestVmsingle
(similarly to how TestCluster tests are skipped in master branch)
- The build rule must depend on `vmstorage vminsert vmselect` instead of
`victoria-metrics`
- The query_test.go can actually be implemented for cluster as well. To
do this the tests need to be renamed to start with TestCluster and the
tests must instantiace vm{storage,insert,select} instead of vmsingle.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
add sorting of logs by groups and within each group by time in desc
order. See #7184 and #7045
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously it incorrectly applied xFilesFactor, if it's value equal to 0.
This commit properly handles this case and returns result according to
the graphite documentation:
`xFilesFactor follows the same semantics as in Whisper storage schemas. Setting it to 0 (the default) means that only a single value in the series needs to be non-null for it to be considered non-empty, setting it to 1 means that all values in the series must be non-null. A setting of 0.5 means that at least half the values in the series must be non-null.`
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Evgeniy Negriy <einegriy@avito.ru>
Loki protocol supports optional `metadata` object for each ingested line. It's added as 3rd field at the (ts,msg,metadata) tuple. Previously, loki request json parsers rejected log line if tuple size != 2.
This commit allows optional tuple field. It parses it as json object and adds it as log metadata fields to the log message stream.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7431
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
User experience suggests that examples shouldn't have `-rule.defaultRuleType=vlogs` set,
as it may confuse users who run vmalert with their existing rules or only use
rules from examples for testing purposes.
This change is supposed to remove the confusion by removing `-rule.defaultRuleType=vlogs`
from default recommendations and explcitily specifying `type` on group level in examples.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix typos in rules definition. Otherwise, they can't pass validation
* add code types for rendered examples
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7301
When querying with condition like `WHERE a=1` (looking for series A),
InfluxDB can return data with the tag `a=1` (series A) and data with the
tag `a=1,b=1` (series B).
However, series B is will be queried later and it's data should not be
combined into series A's data.
This PR filter those series that are not identical to the original query
condition.
For table `example`:
```
// time host region value
// ---- ---- ------ -----
// 2024-10-25T02:12:13.469720983Z serverA us_west 0.64
// 2024-10-25T02:12:21.832755213Z serverA us_west 0.75
// 2024-10-25T02:12:32.351876479Z serverA 0.88
// 2024-10-25T02:12:37.766320484Z serverA 0.95
```
The query for series A (`example_value{host="serverA"}`) and result will
be:
```SQL
SELECT * FROM example WHERE host = "serverA"
```
```json
{
"results": [{
"statement_id": 0,
"series": [{
"name": "cpu",
"columns": ["time", "host", "region", "value"],
"values": [
["2024-10-25T02:12:13.469720983Z", "serverA", "us_west", 0.64],
["2024-10-25T02:12:21.832755213Z", "serverA", "us_west", 0.75],
["2024-10-25T02:12:32.351876479Z", "serverA", null, 0.88],
["2024-10-25T02:12:37.766320484Z", "serverA", null, 0.95]
]
}]
}]
}
```
We need to abandon `values[0]` and `values[1]` because the value of
**unwanted** column `region` is not null.
As for series B (`example_value{host="serverA", region="us_west"}`), no
change needed since the query filter out unwanted rows already.
### Note
This is a draft PR for verifying the fix.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This is a follow up for #7435. Images need to be updated too:
- The time is changed from 10 hrs to 08 hrs
- A missing data point is added to the range query image
- Source escalidraw has been updated as well
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Add puppetdb sd to changelog of `v1.106.0` version.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Adding a blog post that introduces VictoriaMetrics to third party
articles
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Smaine Kahlouch <smainklh@gmail.com>
### Describe Your Changes
Sync list of dashboards to be provided with Prometheus and
VictoriaMetrics' datasources.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This commit adds the following changes:
- Added support to push datadog logs with examples of how to ingest data
using Vector and Fluentbit
- Updated VictoriaLogs examples directory structure to have single
container image for victorialogs, agent (fluentbit, vector, etc) but
multiple configurations for different protocols
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6632
### Describe Your Changes
Christmas is early and you get the first present in the shape of
spelling fixes.
Sorry for the big amount :)
### Checklist
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This commit changes the following:
- The datetime has been fixed so it corresponds to the timestamps in example samples. The datetime now also include the UTC time zone and is changed to adhere ISO format.
- The data points in query range result have been fixed to match the inserted data.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
- remove reference to sparse cache as it was reverted in 9f9cc24e4c
- add reference to 1.102.6 and 1.97.11 LTS releases
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously vmgateway returned error for the requests with multitenant
tenant.
This commit allows to rate limit multitenant requests and apply global
rate limit for it.
Currently it supports only queries for rate limiting.
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7201
This commit also addresses gateway start-up crash if datasource.url is not accessible.
Previously vmgateway could crash at start-up with enabled rate limiting if datasource for metrics
was not avaiable for any reason. It seems, that crash is expected. But in fact it's not. For instance, datasource could be in restart phase.
Replaces crash with log message error. It increased availability of vmgateway component.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
This commit adds `metric_relabel_configs` and `relabel_configs` fields
into the `global` section of scrape configuration file.
New fields are used as global relabeling rules for the scrape targets.
These relabel configs are prepended to the target relabel configs.
This feature is useful to:
* apply global rules to __meta labels from service discovery targets.
* drop noisy labels during scrapping.
* mutate labels without affecting metrics ingested via any of push
protocols.
Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6966
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This commit fixes flaky test TestWriteRead/read/graphite/subquery-aggregation in app/victoria-metrics/main_test.go
The test fails when the test execution falls on the first second of a minute,
for example 6:59:00. In all other cases (such as 6:59:01) the test passes.
The test fails because of the way VictoriaMetrics implements sub-queries: it
aligns the time range to the step. The test config does not account for this.
Assuming that the implementation is correct, the fix is to adjust the test
config so that the data is inserted at intervals other than 1m.
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Updated the versions of the data sources to the latest releases
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Related issue: #7199
This is the initial version of the integration tests for cluster. See
`README.md` for details.
Currently cluster only, but it can also be used for vm-single if needed.
The code has been added to the apptest package that resides in the root
directory of the VM codebase. This is done to exclude the integration
tests from regular testing build targets because:
- Most of the test variants do not apply to integration testing (such as
pure or race).
- The integtation tests may also be slow because each test must wait for
2 seconds so vmstorage flushes pending content). It may be okay when
there are a few tests but when there is a 100 of them running tests will
require much more time which will affect the developer wait time and CI
workflows.
- Finally, the integration tests may be flaky especially short term.
An alternative approach would be placing apptest under app package and
exclude apptest from packages under test, but that is not trivial.
The integration tests rely on retrieving some application runtime info
from the application logs, namely the application's host:port. Therefore
some changes to lib/httpserver/httpserver.go were necessary, such as
reporting the effective host:port instead the one from the flag.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
(cherry picked from commit d7b3589dbd)
In this case the _msg field is set to the value specified in the -defaultMsgValue command-line flag.
This should simplify first-time migration to VictoriaLogs from other systems.
- Remove leading whitespace from the first lines in 'HTTP parameters' chapter.
This whitespace isn't needed for the markdown formatting.
- Add leading whitespace for the second sentence in the list bullet describing AccountID and ProjectID HTTP headers.
This fixes markdown formatting for this list bullet.
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.
If the number of output (bloom, values) shards is zero, then this may lead to panic
as shown at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7391 .
This panic may happen when parts with only constant fields with distinct values are merged into
output part with non-constant fields, which should be written to (bloom, values) shards.
### Describe Your Changes
"Single version" is unclear, since VM is also a single-executable. I
think "single-node" is clearer.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- made small typo fix in case studies
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
fix function name
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This allows reducing the amounts of data, which must be read during queries over logs with big number of fields (aka "wide events").
This, in turn, improves query performance when the data, which needs to be scanned during the query, doesn't fit OS page cache.
This improves performance of `field_values` pipe when it is applied to large number of data blocks.
This also improves performance of /select/logsql/field_values HTTP API.
### Describe Your Changes
docs/vmanomaly - release 1.18.0
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
It is possible for in-memory part to be empty if ingested samples are
removed by retention filters. In this case, data will not be discarded
due to retention before creating in memory part. After in-memory parts
merge samples will be removed resulting in creating completely empty
part at destination.
This commit checks for resulting part and skips it, if it's empty.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7334
available disk space should be
```
(vm_free_disk_space_bytes{job=~...} - vm_free_disk_space_limit_bytes{job=~...})
```
instead of
```
vm_free_disk_space_bytes{job=~...}
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fixes issues with incorrect updating of query and limit fields, and
resolves the problem where the display tab resets.
Related issue: #7279 and #7290
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
- update to recent versions of components
- add information about the license key
- add example configuration for remote write with oAuth identity for
vmagent
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This commit adds Kubernetes Native Sidecar support.
It's the special type of init containers, that have restartPolicy == "Always" and continue to run after container initialization.
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7287
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182
- add a separate index cache for searches which might read through large
amounts of random entries. Primary use-case for this is retention and
downsampling filters, when applying filters background merge needs to
fetch large amount of random entries which pollutes an index cache.
Using different caches allows to reduce effect on memory usage and cache
efficiency of the main cache while still having high cache hit rate. A
separate cache size is 5% of allowed memory.
- reduce size of indexdb/dataBlocks cache in order to free memory for
new sparse cache. Reduced size by 5% and moved this to a separate cache.
- add a separate metricName search which does not cache metric names -
this is needed in order to allow disabling metric name caching when
applying downsampling/retention filters. Applying filters during
background merge accesses random entries, this fills up cache and does
not provide an actual improvement due to random access nature.
Merge performance and memory usage stats before and after the change:
- before

- after

---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This commit fixes the TestStorageRotateIndexDB flaky test reported at:
#6977. Sample test failure: https://pastebin.com/bTSs8HP1
The test fails because one goroutine adds items to the indexDB table
while another goroutine is closing that table. This may happen if
indexDB rotation happens twice during one Storage.add() operation:
- Storage.add() takes the current indexDB and adds index recods to it
- First index db rotation makes the current index DB a previous one
(still ok at this point)
- Second index db rotation removes the indexDB that was current two
rotations earlier. It does this by setting the mustDrop flag to true and
decrementing the ref counter. The ref counter reaches zero which cases
the underlying indexdb table to release its resources gracefully.
Graceful release assumes that the table is not written anymore. But
Storage.add() still adds items to it.
The solution is to increment the indexDB ref counters while it is used
inside add().
The unit test has been changed a little so that the test fails reliably.
The idea is to make add() function invocation to last much longer,
therefore the test inserts not just one record at a time but thouthands
of them.
To see the test fail, just replace the idbsLocked() func with:
```go
unc (s *Storage) idbsLocked2() (*indexDB, *indexDB, func()) {
return s.idbCurr.Load(), s.idbNext.Load(), func() {}
}
```
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Auto-adjust `-remoteWrite.concurrency` cmd-line flags with the number of
available CPU cores in the same way as vmagent does. With this change
the default behavior of vmalert in high-loaded installation should
become more resilient. This change also reduces
`-remoteWrite.flushInterval` from `5s` to `2s` to provide better data
freshness.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
### Describe Your Changes
- release 1.17.2 updates
- added sections on logging and CLI args to docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Changed highlight style for cmd flags
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
docs/vmanomaly: release 1.17.1
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fixed the display of hits chart in VictoriaLogs.
See #7133
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
These caches aren't expected to grow big, so it is OK to use the most simplest cache based on sync.Map.
The benefit of this cache compared to workingsetcache is better scalability on systems with many CPU cores,
since it doesn't use mutexes at fast path.
An additional benefit is lower memory usage on average, since the size of in-memory cache equals
working set for the last 3 minutes.
The downside is that there is no upper bound for the cache size, so it may grow big during workload spikes.
But this is very unlikely for typical workloads.
Partition directories can be manually deleted and copied from another sources such as backups or other VitoriaLogs instances.
In this case the persisted cache becomes out of sync with partitions. This can result in missing index entries
during data ingestion or in incorrect results during querying. So it is better to do not persist caches.
This shouldn't hurt VictoriaLogs performance just after the restart too much, since its caches usually contain
small amounts of data, which can be quickly re-populated from the persisted data.
Unpack the full columnsHeader block instead of unpacking meta-information per each individual column
when the query, which selects all the columns, is executed. This improves performance when scanning
logs with big number of fields.
- Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems.
- Use topN heap sort of per-shard results. This improves performance when results contain millions of entries.
### Describe Your Changes
When debugging unexpected query results, add reduce_mem_usage=1 param to
export query to preserve duplicates.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
docs/vmanomaly: release v1.17.0
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
After adding multitenant query feature at v1.104.0, searchQuery wasn't
properly unmarshalled at bottom vmselect in multi-level cluster setup.
It resulted into empty query responses.
This commit adds fallback to Unmarshal method of SearchQuery to fill
TenantTokens. It allows to properly execute search requests
at vmselect side.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7270
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
The purpose of this change is to reduce confusion between using
`flag.Duration` and `flagutils.Duration`. The reason is that
`flagutils.Duration` was mistakenly used for cases that required `m`
support. See
ab0d31a7b0
The change in name should clearly indicate the purpose of this data
type.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
`flagutil.Duration` docs state that `m` suffix stands for `minute`, but
in fact this suffix is not supported due to ambiguity with `month`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Alexander Frolov <winningpiece@gmail.com>
### Describe Your Changes
If a dict flag has only one value without a prefix it is supposed to
replace default value.
Previously, when flag was set to `-flag=2` and the default value in
`NewDictInt` was set to 1 the resulting value for any `flag.Get()` call
would be 1 which is not expected.
This commit updates default value for the flag in case there is only one
entry for flag and the entry is a number without a key.
This affects cluster version and specifically `replicationFactor` flag
usage with vmstorage [node
groups](https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect).
Previously, the following configuration would effectively be ignored:
```
/path/to/vmselect \
-replicationFactor=2 \
-storageNode=g1/host1,g1/host2,g1/host3 \
-storageNode=g2/host4,g2/host5,g2/host6 \
-storageNode=g3/host7,g3/host8,g3/host9
```
Changes from this PR will force default value for `replicationFactor`
flag to be set to `2` which is expected as the result of this
configuration.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
1. Verify if field in [fields
pipe](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe)
exists. If not, it generates a metric with illegal float value "" for
prometheus metrics protocol.
2. check if multiple time range filters produce conflicted query time
range, for instance:
```
query: _time: 5m | stats count(),
start:2024-10-08T10:00:00.806Z,
end: 2024-10-08T12:00:00.806Z,
time: 2024-10-10T10:02:59.806Z
```
must give no result due to invalid final time range.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields.
For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log.
Such logs are also known as "wide events".
The previous storage format was optimized for logs with a few fields. When at least a single field
was referenced in the query, then the all the meta-information about all the log fields was unpacked
and parsed per each scanned block during the query. This could require a lot of additional disk IO
and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset)
index per each field in every data block. This index allows reading and extracting only the needed
metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ).
This allows increasing performance for queries over wide events by 10x and more.
Another issue was that the data for bloom filters and field values across all the log fields except of _msg
was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ).
This could result in huge disk read IO overhead when some small field was referred in the query,
since the Operating System usually reads more data than requested. It reads the data from disk
in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB).
So, if 512-byte bloom filter or values' block is read from the file, then the Operating System
reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible
for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache),
but this overhead may become very annoying when performing the query over large volumes of data
which isn't present in OS page cache.
The solution for this issue is to split bloom filters and field values across multiple shards.
This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards,
while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N.
Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases
performance for queries over large volumes of newly ingested data by up to 1000x.
The new storage format is versioned as v1, while the old storage format is version as v0.
It is stored in the partHeader.FormatVersion.
Parts with the old storage format are converted into parts with the new storage format during background merge.
It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously it was incorrectly used append for pre-allocated slice of labels.
This commit fixes slice append by allocating zero length slice with needed capacity.
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
### Describe Your Changes
- Added functionality to cancel running queries on the Explore Logs and
Query pages.
- The loader was changed from a spinner to a top bar within the block.
This still indicates loading, but solves the issue of the spinner
"flickering," especially during graph dragging.
Related issue: #7097https://github.com/user-attachments/assets/98e59aeb-905b-4b9d-bbb2-688223b22a82
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.
### Describe Your Changes
Fixed VictoriaLogs HA examples references in docs
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
s.partitions can be changed when new partition is registered or when old partition is dropped.
This could lead to data races and panics when s.partitions slice is accessed by concurrently executed queries.
The fix is to make a copy of the selected partitions under s.partitionsLock before performing the query.
This localizes blockSearch.getColumnsHeader() call at block_search.go .
This call is going to be optimized in the next commits in order to avoid
unmarshaling of header data for unneeded columns, which weren't requested
by getConstColumnValue() / getColumnHeader().
Refer the original byte slice with the marshaled columnsHeader for columns names and dictionary-encoded column values.
This improves query performance a bit when big number of blocks with big number of columns are scanned during the query.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Related issue: #7142
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
added opentelemetry exponential histograms support. Such histograms are automatically converted into
VictoriaMetrics histogram with `vmrange` buckets.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
there's an extra `"` at the end of the dashboard url for this alert;
remove it by making the quoting consistent with other alerts in this
file.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Co-authored-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
### Describe Your Changes
Fixed button name in the cloud docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Update images with updated interface of the cloud solution
### Checklist
The following checks are **mandatory**:
- [ x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This PR is based on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6777. The
differences are the following:
* it keeps backward compatibility for links
* it re-structures only original document file
* it adds #common-mistakes section, re-phrased
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Improperly written pipes could be silently parsed as filter pipe.
For example, the following query:
* | by (x)
was silently parsed to:
* | filter "by" x
It is better to return error, so the user could identify and fix invalid pipe
instead of silently executing invalid query with `filter` pipe.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
docs/vmanomaly: updates for v1.16.3
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
The flags docs mention the flag that does not exist (and never existed).
Perhaps that was a typo.
`s/retryMaxInterval/retryMaxTime/g`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Currently, vmagent always uses a separate `http.Client` for every group
watcher in Kubernetes SD. With a high number of group watchers this
leads to large amount of opened connections.
This PR adds 2 changes to address this:
- re-use of existing `http.Client` - in case `http.Client` is connecting
to the same API server and uses the same parameters it will be re-used
between group watchers
- HTTP2 support - this allows to reuse connections more efficiently due
to ability of using streaming via existing connections.
See this issue for the details and test results -
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
evalInstantRollup could have overreport the number of fetched series if
`offset` checks will result into retry. This change updates fetched
series only if these checks were successful.
It also adds a comment to another potential place of over-reporting
series fetched. It doesn't fix it, because it would require spending
extra resources on such a check, while discrepancy in seriesFetched
doesn't affect calculations in any way.
Probably related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7170
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
### Describe Your Changes
docs/vmanomaly: remove duplicate header in VmWriter docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
docs for `vmanomaly`, updated after release 1.16.2
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This PR fixes#7062
For hijacked connections, one has to read from the connection buffer,
but still write directly to the connection. Otherwise, when reading
directly from such connections, the first byte may be lost. This, in
turn corrupts the ClientHello TLS handshake message and when the backend
server receives it, it closes the connection and reports the following
error in the log:
```
http: TLS handshake error from 127.0.0.1:33150: tls: first record does not look
like a TLS handshake
```
The first byte may be lost because underlying HTTP request handler may
read it from the connection and put it into the buffer. As the result,
subsequent connection reads won't see that byte.
- See: https://github.com/golang/go/issues/27408
- The fix is taken from : https://github.com/k3s-io/k3s/pull/6216
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
fix of typos and improper version references in code snippets of example
usage
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
`vmanomaly` patch release 1.16.1 updates
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
`vm_rows_ignored_total` metric is a metric for users to signalize about
ingestion issues, such as bad timestamp or parsing error.
In commit
a5424e95b3
this metric started to increment each time vmstorage gets NaN. But NaN
is a valid value for Prometheus data model and for Prometheus metrics
exposition format. Exporters from Prometheus ecosystem could expose NaNs
as values for metrics and these values will be delivered to vmstorage
and increment the metric.
Since there is nothing user can do with this, in opposite to parsing
errors or bad timestamps, there is not much sense in incrementing this
metric. So this commit rolls-back `reason="nan_value"` increments.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
### Describe Your Changes
update `vnanomaly` versions in examples
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
doc updates for vmanomaly v1.16.0
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Added ability to query data across multiple tenants. See:
VictoriaMetrics/VictoriaMetrics#1434
Currently, the following endpoints work with multi-tenancy:
- /prometheus/api/v1/query
- /prometheus/api/v1/query_range
- /prometheus/api/v1/series
- /prometheus/api/v1/labels
- /prometheus/api/v1/label/<label_name>/values
- /prometheus/api/v1/status/active_queries
- /prometheus/api/v1/status/top_queries
- /prometheus/api/v1/status/tsdb
- /prometheus/api/v1/export
- /prometheus/api/v1/export/csv
- /vmui
A note regarding VMUI: endpoints such as `active_queries` and
`top_queries` have been updated to indicate whether query was a
single-tenant or multi-tenant, but UI needs to be updated to display
this info.
cc: @Loori-R
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Create blockResultColumn.forEachDictValue* helper functions for visiting matching
dictionary values. These helper functions should prevent from counting dictionary values
without matching logs in the future.
This is a follow-up for 0c0f013a60
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152
* Replaces deprecated graphs with Timeseries panels
* Adds new latency dashboards for rest client and golang scheduler
* Adds new overview panels
* Adds VM Datasource version of dashboard
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
When ingesting samples with the same labels(duplicated samples or
samples with the same labels after `by` or `without` options). They
could register different entries for the same labelset in
LabelsCompressor.
For example, both index 99 and 100 can be assigned to label `foo=1` in
two concurrent pushes. Then due to differing label indexes in encoded
keys, the samples will appear as distinct in aggrState, resulting in
duplicated results after decompressing the label indexes.
fbde238cdc/lib/streamaggr/streamaggr.go (L933)
In this pull request, since we need to store `idxToLabel` first to
ensure the idx can be searched after `lc.labelToIdxStore`,
the `lc.idxToLabel` still could contain a duplicated entries
[100]="foo=1". But given the low likelihood of this issue and the size
of idxToLabel, it should be fine.
Change the default value of the maxDeleteSeries flag to 1 million. This
is a follow up for ed5da38ede
---
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
Introduce the `-search.maxDeleteSeries` flag that limits the number of
time series that can be deleted with a single
`/api/v1/admin/tsdb/delete_series` call.
Currently, any number can be deleted and if the number is big (millions)
then the operation may result in unaccounted CPU and memory usage spikes
which in some cases may result in OOM kill (see #7027). The flag limits
the number to 30k by default and the users may override it if needed at
the vmstorage start time.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Current doc is using per-url deduplication, and users might use this example
when they have more than 1 remoteWrite URL. Which would result into extra resource usage.
Changing the example to use global dedup, as it makes more sense.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously the phrase filter with `!` was treated unexpectedly.
For example, `foo!bar` filter was treated at `foo AND NOT bar`,
while most users expect that it matches "foo!bar" phrase.
This commit aligns with users' expectations.
encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values.
So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.
This simplifies pipeProcessor initialization logic a bit.
This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.
### Describe Your Changes
Renamed base compose files to prevent envs to be created from them
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This allows executing queries with `stats` pipe, which calculate multiple results with the same functions,
but with different `if (...)` conditions. For example:
_time:5m | count(), count() if (error)
Previously such queries couldn't be executed becasue automatically generated name for the second result
didn't include `if (error)`, so names for both results were identical - `count(*)`.
Now the following queries are equivalents:
_time:5s | sort by (_time)
_time:5s | order by (_time)
This is needed for convenience, since `order by` is commonly used in other query languages such as SQL.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Before, single and cluster deployments were provisioned with both
Grafana datasources: single and cluster. But this resulted into a
problem: single DS didn't work for cluster and vice versa. See
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7113
This PR splits datasource file into 2 files: single and cluster. Now,
these files are separately provisioned to single and cluster deployments
correspondingly.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Use fluentd logging driver in examples to have enriched data in
VictoriaLogs
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
The change should help users to understand what happens on labels
conflict.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Marked fluentd in victorialogs roadmap
Added fluentd syslog example setup
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- Show the time range in the tooltip when hovering over staircase
graphs.
- Use bolder lines for staircase graphs.
- Increase the number of steps on the staircase graph to 100.
- Reduce the maximum width of the tooltip to 1/3 of the screen.
- Insert only the label name under the cursor into the query input field
when `Ctrl`-clicking the line legend.
See [this
comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6545#issuecomment-2336805237).
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
1) Changed table settings from a popup to a modal window to simplify
future functionality additions.
2) Added functionality to save selected columns when data is modified or
the page is reloaded. See #7016.
<details>
<summary>Example screenshots</summary>
<img alt="demo-1" width="600"
src="https://github.com/user-attachments/assets/a5d9a910-363c-4931-8b12-18ea8b3d97d8"/>
</details>
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Previously only logs inside the selected time range could be returned by stream_context pipe.
For example, the following query could return up to 10 surrounding logs only for the last 5 minutes,
while most users expect this query should return up to 10 surrounding logs without restrictions on the time range.
_time:5m panic | stream_context before 10
This enables the ability to implement stream context feature at VictoriaLogs web UI: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7063 .
Reduce memory usage when returning stream context over big log streams with millions of entries.
The new logic scans over all the log messages for the selected log stream, while keeping in memory only
the given number of surrounding logs. Previously all the logs for the given log stream on the selected time range
were loaded in memory before selecting the needed surrounding logs.
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 .
Reduce the scan performance for big log streams by fetching only the requested fields. For example, the following
query should be executed much faster than before if logs contain many fields other than _stream, _msg and _time:
panic | stream_context after 30 | fields _stream, _msg, _time
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.
While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.
Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
### Describe Your Changes
VictoriaLogs allows logs without `_msg` field or `_msg` field is empty.
This lead to incorrect search result. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785
This pull request search for non-empty `_msg` field before log entry is
added to `LogRows`.
New counter `vl_rows_dropped_total{reason="msg_not_exist"}` is
introduced.
Example log output:
```
2024-09-23T02:33:19.719Z warn app/vlinsert/insertutils/common_params.go:189 dropping log line without _msg field; [{@timestamp 2024-09-18T13:42:16.600000000Z} {Attributes.array.attribute ["many","values"]} {Attributes.boolean.attribute true} {Attributes.double.attribute 637.704} {Attributes.int.attribute 10} {Attributes.map.attribute.some.map.key some value} {Attributes.string.attribute some string} {Body Example ddddddddddlog record} {Resource.service.name my.service} {Scope.my.scope.attribute some scope attribute} {Scope.name my.library} {Scope.version 1.0.0} {SeverityNumber 10} {SeverityText Information} {SpanId eee19b7ec3c1b174} {TraceFlags 0} {TraceId 5b8efff798038103d269b633813fc60c}]
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- [ ] Benchmark for potential performance loss.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously the original timestamp was used in the copied query, so _time:duration filters
were applied to the original time range: (timestamp-duration ... timestamp]. This resulted
in stopped live tailing, since new logs have timestamps bigger than the original time range.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7028
This pipe is useful for debugging purposes when the number of processed blocks must be calculated for the given query:
<query> | blocks_count
This helps detecting the root cause of query performance slowdown in cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070
This improves performance for analytical queries, which do not need column headers metadata.
For example, the following query doesn't need column headers metadata, since _stream and min(_time)
are stored in block header, which is read separately from colum headers metadata:
_time:1w | stats by (_stream) min(_time) min_time
This commit significantly improves the performance for this query.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070
### Describe Your Changes
- Use common compose.yaml file for all victorialogs setups to set
version in a single place and override it on demand for each agent and
protocol
- Replaced multiple victorialogs instances in HA setup with single setup
with `deploy.replica` parameter set
- Added fluentd setup
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* `remoteWrite.maxQueueSize` from `100_000` to `1_000_000`, this should
improve resiliency of recording rules that produce many series;
* `remoteWrite.maxBatchSize` from `1_000` to `10_000`, this should be
more efficient to send from netwroking perspective;
* `remoteWrite.concurrency` from `1` to `4`, this should imrpove speed
of sending the generated series.
The new settings should improve remote write performance of vmalert with
default settings.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries.
This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU
has its own blockSearch instance.
This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch
cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.
The streamID.marshalString() is executed in hot path if the query selects _stream_id field.
Command to run the benchmark:
go test ./lib/logstorage/ -run=NONE -bench=BenchmarkStreamIDMarshalString -benchtime=5s
Results before the commit:
BenchmarkStreamIDMarshalString-16 438480714 14.04 ns/op 71.23 MB/s 0 B/op 0 allocs/op
Results after the commit:
BenchmarkStreamIDMarshalString-16 982459660 6.049 ns/op 165.30 MB/s 0 B/op 0 allocs/op
The change supposed to have more practical recommendations and reflect
the real processes for maintaining the project.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This test is very flaky and prevents other tests from running in CI.
Disabling this test should improve tests quality, since it isn't reliable anyway.
There is a ticket to fix this test - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7062
Once fixed, this test should be uncommented.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
New logos and usage guideline
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
By default, the `elasticsearchexporter` in OTel Collector puts the log
message under a field other than `_msg` (e.g., `Body`). Without
specifying via an HTTP header, those logs may not be queried correctly.
See also:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785.
This pull request updates the example configuration and notes for the
`elasticsearchexporter`.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Updated grafana plugins to the latest releases
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Rounding GOMAXPROCS to the upper interger value of cpuQuota increases chances of CPU starvation,
non-optimimal goroutine scheduling and additional CPU overhead related to context switching.
So it is better to round GOMAXPROCS to the lower integer value of cpuQuota.
It is expected that range_first and range_last functions return non-nan const value across all the points
if the original series contains at least a single non-NaN value. Previously this rule was violated for NaN data points
in the original series. This could confuse users.
While at it, add tests for series with NaN values across all the range_* and running_* functions, in order to maintain
consistent handling of NaN values across these functions.
* lib/license: add support of license key hot-reload
* docs: add info about license key hot reload
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The previous behaviour was incorrect, since it is unexpected that the -streamAggr.dedupInterval
and -remoteWrite.streamAggr.dedupInterval is applied to processed samples only if -streamAggr.config isn't set.
This is a follow-up for d523015f27
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6711
### Describe Your Changes
Currently it the metricID list is empty it won't be mashalled and as the
result won't be put into the tagFiltersToMetricIDsCache which causes the
cache misses for the corresponding tagFilters. In some setups this
causes severe search speed detradation (see #7009).
The empty metric IDs was covered before but then was accidentally
removed in 6c21439.
This PR restores the coverage of this case.
A new unit test can be used as a proof that empty metricID lists are not
added to the cache (just remove the fix in index_db.go and run the test
to see the result)
Also a benchmark has been added to see the implications of the
compression.
```
user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage
cpu: 13th Gen Intel(R) Core(TM) i7-1355U
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12 3237240 363.5 ns/op 0 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12 2831049 451.8 ns/op 0.4706 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12 1152764 1009 ns/op 1.667 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12 297055 3998 ns/op 5.755 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12 31172 34566 ns/op 8.484 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12 4900 289659 ns/op 9.416 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12 447 2341173 ns/op 9.456 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12 42 24926928 ns/op 9.468 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12 5 204098872 ns/op 9.467 compression-rate
PASS
ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 15.018s
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
The indexSearch.containsTimeRange() function is called for the current indexDB and the previous indexDB
every time when searching for metricIDs by label filters. This function consumes a lot of additional CPU time
for cases when queries with lightweight label filters are sent to VictoriaMetrics at high rate (e.g. thousands of RPS),
like in the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7009 .
Optimize indexSearch.containsTimeRange() function in the following ways:
- Unconditionally return true if this function is called for the current indexDB, since there are very high
chances that the current indexDB contains the data with timestamps in the requested time range.
- Cache the minimum timestamp, which is missing in the indexed data for the previous indexDB.
This is safe to do, since the previous indexDB is readonly.
This optimization eliminates potentially slow lookup in the previous indexDB for typical
use cases when the requested time range is close to the current time.
Previously indexDB.doExtDB() was returning boolean value, which was indicating whether f callback was called.
There is no need in returning this boolean value, since the f callback can determine on itself whether it was called.
This simplifies the code a bit.
While at it, document indexDB.doExtDB().
This fixes flaky test TestGetCommonTokensForOrFilters:
filter_or_test.go:143: unexpected tokens for field "_msg"; got ["foo" "bar"]; want ["bar" "foo"]
### Describe Your Changes
### Pull Request Description:
1. **HTML File Structure Optimization**: Adjusted the location of HTML
files for different builds to prevent redundant files in the final
output. See issue #6900
2. **Metadata Fixes**: Corrected metadata in HTML files for each build
configuration.
3. **Favicon Update**: Replaced PNG favicon (`14 KB` and `1.58 KB`) with
SVG (`1.35 KB`).
4. **Social Media Optimization**: Optimized the social preview image,
reducing its size by `60.2 KB`.
5. **Git Ignore Update**: Added `public/index.html` to `.gitignore` as
it is dynamically generated during the build process.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fix a typo; `filebeat.yml` -> `fluent-bit.conf`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Thomas Danielsson <thomas@elajt.se>
…g from 1e3 to 5e3
This should improve visibility on errors produced by very long queries.
The change is classified as BUG in order to port it to LTS releases.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Mathias Palmersheim <mathias@victoriametrics.com>
### Describe Your Changes
Added missing `{` in vmalert rule.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Improved VictoriaMetrics documentation for cloud
Related issue: https://github.com/VictoriaMetrics/cloud/issues/2143
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* Previously, only metricID->metricName missing index records were
tracked with deadline But it was possible a case for missing
metricID->TSID index records. IndexDB metrics fix exposed misleading
metric for such missing records.
* This commit adds check for metricID->TSID missing index records. And
delete missing metricID entry if it hit 60 second deadline.
Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6931
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Also tried to make it catch "Authorisation" in the future, fixed a lot
of other misspells along the way, but didn't make it catch
"Authorisation" anyway.
- Fix misspelled "Authorization" header name
- Fix misspelled "organization"
- Fix more misspells
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
…pair
`alert_relabel_configs` in [notifier
config](https://docs.victoriametrics.com/vmalert/#notifier-configuration-file)
can drop alert labels when used to filter different tenant alert message
to different notifier.
alertmanager would report error like `msg="Failed to validate alerts"
err="at least one label pair required"` in this case, but the rest of
the alerts inside one request would still be valid in alertmanager, so
it's not severe.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Previously the query could return incorrect results, since the query timestamp was updated with every Query.Clone() call
during iterative search for the time range with up to limit=N rows.
While at it, optimize queries, which find low number of matching logs, while spend a lot of CPU time for searching
across big number of logs. The optimization reduces the upper bound of the time range to search if the current time range
contains zero matching rows.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785
Previously tokens from AND filters were extracted in random order. This could slow down
checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters.
Preserve the original order of tokens when matching them against bloom filters,
so the user could control the performance of the query by putting the most specific AND filters
at the beginning of the query.
While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
Previously the following query could miss rows matching !bar if these rows do not contain foo:
foo OR !bar
This is because of incorrect detection of common tokens for OR filters - all the unsupported filters
were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned.
While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
This is needed for avoiding confusion between the `|` operator at `math` pipe and `|` pipe delimiter.
For example, the following query was parsed unexpectedly:
* | math foo / bar | fields x
as
* | math foo / (bar | fields) as x
Substituting `|` with `or` inside `math` pipe fixes this ambiguity.
### Describe Your Changes
Add storage metrics that count records that failed to insert:
- `RowsReceivedTotal`: the number of records that have been received by
the storage from the clients
- `RowsAddedTotal`: the number of records that have actually been
persisted. This value must be equal to `RowsReceivedTotal` if all the
records have been valid ones. But it will be smaller otherwise. The
values of the metrics below should provide the insight of why some
records hasn't been added
- `NaNValueRows`: the number of records whose value was `NaN`
- `StaleNaNValueRows`: the number of records whose value was `Stale NaN`
- `InvalidRawMetricNames`: the number of records whose raw metric name
has failed to unmarshal.
The following metrics existed before this PR and are listed here for
completeness:
- `TooSmallTimestampRows`: the number of records whose timestamp is
negative or is older than retention period
- `TooBigTimestampRows`: the number of records whose timestamp is too
far in the future.
- `HourlySeriesLimitRowsDropped`: the number of records that have not
been added because the hourly series limit has been exceeded.
- `DailySeriesLimitRowsDropped`: the number of records that have not
been added because the daily series limit has been exceeded.
---
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
* app/vmgateway: allow skipping Bearer prefix, parsing access as string
- allow disabling of "Bearer" prefix check - This is needed in order to support OIDC systems where identity token is provided separately from access token and it does not contain "Bearer" prefix(such as Azure Entra ID, ex AD).a
- support parsing "vm_access" claim as a string - This is helpful for systems where claims can only be mapped to string.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/changelog: mention vmgateway updates
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block.
This could be slow when the number of tokens is big or when the number of blocks to scan is big.
Pre-calculate hashes for bloom filters and then use them for searching in bloom filters.
This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.
Previous bugfix at 49f63b2 only partially fixed pagination host validation error.
Before this fix it was:
```
unexpected nextLink host \"management.azure.com\", expecting \"https://management.azure.com\"
```
Now we only check the `Host` without schema.
However, when Azure respond `nextLink` in `Host:Port` format, the
`nextLink` check will fail:
```
unexpected nextLink host \"management.azure.com:443\", expecting \"management.azure.com\"
```
This pull request further relaxes the checks by only checking the
`Hostname`.
---
related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6912
This patch reverts 1fd3385
After discussing it we've come to conclusion that this is a valid
behavior which can be avoided by deleting the time series only once the
corresponding stale NaNs have been received.
On the other hand, the fix leads to lost stale NaNs in some rare but
valid use cases. For example:
- In a cluster configuration the samples for a given time series are
normally sent to the same vmstorage replica. However, wminsert may
reroute the samples to another replica because the original one is down
or is overloaded. In this case the stale NaN may end up on a replica
that has no data for that time series, but we still want to record that
sample.
Thus, reverting that fix.
---
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
follow up
4ecc370acb
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`.
These tokens were used for checking against bloom filter for every data block, so the data block,
which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped.
This was incorrect, since such a block may contain logs matching the original OR filter.
The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter
for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters
do not contain common tokens, which could be used for checking against bloom filter.
While at it, add more tests covering various edge cases for filters delimited by AND and OR.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
### Describe Your Changes
Upgraded victoriametrics and victorialogs data source versions.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Commit adds the following changes:
* Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages
* json encoding is not supported for the following reasons:
- It brings a lot of fragile code, which works inefficiently.
- json encoding is impossible to use with language SDK.
* splits metrics and logs structures at lib/protoparser/opentelemetry/pb package.
* adds docs with examples for opentelemetry logs.
---
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839
Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
* updates change log
* adds VL-Debug http header
* updates doc
* extracts only the first value of http headers for VL-Stream-Fields and VL-Ignore-Fields.
It makes behaviour the same as Query string args. And allows to easily configure client applications.
Since most of the client collectors don't support multi value headers.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
* Many collectors don't support forwarding url query params to the remote system. It makes impossible to define stream fields for it. Workaround with proxy between VictoriaLogs and log shipper is too complicated solution.
* This commit adds the following changes:
* Adds fallback to to headers params, if query param is empty for:
_msg_field -> VL-Msg-Field
_stream_fields -> VL-Stream-Fields
_ignore_fields -> VL-Ignore-Fields
_time_field -> VL-Time-Field
* removes deprecations from victorialogs compose files, added more
output format examples for logstash, telegraf, fluent-bit
related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5310
Recent versions of `docker build` started generating the InvalidDefaultArgInFrom warning if Dockerfile contains
an ARG without default value. While this warning doesn't affect building Docker packages via `make package-*` commands,
it is better suppressing the warning, so it doesn't clutter `make package-*` output with the noise,
which can hide real issues in the future.
…specifying `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval` command-line flag
[The
documentation](https://docs.victoriametrics.com/stream-aggregation/)
contains conflicting descriptions regarding deduplication for
non-matched series when `-remoteWrite.streamAggr.config` and / or
`-streamAggr.config` are set:
1. Statement below says **all the received data** is deduplicated:
>[vmagent](https://docs.victoriametrics.com/vmagent/) supports
relabeling, deduplication and stream aggregation for all the received
data, scraped or pushed. Then, the collected data will be forwarded to
specified -remoteWrite.url destinations. The data processing order is
the following:
>1. all the received data is relabeled according to the specified
[-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling)
(if it is set)
>2. all the received data is deduplicated according to specified
[-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication)
(if it is set to duration bigger than 0)
2. Another statement says the deduplication is performed individually
for the **matching samples**
>The de-deduplication is performed after applying
[relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and
before performing the aggregation. If the -remoteWrite.streamAggr.config
and / or -streamAggr.config is set, then the de-duplication is performed
individually per each [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config)
for the matching samples after applying
[input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling).
Considering the following deduplication use cases:
1. To apply deduplication(globally or for specific remoteWrite
destination) for all the received data, scraped or pushed
--- using `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval`.
2. To deduplicate and aggregate metrics that match the rule `match`
filters
--- using `-remoteWrite.streamAggr.config` and specifiying
`dedup_interval` option in [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
3. To deduplicate all the received data while having `streamAggr.config`
for some metrics
--- no way for a single vmagent now, need to set up two level vmagents
This PR implements case3.
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Correct the spelling error of 'vminsert' in the dashboards.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
The 3 alerts for VMagent:
- `RejectedRemoteWriteDataBlocksAreDropped`
- `TooManyScrapeErrors`
- `TooManyWriteErrors`
missed the description annotation.
I moved the summary to description and added a generic summary to these
alerts.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Marco Maurer <marco.kilchhofer@gmail.com>
fix#6554
andfilter shouldn't return orfilter field which result in bloomfilter
return false.
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
The prev links like `/changelog_2021/`
stopped working after 9dc8d1debd
because these files now require specifying the parent `changelog` in the path, like `/changelog/changelog_2021/`.
This fix adds an alias for an old link.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This is a follow-up PR: Unit tests introduced in #6872 can now use
RowsAddedTotal counter whose scope was fixed in #6841.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
### Describe Your Changes
fsync() ensures that the data is written to disk. In production this is
needed for data durability. However, during the development, when the
unit tests are run, this level of durability is not needed. Therefore
fsync() can be disabled which will makes test runs two times faster.
The disabling is done by setting the `DISABLE_FSYNC_FOR_TESTING`
environment variable. The valid values for this variable are the same as
the values of the arg of `go doc strconv.ParseBool`:
```
1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False.
```
Any other value means `false`.
The variable is set for all test build targets. Compare running times:
Build Target | DISABLE_FSYNC_FOR_TESTING=0 | DISABLE_FSYNC_FOR_TESTING=1
----------------- | ------------------------------------------------ |
-------------------------------------------------
make test | 1m5s | 0m22s
make test-race | 3m1s | 1m42s
make test-pure | 1m7s | 0m20s
make test-full | 1m21s | 0m32s
make test-full-386 | 1m42s | 0m36s
When running tests for a given package, fsync can be disabled as
follows:
```shell
DISABLE_FSYNC_FOR_TESTING=1 go test ./lib/storage
```
Disabling fsync() is intended for testing purposes only and the name of
the variables reflects that.
What could also have been done but haven't:
- lib/filestream/filestream.go: `Writer.MustFlush()` also uses f.Sync()
but nothing has been done to it, because the Writer.MustFlush() is not
used anywhere in the VM codebase. A side question: what is the general
policy for the unused code?
- lib/filestream/filestream.go: Writer.Write() calls `adviceDontNeed()`
which calls unix.Fdatasync(). Disabling it could potentially improve
running time, but running tests with this code disabled has shown
otherwise.
### Checklist
The following checks are **mandatory**:
- [ x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
### Describe Your Changes
Add mentions of VictoriaMetrics Cloud to the documentation of vmalert
where this info is helpful to a user.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Describe steps to run VictoriaMetrics Single node or Cluster on
VictoriaMetrics Cloud
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This pull request fixes incorrect URLs in two places:
1. In the OTel guide, which has been corrected in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6880, but one
incorrect URL is still missing.
2. In the URL example, the cache reset endpoint for vmselect / Cluster
version is `/internal/resetRollupResultCache`, but it is mistakenly
noted as `/select/internal/resetRollupResultCache`, which misguides the
user. (introduced in #4468)
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously, some extIndexDB metrics were not registered. It resulted
into missing metrics, if metric value was added to the extIndexDB. It's
a usual case for search requests at both indexes.
Current commit updates all metrics from extIndexDB according to the
current IndexDB. It must fix such cases
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
The anchor to "Other fields" section should be #other-fields (instead of
#other-field)
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Cuong Le <cuongleqq@gmail.com>
`TL;DR` This PR improves the metric IDs search in IndexDB:
- Avoid seaching for metric IDs twice when `maxMetrics` limit is
exceeded
- Use correct error type for indicating that the `maxMetrics` limit is
exceded
- Simplify the logic of deciding between per-day and global index search
A unit test has been added to ensure that this refactoring does not
break anything.
---
Function calls before the fix:
```
idb.searchMetricIDs
|__ is.searchMetricIDs
|__ is.searchMetricIDsInternal
|__ is.updateMetricIDsForTagFilters
|__ is.tryUpdatingMetricIDsForDateRange
| |
|__ is.getMetricIDsForDateAndFilters
```
- `searchMetricIDsInternal` searches metric IDs for each filter set. It
maintains a metric ID set variable which is updated every time the
`updateMetricIDsForTagFilters` function is called. After each successful
call, the function checks the length of the updated metric ID set and if
it is greater than `maxMetrics`, the function returns `too many
timeseries` error.
- `updateMetricIDsForTagFilters` uses either per-day or global index to
search metric IDs for the given filter set. The decision of which index
to use is made is made within the `tryUpdatingMetricIDsForDateRange`
function and if it returns `fallback to global search` error then the
function uses global index by calling `getMetricIDsForDateAndFilters`
with zero date.
- `tryUpdatingMetricIDsForDateRange` first checks if the given time
range is larger than 40 days and if so returns `fallback to global
search` error. Otherwise it proceeds to searching for metric IDs within
that time range by calling `getMetricIDsForDateAndFilters` for each
date.
- `getMetricIDsForDateAndFilters` searches for metric IDs for the given
date and returns `fallback to global search` error if the number of
found metric IDs is greater than `maxMetrics`.
Problems with this solution:
1. The `fallback to global search` error returned by
`getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is
misleading.
2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search
and returns `fallback to global search` error (because
`getMetricIDsForDateAndFilters` returns it) then this will trigger
global search in `updateMetricIDsForTagFilters`. However the global
search uses the same maxMetrics value which means this search is
destined to fail too. I.e. the same search is performed twice and fails
twice.
3. `too many timeseries` error is already handled in
`searchMetricIDsInternal` and therefore handing this error in
`updateMetricIDsForTagFilters` is redundant
4. updateMetricIDsForTagFilters is a better place to make a decision on
whether to use per-day or global index.
Solution:
1. Use a dedicated error for `too many timeseries` case
2. Handle `too many timeseries` error in `searchMetricIDsInternal` only
3. Move the per-day or global search decision from
`tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and
remove `fallback to global search` error.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Once the timeseries is in tsidCache, new entries won't be created in
per-day index because the RegisterMetricNames() code does consider
different dates for the same timeseries. So this case has been added.
The same bug exists for AddRows() but it is not manifested because the
index entries are finally created in updatePerDateData().
RegisterMetricNames also updated to increase the newTimeseriesCreated
counter because it actually creates new time series in index.
A unit tests has been added that check all possible data patterns
(different metric names and dates) and code branches in both
RegisterMetricNames and AddRows. The total number of new unit tests is
around 100 which increaded the running time of storage tests by 50%.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
### Describe Your Changes
Reduced the scope of rowsAddedTotal variable from global to Storage.
This metric clearly belongs to a given Storage object as it counts the
number of records added by a given Storage instance.
Reducing the scope improves the incapsulation and allows to reset this
variable during the unit tests (i.e. every time a new Storage object is
created by a test, that object gets a new variable).
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
### Describe Your Changes
This is an attempt to document IndexDB. I guess I was trying to touch
the important points that might be of interest for the end users while
refraining from making it too detailed (such as I did not enumerate and
describe all the specific record types).
Please take a look and any suggestions are very welcome.
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
In the previous commit 8958cecad6
the default ports (80/443) were removed for both the `scrapeURL` and
`instance` label values for those targets without a port in
`__address__`. Different values in the `instance` label generate new
time series.
This commit reverts the changes made to the `instance` label. Now,
for those targets:
- `scrapeURL` will remain unchanged.
- The `instance` label value will include the default port.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792
### Describe Your Changes
release notes for patches 1.15.6 - 9
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
add command-line flag `-search.inmemoryBufSizeBytes` for configuring size of in-memory buffers used by vmselect during processing of vmstorage responses. A new summary metric `vm_tmp_blocks_inmemory_file_size_bytes` is exposed to show the size of the buffer during requests processing.
The new setting can be used by experienced users to adjust memory usage by vmselect when processing
many small read requests. Instead of allocating 4MB buffers each time, vmselect can be instructed to lower
the buffer size via `-search.inmemoryBufSizeBytes`. To make the decision whether this flag needs to be adjusted
users can consult with `vm_tmp_blocks_inmemory_file_size_bytes` which shows the actual size of buffers used
during query processing.
----------
The detailed information of this PR can be found in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6851
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit cab3ef8294)
## Describe Your Changes
Add RemoteWrite Retry Controls
This PR introduces two new flags to the remote write functionality:
- remoteWrite.retryMinInterval
- remoteWrite.retryMaxTime
These flags provide finer control over the retry behavior for
remoteWrite operations, allowing users to customize the minimum interval
between retries and the maximum duration for retry attempts.
Fixes#5486.
## Checklist
- [x] The following checks are mandatory:
My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Yury Akudovich <ya@matterlabs.dev>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Adds a link to the API example section that describes how to delete
metrics on VM Single.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This change is made in attempt to reduce memory usage by vmalert when
parsing big instant responses from VM/Prometheus.
In
a5c427bac4
vmalert switched from std json lib to fastjson lib in order to reduce
amount of allocations, as according to highloaded profiles of vmalert
the CPU is mostly spent on GC.
But switching to fastjson resulted into excessive memory usage for cases
when vmalert has to parse long json lines, which usually happens when
instant response contains many `metric` objects.
In this change we do a mixed parsing:
1. Slice of `metric` objects is parsed with std lib to keep mem low
2. Each `metric` object is parsed with fastjson to reduce allocs
The benchmark results are the following:
```
pkg: github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource
BenchmarkParsePrometheusResponse/Instant_std+fastjson-10 1760 668959 ns/op 280147 B/op 5781 allocs/op
MBs allocated at heap: 493.078392
mallocs: 18655472
BenchmarkParsePrometheusResponse/Instant_fastjson-10 6109 198258 ns/op 172839 B/op 5548 allocs/op
MBs allocated at heap: 1056.384464
mallocs: 34457184
BenchmarkParsePrometheusResponse/Instant_std-10 1287 950987 ns/op 451677 B/op 9619 allocs/op
MBs allocated at heap: 580.802976
mallocs: 13351636
```
The benchmark function code with mem measurement is available here
https://gist.github.com/hagen1778/b9c3ca7f8ca7d6b21aec9777112c5810
The benchmark contains 3 results:
1. Instant_std+fastjson is the implementation in this change
2. Instant_fastjson-10 is the implementation from
a5c427bac4
3. BenchmarkParsePrometheusResponse/Instant_std-10 is implementation
before
a5c427bac4
According to these results, this new implementation is slower than
previous, but faster than before switching to fastjson. It also has
lower number of allocations and roughly the same memory allocation on
heap with GC turned off.
---------
Other changes:
1. rm BenchmarkMetrics as it doesn't measure anything
2. simplify BenchmarkParsePrometheusResponse into
BenchmarkPromInstantUnmarshal
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Moving key-concepts-related docs to a separate dir should make it easier
to navigate in `docs/` folder and helps to avoid adding prefixes to
image assets.
-----------
The change shouldn't have any visual changes or changes to the links.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Moving changelog-related docs to a separate dir should make it easier to
navigate in `docs/` folder.
-----------
The change shouldn't have any visual changes or changes to the links.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- fix TS lint
- anomaly: remove /vmui
- anomaly: minor inspections fix
- docs: fix broken links to headings
### Describe Your Changes
Initially vmanomaly opened with `/vmui` in serverUrl, remove it.
* Adds custom dial func for HTTP-Connect and socks5 proxy tunnels.
Standard golang http.transport exposes GetProxyConnectHeader function,
but it doesn't allow to use separate tls config for proxy.
It also not possible to enforce HTTP-Connect with standard http lib.
* For http scrape targets, by default http.Transport.Proxy function must
be used. Since it has special case with full uri forward.
* Adds proxy.URL json methods that allow to properly copy internal
fields, like User/Password.
It should fix bug with proxy_url. When credentials specified at URL was
ignored.
* Adds tests for scrape client proxy requests
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771
* It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed.
* An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy.
It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792
Co-authored-by: Nikolay <nik@victoriametrics.com>
to allow configuring additional headers in each request to the
corresponding notifier.
Other flags like `-datasource.headers`, `-remoteWrite.headers` already
use `^^` as delimiter, it's consistent to use it in `-notifier.headers`
as well.
related https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3260
vmalert can integrate with alertmanager that supports multi-tenant by
adding tenantID header`X-Scope-OrgID` in requests.
In multitenancy, vmalert can also filter alerts which send to different
notifier addresses(or with different header settings) using
`alert_relabel_configs`.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3260
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
docs: vmanomaly - v1.15.5 patch notes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Updated model list in Anomaly Detection Overview
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
release notes for 1.15.4 patch
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- change links from relative to absolute under Anomaly Detection section
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
changelog updates to v1.15.3 patch of `vmanomaly`
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
small update to `data_range` parameter in uppermost config conversion
example
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
update vmanomaly docs to forthcoming release v1.15.2
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Production workload shows that it's useful optimisation.
Channel based objects pool allows to handle irregural data ingestion
requests and make memory allocations more smooth.
It's improves sync.Pool efficiency, since objects from sync.Pool removed
after 2 GC cycles. With GOGC=30 value, GC runs significantly more often.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6733
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f255800da3)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
# Conflicts:
# app/vminsert/common/insert_ctx_pool.go
Describe Your Changes
When I use usePromCompatibleNaming with vmagent to process data that
needs to be formatted from different sources such as InfluxDB, I find
that it doesn’t work
However, it works in vminsert. I found that vminsert uses the
HasRelabeling method to determine whether to relabel.
```go
func HasRelabeling() bool {
pcs := pcsGlobal.Load()
return pcs.Len() > 0 || *usePromCompatibleNaming
}
```
in vmagent, the decision to relabel is determined only by
pcsGlobal.Len() > 0. However, in the applyRelabeling method, the
usePromCompatibleNaming logic is also used to determine whether to
relabel in the error handling.
```go
func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, pcs *promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries {
if pcs.Len() == 0 && !*usePromCompatibleNaming {
// Nothing to change.
return tss
}
```
So I think that the logic for determining whether to relabel in vmagent
is not as expected.
Checklist
The following checks are mandatory:
[✅]My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
…eep_metric_names` options in stream aggregation config together
With aggregated data and raw data under the same metric, results would
be confusing.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
typos fix & clarity improvement of vmanomaly docs after v1.15.1 release
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Updated user management guide with new cloud content
This PR should be merged after the cloud PR
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Before, buffer growth was always x2 of its size, which could lead to
excessive memory usage when processing big amount of data.
For example, scraping a target with hundreds of MBs in response could
result into hih memory spikes in vmagent because buffer has to double
its size to fit the response. See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759
The change smoothes out the growth rate, trading higher allocation rate
for lower mem usage at certain conditions.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
* `sort` param is unused by the current website engine, and was present only for compatibility
with previous website engine. It is time to remove it as it makes no effect
* re-structure guides content into folders to simplify assets management
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fixing remaining typos and missing words after v.1.15.0 updates
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- updated docs on `vmanomaly` with v1.15.0
- additional chapters of FAQ and model pages
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
VM has different responses to equivalent queries for MetricsQL and
GraphiteQL in case of failed access to one of vmstorage node of the
cluster vmstorage nodes. For GraphiteQL, the denyPartialResponse feature
is not used, it is always true, which is not always correct (depending
on the configuration).
In the PR I have removed the hardcoded denyPartialResponse for
GraphiteQL, just like MetricsQL does.
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
(cherry picked from commit 79008b712f)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark.
This benchmark was testing aggregate state flush performance by keeping the same state across flushes.
The benhmark didn't reflect the performance and scalability of stream aggregation in production,
while it led to non-trivial code changes related to resetState arg handling.
So let's drop the benchmark together with all the code related to resetState handling,
in order to simplify the code at lib/streamaggr a bit.
Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
Prevsiously every aggregation output was using its own timestamp for the output aggregated samples
in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output
aggregated samples.
This commit consistently uses the same timestamp across all the output aggregated samples.
This commit makes sure that the duration between subsequent timestamps strictly equals
the configured aggregation interval.
Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
Make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type so it could
accept multiple values which can be applied to the corresponding`-remoteWrite.url`.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Fix `-streamAggr.dropInputLabels` behavior when global deduplication is enabled without `-streamAggr.config`.
Previously, `-remoteWrite.streamAggr.dropInputLabels` is misapplied.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
By introducing this feature, users will have the ability to customize
the sampleLimit parameter on a per-target basis, providing more
flexibility and control over the job execution behavior.
The error check was needed before a84491324d
It was kept by mistake and makes no sense to have rn.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
fixed yaml header in a guide doc, that causes hugo build error
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- Adds support for displaying the top 5 log streams in the hits graph,
grouping the remaining streams into an "other" label.
#6545
- Adds options to customize the graph display with bar, line, stepped
line, and points views.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
The changes are based on SEO report and supposed to improve
ranking and indexation by search engines by using prompt and unique titles
and by updating unreachable links.
It also updates links to have a simplified form and replaces relative links with absolute links
according to https://docs.victoriametrics.com/#documentation
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Adds Prometheus Grafana Alloy and vmagent to the data ingestion
protocols. Grafana Agent was not added since it has been deprecated in
favor of alloy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
https://victoriametrics.slack.com/archives/C05UNTPAEDN/p1722833182319299
Sometimes users may use the wrong (lower) version of Grafana when
setting up the VictoriaLogs datasource.
It would be good to document the requirements of Grafana versions to use
VictoriaMetrics/VictoriaLogs datasource.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fix a typo in docs.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Added `--vm-backoff-retries`, `--vm-backoff-factor`,
`--vm-backoff-min-duration` and `--vm-native-backoff-retries`,
`--vm-native-backoff-factor`, `--vm-native-backoff-min-duration`
command-line flags to the `vmctl` app. Those changes will help to
configure the retry backoff policy for different situations.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6622
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* add a toggle button to the "Group" tab that allows users to expand or collapse all groups at once
* introduce the ability to select a key for grouping logs within the "Group" tab
* display the number of entries within each log group.
* move the Markdown toggle to the general settings panel in the upper left corner.
Changes to .ts files in vmui or vmui for logs require re-building
static files that will be included into compiled binary afterwards.
We don't update static files on each .ts change PR because it results
in too many changes and complicates review.
So we need to update these static files before the actual release.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Fixes panic if incorrect metricsql expression passed to the prettifier API.
Prettify function had misleading panic for duration expression formatting. It expected all WITH templates to be already parsed.
But WITH expression expand was removed.
Bug was introduced at e712a49898
and present at v1.98.0+ releases
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6736
Signed-off-by: f41gh7 <nik@victoriametrics.com>
* add Logo guidelines to GitHub Readme, as it may have higher chances to be viewed by users
* rm Logo guidelines from cluster version docs, as it makes no sense anymore for this page
* rm `picutre` tag from cluster version docs, as it is not used by GitHub anymore
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
- replace docs in root README with a link to official documentation
- remove old make commands for documentation
- remove redundant "VictoriaMetrics" from document titles
- merge changelog docs into a section
- rm content of Single-server-VictoriaMetrics.md as it can be included from docs/README
- add basic information to README in the root folder, so it will be useful for github users
- rm `picture` tag from docs/README as it was needed for github only, we don't display VM logo at docs.victoriametrics.com
- update `## documentation` section in docs/README to reflect the changes
- rename DD pictures, as they now belong to docs/README
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
The new panel will show the 99th quantile of scrape duration in seconds.
This should help identifying vmagent instances that experiences too high scraping durations.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
These changes were made as part of the title transition from Managed
VictoriaMetrics to VictoriaMetrics Cloud
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
After breadcrumb was added to docs.victoriametrics.com there's no need
to specify parent page name in a title
<img width="1437" alt="Screenshot 2024-07-27 at 10 20 09"
src="https://github.com/user-attachments/assets/733f41f4-a727-4f52-a7c0-6019edf1b803">
Also added vmdocs to gitignore to avoid committing it
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This reverts commit e280d90e9a.
Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case
when rows contain timestamps belonging to ptws[0].
The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0],
but this partition is automatically moved to ptws[0] by the code at lines
6aad1d43e9/lib/storage/table.go (L287-L298) ,
so the next time the typical case will work.
Also the updated code makes the code harder to follow, since it introduces an additional level of indirection
with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function.
This function needs to be inspected and understood when reading the code at table.MustAddRows().
This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized
many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding
the code even more.
The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp()
for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows()
function multiple times. This makes the use of partition.HasTimestamp() call more consistent,
easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition()
calls.
Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md,
since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering.
The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users.
See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6677
Relative links in docs are much harder to maintain in consistent state comparing to absolute links:
- It is non-trivial to figure out the proper relative link path when creating and editing docs.
- Relative links break after moving the doc files to another paths, and it is non-trivial
to figure which links are broken after that.
See also f357ee57ef , d5809f8e12 and 8cb1822b94
- Document `make docs-debug` command at https://docs.victoriametrics.com/#documentation
- Remove unneeded ROOTDIR, REPODIR and WORKDIR env vars from docs/Makefile ,
since it is documented and expected that all the Makefile commands are run from the repository root.
- Use `docker --rm` for running Docker container with local docs server, so it is automatically
removed after pressing `Ctrl+C`. This makes the container cleanup automatic.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6677
### Describe Your Changes
The original logic is not only highly complex but also poorly readable,
so it can be modified to increase readability and reduce time
complexity.
---------
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Replaced global http links in docs with relative markdown ones
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Use relative markdown references, removed `{{< ref >}}` shortcodes
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- moved files from root to VictoriaMetrics folder to be able to mount
operator docs and VictoriaMetrics docs independently
- added ability to run website locally
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Change response code to 502 to align it with behaviour of other existing
reverse proxies. Currently, the following reverse proxies will return
502 in case an upstream is not available: nginx, traefik, caddy, apache.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Improve documentation to help:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6670
Currently, the documentation of Kafka Integration did not describe:
- How to switch between different remote write protocols in producer
side and consumer side.
### Describe Your Changes
Add a note to clarify usage of azure credentials when multiple
credentials are available.
The user is required to specify AZURE_CLIENT_ID as otherwise Azure API
will return an error: "Multiple user assigned identities exist, please
specify the clientId / resourceId of the identity in the token request"
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The /some_path/.+ regexp matches /some_path/ followed by at least a single char.
This is unexpected by most users, since they expect it should match /some_path/.
Substitute .+ with .*, so this regexp matches /some_path/ .
There is no need in enumerating all the files, which must be updated, since these files
may be moved to other locations. It is enough to mention that versions for all the VictoriaMetrics
components must be updated.
### Describe Your Changes
Add patch note v1.13.3 to CHANGELOG doc page for `vmanomaly`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- Mention that credentials can be configured via env variables at both vmbackup and vmrestore docs.
- Make clear that the AZURE_STORAGE_DOMAIN env var is optional at https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables
- Use string literals as is for env variable names instead of indirecting them via string constants.
This makes easier to read and understand the code. These environment variable names aren't going to change
in the future, so there is no sense in hiding them under string constants with some other names.
- Refer to https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables in error messages
when auth creds are improperly configured. This should simplify figuring out how to fix the error.
- Simplify the code a bit at FS.newClient(), so it is easier to follow it now.
While at it, remove the check when superflouos environment variables are set, since it is too fragile
and it looks like it doesn't help properly configuring vmbackup / vmrestore.
- Remove envLookuper indirection - just use 'func(name string) (string, bool)' type inline.
This simplifies code reading and understanding.
- Split TestFSInit() into TestFSInit_Failure() and TestFSInit_Success(). This simplifies the test code,
so it should be easier to maintain in the future.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6518
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984
The %q formatter may result in incorrectly formatted JSON string if the original string
contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string
cannot be parsed by JSON parsers.
This is a follow-up for c0caa69939
See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
- Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md
- Do not sanitize tag values - only metric names and tag names must be sanitized,
since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values.
- Properly replace more than two consecutive dots with a single dot.
- Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana
do not support them.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077
### Describe Your Changes
`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.
This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.
Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
The purpose of docs/CHANGELOG.md is to provide VictoriaMetrics users clear and concise information
on what's changed at VictoriaMetrics components. Technical details of the change are unclear
to most of VictoriaMetrics users, who are not familiar with VictoriaMetrics source code.
These details complicate reading the docs/CHANGELOG.md by ordinary users, so do not clutter
the changelog with technical details. If the user wants technical details, he can click
the link to the related GitHub issue and/or pull request and dive into all the details he wants.
- Rename overrideHostHeader() function to hasEmptyHostHeader()
- Rename overrideHostHeader field at UserInfo to useBackendHostHeader
This should simplify the future maintenance of the code
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6525
This reverts commit 4d66e042e3.
Reasons for revert:
- The commit makes unrelated invalid changes to docs/CHANGELOG.md
- The changes at app/vmauth/main.go are too complex. It is better splitting them into two parts:
- pooling readTrackingBody struct for reducing pressure on GC
- avoiding to use readTrackingBody when -maxRequestBodySizeToRetry command-line flag is set to 0
Let's make this in the follow-up commits!
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6445
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6533
- Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API,
since it looks like IMDBSv2 API isn't supported by Yandex Cloud
according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token :
> So far, Yandex Cloud does not support version 2, so it is strongly recommended
> to technically disable getting a service account token via the Amazon EC2 metadata service.
- Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1.
This should prevent from auth errors for instances with disabled GCE-like auth API.
This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884
- Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue.
P.S. This change wasn't tested in prod because I have no access to Yandex Cloud.
It is recommended to test this change by @ITD27M01 and @vmazgo , who filed
the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524
This reverts commit 6b128da811.
Reason for revert: this complicates and slows down CI/CD without giving significant benefits in return.
The idea of automatic building, publishing and deploying Docker images to our playground on every pull request
and commit isn't very bright because of the following reasons:
- It slows down CI/CD pipeline
- It increases costs on CPU time spent at CI/CD pipeline
- It contradicts goal #7 at https://docs.victoriametrics.com/goals/#goals and non-goal #8 at https://docs.victoriametrics.com/goals/#non-goals
The previous workflow was much better - if we need to deploy some new Docker image at playground or staging environment,
then just __manually__ build and deploy the needed Docker image there. If the manual process requires making too many
steps, then think on how to automate these steps into a single Makefile command.
Updates https://github.com/VictoriaMetrics/ops/pull/1297
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6515
- Move the test for SRV discovery into a separate function. This allows verifying round-robin discovery across SRV records.
- Restore the original netutil.Resolver after the test finishes, so it doesn't interfere with other tests.
- Move the description of the bugfix into the correct place at docs/CHANGELOG.md - it should be placed under v1.102.0-rc2
instead of v1.102.0-rc1.
- Remove unneeded code in URLPrefix.sanitizeAndInitialize(), since it is expected this function is called only once
for finishing URLPrefix initializiation. In this case URLPrefix.nextDiscoveryDeadline and URLPrefix.n are equal to 0
according to https://pkg.go.dev/sync/atomic#Uint64
- Properly fix the bug at URLPrefix.discoverBackendAddrsIfNeeded() - it is expected that hostToAddrs map uses
the original hostname keys, including 'srv+' prefix, so it shouldn't be removed when looping over up.busOriginal.
Instead, the 'srv+' prefix must be removed from the hostname only locally before passing the hostname to netutil.Resolver.LookupSRV.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6401
- Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call
- NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil
- Simplify the implementation of NewStatDialFunc by removing sync.Map from there.
- Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils
- Use gauge instead of counter type for *_conns metric
This is a follow-up for d7b5062917
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
This improves code maintainability a bit.
- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
This prevents from potential resource leaks.
- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.
- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.
- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
This label should simplify investigation of the exposed metrics.
- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
it is hard to use these labels in query filters and aggregation functions.
- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
This metric shows the number of samples generated by the corresponding streaming aggregation rule.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
which could modify the behaviour of the constructed streaming aggregator.
Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.
- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
This also allows passing nil instead of Options when default options are enough.
- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
so they have consistent set of labels comparing to the rest of streaming aggregation metrics.
- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
`aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
`output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
metrics. This is a follow-up for the commit 2eb1bc4f81 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604
- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
in the commit 2eb1bc4f81 .
- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
### Describe Your Changes
initial docs to implement #6618 more platforms can be added on this
branch or on future commits.
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Doc updates after v1.13.2 release of `vmanomaly`
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
…ion about Grafana Datasource in quering
### Describe Your Changes
Update Roadmap and Querying documentation for VictoriaLogs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval,
so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk)
to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval
for flushing pending rows and items from memory to disk.
This is a follow-up for 4c80b17027
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221
- Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs.
This should simplify future maintenance of the corresponding code and docs.
- Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs.
- Make more clear the docs for `rate_sum()` and `rate_avg()` outputs.
- Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related
to incorrect suffix passing to newRateAggrState().
- Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates
counter increase in the current aggregation window.
- Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample.
This makes more clear the code logic.
- Move the code for removing outdated entries at rateAggrState into removeOldEntries() function.
This make the code logic inside rateAggrState.flushState() more clear.
- Do not write output sample with zero value if there are no input series, which could be used
for calculating the rate, e.g. if only a single sample is registered for every input series.
- Do not take into account input series with a single registered sample when calculating rate_avg(),
since this leads to incorrect results.
- Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar.
This shuld simplify future mantenance.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243
The old link was changed globally to the new link in the commit f4b1cbfef0 .
Unfortunately, old links are still posted in new commits :(
This is a follow-up for 680b8c25c8 .
While at it, remove duplicate 'len(*remoteWriteURLs) > 0' check in the remotewrite.Init() functions,
since this check is already made at the beginning of the function.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6253
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
systems are configured with the disabled on-disk queue, every in-memory queue is full
and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
relabeling and other processing in this case.
- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
and on-disk queue is disabled.
The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
processes pending data, so it drops aggregated samples in this case.
- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
so it is clear that this flag can be set individually per each -remoteWrite.url.
- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
are configured with the disabled on-disk queue, then there is no sense in keeping samples
on some of these systems, while dropping samples on the remaining systems, since this
will result in global stall on the remote storage system with the disabled on-disk queue
and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
This allows proceeding with newly scraped / pushed samples by sending them to the remaining
remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.
- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.
- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
This makes test code more clear and reduces the number of code lines by 500.
This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e
While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.
This is a follow-up for a9525da8a4
This simplifies debugging tests and makes the test code more clear and concise.
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e
While at is, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.
This is a follow-up for a9525da8a4
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Consistently using t.Fatal* simplifies the test code and makes it less fragile, since it is common error
to forget to make proper cleanup after t.Error* call. Also t.Error* calls do not provide any practical
benefits when some tests fail. They just clutter test output with additional noise information,
which do not help in fixing failing tests most of the time.
While at it, improve errors generated at app/victoria-metrics tests, so they contain more useful information
when debugging failed tests.
This is a follow-up for a9525da8a4
### Describe Your Changes
Implement spellcheck command:
- add cspell configuration files
- dockerize spellchecking process
- add Makefile targets
This PR adds a standalone `make spellcheck` target to check `docs/*.md` files for spelling
errors. The target process is dockerized to be run in a separate npm environment.
Some `docs/` typo fixes also included.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Arkadii Yakovets <ark@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Replace VM- with VictoriaMetrics in QuickStart
Keep the previous anchors for backward compatibility
### Checklist
The following checks are **mandatory**:
- [ x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
### Describe Your Changes
In most cases histograms are exposed in sorted manner with lower buckets
being first. This means that during scraping buckets with lower bounds
have higher chance of being updated earlier than upper ones.
Previously, values were propagated from upper to lower bounds, which
means that in most cases that would produce results higher than expected
once all buckets will become updated.
Propagating from upper bound effectively limits highest value of
histogram to the value of previous scrape. Once the data will become
consistent in the subsequent evaluation this causes spikes in the
result.
Changing propagation to be from lower to higher buckets reduces value
spikes in most cases due to nature of the original inconsistency.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
An example histogram with previous(red) and updated(blue) versions:

This also makes logic of filling nan values with lower buckets values: [1 2 3 nan nan nan] => [1 2 3 3 3 3] obsolete.
Since buckets are now fixed from lower ones to upper this happens in the main loop, so there is no need in a second one.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
vmbackup, vmrestore and vmbackupmanager use the same libs
for integrations with object storage. That means the auth can be configured
in the same way for all of them. So the docs should have either identical
config section for all 3 components, or we should cross-link to one source of truth.
This change removes incomplete auth options from vmrestore docs and adds link
to complete auth options in vmbackup instead.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
These changes support using Azure Managed Identity for the `vmbackup`
utility. It adds two new environment variables:
* `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to
build a connection using the [Azure Default
Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential)
mode. This causes the Azure SDK to check for a variety of environment
variables to try and make a connection. By default, it tries to use
managed identity if that is set up.
This will close
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Testing
However you normally test the `vmbackup` utility using Azure Blob should
continue to work without any changes. The set up for that is environment
specific and not listed out here.
Once regression testing has been done you can set up [Azure Managed
Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview)
so your resource (AKS, VM, etc), can use that credential method. Once it
is set up, update your environment variables according to the updated
documentation.
I added unit tests to the `FS.Init` function, then made my changes, then
updated the unit tests to capture the new branches.
I tested this in our environment, but with SAS token auth and managed
identity and it works as expected.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Justin Rush <jarush@epic.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e
While at it, consistently use t.Fatal* instead of t.Error*, since t.Error* usually leads
to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.
* restore old anchor names to keep links compatibility.
See https://docs.victoriametrics.com/#documentation requirements
* consistently use the same format for commands `sh` as it makes it better
renderred and automatically adds `copy` button to fileds with commands
* simplify the text by removing extra points in the list
* add recommendations for installing the cluster setup
* explicitly mention the ports services are listening on
* add description for `storageNode` cmd-line flag to inform the reader what
values need to be put into it
* fix the incorrect vmui link in cluster installation recommendation
* rename component anchors to be more unique, because URL doesn't respect
hierarchy for the anchored links and may result into conflicts in future
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Updated Quickstart guide for VIctoriaMetrics and VictoriaMetrics Cluster to include instructions for installing the binaries by hand
### Checklist
The following checks are **mandatory**:
- [ x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fixed Custom Model guide according to newer `vmanomaly` versions
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
docs: fix typos
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
This PR is aimed to change the currently in place configuration of
running Go related jobs for code changes that don't contain actual Go
files ([example
1](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6517/checks)
- 2m32s , [example
2](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6543/checks)
- 4m11s).
In order to do that the `build` workflow was extracted from Go related
workflow (now it doesn't require lint as a `need` step -- let me know if
it's something we want to keep). It will run upon the same triggers as
before the change.
The `main` workflow now will be triggered by `**.go` pattern only and
contains lint/test steps that are relevant for Go file changes.
I expect this PR +
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6540 to improve
CI minutes usage.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Arkadii Yakovets <ark@victoriametrics.com>
Reason for revert:
There are many statsd servers exist:
- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics
These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).
Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.
The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).
Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.
P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:
- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)
It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.
In the mean time it is recommended continue using external statsd server.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
This reverts commit 5a3abfa041.
Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .
Issues with Prometheus exemplars:
- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage
- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .
- It looks like exemplars are supported for Histogram metric types only -
see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
Exemplars aren't supported for Counter, Gauge and Summary metric types.
- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
contains histogram_quantile() function. It queries exemplars via special Prometheus API -
https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.
- Exemplars are usually connected to traces. While traces are good for some
I doubt exemplars will become production-ready in the near future because of the issues outlined above.
Alternative to exemplars:
Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.
- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
so it could be used in tests for building []prompbmarshal.TimeSeries from string.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
Move the code responsible for relabelCtx clearing into deferred function.
This allows making more clear the remoteWriteCtx.TryPush code.
This is a follow-up for 879771808b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
While at it, clarify the description of the bugfix at docs/CHANGELOG.md
Use metricsql.IsLikelyInvalid() function for determining whether the given query is likely invalid,
e.g. there is high change the query is incorrectly written, so it will return unexpected results.
The query is invalid most of the time if it passes something other than series selector into rollup function.
For example:
- rate(sum(foo))
- rate(foo + bar)
- rate(foo > bar)
Improtant note: the query is considered valid if it misses the lookbehind window in square brackes inside rollup function,
e.g. rate(foo), since this is very convenient MetricsQL extention to PromQL, and this query returns the expected results
most of the time.
Other unsafe query types can be added in the future into metricsql.IsLikelyInvalid().
TODO: probably, the -search.disableImplicitConversion command-line flag must be set by default in the future releases of VictoriaMetrics.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6180
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6450
- Clarify docs for `Ignore aggregation intervals on start` feature.
- Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions.
It is better from readability and maintainability PoV using distinct a.flush() calls
for distinct cases instead of merging them into a single a.flush() call.
- Take into account the first incomplete interval when tracking the number of skipped aggregation intervals,
since this behaviour is easier to understand by the end users.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137
Previously the in-memory buffer could remain unflushed for long periods of time under low ingestion rate.
The ingested logs weren't visible for search during this time.
### Describe Your Changes
Removed snap packages support as it requires time for maintenance and
it's not popular at all
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
- rwctx - global or number starting from 1
- aggrid - aggregator id starting from 1
- aggrSuffix - <interval>_(by|without)_label1_label2_labeln
e.g: `name="global:1:1m_without_instance_pod"`
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
The set of log fields in the found logs may differ from the set of log fields present in the log stream.
So compare only the log fields in the found logs when searching for the matching log entry in the log stream.
While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI
for grouping logs by log streams.
Fix typo
### Describe Your Changes
Fix typo in the vmalert docs. In the docs it states rules at `VMAgent`'s
namespace when it should be `VMAlert`'s namespace.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Small improvements to a QuickStart guide of `vmanomaly`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fixes#6453
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fix Date metricid cache consistency under concurrent use.
When one goroutine calls Has() and does not find the cache entry in the
immutable map it will acquire a lock and check the mutable map. And it
is possible that before that lock is acquired, the entry is moved from
the mutable map to the immutable map by another goroutine causing a
cache miss.
The fix is to check the immutable map again once the lock is acquired.
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
### Describe Your Changes
- Fixed the update of the relative time range when `Execute Query` is
clicked
- Optimized server requests: now, if an error occurs in the `/query`
request, the `/hits` request will not be executed.
#6345 (duplicates: #6440, #6312)
Updating documentation around the opentelemetry endpoint for metrics and
the "How to use OpenTelemetry metrics with VictoriaMetrics" guide so
that it shows not only how to directly write but also how to write to
the otel collector and view metrics in vmui.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
- Fixed config example in QuickStart vmanomaly docs for 1.13 version
compatibility
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
The `make clean-checkers` command may be useful when the locally installed checker apps are too old,
so they must be updated to the newly requested versions by `make check-all`. In this case the fix is to run
make clean-checkers check-all
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.
While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
rm custom scripts for downloading Grafana plugins for
VictoriaMetrics and VictoriaLogs. Use `GF_INSTALL_PLUGINS` instead.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This reverts commit 6e395048d3.
Reason for revert: the previous logic was correct.
The purpose of `-search.maxSamplesPerQuery` command-line flag is to limit the amounts of CPU resources,
which could be taken by a single query - see https://docs.victoriametrics.com/#resource-usage-limits .
VictoriaMetrics processes samples in blocks during querying - it reads the block, then unpacks it,
then filters out samples outside the selected time range. This means that it _spends CPU time_
on reading and unpacking of _all the samples_ in every block on the requested time range,
even if only a single sample per each block matches the given time range.
The previous logic was effectively limiting CPU time a single query could take.
The new logic fails limiting CPU time a single query could take in some pathological cases
when only a small fraction of samples per each requested block fit the requested time range.
This allows performing multiplication DoS-attacks by querying very narrow time ranges over historical blocks,
which tend to be full. For example, if the `-search.maxSamplesPerQuery` equals to a billion,
and the query requests a single sample out of 8K samples per each block, this means that the query
may unpack a billion of such blocks without exceeding the limit, e.g. it may unpack and process 8K*1e9=8e12 samples.
This is not what the resource usage limits were created for originally - see https://docs.victoriametrics.com/#resource-usage-limits
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5851
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6464
This reverts commit 92b22581e6.
Reason for revert: too complex and slow approach for spellchecking task.
This approach may significantly slow down development pace. It also may take non-trivial
amounts of additional time and resources at CI/CD because of all this npm shit at cspell directory.
Note to @arkid15r : the idea with the ability to run spellchecker on all the VictoriaMetrics
codebase is great. But this shouldn't be mandatory pre-commit check. It is enough to have
a Makefile rule like `make spellcheck`, which could be run manually whenever spellcheck is needed
(e.g. once per month or once per quarter).
Reason for revert: this commit doesn't resolve real security issues,
while it complicates the resulting code in subtle ways (aka security circus).
Comparison of two strings (passwords, auth keys) takes a few nanoseconds.
This comparison is performed in non-trivial http handler, which takes thousands
of nanoseconds, and the request handler timing is non-deterministic because of Go runtime,
Go GC and other concurrently executed goroutines. The request handler timing is even
more non-deterministic when the application is executed in shared environments
such as Kubernetes, where many other applications may run on the same host and use
shared resources of this host (CPU, RAM bandwidth, network bandwidth).
Additionally, it is expected that the passwords and auth keys are passed via TLS-encrypted connections.
Establishing TLS connections takes additional non-trivial time (millions of nanoseconds),
which depends on many factors such as network latency, network congestion, etc.
This makes impossible to conduct timing attack on passwords and auth keys in VictoriaMetrics components.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6423/files
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392
mention change for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6457
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
With the recent release of Managed VictoriaMetrics users are able to
create and execute Alerting & Recording rules and send notifications via
hosted Alertmanager.
So, we're publishing Alertmanager configuration docs for Managed
VictoriaMetrics.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
* rm extra interface method for rw Client, as it has low applicability
and doesn't fit multitenancy well
* add `GetDroppedRows` method instead
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fixed a small typo in a comment about the mutex inside the FastQueue
struct
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Trimming content which is loaded from an external pass leads to obscure
issues in case user-defined input contained trimmed chars. For example.
user-defined password "foo\n" will become "foo" while user will expect
it to contain a new line.
---
For example, a user defines a password which ends with `\n`. This often
happens when user Kubernetes secrets and manually encodes value as
base64-encoded string.
In this case vmauth configuration might look like:
```
users:
- url_prefix:
- http://vminsert:8480/insert/0/prometheus/api/v1/write
name: foo
username: foo
password: "foobar\n"
```
vmagent configuration for this setup will use the following flags:
```
-remoteWrite.url=http://vmauth:8427/
-remoteWrite.basicAuth.passwordFile=/tmp/vmagent-password
-remoteWrite.basicAuth.username="foo"
```
Where `/tmp/vmagent-password` is a file with `foobar\n` password.
Before this change such configuration will result in `401 Unauthorized`
response received by vmagent since after file content will become
`foobar`.
---
An example with Kubernetes operator which uses a secret to reference the
same password in multiple configurations.
<details>
<summary>See full manifests</summary>
`Secret`:
```
apiVersion: v1
data:
name: Zm9v # foo
password: Zm9vYmFy # foobar\n
username: Zm9v= # foo
kind: Secret
metadata:
name: vmuser
```
`VMUser`:
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMUser
metadata:
name: vmagents
spec:
generatePassword: false
name: vmagents
targetRefs:
- crd:
kind: VMAgent
name: some-other-agent
namespace: example
username: foo
# note - the secret above is referenced to provide password
passwordRef:
name: vmagent
key: password
```
`VMAgent`:
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
name: example
spec:
selectAllByDefault: true
scrapeInterval: 5s
replicaCount: 1
remoteWrite:
- url: "http://vmauth-vmauth-example:8427/api/v1/write"
# note - the secret above is referenced as well
basicAuth:
username:
name: vmagent
key: username
password:
name: vmagent
key: password
```
</details>
Since both config target exactly the same `Secret` object it is expected
to work, but apparently the result will be `401 Unauthrized` error.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
- Added a bar chart displaying the number of log entries over a time
range.
#6404
- When `_msg` is empty, all fields are displayed in a single line.
- Added double quotes when copying pairs: `key: "value"`.
- Minor style adjustments.
Check for ranged vector arguments in aggregate expressions when
`-search.disableImplicitConversion` or `-search.logImplicitConversion`
are enabled.
For example, `sum(up[5m])` will fail to execute if these flags are set.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [*] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Added config example for `min_dev_from_expected` arg; also, small
styling fixes and alignments
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
small fix of typos in v1.13 presets (vmanomaly docs)
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
* check if `lastValue` was seen at least twice with different
timestamps. Otherwise, the difference between last timestamp and
previous timestamp could be `0` and will result into `NaN` calculation
* check if there items left in lastValue map after staleness cleanup.
Otherwise, `rate_avg` could have produce `NaN` result.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously byte slices up to 2^20 bytes (e.g. 1Mb) were cached because of a typo in the commit c14dafce43 .
This could result in increased memory usage when vmagent scrapes many regular targets, which expose
relatively small number of metrics (e.g. up to a few thousand per target) and a few large targets such as kube-state-metrics,
which expose more than 10 thousand metrics. This is common case for Kubernetes monitoring.
While at it, remove pools for very small byte slices, since they are rarely used during scraping.
- Make it in a separate goroutine, so it doesn't slow down regular intern() calls.
- Do not lock internStringMap.mutableLock during the cleanup routine, since now
it is called from a single goroutine and reads only the readonly part of the internStringMap.
This should prevent from locking regular intern() calls for new strings during cleanups.
- Add jitter to the cleanup interval in order to prevent from synchornous increase in resource usage
during cleanups.
- Run the cleanup twice per -internStringCacheExpireDuration . This should save 30% CPU time spent
on cleanup comparing to the previous code, which was running the cleanup 3 times per -internStringCacheExpireDuration .
This change fixes the following panic:
```
2024-06-04T11:16:52.899Z warn app/vmauth/auth_config.go:353 cannot discover backend SRV records for http://srv+localhost:8080: lookup localhost on 10.100.10.4:53: server misbehaving; use it literally
panic: runtime error: integer divide by zero
goroutine 9 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1()
/Users/lhhdz/wd/projects/go/VictoriaMetrics/lib/httpserver/httpserver.go:291 +0x58
panic({0x103115100?, 0x10338d700?})
/Users/lhhdz/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-arm64/src/runtime/panic.go:770 +0x124
main.getLeastLoadedBackendURL({0x0?, 0x22?, 0x1400014757b?}, 0x1400013c120?)
/Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:473 +0x210
main.(*URLPrefix).getBackendURL(0x140000aa080)
/Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:312 +0xb8
```
---------
Co-authored-by: Haley Wang <haley@victoriametrics.com>
### Describe Your Changes
Fix 404 relative img links for v1.13.0 update of vmanomaly docs
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Corrected spelling mistake in the operator json to be "working" instead
of "wokring"
### Checklist
The following checks are **mandatory**:
- [ x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
value→valid
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Lapo Luchini <lapo@lapo.it>
This should reduce GC overhead when tens of millions of strings are interned (for example, during stream deduplication
of millions of active time series).
Add support for markdown format and emoji for the `_msg` field in the
"Group" view.
Add markdown rendering toggle. Disabled by default. Value is stored in
`localStorage`.
These functions are called every time `/metrics` page is scraped, so it would be great
if they could be sped up for the cases when dedupAggr tracks tens of millions of active time series.
* adds idleConnTimeout flag, which must reduce probability of `broken
pipe` and `connection reset` errors.
* one-time retry trivial network requests for the same backend
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
- Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples().
This should reduce string allocation rate, since strings can be re-used between aggrState flushes.
- Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map.
- Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples().
- Add missing string interning at rateAggrState.pushSamples().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402
The main change is getting rid of interning of sample key. It was
discovered that for cases with many unique time series aggregated by
vmagent interned keys could grow up to hundreds of millions of objects.
This has negative impact on the following aspects:
1. It slows down garbage collection cycles, as GC has to scan all inuse
objects periodically. The higher is the number of inuse objects, the
longer it takes/the more CPU it takes.
2. It slows down the hot path of samples aggregation where each key
needs to be looked up in the map first.
The change makes code more fragile, but suppose to provide performance
optimization for heavy-loaded vmagents with stream aggregation enabled.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
Added streamaggr metrics to:
- `vm_streamaggr_samples_lag_seconds` - samples lag
- `vm_streamaggr_ignored_samples_total{reason="nan"}` - ignored NaN
samples
- `vm_streamaggr_ignored_samples_total{reason="too_old"}` - ignored old
samples
Unsupported path is already handled by `lib/httpserver`.
This prevents from misleading errors in logs caused by double-writing response headers.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Scratch based images will be using a separate tag: "(version)-scratch"
and will be built for the same architecture as regular images.
This is useful for environments with higher security standards. In this
case using alpine as base layer requires updating images more frequently
in order to get the latest updates for the base image, even in case the
user did not need to update VictoriaMetrics version.
Tested that scratch images work for:
- vmagent - enterprise with kafka and opensource
- cluster
- single-node
No issues observed so far.
cc: @tenmozes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
* "*.idleConnTimeout" flags must reduce probability of `write: broken
pipe` and `read: connection reset by peer` errors Those errors may occur
if remote server closes TCP socket for connection, while it's still
exist at client.
* single time retries for `write: broken pipe` and `read: connection
reset by peer` must handle a case for incorrectly configured timeouts at
middleware proxies, mitigate minor network issues.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5661
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Update wording to highlight that cache is not persistent if flag is
value is empty. Previously, it was not clear if cache is not used at all
or just not persistent.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* It must reduce memory usage for misbehaving clients. Since
VictoriaMetrics stores sparse index inmemory.
* Reduce disk space usage for indexdb.
* Prevent possible indexDB items drops.
* It may trigger slow insert and new timeseries registration due to
default value for flag change
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This PR fixes an issue where parsing long `_msg` values caused errors,
resulting in some log records not being displayed.
The error occurred due to partial processing of strings. In some cases,
a long record could be split into multiple chunks, causing only part of
the record to be processed instead of the entire entry.
#6281
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should improve precision of `restarts` and `version change` annotations when
zooming-in/zooming-out on the dashboards.
The change also makes `restarts` dashboard visible on the panels, so user can disable it from
displaying if needed. This could be useful when restarts overlap with version change events.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Prevent excessive resource usage when stream aggregation config file
contains no matchers by prevent pushing data into Aggregators object.
Before this change a lot of extra work was invoked without reason.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Occasionally, vmagent sends empty blocks to downstream servers. If a
downstream server returns an unexpected response, vmagent gets stuck in
a retry loop. While vmagent handles 400 and 409 errors, there are
various prometheus remote write implementations that return different
error codes. For example, vector returns a 422 error. To mitigate the
risk of vmagent getting stuck in a retry loop, it is advisable to skip
sending empty blocks to downstream servers.
Co-authored-by: hao.peng <hao.peng@smartx.com>
Co-authored-by: Zhu Jiekun <jiekun.dev@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* adds datadog extensions for statsd:
- multiple packed values (v1.1)
- additional types distribution, histogram
* adds type check and append metric type to the labels with special tag
name `__statsd_metric_type__`. It simplifies streaming aggregation
config.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Allocations are reduced by implementing custom json parser via fastjson
lib.
The change also re-uses `promInstant` object in attempt to reduce number
of
allocations when parsing big responses, as usually happens with heavy
recording rules.
```
name old allocs/op new allocs/op delta
ParsePrometheusResponse/Instant-10 9.65k ± 0% 5.60k ± 0% ~ (p=1.000 n=1+1)
```
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Allocations are reduced by re-using the byte buffer when converting
labels to string keys.
```
name old allocs/op new allocs/op delta
GetStaleSeries-10 703 ± 0% 203 ± 0% ~ (p=1.000 n=1+1)
```
Signed-off-by: hagen1778 <roman@victoriametrics.com>
it's needed to remove Summary metric type from the global state of
metrics package. metrics package tracks each bucket of summary and
periodically swaps old buckets with new.
Simple set unregister is not enough to release memory used by Set
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
Change the return values for these functions - now they return the unmarshaled result plus
the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling.
This improves performance of these functions in hot loops of VictoriaLogs a bit.
Added `rate` and `rate_avg` output
Resource usage is the same as for increase output, tested on a benchmark
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Added makefile rule for `GOARCH=loong64` to support building all
VictoriaMetrics components on the `loongarch64` platform.
### Checklist
The following checks are **mandatory**:
* [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: qiangxuhui <qiangxuhui@loongson.cn>
Set correct suffix `<output>_prometheus` for aggregation outputs
`increase_prometheus` and `total_prometheus`
Before, outputs `total` and `total_prometheus` or `increase` and
`increase_prometheus` had the same suffix.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Though labels compressor is quite resource intensive, each aggregator
and deduplicator instance has it's own compressor. Made it shared across
all aggregators to consume less resources while using multiple
aggregators.
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041
#### Added
- Added service discovery support for Vultr.
#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.
#### Note
- Useful links:
- Vultr API:
https://www.vultr.com/api/#tag/instances/operation/list-instances
- Vultr client SDK: https://github.com/vultr/govultr
- Prometheus SD:
https://github.com/prometheus/prometheus/tree/main/discovery/vultr
---
### Checklist
The following checks are mandatory:
- [X] I have read the [Contributing
Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md)
- [x] All commits are signed and include `Signed-off-by` line. Use `git
commit -s` to include `Signed-off-by` your commits. See this
[doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about
how to sign your commits.
- [x] Tests are passing locally. Use `make test` to run all tests
locally.
- [x] Linting is passing locally. Use `make check-all` to run all
linters locally.
Further checks are optional for External Contributions:
- [X] Include a link to the GitHub issue in the commit message, if issue
exists.
- [x] Mention the change in the
[Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md).
Explain what has changed and why. If there is a related issue or
documentation change - link them as well.
Tips for writing a good changelog message::
* Write a human-readable changelog message that describes the problem
and solution.
* Include a link to the issue or pull request in your changelog message.
* Use specific language identifying the fix, such as an error message,
metric name, or flag name.
* Provide a link to the relevant documentation for any new features you
add or modify.
- [ ] After your pull request is merged, please add a message to the
issue with instructions for how to test the fix or try the feature you
added. Here is an
[example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726)
- [x] Do not close the original issue before the change is released.
Please note, in some cases Github can automatically close the issue once
PR is merged. Re-open the issue in such case.
- [x] If the change somehow affects public interfaces (a new flag was
added or updated, or some behavior has changed) - add the corresponding
change to documentation.
Signed-off-by: Jiekun <jiekun.dev@gmail.com>
This code adds Exemplars to VMagent and the promscrape parser adhering
to OpenMetrics Specifications. This will allow forwarding of exemplars
to Prometheus and other third party apps that support OpenMetrics specs.
---------
Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>
The panel will show how many ongoing select queries are processed by vmstorage
and should help to identify resource bottlenecks. See panel description for more details.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new flag type is supposed to be used for specifying URL values which
could contain sensitive information such as auth tokens in GET params or
HTTP basic authentication.
The URL flag also allows loading its value from files if `file://`
prefix is specified. As example, the new flag type was used in
app/vmbackup as it requires specifying `authKey` param for making the
snapshot.
See related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973
Thanks to @wasim-nihal for initial implementation
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6060
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The cumulative number of active merges could be red herring
as it its value depends on the number of vmstorages.
For example, vmstorage could be added or removed and this will affect
the panel.
Or, each vmstorage could start a merging process (i.e. for downsampling)
and visiually it could look like a massive change.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6140
We don't cover this corner case as it has low chance for reproduction.
Precisely, the requirements are following:
1. vmagent need to be configured with multiple identical `remoteWrite.url` flags;
2. At least one of the persistent queues need to be non-empty, which already
signalizes about issues with setup;
3. vmagent need to be restarted with removing of one of `remoteWrite.url` flags.
We do not document this case in vmagent.md as it seems to be a rare corner case
and its explanation will require too much of explanation and confuse users.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Starting from v1.99.0 vmalert could ignore anchors pointing to specific
rule groups if `search` param was present in URL.
This change makes anchors compatible with `search` param in UI.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Using `sh` or `console` formatting doesn't do word-breaking on render. This makes flags description
harder to read, as users need to scroll the web page horizontally.
Removing the formatting renders the description with normal word-breaking.
Stream aggregation may yield inaccurate results if it processes incomplete data.
This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent.
If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations.
To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues
will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations
will be correct.
It is better from visibility PoV if the CONTRIBUTING.md file is visible at https://docs.victoriametrics.com/contributing/ .
This page can be indexed by search engines and searched by our users later.
It is also easier to provide a link to this page now comparing to the old link, which is much harder to remember:
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md
While at it, syncrhonize docs/CONTRIBUTING.md with .github/PULL_REQUEST_TEMPLATE/pull_request_template.md ,
so they do not contain duplicate information, which can be outdated over time. Now all the relevant information
is located at docs/CONTRIBUTING.md, while .github/PULL_REQUEST_TEMPLATE/pull_request_template.md refers this doc.
This is a follow-up for c006db1798
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6040
Using plain sync.Pool simplifies the code without increasing memory usage and CPU usage.
So it is better to use plain sync.Pool from readability and maintainability PoV.
This is a follow-up for 8942f290eb
The memory usage for plain sync.Pool doesn't increase comparing to the memory usage for the hybrid scheme,
so it is better to use plain sync.Pool in order to simplify the code and make it more readable and maintainable.
This is a follow-up for c22da2f917
Data ingestion benchmark doesn't show memory usage difference between two approaches,
so let's use simpler approach in order to improve code readability and maintainability.
This is a follow-up for 77c597738c
This scheme was used for reducing memory usage when vmagent runs on a machine with big number of CPU cores
and the ingestion rate isn't too big. The scheme with channel-based pool could reduce memory usage,
since it minimizes the number of PushCtx structs in the pool in this case.
Performance tests didn't reveal significant difference in memory usage under both low and high ingestion rate
between plain sync.Pool and the current hybrid scheme, so replace the scheme with plain sync.Pool in order
to simplify the code.
There was a sleep statement in the test, waiting for Group
to perform a couple of evaluation. But looks like
it worked unreliable for some CI tests like the one below
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/8718213844/job/23915007958?pr=6115
This commit changes the sleep statement on a function that
waits for a specific number of evaluations. It should make this
test faster in general case, and more reliable for slow environemnts.
It is incorrect applying the limit on the number of values to search without applying filters,
since the returned subset of label values may miss the label values matching the given filters.
This is a follow-up for 66630c7960
This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case:
/api/v1/label/__name__/values?match[]={__name__=~"*.some_value.*"}
When the user types `some_value` in the query input field.
This simplifies further maintenance and opens doors for additional config options
supported by lib/promauth. For example, an ability to specify client TLS certificates.
- Use exact matching by default for the query arg value provided via arg=value syntax at src_query_args.
Regex matching can be enabled by using =~ instead of = . For example, arg=~regex.
This ensures that the exact matching works as expected without the need to escape special regex chars.
- Add helper functions for creating QueryArg, Header and Regex structs in tests.
This improves maintainability of the tests.
- Remove url.QueryUnescape() call on the url in TestCreateTargetURLSuccess(), since this is bogus approach.
The url.QueryUnescape() must be applied to individual query args, and it mustn't be applied to the whole url,
since in this case it may perform invalid unescaping in the context of the url, or make the resulting url invalid.
While at it, properly marshal all the fields inside UserInfo config to yaml in tests.
Previously Header and QueryArg structs were improperly marshaled because the custom MarshalYAML
is called only on pointers to Header and QueryArg structs. This improves test coverage.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6070
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6115
If one out of 5 vmstorage nodes is unavailable, then the remaining 4 vmstorage nodes will recieve 1/5=20% increase of workload, not 5%.
If one out of 10 vmstorage nodes is unavailable, then the remaining 9 vmstorage nodes will receive 1/10=10% increase of workload, not 1%.
This is a follow-up for 458338afa5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6099
* app/vmauth: do not increment backend_errors when hitting concurrency limit
Previously, both "vmauth_concurrent_requests_limit_reached_total" and "vmauth_user_request_backend_errors_total" were incremented.
This was based on the assumption that if concurrency limit is hit the backend must be failing to handle the request thus meaning an error.
This assumption does not work in case the endpoint can be overloaded by the misbehaving client sending too many requests within the timeframe.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This should improve debuggability of unexpected deletion of directories inside partitions.
While at it, log the proper path to parts.json when the directory for big part is missing in the partition.
parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.
* docs/cluster: update cluster resizing info
- add example of resources distribution
- add info about how to handle uneven disk usage after adding a new storage node
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* vmui: fix parsing of fractional values
* vmui/vmanomaly: update display logic to align with vmanomaly /query_range API
* vmui/vmanomaly: rename flag anomalyView to isAnomalyView
- Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag
- Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config
- Document the change as a feature instead of a bug at docs/CHANGELOG.md
- Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files.
- Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport
instead of a callback, which can change the internal http.Transport.
- Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs.
This should reduce memory usage a bit when tls isn't used for scraping big number of targets.
- Do not re-read TLS root CA files on every processed request. Re-read them once per second.
This should reduce CPU usage when scraping big number of targets over https.
- Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded
from byte slices via crypto/tls.X509KeyPair().
- Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171
* lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk
Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change.
Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`.
This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: update NewRoundTripper API
Update API to allow user to update only parameters required for transport.
Add warning log when reloading Root CA failed.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: fix mutex acquire logic
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: replace RWMutex with regular mutex to simplify the code
- remove additional mutex used for getRootCABytes - require callee to use mutex
- replace RWMutex with regular mutex
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: refactor
- hold the mutex lock to avoid round tripper being re-created twice
- move recreation logic into separate func to simplify the code
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
- Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming
- Clarify the description of the change at docs/CHANGELOG.md
- Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens
- Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens.
- Add tests for various edge cases for Prometheus metric names' normalization
according to the code at b865505850/pkg/translator/prometheus/normalize_name.go
- Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go)
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035
- Make the configuration more clear by accepting the list of ignored labels during sharding
via a dedicated command-line flag - -remoteWrite.shardByURL.ignoreLabels.
This prevents from overloading the meaning of -remoteWrite.shardByURL.labels command-line flag.
- Removed superfluous memory allocation per each processed sample if sharding by remote storage is enabled.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5938
Remove description for -search.maxExportDuration and -search.maxStatusRequestDuration command-line flags
from the 'Resource usage limits' chapter, since these flags are rarely used for limiting resource usage
and they are already documented in the 'List of command-line flags' chapter.
- Fix docs for new functions at app/vmselect/graphite/functions.json
- Properly drain series lists on errors in aggregateSeriesListsGeneric() and aggregateSeriesList()
- Add links to docs for the added functions at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5809
- Allow specifying only a single HTTP header for reading auth tokens via -httpAuthHeader command-line flag.
This is better from security PoV, since this prevents from accidental reading of auth token from undesired
HTTP header. By default the -httpAuthHeader equals to Authorization. When it is overridden, then
auth token isn't read from Authorization header - it is read only from the specified header.
- Document the -httpAuthHeader command-line flag at https://docs.victoriametrics.com/vmauth/#reading-auth-tokens-from-other-http-headers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6009
* docs/vmanomaly: v1.12.0 & link updates
* add autotuned description to model section
* - update refs of vmanomaly on enterprise and vmalert pages
- add diagrams for model types
- update self-monitoring section
* - fix typos
- remove .index.html from links
This reverts commit cb23685681.
Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders
before their use. This may complicate detecting and fixing such bugs in the future.
There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 :
- To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp).
- To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory,
which cannot be removed automatically by the OS.
The case when the user accidentally deletes the directory with some files created by VictoriaMetrics
shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically.
It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error
in this case.
The remotewrite.Stop() expects that there are no pending calls to TryPush().
This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop().
Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop().
While at it, move the rate limiter into lib/ratelimiter package, since it has two users.
Also move the description of the feature to the correct place at docs/CHANGELOG.md.
Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags.
This is a follow-up for 02bccd1eb9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900
Custom HTTP headers are set via net/http.Header.Set or net/http.Header.Add functions.
These functions always convert header keys to canonical form. So the change at b577413d3b
isn't visible to users of VictoriaMetrics components.
There is no need in documenting this change at docs/CHANGELOG.md, since it doesn't give any useful information to users.
This is a follow-up for e6dd52b04c
* lib/storage: add ability to use downsampling for the given series filter
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: add information about downsampling filters
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: fix MetricsQL filter
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/downsampling: treat missing downsampling filter as a bug
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/part_header: verify correctness of downsampling filters when opening partition
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/downsampling: save only appliable rules in part metadata
Filter and save only rules which are appliable to partition based on MinTimestamp of stored data.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/downsampling: update log messages for final dedup
Properly specify a reason of re-running deduplication for partition.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules
Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule.
For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000].
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Follow-up
- Apply the first matching downsampling period if multiple filters match the given time series.
This allows fine-tuning the downsampling config for the specific needs.
- Take into account downsampling filters during search queries.
- Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches.
- Properly parse series filters with colons inside them.
- Document the feature at docs/CHANGELOG.md.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/storage: adds metrics for downsampling
vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled
vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled
These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* rm recommendation to keep look-behind window empty, as it is not correct
* mention the change of default value for `-search.latencyOffset`
Signed-off-by: hagen1778 <roman@victoriametrics.com>
During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.
This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).
This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025
To horizontally scale streaming aggregation, you might want to deploy a separate hashing tier
of vmagents that route to a separate aggregation tier. The hashing tier should shard by all labels
except the instance-level labels, to ensure the input metrics are routed correctly to the aggregator
instance responsible for those labels.
For this to achieve we introduce `remoteWrite.shardByURL.inverseLabels` flag to inverse logic of `remoteWrite.shardByURL.labels`
---------
Co-authored-by: Eugene Ma <eugene.ma@airbnb.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* vmalert: fix sending alert messages
1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered;
2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile.
* docs: update changelog notes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Store the deadline when the metricID entries must be deleted from indexdb
if metricID->metricName entry isn't found after the deadline. This should
make the code more clear comparing the the previous version, where the timestamp
of the first metricID->metricName lookup miss was stored in missingMetricIDs.
Remove the misleading comment about the importance of the order for creating entries
in the inverted index when registering new time series. The order doesn't matter,
since any subset of the created entries can become visible for search
before any other subset after registering in indexdb.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
* lib/storage/table: properly wait for force merges to be completed during shutdown
Properly keep track of running background merges and wait for merges completion when closing the table.
Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: add changelog entry
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
vmselect uses a cache folder in file system for two purposes:
1. Storing rollup cache results on shutdown;
2. Storing temporary search results from vmstorage during query executions.
It could happen that cache folder is deleted accidentally by user, or by OS
during cleanup routines. This would cause vmselect to:
1. panic on /metrics call, because `MustGetFreeSpace` will fail;
2. return query error user, as it won't be able to store temporary search results.
The changes in this commit are the following:
1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing;
2. Make vmselect to try re-creating the cache folder if it can't persist tmp search
results.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* [vmagent] added ingestion rate limiting with new flag `-maxIngestionRate`. This flag can be used to limit the number of samples ingested by vmagent per second. If the limit is exceeded, the ingestion rate will be throttled.
* fix changelog
* fix review comment
When `--vm-native-step-interval` is specified, explore phase will be executed
within specified intervals. Discovered metric names will be associated with
time intervals at which they were discovered. This suppose to reduce number
of requests vmctl makes per metric name since it will skip time intervals
when metric name didn't exist.
This should also reduce probability of exceeding complexity limits
for number of selected series in one request during explore phase.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5369
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Links with upper-case simply don't work for unknown reason.
Once the reason is fixed on docs side, this commit can be reverted.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
It is confusing for cluster users to find that VMUI is available at vmselect as it is only mentioned in the list of URLs. Explicit mention of vmselect URL in docs will make it easier to discover.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
This is a follow-up for 2d31fd7855
It is assumed that URLPrefix.busOriginal will be initialized
durin Unmarshal of the config. But in tests we set fields manually,
so this field never get initialized properly.
Fixes the error `panic: runtime error: integer divide by zero`
at `vmauth.getLeastLoadedBackendURL`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
If tlsServerName isn't empty, then it is likely the https request is sent to IP instead of hostname.
In this case the request will fail, since Go automatically sets the Host header to the IP instead
of the desired hostname at tlsServerName. So set the Host header to tlsServerName if itsn't empty.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5802
The `match[]` filter is mandatory at /api/v1/series, so it mustn't be dropped here.
There is no sense in dropping `match[]` filter together with `extra_label` and `extra_filters[]`
at /api/v1/labels and /api/v1/label/.../values if -search.ignoreExtraFiltersAtLabelsAPI commnad-line flag is set,
since:
- the `match[]` filter triggers slow path at these APIs;
- the `extra_label` and `extra_filters[]` filters narrow down the number of matched time series,
so they improve performance comparing to the case when only `match[]` filter is left,
while `extra_label` and `extra_filters[]` filters are dropped.
This is a follow-up for 0b7a23a91d
Improved trace display for better visual separation of branches:
* Increased left padding for each element
* Added padding for the last element in the branch
The change also removes misleading `default` value from README for `maxConcurrentInserts`
cmd-line flag.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This reverts commit eb40395a1c.
Reason for revert: it has been appeared that the performance gain on multiple CPU cores
wasn't visible because the benchmark was generating incorrect pushSample.key.
See a207e0bf687d65f5198207477248d70c69284296
Previously samples were dropped on the first incomplete interval and the next complete interval.
Also make sure that the de-duplication is performed just before flushing the aggregate state.
This should help the case then dedup_interval = interval.
* use docs.victoriametrics.com instead of github docs
* add links to common terms used in VictoriaMetrics
Signed-off-by: hagen1778 <roman@victoriametrics.com>
For example, if `interval: 1m`, then data flush occurs at the end of every minute,
while `interval: 1h` leads to data flush at the end of every hour.
Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.
The labelsMap struct employs the fact that label indexes are condensed around 0,
so it stores the referred labels in a slice instead of map and uses slice index as label key.
This allows increasing the LabelsCompressor.Decompress performance by up to 3x.
This also reduces the latency of data flush in stream aggregation.
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
The match[] at /api/v1/labels and /api/v1/label/.../values also may lead to slow requests and
high resource usage if it matches big number of time series. So it must be igrnored if -search.ignoreExtraFiltersAtLabelsAPI
command-line flag is set.
This is a follow-up for fab02faa3f
* app/{vmagent,vminsert}: adds /v1/metrics suffix for opentelemetry route path
it must fix compatibility with opentemetry-collector [spec](https://opentelemetry.io/docs/specs/otlp/\#otlphttp-request)
this suffix is hard-coded and cannot be changed with collector configuration
* Apply suggestions from code review
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* metricsql: fix label_join() when `dst_label` is equal to one of the `src_label`
* Update app/vmselect/promql/transform.go
* Update docs/CHANGELOG.md
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Document the ability to read OpenTelemetry data from Amazon Firehose at docs/CHANGELOG.md
- Simplify parsing Firehose data. There is no need in trying to optimize the parsing with fastjson
and byte slice tricks, since OpenTelemetry protocol is really slooow because of over-engineering.
It is better to write clear code for better maintanability in the future.
- Move Firehose parser from /lib/protoparser/firehose to lib/protoparser/opentelemetry/firehose,
since it is used only by opentelemetry parser.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5893
* deployment: create a separate env for VictoriaLogs
The new environment consists of the following components:
* VictoriaLogs
* fluentbit for collecting logs and sending to VictoriaLogs
* VictoriaMetrics for scraping and storing metrics from fluentbit and VictoriaLogs
* Grafana with VictoriaLogs datasource for monitoring
-----------------
The motivation for creating a separate environment is to simplify existing environments
and make it easier to update or modify them in future.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Prevsiously they were swapped - the first arg should be the label name and the second arg should be label filters
This is a follow-up for e389b7b959e8144fdff5075bf7a5a39b2b0c6dd3
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5847
This solves two issues:
1. The vm_backups_uploaded_bytes_total metric will grow more smoothly
2. This prevents from int overflow at metrics.Counter.Add() when uploading files bigger than 2GiB
This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions
on int and bool types.
See ea9e2b19a5
The issue has been introduced in bace9a2501
The improper fix was in the d4c0615dcd ,
since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field
by 4-byte boundary inside partition struct.
The proper fix is to use atomic.Int64 field - this guarantees that the access to this field
won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860
and https://github.com/golang/go/issues/19057
It has been appeared that there are VictoriaMetrics users, who rely on the fact that
VictoriaMetrics components were closing incoming connections to -httpListenAddr every 2 minutes
by default. So let's return back this value by default in order to fix the breaking change
made at d8c1db7953 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1961891450 .
Previously the (date, metricID) entries for dates older than the last 2 days were removed.
This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling).
The issue has been introduced in 431aa16c8d
This commit returns back limits for these endpoints, which have been removed at 5d66ee88bd ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
* vmui: add a time picker to the "Logs Explorer" page #5673
* Update app/vmui/packages/vmui/src/pages/ExploreLogs/hooks/useFetchLogs.ts
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Do not convert shard items to part when a shard becomes full. Instead, collect multiple
full shards and then convert them to a searchable part at once. This reduces
the number of searchable parts, which, in turn, should increase query performance,
since queries need to scan smaller number of parts.
* app/vmselect: adds milliseconds to the csv export response for rfc3339
* milliseconds is a standard prescion for VictoriaMetrics query request responses
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5837
* app/victoria-metrics: adds tests for csv export/import
follow-up after 3541a8d0cf96dd4f8563624c4aab6816615d0756
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Previously the interval between item addition and its conversion to searchable in-memory part
could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp()
to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking
the time when buffered items must be converted to searchable in-memory parts, since time.Now()
calls aren't located in hot paths.
Increase the flush interval for converting buffered samples to searchable in-memory parts
from one second to two seconds. This should reduce the number of blocks, which are needed
to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage.
While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal
data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems
with big amounts of RAM under high data ingestion rate.
The pooled rawRowsBlock objects occupies big amounts of memory between flushes,
and the flushes are relatively rare. So it is better to don't use the pool
and to allocate rawRow blocks on demand. This should reduce the average
memory usage between flushes.
The buffer can be quite big under high ingestion rate (e.g. more than 100MB).
This leads to increased memory usage between buffer flushes.
So it is better to re-create the buffer on every flush in order to reduce memory usage
between buffer flushes.
This should improve maintainability of the code related to rollup functions,
since it is located in rollup.go
While at it, properly return empty results from holt_winters(), rate_over_sum(),
sum2_over_time(), geomean_over_time() and distinct_over_time() when there are no real samples
on the selected lookbehind window. Previously the previous sample value was mistakenly
returned from these functions.
* [lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)
* fixed tests
* fixed test
* Revert "fixed test"
This reverts commit 8a29764806.
* Revert "fixed tests"
This reverts commit 9ce13d1042.
* Revert "[lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)"
This reverts commit a7a04bd4
* [lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
Currently the docker-compose examples for loading `victoriametrics-datasource` uses 2 environment variables:
- `GF_ALLOW_LOADING_UNSIGNED_PLUGINS`
- `GF_DEFAULT_APP_MODE`
I believe both of the env vars are trying to achieve the same thing. `GF_DEFAULT_APP_MODE` disables code signing for all plugins and `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` intends to disable code signing for just `victoriametrics-datasource`.
Keeping the scope narrowed to just `victoriametrics-datasource` would be preferable in this case.
Unfortunately `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` is misspelled. According to [grafana docs](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#override-configuration-with-environment-variables), the format is supposed to be `GF_<SectionName>_<KeyName>`. In other words the current env var is missing the section name.
This PR proposes to:
1. fix the typo
2. remove the global disablement of code signing
Alternatively, if you prefer to keep codesigning disabled globally, please remove `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` env var as it confuses things
* support case-insensitive search
* reflect search condition in URL, so link can be sharable
* support filtering on /alerts page
* fix collapseAll/expandAll logic to respect only shown entries
* add changelog
b60dcbe11f
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb.
See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request.
- Document the change at docs/VictoriaLogs/CHANGELOG.md
- Document the `limit` query arg at docs/VictoriaLogs/querying/README.md
- Make the change less intrusive.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778
* - apply v1.10 changes
- chapter on model types (uni/multivariate and rolling)
* - update self-monitoring labels description
- fix typos
* fix duplicated text and link rendering
* app/vmauth: properly release memory during config reload
previously metrics package hold a refrence for channels for users concurrent requests.
it case of churn at `name` field of users configuration, new metric was created. But previous one wasn't deleted.
It prevented full parsed configuration from being garbace collected.
now all config related metrics are bound to corresponding metrics.Set and unregistered during config reload process.
It also must fix an issue with incorrect values for current concurrent user requests
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Instead, log a sample of these long items once per 5 seconds into error log,
so users could notice and fix the issue with too long labels or too many labels.
Previously this panic could occur in production when ingesting samples with too long labels.
The 3c246cdf00 added an optimization where the previous metaindexRow
could be saved to disk when the current block header couldn't be added indexBlock because the resulting
indexBlock size became too big. This could result in an empty metaindexRow.firstItem for the next metaindexRow.
This allows removing importing unneeded command-line flags into binaries, which import lib/storage,
which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions.
This is a follow-up for 83e55456e2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738
Closing client connections every 2 minutes doesn't help load balancing -
this just leads to "jumpy" connections between multiple backend servers,
e.g. the load isn't spread evenly among backend servers, and instead jumps
between the servers every 2 minutes.
It is still possible periodically closing client connections by specifying non-zero -http.connTimeout command-line flag.
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1636997037
This is a follow-up for d387da142e
Sync description after changes in 83e55456e2
Remove extra text in flags description for vmbackupmanager as it breaks layout
when rendered at docs.victoriametrics.com
Signed-off-by: hagen1778 <roman@victoriametrics.com>
There is no sense in storing commonPrefix for blockHeader containing only a single item,
since this only increases blockHeader size without any benefits.
This panic could occur when samples with too long label values are ingested into VictoriaMetrics.
This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each).
This may inflate the maximum index block size by 4 * maxIndexBlockSize.
* add more details to changelog
* simplify panels description
* remove capacity planning recommendation, as it proves it incompetent
Signed-off-by: hagen1778 <roman@victoriametrics.com>
During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than
rate(vm_rows_added_to_storage_total) and it could last quite some time,
which causes negative values of Storage full ETA and confuses users, see playground.
Instead of trying to get more accurate results during downsampling, I think it's ok to ignore
vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.
The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
For example, -fooDuration=',10s,' is now supported - it sets three command-line flag values:
- the first and the last one are set to the default value for `-fooDuration`
- the second one is set to 10s
* vmui: fix select closing on click outside (#5728)
* vmui: clear entered text in select after selecting a value (#5727)
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should significantly reduce the number of open ReaderAt files
on VictoriaMetrics and VictoriaLogs startup.
The open files can be tracked via vm_fs_readers metric
GOGC can be already set via environment variable. There is no need in adding
new approaches for setting the GOGC (such as command-line flag), since they complicate operations.
It should help identifying cases when too much CPU is spent on garbage collection,
and advice users on how this can be addressed.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Remove temporary file before closing it in order to signal the OS that it shouldn't
store the file contents from page cache to disk when the file is closed.
Gracefully handle the case when the file cannot be removed before being closed -
in this case remove the file after closing it. This allows working on Windows.
Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing.
This is a follow-up for 9b1e002287
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
The easyproto-based marshaler is 2x slower than the previous custom marshaler,
so let's stick with it. This improves the performance for sending data to remote storage at vmagent
and reduces CPU usage to pre-v1.97.0 levels.
This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs)
before trying to append a new item to brsPool and sharing its contents with brs.brs.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733
* app/vmselect: set proper timestamp for cached instant responses
The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.
The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
There is no sense in running more than GOMAXPROCS concurrent marshalers,
since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth,
but they may result in more RAM usage.
Entries for the previous dates is usually not used, so there is little sense in keeping them in memory.
This should reduce the size of storage/date_metricID cache, which can be monitored
via vm_cache_entries{type="storage/date_metricID"} metric.
This limit has little sense for these APIs, since:
- Thses APIs frequently result in scanning of all the time series on the given time range.
For example, if extra_filters={datacenter="some_dc"} .
- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.
Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:
- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
* port un-synced changed from docs/readme to readme
* consistently use `sh` instead of `console` highlight, as it looks like
a more appropriate syntax highlight
* consistently use `sh` instead of `bash`, as it is shorter
* consistently use `yaml` instead of `yml`
See syntax codes here https://gohugo.io/content-management/syntax-highlighting/
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Optimize the performance of data merge: decimal.CalibrateScale() from 49633 ns/op to 9146 ns/op
* Optimize the performance of data merge: decimal.CalibrateScale()
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: fix data race during hot-config reload
During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.
The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Revert "app/vmalert: fix data race during hot-config reload"
This reverts commit a4bb7e8932.
* app/vmalert: fix data race during hot-config reload
During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.
The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
Previously a shared pool was used for merging all the part types.
A single merge worker could merge parts with mixed types at once. For example,
it could merge simultaneously an in-memory part plus a big file part.
Such a merge could take hours for big file part. During the duration of this merge
the in-memory part was pinned in memory and couldn't be persisted to disk
under the configured -inmemoryDataFlushInterval .
Another common issue, which could happen when parts with mixed types are merged,
is uncontrolled growth of in-memory parts or small parts when all the merge workers
were busy with merging big files. Such growth could lead to significant performance
degradataion for queries, since every query needs to check ever growing list of parts.
This could also slow down the registration of new time series, since VictoriaMetrics
searches for the internal series_id in the indexdb for every new time series.
The third issue is graceful shutdown duration, which could be very long when a background
merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
since it merges in-memory parts.
A separate pool of merge workers per every part type elegantly resolves both issues:
- In-memory parts are merged to file-based parts in a timely manner, since the maximum
size of in-memory parts is limited.
- Long-running merges for big parts do not block merges for in-memory parts and small parts.
- Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
Merging for file parts is instantly canceled on graceful shutdown now.
- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
should automatically self-tune according to the number of available CPU cores.
- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge
- Tune the number of shards for pending rows and items before the data goes to in-memory parts
and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
for registration of new time series. This should reduce the duration of data ingestion slowdown
in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
unavailable.
- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.
This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
* docs: remove raw and endraw tags as they are not needed for the new version of site
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
* revert formating in vmaler
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
The maxFileParts usage has been accidentally removed in fa566c68a6
While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
This reverts cd4f641d32 , since it has been appeared that the disabled compression
for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads,
while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%.
The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage.
This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability
of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default.
The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage.
The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.
The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.
The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.
The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average.
The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.
Measuring read / write duration from / to in-memory buffers has little sense,
since it will be always fast. It is better to measure read / write duration from / to
real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics.
This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls
per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers.
This is a follow-up for 2f63dec2e3
* lib/promscrape: respect `0` value for `series_limit` param
Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.
This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update docs/CHANGELOG.md
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* vmui: fix the logic of closing the popper #5470
* vmui: fix the logic of caching autocomplete results #5472
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This reduces the number of memory allocations at the cost of possible memory usage increase,
since now different metric name strings may hold references to the previous byte slice.
This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory.
This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex.
The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache.
Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores.
The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache.
This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds
because of bottleneck at the mutex.
Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2
cache hits under the mutex, e.g. when the entry is found in the mutable cache.
This should automatically adjust intervals between merges depending on the addition rate
for new time series (aka churn rate):
- The interval will be much smaller than 10 seconds under high churn rate.
This should reduce the mutex contention for mutable cache.
- The interval will be bigger than 10 seconds under low churn rate.
This should reduce the uneeded work on merging of mutable cache into immutable cache.
The change removes artificial delay before returning error, which sometimes
caused less retry events than expected.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice
Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.
* remove mislead comment
* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640
* wip
* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls
groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.
Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.
P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.
* typo fix
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Examples:
1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url
The flag value is automatically updated when the file contents changes.
* app/vmauth: adds metric_labels and backend_errors counter
it must improve observability for user requests with new metric - per user backend errors counter.
it's needed to calculate requests fail rate to the configured backends.
metric_labels configuration allows to perform additional aggregations on top of multiple users from configuration section.
It could be multiple clients or clients with separate read/write tokens
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()
* fix test
- docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names,
document missing __meta_hetzner_role label.
- lib/promscrape/config.go: added missing MustStop() call for Hetzner SD,
and moved the code to the correct place according to alphabetical order of SD names.
- lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses,
populate missing __meta_hetzner_role label like Prometheus does.
- Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does.
- Remove unused SDConfig.Token.
- Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154
It is likely this value was borrowed from panel `Slow inserts` panel
from Grafana dasbhoard for VM single/cluster installations. This is a mistake.
There is no such thing as "tolerable churn rate", as tolerancy depends on the amount
of allocated resources.
Although, it is unclear what is meant by 5%. If it refers to 5% of new time series per second,
then such churn rate is extremely high. It would mean that the avg life of a time series is 20s.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
app/victoriametrics: update the test suite
* simplify /query and /query_range test cases configuration and tests
* support instant queries with lookbehind window like `query=foo[5m]`
* support instant queries selecting scalar value like `query=42`
* add query_range test for prometheus
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmui/vmanomaly: add support models that produce only `anomaly_score`
* vmui/vmanomaly: fix display legend
* vmui/vmanomaly: update comment on anomaly threshold
- Clarify the bugfix description at docs/CHANGELOG.md
- Simplify the code by accessing prefetchedMetricIDs struct under the lock
instead of using lockless access to immutable struct.
This shouldn't worsen code scalability too much on busy systems with many CPU cores,
since the code executed under the lock is quite small and fast.
This allows removing cloning of prefetchedMetricIDs struct every time
new metric names are pre-fetched. This should reduce load on Go GC,
since the cloning of uin64set.Set struct allocates many new objects.
Add docs to explain that `latest` backup folder can be accessed
more frequently than others. Hence, it is recommended to keep it
on storages with low access price.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, this cache was limited only by size.
Cache invalidation by time happens with jitter to prevent thundering herd problem.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
It was calculating the number of dropped time series instead of the number of dropped samples.
While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated -
at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating
the size of snappy-compressed prompbmarshal.WriteRequest protobuf message.
Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.
* update link to book a demo for AD & RCA
* fix invalid refs in components/model
* - fix staging -> prod links
- replace capitalized FAQ headers
- change the section order on main page
- replace :latest tag with current stable
* add panels for detailed visualization of traffic usage between vmstorage, vminsert, vmselect
components and their clients. New panels are available in the rows dedicated to specific components.
* update "Slow Queries" panel to show percentage of the slow queries to the total number of read queries
served by vmselect. The percentage value should make it more clear for users whether there is a service degradation.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmselect: drop `rollupDefault` function as duplicate
It is unclear why there are two identical fns `rollupDefault`
and `rollupDistinct`. Dropping one of them.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmselect/promql/rollup.go
* Update app/vmselect/promql/rollup.go
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The user may which to control the endpoint parameters for instance to
set the audience when requesting an access token. Exposing the
parameters as a map allows for additional use cases without requiring
modification.
* lib/awsapi: properly assume role with webIdentity token
introduce new irsaRoleArn param for config. It's only needed for authorization with webIdentity token.
First credentials obtained with irsa role and the next sts assume call for an actual roleArn made with those credentials.
Common use case for it - cross AWS accounts authorization
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3822
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Some clients may ingest samples via OpenTelemetry protocol without Resource labels.
Previously VictoriaMetrics was silently dropping such samples.
The commit 317834f876 added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"}
counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459
It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.
Before, retries happened only on writes into a network connection
between source and destination. But errors returned by server after
all the data was transmitted were logged, but not retried.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Currently, it is impossible to understand why metrics are not ingested when resource is not set by OTEL exporter. Adding metric should simplify debugging and make it improve debuggability.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
app/vmselect/promql/eval.go:evalAggrFunc shunts evaluation
of AggrFuncExpr over rollupFunc over MetricsExpr to an optimized
path. tryGetArgRollupFuncWithMetricExpr() checks whether expression
can be shunted, but it mangles the AggrFuncExpr when the aggregation
function has more than one argument. This results in queries like
`sum(aggr_over_time("avg_over_time",m))` failing with error message
'expecting at least 2 args to "aggr_over_time"; got 1 args' while
the analogous query `sum(avg_over_time(m))` executes successfully.
This fix removes the unnecessary mangling.
Signed-off-by: Anton Tykhyy <atykhyy@gmail.com>
Previously `retry_status_codes: []` and `drop_src_path_prefix_parts: 0` at `url_map` were equivalent to missing values.
This was resulting in using the user-level values instead.
* vmui: add show quick tip for autocomplete
* vmui: auto-completion usability improvements #5348
* vmui: add const for min symbols in autocomplete
* Use proper queries to VictoriaMetrics
* vmui: fix comments for autocomplete
* app/vmselect: run `make vmui-update`
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* fix typos in docs
* add `shard-` prefix to generated links when `-promscrape.cluster.memberURLTemplate` is enabled
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This is follow-up after
75196d7234
It updates some of the alerting rules to remove unnecessary aggregations.
It keeps aggregations for expressions which are using multiple time series
filters to make sure their label will match.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Aggregations with by() have one sideeffect, that any custom labels you add to hosts are dropped too which can be used for alerts routing.
Therefore, some good practice could be to use without() instead, with labels, like without(path) , or without(url) to get same aggregations but with any external labels left intact.
Before, vmalert would send notifications with labels containing characters
not supported by Alertmanager validator, resulting into validation errors
like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"`
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions
While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
* vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers
* add extra panel for new metric
- Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags.
- Wait until message processing goroutines are stopped before returning from gcppubsub.Stop().
- Prevent from multiple calls to Init() without Stop().
- Drop message if tenantID cannot be parsed properly.
- Take into account tenantID for all the supported message formats.
- Support gzip-compressed messages for graphite format.
- Use exponential backoff sleep when the message cannot be pushed to remote storage systems
because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence
- Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called.
- Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub
- Mention Google PubSub support at docs/enterprise.md
- Make Google PubSub docs more clear at docs/vmagent.md
This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984
and e6eab781ce42285a6a1750dc01eba6801dd35516 .
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713
* app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format
* app/vmalert: hide updates if query param not set
* app/vmalert: fix panic (recursion call)
* app/vmalert: add needed group name and file name
* app/vmalert: fix comment, update behavior
* app/vmalert: fix description
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
E.g. replace `fs.Dir + filePath` with `path.Join(fs.Dir, filePath)`
The fs.Dir is guaranteed to end with slash - see Init() functions.
The filePath may start with slash. If it starts with slash, then `fs.Dir + filePath` constructs
an incorrect path with double slashes.
path.Join() properly substitutes duplicate slashes with a single slash in this case.
While at it, also substitute incorrect usage of filepath.Join() with path.Join()
for constructing paths to object storage systems, which expect forward slashes in paths.
filepath.Join() substittues forward slashes with backslashes on Windows, so this may break
creating or managing backups from Windows.
This is a follow-up for 0399367be602b577baf6a872ca81bf0f99ba401b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/719
* lib/backup/s3remote: remove prev object versions for recursive delete
- fix error caused by sending empty objects list to be deleted. This was possible in case old versions of objects where deleted, but root-level entries where still available. This caused paginator to return an empty page which wasn't skipped.
- delete previous versions of objects recursively for S3 remote
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/changelog: add vmbackupmanager fix entry
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/backup/s3remote: unify path construction for S3 objects
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously concurrency for static and fast queries was limited with the -search.maxConcurrentRequests
command-line flag. This could complicate identifying heavy queries via `vmui` at `Top queries` and `Active queries` pages,
since `vmui` and these pages couldn't be opened on overloaded vmselect.
Thanks to @f41gh7 for the idea.
Previously the /service-discovery page didn't show targets dropped because of sharding
( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ).
Show also the reason why every target is dropped at /service-discovery page.
This should improve debuging why particular targets are dropped.
While at it, do not remove dropped targets from the list at /service-discovery page
until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets .
Previously the list was cleaned up every 10 minutes from the entries, which weren't updated
for the last minute. This could complicate debugging of dropped targets.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
* lib/streamaggr: properly reference slice with labels
by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402
* Reduce memory allocations when output_relabel_configs adds new labels to output samples
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* prevent /api/v1 from panic on parsing rows
* add tests for Extract function for v1 and v2 api's
* separate request types in different pools to prevent different objects mixing
* add changelog line
543f218fe9
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
for the result returned from these functions.
- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.
- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.
- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
cannot keep up with the data ingestion rate.
- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.
- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
data to persistent queue when -remoteWrite.disableOnDiskQueue is set.
- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
because they are replaced with vmagent_remotewrite_samples_dropped_total metric.
- Update 'Disabling on-disk persistence' docs at docs/vmagent.md
- Update stale comments in the code
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.
New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.
Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
* adds test, updates readme
* apply review suggestions
* update docs for vmagent
* makes linter happy
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Tests showed that importing a single line with 70MB size takes 5.3GiB
RSS memory for VictoriaMetrics single-node.
In the scenario when user exports and imports data from one VM to another,
it could possibly lead to OOM exception for destination VM.
Importing a single line with 16MB size taks 1.3GiB RSS memory.
Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB
to improve reliability of VictoriaMetrics during imports.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The default caching for Go artifacts from actions/setup-go@v4 uses the hash of go.sum file
as a cache key - see https://github.com/actions/cache/blob/main/examples.md#go---modules .
This isn't enough for VictoriaMetrics case, since different makefile actions build different the Go artifacts,
which need to be cached. So embed the action name in the cache key.
This case is possible after the following steps:
1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data.
2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk.
3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression.
4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data.
5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy.
The solution is the same as 12cd32fd75, just fall back to zstd decompression if Snappy decompression fails.
Previously the number of memory allocations inside copyTimeseriesShallow() was equal to 1+len(tss)
Reduce this number to 2 by pre-allocating a slice of timeseries structs with len(tss) length.
The `path!="/favicon.ico"` filter has little sense, since there are many other special paths,
which may be filtered out - /metrics, /flags, /health, /ping, /robots.txt, /-/healthy, /-/ready, /reload, etc.
See /lib/httpserver/httpserver.go for more details.
It will be hard or impossible to maintain filters for all these paths, so it is better to drop this filter
in order to simplify queries and improve the consistency of these queries.
The buffered connection could have exceeded the underlying connection
deadline during reading or writing to an internal buffer.
With this change, buffered connection struct additionally checks
for a deadline in Read/Write methods.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series)
when performing queries satisfying all the following conditions:
- It must select time series with multiple metric names. For example, {__name__=~"foo|bar"}
- The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo|bar"})
- The rollup function must be wrapped into aggregate function, which has no streaming optimization.
For example, quantile(0.9, rate({__name__=~"foo|bar"})
In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error.
Instead, it should fall back to query execution with disabled cache.
Also properly store the merged results. Previously they were incorrectly stored because of a typo
introduced in the commit 41a0fdaf39
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337
`version` label won't show the difference if various flavors of the same
version were deployed. But `short_version` will.
For example, on the sandbox env we test VM builds before new version release.
Without this change, the version update won't be visible on dashboard.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/querytracer: makes package concurrent safe to use
it must fix various issues with concurrent code usage.
Especially, when it's not reasonable to wait for all goroutines to be finished
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- If min_over_time(m[offset] @ timestamp) <= min_over_time(m[offset] @ (timestamp-window)),
then the optimization can be applied.
- If max_over_time(m[offset] @ timestamp) >= max_over_time(m[offset] @ (timestamp-window)),
then the optimization can be applied.
* vmui: reduced the number of server requests
* run `make vmui-update vmui-logs-update`
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This case is possible after the following steps:
1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether
the remote storage supports zstd-compressed data.
2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression
for the data sent to the remote storage.
3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk.
4. The remote storage becomes available.
5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data.
6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage,
while falsely advertizing it sends zstd-compressed data.
7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd.
The solution is to just fall back to Snappy decompression if zstd decompression fails.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
We have failed test on master branch.
```
--- FAIL: TestFormatLogMessage (0.00s)
logger_test.go:24: unexpected result; got
"foo: abcde, \"foo bar baz\", xx"
want
"foo: a..e, \"f..z\", xx"
```
if failed because maxArgs maxLen <= 4 in the `LimitStringLen` in that case we always will return the income string
but in the test we limit the maxLen by value 4
```
f("foo: %s, %q, %s", []interface{}{"abcde", fmt.Errorf("foo bar baz"), "xx"}, 4, `foo: a..e, "f..z", xx`)
Previously the `Host` header was remained unchanged when passing it in requests to backends.
This may improperly work if the backend uses host-based routing.
While at it, allows http/2.0 requests to backends. While VictoriaMetrics components
do not accept http/2.0 requests, other backends can require such requests.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
- Re-use identically configured http.Transport across multiple users.
This fixes handling of the limit on the number of connection, which can be established per each backend
via -maxIdleConnsPerBackend command-line flag. This limit stopped working after 323f3720ed
- Add docs about backend TLS setup at https://docs.victoriametrics.com/vmauth.html#backend-tls-setup
- Add ability to disable backend TLS verification for all the users via -backend.tlsInsecureSkipVerify command-line flag.
This flag may be useful when -auth.config contains big number of users, and every user must disable backend TLS verification.
- Add ability to specify TLS Root CA via tls_ca_file option at per-user basis and via -backend.tlsCAFile command-line flag
across all the users.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
Previously entries which were accessed only 1 time weren't cached.
It has been appeared that some rarely executed heavy queries may read indexdb block twice
in a row instead of once. There is no need in caching such a block then.
This change should eliminate cache size spikes for indexdb/dataBlocks when such heavy queries are executed.
Expose -blockcache.missesBeforeCaching command-line flag, which can be used for fine-tuning
the number of cache misses needed before storing the block in the caching.
* app/vmalert: update remote-write process
* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update docs/CHANGELOG.md
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
reduce the number of queries for restoring alerts state on start-up.
The change should speed up the restore process and reduce pressure on `remoteRead.url`.
* vmauth: add browser authorization request for http requests without credentials to a route that is not in the `unauthorized_user` section (when `unauthorized_user` is specified).
* add link to issue in CHANGELOG
* Extend vmauth docs
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This reduction is based on production testing.
Also expose -search.minWindowForInstantRollupOptimization command-line flag, so users could fine-tune this arg for their needs
vmalert expects string value for stats.seriesFetched, so it is impossible
switching to number without breaking compatibility with old vmalert releases :(
It is still unclear why stats.seriesFetched has string type in the first place...
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:
- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()
The basic idea of optimization is to calculate
rf(m[d] @ t)
as
rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))
where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously
The offset may be in the range of up to 1 hour.
The SECURITY label should be applied only to changes, which fix security issues.
The change at ad839aa492 adds new command-line flags, which can be used
for improving security in some cases. They do not fix any security issues.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5111
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new rule for vmalert supposed to detect groups that miss their
evaulations due to slow queries.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The panel `Errors rate to Alertmanager` had `group` label filter
applied to the expression, while the metric `vmalert_alerts_send_errors_total`
doesn't have that label. This resulted into always empty results.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
fix possible missing firing states for alerting rules in replay mode
Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing.
Co-authored-by: hagen1778 <roman@victoriametrics.com>
support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options`
HTTP headers in all VictoriaMetrics components.
The values for headers can be specified by users via the following flags:
`-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`.
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file
and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file.
Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files
without the need to restart vmagent.
This is a follow-up for 90427abc65
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243
Previously url watchers for pod, service and node objects could be mistakenly closed
when service discovery was set up only for endpoints and endpointslice roles,
since watchers for these roles may start start pod, service and node url watchers
with nil apiWatcher passed to groupWatcher.startWatchersForRole().
Now all the url watchers, which belong to a particular groupWatcher, are stopped at once
when this groupWatcher has no apiWatcher subscribers.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216
The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
* do not print redundant error logs when failed to scrape consul or nomad target
prometheus performs the same because it uses consul lib which just drops the error(1806bcb38c/api/api.go (L1134))
- Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup
don't prevent from processing the corresponding scrape targets after the file becomes correct,
without the need to restart vmagent.
Previously scrape targets with invalid TLS CA file or TLS client certificate files
were permanently dropped after the first attempt to initialize them, and they didn't
appear until the next vmagent reload or the next change in other places of the loaded scrape configs.
- Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent.
Previously the old TLS CA was used until vmagent restart.
- Properly handle errors during http request creation for the second attempt to send data to remote system
at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing,
since the returned request is nil on error.
- Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent.
Previously it could miss details on the source of the request.
- Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header
of every http request issued by vmagent during service discovery or target scraping.
Re-use the HTTP client instead until the corresponding scrape config changes.
- Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached,
e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server
when auth header cannot be obtained because of temporary error.
- Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config.
Cache the loaded certificate and the error for one second. This should significantly reduce CPU load
when scraping big number of targets with the same tls_config.
- Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`.
- Improve test coverage at lib/promauth
- Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later.
Previously vmagent was exitting in this case.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959
It could return either `failed to read` or `failed to parse` errors depending
on whether the given url can be loaded or not under the current environment
c9375cac5e
Descriptions were updated in attempt to make it more clear for readers,
re-phrasing and linking missing docs.
`eval_delay` was added to tests to verify it can be unmarshalled.
`eval_delay` is now applied before timestamp alignment to make it more predictable.
Before, if delay < interval the timestamp won't be aligned.
`eval_delay` and `eval_offset` was added to API output.
`PreviouslySentSeriesToRW` converted to private `previouslySentSeriesToRW`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Using `min_over_time` should reduce the amount of false positives when
component is running in near-the-threshold state. Now it should trigger
only if all collected samples were above the threshold on 10m interval.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before filling a bug report it would be great to [upgrade](https://docs.victoriametrics.com/#how-to-upgrade)
Before filling a bug report it would be great to [upgrade](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
and verify whether the bug is reproducible there.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html) first.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/victoriametrics/troubleshooting/) first.
- type:textarea
id:describe-the-bug
attributes:
@@ -60,12 +60,12 @@ body:
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
*[Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
*[Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
*[Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229)
*[Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)
See how to setup monitoring here:
*[monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
*[monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
*[monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
*[monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
VictoriaMetrics is a fast, cost-saving, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.
## Folder Structure
-`/app`: Contains the compilable binaries.
-`/lib`: Contains the golang reusable libraries
-`/docs/victoriametrics`: Contains documentation for the project.
-`/apptest/tests`: Contains integration tests.
## Libraries and Frameworks
- Backend: Golang, no framework. Use third-party libraries sparingly.
- Frontend: React.
## Code review guidelines
Ensure the feature or bugfix includes a changelog entry in /docs/victoriametrics/changelog/CHANGELOG.md.
Verify the entry is under the ## tip section and matches the structure and style of existing entries.
Chore-only changes may be omitted from the changelog.
Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development goals](https://docs.victoriametrics.com/victoriametrics/goals/).
which govulncheck || go install golang.org/x/vuln/cmd/govulncheck@latest
remove-govulncheck:
rm -rf `which govulncheck`
install-wwhrd:
which wwhrd || go install github.com/frapposelli/wwhrd@latest
check-licenses:install-wwhrd
wwhrd check -f .wwhrd.yml
copy-docs:
# The 'printf' function is used instead of 'echo' or 'echo -e' to handle line breaks (e.g. '\n') in the same way on different operating systems (MacOS/Ubuntu Linux/Arch Linux) and their shells (bash/sh/zsh/fish).
# For details, see https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4548#issue-1782796419 and https://stackoverflow.com/questions/8467424/echo-newline-in-bash-prints-literal-n
httpListenAddr=flag.String("httpListenAddr",":8428","TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol=flag.Bool("httpListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs=flagutil.NewArrayString("httpListenAddr","TCP addresses to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol=flagutil.NewArrayBool("httpListenAddr.useProxyProtocol","Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
minScrapeInterval=flag.Duration("dedup.minScrapeInterval",0,"Leave only the last sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication and https://docs.victoriametrics.com/#downsampling")
"equal to -dedup.minScrapeInterval > 0. See also -streamAggr.dedupInterval and https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication")
dryRun=flag.Bool("dryRun",false,"Whether to check config files without running VictoriaMetrics. The following config files are checked: "+
"-promscrape.config, -relabelConfig and -streamAggr.config. Unknown config entries aren't allowed in -promscrape.config by default. "+
"This can be changed with -promscrape.config.strictParse=false command-line flag")
@@ -39,16 +39,30 @@ var (
"The saved data survives unclean shutdowns such as OOM crash, hardware reset, SIGKILL, etc. "+
"Bigger intervals may help increase the lifetime of flash storage with limited write cycles (e.g. Raspberry PI). "+
"Smaller intervals increase disk IO load. Minimum supported value is 1s")
maxIngestionRate=flag.Int("maxIngestionRate",0,"The maximum number of samples vmsingle can receive per second. Data ingestion is paused when the limit is exceeded. "+
"By default there are no limits on samples ingestion rate.")
finalDedupScheduleInterval=flag.Duration("storage.finalDedupScheduleCheckInterval",time.Hour,"The interval for checking when final deduplication process should be started."+
"Storage unconditionally adds 25% jitter to the interval value on each check evaluation."+
" Changing the interval to the bigger values may delay downsampling, deduplication for historical data."+
" See also https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication")
)
funcmain(){
// VictoriaMetrics is optimized for reduced memory allocations,
// so it can run with the reduced GOGC in order to reduce the used memory,
// while keeping CPU usage spent in GC at low levels.
//
// Some workloads may need increased GOGC values. Then such values can be set via GOGC environment variable.
// It is recommended increasing GOGC if go_memstats_gc_cpu_fraction metric exposed at /metrics page
// exceeds 0.05 for extended periods of time.
cgroup.SetGOGC(30)
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage=usage
envflag.Parse()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
ifpromscrape.IsDryRun(){
*dryRun=true
@@ -67,36 +81,49 @@ func main() {
return
}
logger.Infof("starting VictoriaMetrics at %q...",*httpListenAddr)
listenAddrs:=*httpListenAddrs
iflen(listenAddrs)==0{
listenAddrs=[]string{":8428"}
}
logger.Infof("starting VictoriaMetrics at %q...",listenAddrs)
maxConcurrentRequests=flag.Int("search.maxConcurrentRequests",getDefaultMaxConcurrentRequests(),"The maximum number of concurrent search requests. "+
"It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. "+
"See also -search.maxQueueDuration")
maxQueueDuration=flag.Duration("search.maxQueueDuration",10*time.Second,"The maximum time the search request waits for execution when -search.maxConcurrentRequests "+
"limit is reached; see also -search.maxQueryDuration")
maxQueryDuration=flag.Duration("search.maxQueryDuration",time.Second*30,"The maximum duration for query execution")
)
funcgetDefaultMaxConcurrentRequests()int{
n:=cgroup.AvailableCPUs()
ifn<=4{
n*=2
}
ifn>16{
// A single request can saturate all the CPU cores, so there is no sense
// in allowing higher number of concurrent requests - they will just contend
<!doctype html><htmllang="en"><head><metacharset="utf-8"/><linkrel="icon"href="./favicon.ico"/><metaname="viewport"content="width=device-width,initial-scale=1,maximum-scale=5"/><metaname="theme-color"content="#000000"/><metaname="description"content="UI for VictoriaMetrics"/><linkrel="apple-touch-icon"href="./apple-touch-icon.png"/><linkrel="icon"type="image/png"sizes="32x32"href="./favicon-32x32.png"><linkrel="manifest"href="./manifest.json"/><title>VM UI</title><scriptsrc="./dashboards/index.js"type="module"></script><metaname="twitter:card"content="summary_large_image"><metaname="twitter:image"content="./preview.jpg"><metaname="twitter:title"content="UI for VictoriaMetrics"><metaname="twitter:description"content="Explore and troubleshoot your VictoriaMetrics data"><metaname="twitter:site"content="@VictoriaMetrics"><metaproperty="og:title"content="Metric explorer for VictoriaMetrics"><metaproperty="og:description"content="Explore and troubleshoot your VictoriaMetrics data"><metaproperty="og:image"content="./preview.jpg"><metaproperty="og:type"content="website"><scriptdefer="defer"src="./static/js/main.02178f4b.js"></script><linkhref="./static/css/main.9a224445.css"rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><divid="root"></div></body></html>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.