The match[] at /api/v1/labels and /api/v1/label/.../values also may lead to slow requests and
high resource usage if it matches big number of time series. So it must be igrnored if -search.ignoreExtraFiltersAtLabelsAPI
command-line flag is set.
This is a follow-up for fab02faa3f
* app/{vmagent,vminsert}: adds /v1/metrics suffix for opentelemetry route path
it must fix compatibility with opentemetry-collector [spec](https://opentelemetry.io/docs/specs/otlp/\#otlphttp-request)
this suffix is hard-coded and cannot be changed with collector configuration
* Apply suggestions from code review
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* metricsql: fix label_join() when `dst_label` is equal to one of the `src_label`
* Update app/vmselect/promql/transform.go
* Update docs/CHANGELOG.md
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Document the ability to read OpenTelemetry data from Amazon Firehose at docs/CHANGELOG.md
- Simplify parsing Firehose data. There is no need in trying to optimize the parsing with fastjson
and byte slice tricks, since OpenTelemetry protocol is really slooow because of over-engineering.
It is better to write clear code for better maintanability in the future.
- Move Firehose parser from /lib/protoparser/firehose to lib/protoparser/opentelemetry/firehose,
since it is used only by opentelemetry parser.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5893
* deployment: create a separate env for VictoriaLogs
The new environment consists of the following components:
* VictoriaLogs
* fluentbit for collecting logs and sending to VictoriaLogs
* VictoriaMetrics for scraping and storing metrics from fluentbit and VictoriaLogs
* Grafana with VictoriaLogs datasource for monitoring
-----------------
The motivation for creating a separate environment is to simplify existing environments
and make it easier to update or modify them in future.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Prevsiously they were swapped - the first arg should be the label name and the second arg should be label filters
This is a follow-up for e389b7b959e8144fdff5075bf7a5a39b2b0c6dd3
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5847
This solves two issues:
1. The vm_backups_uploaded_bytes_total metric will grow more smoothly
2. This prevents from int overflow at metrics.Counter.Add() when uploading files bigger than 2GiB
This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions
on int and bool types.
See ea9e2b19a5
The issue has been introduced in bace9a2501
The improper fix was in the d4c0615dcd ,
since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field
by 4-byte boundary inside partition struct.
The proper fix is to use atomic.Int64 field - this guarantees that the access to this field
won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860
and https://github.com/golang/go/issues/19057
It has been appeared that there are VictoriaMetrics users, who rely on the fact that
VictoriaMetrics components were closing incoming connections to -httpListenAddr every 2 minutes
by default. So let's return back this value by default in order to fix the breaking change
made at d8c1db7953 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1961891450 .
Previously the (date, metricID) entries for dates older than the last 2 days were removed.
This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling).
The issue has been introduced in 431aa16c8d
This commit returns back limits for these endpoints, which have been removed at 5d66ee88bd ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
* vmui: add a time picker to the "Logs Explorer" page #5673
* Update app/vmui/packages/vmui/src/pages/ExploreLogs/hooks/useFetchLogs.ts
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Do not convert shard items to part when a shard becomes full. Instead, collect multiple
full shards and then convert them to a searchable part at once. This reduces
the number of searchable parts, which, in turn, should increase query performance,
since queries need to scan smaller number of parts.
* app/vmselect: adds milliseconds to the csv export response for rfc3339
* milliseconds is a standard prescion for VictoriaMetrics query request responses
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5837
* app/victoria-metrics: adds tests for csv export/import
follow-up after 3541a8d0cf96dd4f8563624c4aab6816615d0756
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Previously the interval between item addition and its conversion to searchable in-memory part
could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp()
to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking
the time when buffered items must be converted to searchable in-memory parts, since time.Now()
calls aren't located in hot paths.
Increase the flush interval for converting buffered samples to searchable in-memory parts
from one second to two seconds. This should reduce the number of blocks, which are needed
to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage.
While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal
data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems
with big amounts of RAM under high data ingestion rate.
The pooled rawRowsBlock objects occupies big amounts of memory between flushes,
and the flushes are relatively rare. So it is better to don't use the pool
and to allocate rawRow blocks on demand. This should reduce the average
memory usage between flushes.
The buffer can be quite big under high ingestion rate (e.g. more than 100MB).
This leads to increased memory usage between buffer flushes.
So it is better to re-create the buffer on every flush in order to reduce memory usage
between buffer flushes.
This should improve maintainability of the code related to rollup functions,
since it is located in rollup.go
While at it, properly return empty results from holt_winters(), rate_over_sum(),
sum2_over_time(), geomean_over_time() and distinct_over_time() when there are no real samples
on the selected lookbehind window. Previously the previous sample value was mistakenly
returned from these functions.
* [lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)
* fixed tests
* fixed test
* Revert "fixed test"
This reverts commit 8a29764806.
* Revert "fixed tests"
This reverts commit 9ce13d1042.
* Revert "[lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)"
This reverts commit a7a04bd4
* [lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
Currently the docker-compose examples for loading `victoriametrics-datasource` uses 2 environment variables:
- `GF_ALLOW_LOADING_UNSIGNED_PLUGINS`
- `GF_DEFAULT_APP_MODE`
I believe both of the env vars are trying to achieve the same thing. `GF_DEFAULT_APP_MODE` disables code signing for all plugins and `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` intends to disable code signing for just `victoriametrics-datasource`.
Keeping the scope narrowed to just `victoriametrics-datasource` would be preferable in this case.
Unfortunately `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` is misspelled. According to [grafana docs](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#override-configuration-with-environment-variables), the format is supposed to be `GF_<SectionName>_<KeyName>`. In other words the current env var is missing the section name.
This PR proposes to:
1. fix the typo
2. remove the global disablement of code signing
Alternatively, if you prefer to keep codesigning disabled globally, please remove `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` env var as it confuses things
* support case-insensitive search
* reflect search condition in URL, so link can be sharable
* support filtering on /alerts page
* fix collapseAll/expandAll logic to respect only shown entries
* add changelog
b60dcbe11f
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb.
See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request.
- Document the change at docs/VictoriaLogs/CHANGELOG.md
- Document the `limit` query arg at docs/VictoriaLogs/querying/README.md
- Make the change less intrusive.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778
* - apply v1.10 changes
- chapter on model types (uni/multivariate and rolling)
* - update self-monitoring labels description
- fix typos
* fix duplicated text and link rendering
* app/vmauth: properly release memory during config reload
previously metrics package hold a refrence for channels for users concurrent requests.
it case of churn at `name` field of users configuration, new metric was created. But previous one wasn't deleted.
It prevented full parsed configuration from being garbace collected.
now all config related metrics are bound to corresponding metrics.Set and unregistered during config reload process.
It also must fix an issue with incorrect values for current concurrent user requests
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Instead, log a sample of these long items once per 5 seconds into error log,
so users could notice and fix the issue with too long labels or too many labels.
Previously this panic could occur in production when ingesting samples with too long labels.
The 3c246cdf00 added an optimization where the previous metaindexRow
could be saved to disk when the current block header couldn't be added indexBlock because the resulting
indexBlock size became too big. This could result in an empty metaindexRow.firstItem for the next metaindexRow.
This allows removing importing unneeded command-line flags into binaries, which import lib/storage,
which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions.
This is a follow-up for 83e55456e2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738
Closing client connections every 2 minutes doesn't help load balancing -
this just leads to "jumpy" connections between multiple backend servers,
e.g. the load isn't spread evenly among backend servers, and instead jumps
between the servers every 2 minutes.
It is still possible periodically closing client connections by specifying non-zero -http.connTimeout command-line flag.
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1636997037
This is a follow-up for d387da142e
Sync description after changes in 83e55456e2
Remove extra text in flags description for vmbackupmanager as it breaks layout
when rendered at docs.victoriametrics.com
Signed-off-by: hagen1778 <roman@victoriametrics.com>
There is no sense in storing commonPrefix for blockHeader containing only a single item,
since this only increases blockHeader size without any benefits.
This panic could occur when samples with too long label values are ingested into VictoriaMetrics.
This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each).
This may inflate the maximum index block size by 4 * maxIndexBlockSize.
* add more details to changelog
* simplify panels description
* remove capacity planning recommendation, as it proves it incompetent
Signed-off-by: hagen1778 <roman@victoriametrics.com>
During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than
rate(vm_rows_added_to_storage_total) and it could last quite some time,
which causes negative values of Storage full ETA and confuses users, see playground.
Instead of trying to get more accurate results during downsampling, I think it's ok to ignore
vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.
The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
For example, -fooDuration=',10s,' is now supported - it sets three command-line flag values:
- the first and the last one are set to the default value for `-fooDuration`
- the second one is set to 10s
* vmui: fix select closing on click outside (#5728)
* vmui: clear entered text in select after selecting a value (#5727)
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should significantly reduce the number of open ReaderAt files
on VictoriaMetrics and VictoriaLogs startup.
The open files can be tracked via vm_fs_readers metric
GOGC can be already set via environment variable. There is no need in adding
new approaches for setting the GOGC (such as command-line flag), since they complicate operations.
It should help identifying cases when too much CPU is spent on garbage collection,
and advice users on how this can be addressed.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Remove temporary file before closing it in order to signal the OS that it shouldn't
store the file contents from page cache to disk when the file is closed.
Gracefully handle the case when the file cannot be removed before being closed -
in this case remove the file after closing it. This allows working on Windows.
Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing.
This is a follow-up for 9b1e002287
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
The easyproto-based marshaler is 2x slower than the previous custom marshaler,
so let's stick with it. This improves the performance for sending data to remote storage at vmagent
and reduces CPU usage to pre-v1.97.0 levels.
This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs)
before trying to append a new item to brsPool and sharing its contents with brs.brs.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733
* app/vmselect: set proper timestamp for cached instant responses
The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.
The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
There is no sense in running more than GOMAXPROCS concurrent marshalers,
since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth,
but they may result in more RAM usage.
Entries for the previous dates is usually not used, so there is little sense in keeping them in memory.
This should reduce the size of storage/date_metricID cache, which can be monitored
via vm_cache_entries{type="storage/date_metricID"} metric.
This limit has little sense for these APIs, since:
- Thses APIs frequently result in scanning of all the time series on the given time range.
For example, if extra_filters={datacenter="some_dc"} .
- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.
Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:
- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
* port un-synced changed from docs/readme to readme
* consistently use `sh` instead of `console` highlight, as it looks like
a more appropriate syntax highlight
* consistently use `sh` instead of `bash`, as it is shorter
* consistently use `yaml` instead of `yml`
See syntax codes here https://gohugo.io/content-management/syntax-highlighting/
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Optimize the performance of data merge: decimal.CalibrateScale() from 49633 ns/op to 9146 ns/op
* Optimize the performance of data merge: decimal.CalibrateScale()
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: fix data race during hot-config reload
During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.
The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Revert "app/vmalert: fix data race during hot-config reload"
This reverts commit a4bb7e8932.
* app/vmalert: fix data race during hot-config reload
During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.
The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
Previously a shared pool was used for merging all the part types.
A single merge worker could merge parts with mixed types at once. For example,
it could merge simultaneously an in-memory part plus a big file part.
Such a merge could take hours for big file part. During the duration of this merge
the in-memory part was pinned in memory and couldn't be persisted to disk
under the configured -inmemoryDataFlushInterval .
Another common issue, which could happen when parts with mixed types are merged,
is uncontrolled growth of in-memory parts or small parts when all the merge workers
were busy with merging big files. Such growth could lead to significant performance
degradataion for queries, since every query needs to check ever growing list of parts.
This could also slow down the registration of new time series, since VictoriaMetrics
searches for the internal series_id in the indexdb for every new time series.
The third issue is graceful shutdown duration, which could be very long when a background
merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
since it merges in-memory parts.
A separate pool of merge workers per every part type elegantly resolves both issues:
- In-memory parts are merged to file-based parts in a timely manner, since the maximum
size of in-memory parts is limited.
- Long-running merges for big parts do not block merges for in-memory parts and small parts.
- Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
Merging for file parts is instantly canceled on graceful shutdown now.
- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
should automatically self-tune according to the number of available CPU cores.
- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge
- Tune the number of shards for pending rows and items before the data goes to in-memory parts
and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
for registration of new time series. This should reduce the duration of data ingestion slowdown
in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
unavailable.
- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.
This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
* docs: remove raw and endraw tags as they are not needed for the new version of site
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
* revert formating in vmaler
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
---------
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
The maxFileParts usage has been accidentally removed in fa566c68a6
While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
This reverts cd4f641d32 , since it has been appeared that the disabled compression
for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads,
while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%.
The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage.
This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability
of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default.
The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage.
The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.
The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.
The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.
The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average.
The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.
Measuring read / write duration from / to in-memory buffers has little sense,
since it will be always fast. It is better to measure read / write duration from / to
real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics.
This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls
per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers.
This is a follow-up for 2f63dec2e3
* lib/promscrape: respect `0` value for `series_limit` param
Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.
This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update docs/CHANGELOG.md
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* vmui: fix the logic of closing the popper #5470
* vmui: fix the logic of caching autocomplete results #5472
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This reduces the number of memory allocations at the cost of possible memory usage increase,
since now different metric name strings may hold references to the previous byte slice.
This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory.
This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex.
The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache.
Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores.
The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache.
This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds
because of bottleneck at the mutex.
Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2
cache hits under the mutex, e.g. when the entry is found in the mutable cache.
This should automatically adjust intervals between merges depending on the addition rate
for new time series (aka churn rate):
- The interval will be much smaller than 10 seconds under high churn rate.
This should reduce the mutex contention for mutable cache.
- The interval will be bigger than 10 seconds under low churn rate.
This should reduce the uneeded work on merging of mutable cache into immutable cache.
The change removes artificial delay before returning error, which sometimes
caused less retry events than expected.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice
Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.
* remove mislead comment
* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640
* wip
* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls
groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.
Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.
P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.
* typo fix
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Examples:
1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url
The flag value is automatically updated when the file contents changes.
* app/vmauth: adds metric_labels and backend_errors counter
it must improve observability for user requests with new metric - per user backend errors counter.
it's needed to calculate requests fail rate to the configured backends.
metric_labels configuration allows to perform additional aggregations on top of multiple users from configuration section.
It could be multiple clients or clients with separate read/write tokens
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()
* fix test
- docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names,
document missing __meta_hetzner_role label.
- lib/promscrape/config.go: added missing MustStop() call for Hetzner SD,
and moved the code to the correct place according to alphabetical order of SD names.
- lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses,
populate missing __meta_hetzner_role label like Prometheus does.
- Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does.
- Remove unused SDConfig.Token.
- Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154
It is likely this value was borrowed from panel `Slow inserts` panel
from Grafana dasbhoard for VM single/cluster installations. This is a mistake.
There is no such thing as "tolerable churn rate", as tolerancy depends on the amount
of allocated resources.
Although, it is unclear what is meant by 5%. If it refers to 5% of new time series per second,
then such churn rate is extremely high. It would mean that the avg life of a time series is 20s.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
app/victoriametrics: update the test suite
* simplify /query and /query_range test cases configuration and tests
* support instant queries with lookbehind window like `query=foo[5m]`
* support instant queries selecting scalar value like `query=42`
* add query_range test for prometheus
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmui/vmanomaly: add support models that produce only `anomaly_score`
* vmui/vmanomaly: fix display legend
* vmui/vmanomaly: update comment on anomaly threshold
- Clarify the bugfix description at docs/CHANGELOG.md
- Simplify the code by accessing prefetchedMetricIDs struct under the lock
instead of using lockless access to immutable struct.
This shouldn't worsen code scalability too much on busy systems with many CPU cores,
since the code executed under the lock is quite small and fast.
This allows removing cloning of prefetchedMetricIDs struct every time
new metric names are pre-fetched. This should reduce load on Go GC,
since the cloning of uin64set.Set struct allocates many new objects.
Add docs to explain that `latest` backup folder can be accessed
more frequently than others. Hence, it is recommended to keep it
on storages with low access price.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, this cache was limited only by size.
Cache invalidation by time happens with jitter to prevent thundering herd problem.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
It was calculating the number of dropped time series instead of the number of dropped samples.
While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated -
at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating
the size of snappy-compressed prompbmarshal.WriteRequest protobuf message.
Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.
* update link to book a demo for AD & RCA
* fix invalid refs in components/model
* - fix staging -> prod links
- replace capitalized FAQ headers
- change the section order on main page
- replace :latest tag with current stable
* add panels for detailed visualization of traffic usage between vmstorage, vminsert, vmselect
components and their clients. New panels are available in the rows dedicated to specific components.
* update "Slow Queries" panel to show percentage of the slow queries to the total number of read queries
served by vmselect. The percentage value should make it more clear for users whether there is a service degradation.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmselect: drop `rollupDefault` function as duplicate
It is unclear why there are two identical fns `rollupDefault`
and `rollupDistinct`. Dropping one of them.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmselect/promql/rollup.go
* Update app/vmselect/promql/rollup.go
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The user may which to control the endpoint parameters for instance to
set the audience when requesting an access token. Exposing the
parameters as a map allows for additional use cases without requiring
modification.
* lib/awsapi: properly assume role with webIdentity token
introduce new irsaRoleArn param for config. It's only needed for authorization with webIdentity token.
First credentials obtained with irsa role and the next sts assume call for an actual roleArn made with those credentials.
Common use case for it - cross AWS accounts authorization
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3822
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Some clients may ingest samples via OpenTelemetry protocol without Resource labels.
Previously VictoriaMetrics was silently dropping such samples.
The commit 317834f876 added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"}
counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459
It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.
Before, retries happened only on writes into a network connection
between source and destination. But errors returned by server after
all the data was transmitted were logged, but not retried.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Currently, it is impossible to understand why metrics are not ingested when resource is not set by OTEL exporter. Adding metric should simplify debugging and make it improve debuggability.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
app/vmselect/promql/eval.go:evalAggrFunc shunts evaluation
of AggrFuncExpr over rollupFunc over MetricsExpr to an optimized
path. tryGetArgRollupFuncWithMetricExpr() checks whether expression
can be shunted, but it mangles the AggrFuncExpr when the aggregation
function has more than one argument. This results in queries like
`sum(aggr_over_time("avg_over_time",m))` failing with error message
'expecting at least 2 args to "aggr_over_time"; got 1 args' while
the analogous query `sum(avg_over_time(m))` executes successfully.
This fix removes the unnecessary mangling.
Signed-off-by: Anton Tykhyy <atykhyy@gmail.com>
Previously `retry_status_codes: []` and `drop_src_path_prefix_parts: 0` at `url_map` were equivalent to missing values.
This was resulting in using the user-level values instead.
* vmui: add show quick tip for autocomplete
* vmui: auto-completion usability improvements #5348
* vmui: add const for min symbols in autocomplete
* Use proper queries to VictoriaMetrics
* vmui: fix comments for autocomplete
* app/vmselect: run `make vmui-update`
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* fix typos in docs
* add `shard-` prefix to generated links when `-promscrape.cluster.memberURLTemplate` is enabled
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This is follow-up after
75196d7234
It updates some of the alerting rules to remove unnecessary aggregations.
It keeps aggregations for expressions which are using multiple time series
filters to make sure their label will match.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Aggregations with by() have one sideeffect, that any custom labels you add to hosts are dropped too which can be used for alerts routing.
Therefore, some good practice could be to use without() instead, with labels, like without(path) , or without(url) to get same aggregations but with any external labels left intact.
Before, vmalert would send notifications with labels containing characters
not supported by Alertmanager validator, resulting into validation errors
like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"`
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions
While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
* vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers
* add extra panel for new metric
- Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags.
- Wait until message processing goroutines are stopped before returning from gcppubsub.Stop().
- Prevent from multiple calls to Init() without Stop().
- Drop message if tenantID cannot be parsed properly.
- Take into account tenantID for all the supported message formats.
- Support gzip-compressed messages for graphite format.
- Use exponential backoff sleep when the message cannot be pushed to remote storage systems
because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence
- Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called.
- Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub
- Mention Google PubSub support at docs/enterprise.md
- Make Google PubSub docs more clear at docs/vmagent.md
This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984
and e6eab781ce42285a6a1750dc01eba6801dd35516 .
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713
* app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format
* app/vmalert: hide updates if query param not set
* app/vmalert: fix panic (recursion call)
* app/vmalert: add needed group name and file name
* app/vmalert: fix comment, update behavior
* app/vmalert: fix description
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: simplify API for /api/v1/rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
E.g. replace `fs.Dir + filePath` with `path.Join(fs.Dir, filePath)`
The fs.Dir is guaranteed to end with slash - see Init() functions.
The filePath may start with slash. If it starts with slash, then `fs.Dir + filePath` constructs
an incorrect path with double slashes.
path.Join() properly substitutes duplicate slashes with a single slash in this case.
While at it, also substitute incorrect usage of filepath.Join() with path.Join()
for constructing paths to object storage systems, which expect forward slashes in paths.
filepath.Join() substittues forward slashes with backslashes on Windows, so this may break
creating or managing backups from Windows.
This is a follow-up for 0399367be602b577baf6a872ca81bf0f99ba401b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/719
* lib/backup/s3remote: remove prev object versions for recursive delete
- fix error caused by sending empty objects list to be deleted. This was possible in case old versions of objects where deleted, but root-level entries where still available. This caused paginator to return an empty page which wasn't skipped.
- delete previous versions of objects recursively for S3 remote
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/changelog: add vmbackupmanager fix entry
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/backup/s3remote: unify path construction for S3 objects
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Previously concurrency for static and fast queries was limited with the -search.maxConcurrentRequests
command-line flag. This could complicate identifying heavy queries via `vmui` at `Top queries` and `Active queries` pages,
since `vmui` and these pages couldn't be opened on overloaded vmselect.
Thanks to @f41gh7 for the idea.
Previously the /service-discovery page didn't show targets dropped because of sharding
( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ).
Show also the reason why every target is dropped at /service-discovery page.
This should improve debuging why particular targets are dropped.
While at it, do not remove dropped targets from the list at /service-discovery page
until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets .
Previously the list was cleaned up every 10 minutes from the entries, which weren't updated
for the last minute. This could complicate debugging of dropped targets.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
* lib/streamaggr: properly reference slice with labels
by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402
* Reduce memory allocations when output_relabel_configs adds new labels to output samples
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* prevent /api/v1 from panic on parsing rows
* add tests for Extract function for v1 and v2 api's
* separate request types in different pools to prevent different objects mixing
* add changelog line
543f218fe9
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
for the result returned from these functions.
- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.
- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.
- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
cannot keep up with the data ingestion rate.
- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.
- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
data to persistent queue when -remoteWrite.disableOnDiskQueue is set.
- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
because they are replaced with vmagent_remotewrite_samples_dropped_total metric.
- Update 'Disabling on-disk persistence' docs at docs/vmagent.md
- Update stale comments in the code
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.
New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.
Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
* adds test, updates readme
* apply review suggestions
* update docs for vmagent
* makes linter happy
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Tests showed that importing a single line with 70MB size takes 5.3GiB
RSS memory for VictoriaMetrics single-node.
In the scenario when user exports and imports data from one VM to another,
it could possibly lead to OOM exception for destination VM.
Importing a single line with 16MB size taks 1.3GiB RSS memory.
Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB
to improve reliability of VictoriaMetrics during imports.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The default caching for Go artifacts from actions/setup-go@v4 uses the hash of go.sum file
as a cache key - see https://github.com/actions/cache/blob/main/examples.md#go---modules .
This isn't enough for VictoriaMetrics case, since different makefile actions build different the Go artifacts,
which need to be cached. So embed the action name in the cache key.
This case is possible after the following steps:
1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data.
2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk.
3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression.
4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data.
5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy.
The solution is the same as 12cd32fd75, just fall back to zstd decompression if Snappy decompression fails.
Previously the number of memory allocations inside copyTimeseriesShallow() was equal to 1+len(tss)
Reduce this number to 2 by pre-allocating a slice of timeseries structs with len(tss) length.
The `path!="/favicon.ico"` filter has little sense, since there are many other special paths,
which may be filtered out - /metrics, /flags, /health, /ping, /robots.txt, /-/healthy, /-/ready, /reload, etc.
See /lib/httpserver/httpserver.go for more details.
It will be hard or impossible to maintain filters for all these paths, so it is better to drop this filter
in order to simplify queries and improve the consistency of these queries.
The buffered connection could have exceeded the underlying connection
deadline during reading or writing to an internal buffer.
With this change, buffered connection struct additionally checks
for a deadline in Read/Write methods.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series)
when performing queries satisfying all the following conditions:
- It must select time series with multiple metric names. For example, {__name__=~"foo|bar"}
- The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo|bar"})
- The rollup function must be wrapped into aggregate function, which has no streaming optimization.
For example, quantile(0.9, rate({__name__=~"foo|bar"})
In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error.
Instead, it should fall back to query execution with disabled cache.
Also properly store the merged results. Previously they were incorrectly stored because of a typo
introduced in the commit 41a0fdaf39
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337
`version` label won't show the difference if various flavors of the same
version were deployed. But `short_version` will.
For example, on the sandbox env we test VM builds before new version release.
Without this change, the version update won't be visible on dashboard.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/querytracer: makes package concurrent safe to use
it must fix various issues with concurrent code usage.
Especially, when it's not reasonable to wait for all goroutines to be finished
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- If min_over_time(m[offset] @ timestamp) <= min_over_time(m[offset] @ (timestamp-window)),
then the optimization can be applied.
- If max_over_time(m[offset] @ timestamp) >= max_over_time(m[offset] @ (timestamp-window)),
then the optimization can be applied.
* vmui: reduced the number of server requests
* run `make vmui-update vmui-logs-update`
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This case is possible after the following steps:
1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether
the remote storage supports zstd-compressed data.
2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression
for the data sent to the remote storage.
3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk.
4. The remote storage becomes available.
5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data.
6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage,
while falsely advertizing it sends zstd-compressed data.
7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd.
The solution is to just fall back to Snappy decompression if zstd decompression fails.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
We have failed test on master branch.
```
--- FAIL: TestFormatLogMessage (0.00s)
logger_test.go:24: unexpected result; got
"foo: abcde, \"foo bar baz\", xx"
want
"foo: a..e, \"f..z\", xx"
```
if failed because maxArgs maxLen <= 4 in the `LimitStringLen` in that case we always will return the income string
but in the test we limit the maxLen by value 4
```
f("foo: %s, %q, %s", []interface{}{"abcde", fmt.Errorf("foo bar baz"), "xx"}, 4, `foo: a..e, "f..z", xx`)
Previously the `Host` header was remained unchanged when passing it in requests to backends.
This may improperly work if the backend uses host-based routing.
While at it, allows http/2.0 requests to backends. While VictoriaMetrics components
do not accept http/2.0 requests, other backends can require such requests.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
- Re-use identically configured http.Transport across multiple users.
This fixes handling of the limit on the number of connection, which can be established per each backend
via -maxIdleConnsPerBackend command-line flag. This limit stopped working after 323f3720ed
- Add docs about backend TLS setup at https://docs.victoriametrics.com/vmauth.html#backend-tls-setup
- Add ability to disable backend TLS verification for all the users via -backend.tlsInsecureSkipVerify command-line flag.
This flag may be useful when -auth.config contains big number of users, and every user must disable backend TLS verification.
- Add ability to specify TLS Root CA via tls_ca_file option at per-user basis and via -backend.tlsCAFile command-line flag
across all the users.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
Previously entries which were accessed only 1 time weren't cached.
It has been appeared that some rarely executed heavy queries may read indexdb block twice
in a row instead of once. There is no need in caching such a block then.
This change should eliminate cache size spikes for indexdb/dataBlocks when such heavy queries are executed.
Expose -blockcache.missesBeforeCaching command-line flag, which can be used for fine-tuning
the number of cache misses needed before storing the block in the caching.
* app/vmalert: update remote-write process
* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update docs/CHANGELOG.md
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
reduce the number of queries for restoring alerts state on start-up.
The change should speed up the restore process and reduce pressure on `remoteRead.url`.
* vmauth: add browser authorization request for http requests without credentials to a route that is not in the `unauthorized_user` section (when `unauthorized_user` is specified).
* add link to issue in CHANGELOG
* Extend vmauth docs
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This reduction is based on production testing.
Also expose -search.minWindowForInstantRollupOptimization command-line flag, so users could fine-tune this arg for their needs
vmalert expects string value for stats.seriesFetched, so it is impossible
switching to number without breaking compatibility with old vmalert releases :(
It is still unclear why stats.seriesFetched has string type in the first place...
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:
- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()
The basic idea of optimization is to calculate
rf(m[d] @ t)
as
rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))
where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously
The offset may be in the range of up to 1 hour.
The SECURITY label should be applied only to changes, which fix security issues.
The change at ad839aa492 adds new command-line flags, which can be used
for improving security in some cases. They do not fix any security issues.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5111
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new rule for vmalert supposed to detect groups that miss their
evaulations due to slow queries.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The panel `Errors rate to Alertmanager` had `group` label filter
applied to the expression, while the metric `vmalert_alerts_send_errors_total`
doesn't have that label. This resulted into always empty results.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
fix possible missing firing states for alerting rules in replay mode
Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing.
Co-authored-by: hagen1778 <roman@victoriametrics.com>
support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options`
HTTP headers in all VictoriaMetrics components.
The values for headers can be specified by users via the following flags:
`-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`.
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file
and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file.
Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files
without the need to restart vmagent.
This is a follow-up for 90427abc65
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243
Previously url watchers for pod, service and node objects could be mistakenly closed
when service discovery was set up only for endpoints and endpointslice roles,
since watchers for these roles may start start pod, service and node url watchers
with nil apiWatcher passed to groupWatcher.startWatchersForRole().
Now all the url watchers, which belong to a particular groupWatcher, are stopped at once
when this groupWatcher has no apiWatcher subscribers.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216
The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
* do not print redundant error logs when failed to scrape consul or nomad target
prometheus performs the same because it uses consul lib which just drops the error(1806bcb38c/api/api.go (L1134))
- Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup
don't prevent from processing the corresponding scrape targets after the file becomes correct,
without the need to restart vmagent.
Previously scrape targets with invalid TLS CA file or TLS client certificate files
were permanently dropped after the first attempt to initialize them, and they didn't
appear until the next vmagent reload or the next change in other places of the loaded scrape configs.
- Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent.
Previously the old TLS CA was used until vmagent restart.
- Properly handle errors during http request creation for the second attempt to send data to remote system
at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing,
since the returned request is nil on error.
- Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent.
Previously it could miss details on the source of the request.
- Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header
of every http request issued by vmagent during service discovery or target scraping.
Re-use the HTTP client instead until the corresponding scrape config changes.
- Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached,
e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server
when auth header cannot be obtained because of temporary error.
- Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config.
Cache the loaded certificate and the error for one second. This should significantly reduce CPU load
when scraping big number of targets with the same tls_config.
- Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`.
- Improve test coverage at lib/promauth
- Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later.
Previously vmagent was exitting in this case.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959
It could return either `failed to read` or `failed to parse` errors depending
on whether the given url can be loaded or not under the current environment
c9375cac5e
Descriptions were updated in attempt to make it more clear for readers,
re-phrasing and linking missing docs.
`eval_delay` was added to tests to verify it can be unmarshalled.
`eval_delay` is now applied before timestamp alignment to make it more predictable.
Before, if delay < interval the timestamp won't be aligned.
`eval_delay` and `eval_offset` was added to API output.
`PreviouslySentSeriesToRW` converted to private `previouslySentSeriesToRW`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Using `min_over_time` should reduce the amount of false positives when
component is running in near-the-threshold state. Now it should trigger
only if all collected samples were above the threshold on 10m interval.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
httpListenAddr=flag.String("httpListenAddr",":9428","TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol=flag.Bool("httpListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs=flagutil.NewArrayString("httpListenAddr","TCP address to listen for incoming http requests. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol=flagutil.NewArrayBool("httpListenAddr.useProxyProtocol","Whether to use proxy protocol for connections accepted at the given -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
gogc=flag.Int("gogc",100,"GOGC to use. See https://tip.golang.org/doc/gc-guide")
)
funcmain(){
@@ -34,27 +32,31 @@ func main() {
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage=usage
envflag.Parse()
cgroup.SetGOGC(*gogc)
buildinfo.Init()
logger.Init()
pushmetrics.Init()
logger.Infof("starting VictoriaLogs at %q...",*httpListenAddr)
listenAddrs:=*httpListenAddrs
iflen(listenAddrs)==0{
listenAddrs=[]string{":9428"}
}
logger.Infof("starting VictoriaLogs at %q...",listenAddrs)
httpListenAddr=flag.String("httpListenAddr",":8428","TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol=flag.Bool("httpListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs=flagutil.NewArrayString("httpListenAddr","TCP addresses to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol=flagutil.NewArrayBool("httpListenAddr.useProxyProtocol","Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
minScrapeInterval=flag.Duration("dedup.minScrapeInterval",0,"Leave only the last sample in every time series per each discrete interval "+
@@ -48,7 +48,6 @@ func main() {
envflag.Parse()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
ifpromscrape.IsDryRun(){
*dryRun=true
@@ -67,30 +66,37 @@ func main() {
return
}
logger.Infof("starting VictoriaMetrics at %q...",*httpListenAddr)
listenAddrs:=*httpListenAddrs
iflen(listenAddrs)==0{
listenAddrs=[]string{":8428"}
}
logger.Infof("starting VictoriaMetrics at %q...",listenAddrs)
t.Fatalf("not expected number of csv lines want=%d\ngot=%d test=%s.%s\n\response=%q",len(data),test.ExpectedResultLinesCount,q,test.Issue,strings.Join(data,"\n"))
<!doctype html><htmllang="en"><head><metacharset="utf-8"/><linkrel="icon"href="./favicon.ico"/><metaname="viewport"content="width=device-width,initial-scale=1,maximum-scale=5"/><metaname="theme-color"content="#000000"/><metaname="description"content="UI for VictoriaMetrics"/><linkrel="apple-touch-icon"href="./apple-touch-icon.png"/><linkrel="icon"type="image/png"sizes="32x32"href="./favicon-32x32.png"><linkrel="manifest"href="./manifest.json"/><title>VM UI</title><scriptsrc="./dashboards/index.js"type="module"></script><metaname="twitter:card"content="summary_large_image"><metaname="twitter:image"content="./preview.jpg"><metaname="twitter:title"content="UI for VictoriaMetrics"><metaname="twitter:description"content="Explore and troubleshoot your VictoriaMetrics data"><metaname="twitter:site"content="@VictoriaMetrics"><metaproperty="og:title"content="Metric explorer for VictoriaMetrics"><metaproperty="og:description"content="Explore and troubleshoot your VictoriaMetrics data"><metaproperty="og:image"content="./preview.jpg"><metaproperty="og:type"content="website"><scriptdefer="defer"src="./static/js/main.02178f4b.js"></script><linkhref="./static/css/main.9a224445.css"rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><divid="root"></div></body></html>
<!doctype html><htmllang="en"><head><metacharset="utf-8"/><linkrel="icon"href="./favicon.ico"/><metaname="viewport"content="width=device-width,initial-scale=1,maximum-scale=5"/><metaname="theme-color"content="#000000"/><metaname="description"content="UI for VictoriaMetrics"/><linkrel="apple-touch-icon"href="./apple-touch-icon.png"/><linkrel="icon"type="image/png"sizes="32x32"href="./favicon-32x32.png"><linkrel="manifest"href="./manifest.json"/><title>VM UI</title><scriptsrc="./dashboards/index.js"type="module"></script><metaname="twitter:card"content="summary_large_image"><metaname="twitter:image"content="./preview.jpg"><metaname="twitter:title"content="UI for VictoriaMetrics"><metaname="twitter:description"content="Explore and troubleshoot your VictoriaMetrics data"><metaname="twitter:site"content="@VictoriaMetrics"><metaproperty="og:title"content="Metric explorer for VictoriaMetrics"><metaproperty="og:description"content="Explore and troubleshoot your VictoriaMetrics data"><metaproperty="og:image"content="./preview.jpg"><metaproperty="og:type"content="website"><scriptdefer="defer"src="./static/js/main.53048302.js"></script><linkhref="./static/css/main.bc07cc78.css"rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><divid="root"></div></body></html>
httpListenAddr=flag.String("httpListenAddr",":8429","TCP address to listen for http connections. "+
httpListenAddrs=flagutil.NewArrayString("httpListenAddr","TCP address to listen for incoming http requests. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol=flag.Bool("httpListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol=flagutil.NewArrayBool("httpListenAddr.useProxyProtocol","Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
influxListenAddr=flag.String("influxListenAddr","","TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
@@ -69,7 +69,8 @@ var (
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol=flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey=flag.String("configAuthKey","","Authorization key for accessing /config page. It must be passed via authKey query arg")
configAuthKey=flagutil.NewPassword("configAuthKey","Authorization key for accessing /config page. It must be passed via authKey query arg")
reloadAuthKey=flagutil.NewPassword("reloadAuthKey","Auth key for /-/reload http endpoint. It must be passed as authKey=...")
dryRun=flag.Bool("dryRun",false,"Whether to check config files without running vmagent. The following files are checked: "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
@@ -96,7 +97,6 @@ func main() {
remotewrite.InitSecretFlags()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
ifpromscrape.IsDryRun(){
iferr:=promscrape.CheckConfig();err!=nil{
@@ -119,13 +119,17 @@ func main() {
return
}
logger.Infof("starting vmagent at %q...",*httpListenAddr)
listenAddrs:=*httpListenAddrs
iflen(listenAddrs)==0{
listenAddrs=[]string{":8429"}
}
logger.Infof("starting vmagent at %q...",listenAddrs)
tlsHandshakeTimeout=flagutil.NewArrayDuration("remoteWrite.tlsHandshakeTimeout",20*time.Second,"The timeout for estabilishing tls connections to the corresponding -remoteWrite.url")
tlsInsecureSkipVerify=flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify","Whether to skip tls verification when connecting to the corresponding -remoteWrite.url")
tlsCertFile=flagutil.NewArrayString("remoteWrite.tlsCertFile","Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -remoteWrite.url")
@@ -58,8 +60,10 @@ var (
oauth2ClientID=flagutil.NewArrayString("remoteWrite.oauth2.clientID","Optional OAuth2 clientID to use for the corresponding -remoteWrite.url")
oauth2ClientSecret=flagutil.NewArrayString("remoteWrite.oauth2.clientSecret","Optional OAuth2 clientSecret to use for the corresponding -remoteWrite.url")
oauth2ClientSecretFile=flagutil.NewArrayString("remoteWrite.oauth2.clientSecretFile","Optional OAuth2 clientSecretFile to use for the corresponding -remoteWrite.url")
oauth2TokenURL=flagutil.NewArrayString("remoteWrite.oauth2.tokenUrl","Optional OAuth2 tokenURL to use for the corresponding -remoteWrite.url")
oauth2Scopes=flagutil.NewArrayString("remoteWrite.oauth2.scopes","Optional OAuth2 scopes to use for the corresponding -remoteWrite.url. Scopes must be delimited by ';'")
oauth2EndpointParams=flagutil.NewArrayString("remoteWrite.oauth2.endpointParams","Optional OAuth2 endpoint parameters to use for the corresponding -remoteWrite.url . "+
`The endpoint parameters must be set in JSON format: {"param1":"value1",...,"paramN":"valueN"}`)
oauth2TokenURL=flagutil.NewArrayString("remoteWrite.oauth2.tokenUrl","Optional OAuth2 tokenURL to use for the corresponding -remoteWrite.url")
oauth2Scopes=flagutil.NewArrayString("remoteWrite.oauth2.scopes","Optional OAuth2 scopes to use for the corresponding -remoteWrite.url. Scopes must be delimited by ';'")
awsUseSigv4=flagutil.NewArrayBool("remoteWrite.aws.useSigv4","Enables SigV4 request signing for the corresponding -remoteWrite.url. "+
"It is expected that other -remoteWrite.aws.* command-line flags are set if sigv4 request signing is enabled")
remoteWriteURLs=flagutil.NewArrayString("remoteWrite.url","Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol "+
"or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. "+
"The data can be sharded among the configured remote storage systems if -remoteWrite.shardByURL flag is set. "+
"See also -remoteWrite.multitenantURL")
"The data can be sharded among the configured remote storage systems if -remoteWrite.shardByURL flag is set")
remoteWriteMultitenantURLs=flagutil.NewArrayString("remoteWrite.multitenantURL","Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. "+
"This flag is deprecated in favor of -enableMultitenantHandlers . See https://docs.victoriametrics.com/vmagent.html#multitenancy")
enableMultitenantHandlers=flag.Bool("enableMultitenantHandlers",false,"Whether to process incoming data via multitenant insert handlers according to "+
"https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format . By default incoming data is processed via single-node insert handlers "+
"according to https://docs.victoriametrics.com/#how-to-import-time-series-data ."+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details")
shardByURL=flag.Bool("remoteWrite.shardByURL",false,"Whether to shard outgoing series across all the remote storage systems enumerated via -remoteWrite.url . "+
"By default the data is replicated across all the -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#sharding-among-remote-storages")
tmpDataPath=flag.String("remoteWrite.tmpDataPath","vmagent-remotewrite-data","Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
shardByURLLabels=flagutil.NewArrayString("remoteWrite.shardByURL.labels","Optional list of labels, which must be used for sharding outgoing samples "+
"among remote storage systems if -remoteWrite.shardByURL command-line flag is set. By default all the labels are used for sharding in order to gain "+
"even distribution of series over the specified -remoteWrite.url systems")
tmpDataPath=flag.String("remoteWrite.tmpDataPath","vmagent-remotewrite-data","Path to directory for storing pending data, which isn't sent to the configured -remoteWrite.url . "+
"See also -remoteWrite.maxDiskUsagePerURL and -remoteWrite.disableOnDiskQueue")
keepDanglingQueues=flag.Bool("remoteWrite.keepDanglingQueues",false,"Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. "+
"Useful when -remoteWrite.url is changed temporarily and persistent queue files will be needed later on.")
queues=flag.Int("remoteWrite.queues",cgroup.AvailableCPUs()*2,"The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
@@ -80,6 +91,11 @@ var (
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.keepInput and https://docs.victoriametrics.com/stream-aggregation.html")
streamAggrDedupInterval=flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval",0,"Input samples are de-duplicated with this interval before being aggregated. "+
"Only the last sample per each time series per each interval is aggregated if the interval is greater than zero")
disableOnDiskQueue=flag.Bool("remoteWrite.disableOnDiskQueue",false,"Whether to disable storing pending data to -remoteWrite.tmpDataPath "+
"when the configured remote storage systems cannot keep up with the data ingestion rate. See https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence ."+
"See also -remoteWrite.dropSamplesOnOverload")
dropSamplesOnOverload=flag.Bool("remoteWrite.dropSamplesOnOverload",false,"Whether to drop samples when -remoteWrite.disableOnDiskQueue is set and if the samples "+
"cannot be pushed into the configured remote storage systems in a timely manner. See https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence")
)
var(
@@ -92,11 +108,19 @@ var (
// Data without tenant id is written to defaultAuthToken if -remoteWrite.multitenantURL is specified.
defaultAuthToken=&auth.Token{}
// ErrQueueFullHTTPRetry must be returned when TryPush() returns false.
logger.Warnf("rounding the -remoteWrite.maxDiskUsagePerURL=%d to the minimum supported value: %d",maxPendingBytes,persistentqueue.DefaultChunkFileSize)
vmalert-tool unittest is compatible with [Prometheus config format for tests](https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/#test-file-format)
except `promql_expr_test` field. Use `metricsql_expr_test` field name instead. The name is different because vmalert-tool
validates and executes [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) expressions,
which aren't always backward compatible with [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/).
### Test file format
The configuration format for files specified in `--files` cmd-line flag is the following:
```yaml
# Path to the files or http url containing [rule groups](https://docs.victoriametrics.com/vmalert.html#groups) configuration.
# Enterprise version of vmalert-tool supports S3 and GCS paths to rules.
rule_files:
[- <string> ]
# The evaluation interval for rules specified in `rule_files`
[ evaluation_interval:<duration> | default = 1m ]
# Groups listed below will be evaluated by order.
# Not All the groups need not be mentioned, if not, they will be evaluated by define order in rule_files.
group_eval_order:
[- <string> ]
# The list of unit test files to be checked during evaluation.
tests:
[- <test_group> ]
```
#### `<test_group>`
```yaml
# Interval between samples for input series
interval:<duration>
# Time series to persist into the database according to configured <interval> before running tests.
input_series:
[- <series> ]
# Name of the test group, optional
[ name:<string> ]
# Unit tests for alerting rules
alert_rule_test:
[- <alert_test_case> ]
# Unit tests for Metricsql expressions.
metricsql_expr_test:
[- <metricsql_expr_test> ]
# External labels accessible for templating.
external_labels:
[ <labelname>:<string> ... ]
```
#### `<series>`
```yaml
# series in the following format '<metric name>{<label name>=<label value>, ...}'
tlsCAFile=flag.String("datasource.tlsCAFile","",`Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used`)
tlsServerName=flag.String("datasource.tlsServerName","",`Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used`)
oauth2ClientID=flag.String("datasource.oauth2.clientID","","Optional OAuth2 clientID to use for -datasource.url. ")
oauth2ClientSecret=flag.String("datasource.oauth2.clientSecret","","Optional OAuth2 clientSecret to use for -datasource.url.")
oauth2ClientSecretFile=flag.String("datasource.oauth2.clientSecretFile","","Optional OAuth2 clientSecretFile to use for -datasource.url. ")
oauth2TokenURL=flag.String("datasource.oauth2.tokenUrl","","Optional OAuth2 tokenURL to use for -datasource.url.")
oauth2Scopes=flag.String("datasource.oauth2.scopes","","Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")
oauth2ClientID=flag.String("datasource.oauth2.clientID","","Optional OAuth2 clientID to use for -datasource.url")
oauth2ClientSecret=flag.String("datasource.oauth2.clientSecret","","Optional OAuth2 clientSecret to use for -datasource.url")
oauth2ClientSecretFile=flag.String("datasource.oauth2.clientSecretFile","","Optional OAuth2 clientSecretFile to use for -datasource.url")
oauth2EndpointParams=flag.String("datasource.oauth2.endpointParams","","Optional OAuth2 endpoint parameters to use for -datasource.url . "+
`The endpoint parameters must be set in JSON format: {"param1":"value1",...,"paramN":"valueN"}`)
oauth2TokenURL=flag.String("datasource.oauth2.tokenUrl","","Optional OAuth2 tokenURL to use for -datasource.url")
oauth2Scopes=flag.String("datasource.oauth2.scopes","","Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")
lookBack=flag.Duration("datasource.lookback",0,`Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
lookBack=flag.Duration("datasource.lookback",0,`Will be deprecated soon, please adjust "-search.latencyOffset" at datasource side `+
`or specify "latency_offset" in rule group's params. Lookback defines how far into the past to look when evaluating queries. `+
`For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
queryStep=flag.Duration("datasource.queryStep",5*time.Minute,"How far a value can fallback to when evaluating queries. "+
"For example, if -datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query. "+
"If set to 0, rule's evaluation interval will be used instead.")
logger.Warnf("flag `datasource.queryTimeAlignment` is deprecated and will be removed in next releases, please use `eval_alignment` in rule group instead")
logger.Warnf("flag `-datasource.queryTimeAlignment` is deprecated and will be removed in next releases. Please use `eval_alignment` in rule group instead.")
}
if*lookBack!=0{
logger.Warnf("flag `-datasource.lookback` will be deprecated soon. Please use `-rule.evalDelay` command-line flag instead. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155 for details.")
@@ -47,8 +47,8 @@ all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
`)
ruleTemplatesPath=flagutil.NewArrayString("rule.templates",`Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
ruleTemplatesPath=flagutil.NewArrayString("rule.templates",`Path or glob pattern to location with go template definitions`+
`for rules annotations templating. Flag can be specified multiple times.
Examples:
-rule.templates="/path/to/file". Path to a single file with go templates
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
@@ -59,8 +59,8 @@ absolute path to all .tpl files in root.
configCheckInterval=flag.Duration("configCheckInterval",0,"Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr=flag.String("httpListenAddr",":8880","Address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol=flag.Bool("httpListenAddr.useProxyProtocol",false,"Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs=flagutil.NewArrayString("httpListenAddr","Address to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol=flagutil.NewArrayBool("httpListenAddr.useProxyProtocol","Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
evaluationInterval=flag.Duration("evaluationInterval",time.Minute,"How often to evaluate the rules")
@@ -96,7 +96,6 @@ func main() {
notifier.InitSecretFlags()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
if!*remoteReadIgnoreRestoreErrors{
logger.Warnf("flag `remoteRead.ignoreRestoreErrors` is deprecated and will be removed in next releases.")
"It is hidden by default, since it can contain sensitive info such as auth key")
blackHole=flag.Bool("notifier.blackhole",false,"Whether to blackhole alerting notifications. "+
"Enable this flag if you want vmalert to evaluate alerting rules without sending any notifications to external receivers (eg. alertmanager). "+
"`-notifier.url`, `-notifier.config` and `-notifier.blackhole` are mutually exclusive.")
"-notifier.url, -notifier.config and -notifier.blackhole are mutually exclusive.")
basicAuthUsername=flagutil.NewArrayString("notifier.basicAuth.username","Optional basic auth username for -notifier.url")
basicAuthPassword=flagutil.NewArrayString("notifier.basicAuth.password","Optional basic auth password for -notifier.url")
@@ -46,6 +46,8 @@ var (
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2ClientSecretFile=flagutil.NewArrayString("notifier.oauth2.clientSecretFile","Optional OAuth2 clientSecretFile to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2EndpointParams=flagutil.NewArrayString("notifier.oauth2.endpointParams","Optional OAuth2 endpoint parameters to use for the corresponding -notifier.url . "+
`The endpoint parameters must be set in JSON format: {"param1":"value1",...,"paramN":"valueN"}`)
oauth2TokenURL=flagutil.NewArrayString("notifier.oauth2.tokenUrl","Optional OAuth2 tokenURL to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2Scopes=flagutil.NewArrayString("notifier.oauth2.scopes","Optional OAuth2 scopes to use for -notifier.url. Scopes must be delimited by ';'. "+
returnfmt.Errorf("failed to marshal the given time series: %w",err)
}
data:=wr.MarshalProtobuf(nil)
returnc.send(data)
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.