The system links are absolute, e.g. they start from `/`, so there are high chances
they won't work as expected when requested via proxy such as vmselect with -vmalert.proxyURL
command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3424
http.Request was used as a part of state struct
for generating the curl command when viewing the rule's
state changes.
It appears, that holding a referencing is far more expensive
than generating the curl command immediately.
On the test with 40k rules, this change reduces memory
and CPU usage by 50%.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/promscrape/discovery/azure: remove API server from URL returned by azure
* lib/promscrape/discovery/azure: validate nextLink contains same URL as apiServer
This should simplify further debugging, since the first thing to start the debugging by query trace
is to know the version of VictoriaMetrics, which produced this trace.
Alert `RequestErrorsToAPI` could be permanently triggered due to
mistakes in clients configuration. However, such requests are unlikely
to cause VM health state change. So there is no need in displaying
this alert because there will be no correlation caused by it.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Document the change at docs/CHANGELOG.md
- Run `make docs-sync` for copying app/vmgateway/README.md to docs/vmgateway.md
in order to propagate docs' changes to https://docs.victoriametrics.com/vmgateway.html
* vmalert: correctly return error for RW failures
By mistake, in 0989649ad0 the error
for remote write failures weren't return to user.
This change fixes it.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The main purpose of this command-line flag is to increase the lifetime of low-end flash storage
with the limited number of write operations it can perform. Such flash storage is usually
installed on Raspberry PI or similar appliances.
For example, `-inmemoryDataFlushInterval=1h` reduces the frequency of disk write operations
to up to once per hour if the ingested one-hour worth of data fits the limit for in-memory data.
The in-memory data is searchable in the same way as the data stored on disk.
VictoriaMetrics automatically flushes the in-memory data to disk on graceful shutdown via SIGINT signal.
The in-memory data is lost on unclean shutdown (hardware power loss, OOM crash, SIGKILL).
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
The new panels have been added to the vmstorage and drilldown rows.
`Disk space usage %` is supposed to show disk space usage percentage.
This panel is now also referred by `DiskRunsOutOfSpace` alerting rule.
This panel has Drilldown option to show absolute values.
`Disk space usage % by type` shows the relation between datapoints
and indexdb size. It supposed to help identify cases when indexdb
starts to take too much disk space.
This panel has Drilldown option to show absolute values.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Method `metrics()` now pre-allocates slices for labels
and results from query responses. This reduces the number
of allocations on the hot path for instant requests.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The recent change in modifying default value
of `datasource.queryStep` flag resulted in situation
where replay mode was always running queries with
step=`datasource.queryStep`. When it should always
use rule's evaluation interval.
The fix is related not to replay mode only, but
for all Range requests. Now step param is set
individually for each mode.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Fixes a missing `&` char in data link for ETA panel
on cluster dashboards. Without `&` char it generates
wrong link when click on Drilldown menu.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: add `remoteWrite.sendTimeout` command-line flag to configure timeout for sending data to `remoteWrite.url`
* vmalert: remove WriteTimeout from clients Cfg
No need to have it as a part of configuration struct:
* the client isn't used by other packages;
* there are no internal tests to check the WriteTimeout.
* vmalert: remove DisablePathAppend from clients Cfg
No need to have it as a part of configuration struct:
* the client isn't used by other packages;
* there are no internal tests to check the DisablePathAppend.
Co-authored-by: hagen1778 <roman@victoriametrics.com>
- Return meta-labels for the discovered targets via promutils.Labels
instead of map[string]string. This improves the speed of generating
meta-labels for discovered targets by up to 5x.
- Remove memory allocations in hot paths during ScrapeWork generation.
The ScrapeWork contains scrape settings for a single discovered target.
This improves the service discovery speed by up to 2x.
The change list is the following:
* bump Grafana version to 9.2.6;
* replace old "Graph" panel with "TimeSeries" panel;
* show % usage of Mem and CPU additionally to of absolute values;
* `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`;
* add Annotations for Alert triggers. Not all alerts are supposed to be displayed
on the dashboard, but only those with label `show_at: dashboard`.
See `alerts.yml` change.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change list is the following:
* bump Grafana version to 9.2.6;
* replace old Graph panel with TimeSeries panel;
* add RemoteWrite section;
* allow configuring topK elements for some of the panels;
* Preer grouping by job instead of grouping by instance.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* flag reference update
there is no flag `-datasource.disablePathAppend` and datasource actually checking for `-remoteRead.disablePathAppend`
* update source for doc as well
The change list is the following:
* bump Grafana version to 9.2.6;
* add version change annotations;
* switch to per-job panels instead of per-instance;
* add drilldown option for resource usage panels.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change list is the following:
* bump Grafana version to 9.2.6;
* remove artifacts in data links.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* {app/vmstorage,app/vmselect}: add API to get list of existing tenants
* {app/vmstorage,app/vmselect}: add API to get list of existing tenants
* app/vmselect: fix error message
* {app/vmstorage,app/vmselect}: fix error messages
* app/vmselect: change log level for error handling
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* some unexpected DS UIDs were removed;
* replace `$instance.*` filter with `$instance` since we respect
the instance port anyway;
* remove predefined datasource for `clusterbytenant`
in favour of datasource variable `ds`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The purpose of the update is to make the dash more usable
for large installations with many instances. Panels which showed
metrics per-instance (Mem, CPU) now are showing metrics per-job or min/max/avg
aggregations in % instead. This supposed to help immediately to identify
resource shortage and remain usable for small and big installations.
For cases when detailed info is needed, to the bottom of the dashboard
a new row `Drilldown` was added. Panels like Mem or CPU now contain
a `data-link` named `Drilldown` (cis shown on line click) which takes
user to more detailed panel.
The change list is the following:
* bump Grafana version to 9.1.0;
* replace old "Graph" panel with "TimeSeries" panel;
* improve Uptime panel to show number of instances per job;
* show % usage of Mem and CPU instead of absolute values;
* `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`;
* add `Drilldown` section for detailed resource usage;
* add Annotations for Alert triggers. Not all alerts are supposed to be displayed
on the dashboard, but only those with label `show_at: dashboard`.
See `alerts-cluster.yml` change.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix: reset the value of the switches trace and cache
* fix: add cursor text for inputs
* fix: solve the Infinite loop of useFetchQuery.ts
* fix: change condition for show/hide autocomplete
* fix: add limit error length for input
The default list of alerting rules contains the basic
rules for checking vmalert's health state and is recommended
to use for monitoring vmalert deployments.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
It has been appeared that some production workloads could suffer for some time
after every reset of the previous cache when it gets less than 5% of requests
after the needed item isn't found in the current cache. This could result
in reduced cache hit rates, which, in turn, could increase CPU, disk IO and RAM
usage needed for reading, unpacking and caching the missed data from disk.
This commit reduces the cache miss threshold for resetting the previous cache from 5% to 1%.
This should reduce the possible negative impact after each cache reset by at least 5x,
while reducing the total memory used by caches.
This is a follow-up for d906d8573e
* docs/operator: change VMAgentRemoteWriteSettings.MaxDiskUsagePerURL type from int32 to int64
* docs: operator updates api description
Co-authored-by: f41gh7 <nik@victoriametrics.com>
- Remove trailing whitespace at the end of lines
- Remove redundant sentence stating that time series matching the given selector will be deleted.
It should be clear from the surrounding context.
* deployment/docker/docker-compose-cluster.yml: bump VictoriaMetrics Cluster components to the latest v1.83.0 version
* deployment/docker/docker-compose.yml: bump VictoriaMetrics Single node and vmutils to the latest v1.83.0 version
The issue was in the `labels := dst[offset:]` line in the beginning of appendExtraLabels() function.
The `dst` may be re-allocated when adding extra labels to it. In this case the addition of `exported_`
prefix to labels inside `labels` slice become invisible in the returned `dst` labels.
While at it, properly handle some corner cases:
- Add additional `exported_` prefix to clashing metric labels with already existing `exported_` prefix.
- Store scraped metric names in `exported___name__` label if scrape target contains `__name__` label.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3278
Thanks to @jplanckeel for the initial attempt to fix this issue
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3281
* feat: apply serverURL on down Enter
* fix: change method of set time range
* fix: remove prevent run fetch without changes
* fix: prevent reset timerange when autorefresh
Previously the `quotesEscape` function was escaping only double quotes.
This wasn't enough, since the input string could contain other special chars,
which must be escaped when put inside JSON string. For example, carriage return and line feed chars (\n\r),
backslash char, etc. This led to the following issues, which were improperly fixed:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890 - this issue
was "fixed" by introducing the `crlfEscape` function, which led to unnecessary
complications in user templates, while not fixing various corner cases
such as backslash chars in the input string.
See 1de15ad490
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 - this issue
was "fixed" by urlencoding the whole string passed to -external.alert.source
command-line flag. This led to invalid urls, which couldn't be parsed by Grafana.
See 00c838353d
and 4bd0244599
This commit properly encodes the input string passed to `quotesEscape`, so it can be safely embedded inside JSON strings.
This commit deprecates crlfEscape template function and adds the following new template functions:
- strvalue and stripDomain - these functions are supported by Prometheus, so they were added
for compatibility purposes.
- jsonEscape and htmlEscape for converting the input string to valid quoted JSON string
and for html-escaping the input string, so it could be safely embedded as a plaintext
into html.
This commit also documents all supported template functions at https://docs.victoriametrics.com/vmalert.html#template-functions
The deprecated crlfEscape function isn't documented on purpose, since its usefulness is negative in general case.
This reverts commit 00c838353d.
Reason for revert: it incorrectly fixes the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 .
Now `-external.alert.source=explore?orgId=1&left=...` is converted to the following invalid url, which cannot be handled by Grafana:
https://grafana.example.com/explore%3ForgId%3D1%26left%3D...
The next commit will contain the correct fix of the issue - the `quotesEscape` function must
properly escape the string, so it could be embedded into JSON string. This function must
properly escape \n\r chars too. In this case the `crlfEscape` function becomes unnecessary.
Actually, the next commit makes the `crlfEscape` function deprecated.
Previously netstorage.MustStop() call didn't free up all the resources,
so the subsequent call to nestorage.Init() would panic.
This allows writing tests, which call nestorage.Init() + nestorage.MustStop() in a loop.
Blocks outside the configured retention are eventually deleted during background merge.
But such blocks may reside in the storage for long time until background merge.
Previously VictoriaMetrics could spend additional CPU time on processing such blocks
during search queries. Now these blocks are skipped.
The searchTSIDs function was searching for metricIDs matching the the given tag filters
and then was locating the corresponding TSID entries for the found metricIDs.
The TSID entries aren't needed when searching for time series names (aka MetricName),
so this commit removes the uneeded TSID search from the implementation of /api/v1/series API.
This improves perfromance of /api/v1/series calls.
This commit also improves performance a bit for /api/v1/query and /api/v1/query_range calls,
since now these calls cache small metricIDs instead of big TSID entries
in the indexdb/tagFilters cache (now this cache is named indexdb/tagFiltersToMetricIDs)
without the need to compress the saved entries in order to save cache space.
This commit also removes concurrency limiter during searching for matching time series,
which was introduced in 8f16388428, since the concurrency
for all the read queries is already limited with -search.maxConcurrentRequests command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
This increases the maximum time for cache population with new entries from 20 minutes to 40 minutes.
This
This change shouldn't increase memory usage for caches, since the prev cache cleaner
should free up memory by deleting unused prev cache as soon as possible.
See 08ca45d238 for details on prev cache cleaner.
The panic has been introduced in 68f3a02589
While at it, add padding to shard structs in order to avoid false sharing on mordern CPUs
This should improve scalability on systems with many CPU cores
This means that the majority of requests are successfully served from the current cache,
so the previous cache can be reset in order to free up memory.
The message about dropped data still remains at `error` level.
The change supposed to make log message more clear about how
serious it is.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* feat: add maximum display series by tabs
* feat: add warning on PredefinedPanels.tsx
* docs/CHANGELOG.md: vmui limit number of plotted series
* docs/CHANGELOG.md: vmui limit number of plotted series
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/backup: set s3 default region to us-west-2
it should fix an error with region detection for bucket, if AWS_REGION env var is not set
* Update lib/backup/s3remote/s3.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* Multi retention standardization of docs
While reading through the docs the implementation details had different formatting for storageNode. This standardizes them and adds a link to the retention docs.
* Update docs/guides/guide-vmcluster-multiple-retention-setup.md
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
The default value of `-datasource.queryStep` has changed, so we update
the troubleshooting docs accordingly.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Sort labels explicitly after calling the ParsedConfigs.Apply() when needed.
This reduces CPU usage when performing metric-level relabeling, where labels' sorting isn't needed.
Previously the `__address__` label could contain only `host:port` part of the target url,
while the scheme and metrics path were obtained from `__scheme__` and `__metrics_path__`
labels. Now it is possible to set the full url in `__address__` label.
This makes valid the following scrape config, which is frequently used by novice users:
scrape_configs:
- job_name: foo
static_configs:
- targets:
- http://host1/metrics1
- https://host2/metrics2
The stored backups would help to identify docs corruption
but aren't needed for commiting. So `.tmp` backup files
are also git-ignored.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Optimize fast path for /api/v1/import when importing numeric values
* Move the docs about the change from features to bugfixes at docs/CHANGELOG.md
* Update tests at lib/protoparser/vmimport
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3161
* app/vmselect: properly work when export import json from `api/v1/{export, import}` API
* app/vmselect: update convert function
* app/vmselect: export null if `math.IsNaN(v)`
* app/vmselect: get float from json
* lib/protoparser: add test
* docs: add change log
* lib/protoparser: make export import api compatible
* Document the addition of Azure blob storage support in vmbackup / vmrestore
* List the supported storage system types at docs/vmrestore.md
* Mention about azblob storage system support at -src and -dst command-line flags
for vmbackup / vmrestore tools.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1029
* Use vm_account_id and vm_project_id labels to be consistent with https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy-via-labels
* Document the feature that vmalert now exposes vm_account_id and vm_project_id
labels if -clusterMode is set.
* Use literal strings instead of string constants for vm_account_id and vm_project_id.
This improves code readability.
The change is supposed to provide additional flexibility for generating alert's
source link based on label values.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The `docs-sync` command was updated to modify images path
for assets in `docs/` folder. The change allows to refer images from `docs`
without copying them to the root folder.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Incorrect 301 redirects can be cached by user agents such as web browsers.
This can complicate recovery procedure after the incorrect redirect is fixed,
e.g. web browser cache must be reset.
The related issue - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1752
* app/vminsert: allows parsing tenant id from labels
it should help mitigate issues with vmagent's multiTenant mode, which works incorrectly at heavy load
and it cannot handle more then 100 different tenants.
This functional hidden with flag and do not change vminsert default behaviour
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2970
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* wip
* app/vminsert/netstorage: clean remaining labels in order to free up GC
* docs/Cluster-VictoriaMetrics.md: typo fix
* wip
* wip
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Cache `action: replace` results for non-trivial regexs and return them next time
instead of performing CPU-intensive regex replacement.
Optimize also `action: labelmap_all` and `action: replace_all` in the same way.
Previously empty series (e.g. series with all NaN samples) were passed to aggregate functions.
Such series must be ingored by all the aggregate functions.
So it is better from consistency PoV filtering out empty series before applying aggregate functions.
* app/vmselect: ignore empty series for `limit_offset`
VictoriaMetrics doesn't return empty series (with all NaN values) to
the user. But such series are filtered after transform functions.
It means `limit_offset` will account for empty series as well.
For example, let's consider following data set:
```
time series:
foo{label="1"} NaN, NaN, NaN, NaN // empty series
foo{label="2"} 1, 2, 3, 4
foo{label="3"} 4, 3, 2, 1
```
When user requests all series for metric `foo` the empty series
will be filtered out:
```
/query=foo:
foo{label="v2"} 1, 2, 3, 4
foo{label="v3"} 4, 3, 2, 1
```
But `limit_offset(1, 1, foo)` is applied to original series, not filtered yet.
So it will return `foo{label="v2"}` (skips the first in list)
```
/query=limit_offset(1, 1, foo):
foo{label="v2"} 1, 2, 3, 4
```
Expected result would be to apply `limit_offset` to already filtered list,
so in result we receive `foo{label="v3"}`:
```
/query=limit_offset(1, 1, foo):
foo{label="v3"} 4, 3, 2, 1
```
The change does exactly that - filters empty series before applying `limit_offset`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmselect: ignore empty series for `limit_offset`
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
According to Ruler specification, only labels returned within time series
should be available for use in annotations.
For long time, vmalert didn't respect this rule. And in PR
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2403
this was fixed for the sake of compatibility. However, this resulted
into users confusion, as they expected all configured and extra labels
to be available - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3013
This fix allows to use extra labels in Annotations. But in the case of conflicts
the original labels (extracted from time series) are preferred.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/{httpserver,netutil}: allow to define min and max TLS version of the http server
* lib/httpserver: added descriptions about tls supported versions
* lib/netutil: check minimal tls version, added supported tls versions to error
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The workaround was introduced to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/962.
However, it didn't prove itself useful. Instead, it is recommended using `increase_pure` function.
Removing the workaround makes VM to produce accurate results when calculating
`delta` or `increase` functions over slow-changing counters with vary intervals
between data points.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Clarify the description for -datasource.queryStep command-line flag
- Consistently use a single dash in front of -datasource.queryStep command-line flag
- Update -help output at docs/vmalert.md
- Consistently use single dash in front of command-line flags instead of double dashes.
- Add a warning that too small -search.latencyOffset may lead to incomplete query results.
Change default value for command-line flag `datasource.queryStep` from `0s` to `5m`.
Param `step` is added by vmalert to every rule evaluation request sent to datasource.
Before this change, `step` was equal to group's evaluation interval by default.
Param `step` for instant queries defines how far VM can look back for the last written data point.
The change supposed to improve reliability of the rules evaluation when evaluation interval
is lower than scraping interval.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Cluster components always have `-cluster` suffix. The change fixes
incorrect image tag in docker-compose manifest.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* deployment/docker/docker-compose.yml: adds version tags for VictoriaMetrics containers
* deployment/docker/docker-compose-cluster.yml: adds version tags for VictoriaMetrics containers
* deployment/docker: move cluster compose env to master branch
The change supposed to simplify the process of maintaining for
single/cluster docker-compose envs, alerts, dashboards. It also
supposes to reduce confusion for users when looking for cluster
related alerts/configs.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* deployment/docker: move cluster compose env to master branch
Review updates.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Now vmalert will print the following messages on dupliсates:
```
"recording rule \"record\"; expr: \"up == 1\"; labels: summary={{ value|query }}" is a duplicate within the group "test"
"alerting rule \"alert\"; expr: \"up == 1\"; labels: description={{ value|query }}" is a duplicate within the group "test"
```
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3127
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The standard Snappy encoder from github.com/golang/snappy shows quite good performance number
for compressing the Prometheus remote_write proto messages according to the added benchmarks,
so there is no need in switching to github.com/klauspost/compress/s2 yet.
* dashboards/cluster: few updates
* apply consistent formatting across panels;
* make resource usage panels per component more detailed;
* add extra panels to vmselect for displaying
`vm_rows_read_per_query`, `vm_rows_scanned_per_query`,
`vm_rows_read_per_series` and `vm_series_read_per_query` metrics.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/single: few updates
* apply consistent formatting across panels;
* add extra panels to Performance for displaying
`vm_rows_read_per_query`, `vm_rows_scanned_per_query`,
`vm_rows_read_per_series` and `vm_series_read_per_query` metrics.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmagent: few updates
* apply consistent formatting across panels;
* add panels for showing number of samples ingested
or scraped;
* adapt resource usage panels for multiple selected jobs/instances;
* add adhoc variable;
* display vmagent's version in Stats.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmalert: few updates
* apply consistent formatting across panels;
* adapt resource usage panels for multiple selected jobs/instances;
* show vmalert version in Stats section.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: always re-evaluate Annotations
Previously, Annotations were evaluated only:
1. On alert creating.
2. On alert's value change.
This is premature optimization. It was assumed that since annotations
could contain only text with alert's labels or value - there is no need
in spending resources to re-compile Annotations.
Later, template function `query` was added, which can execute
arbitrary queries and return different results on every evaluation.
So if it was used in annotations, it would be executed only on init
or value change.
Another case when optimization caused an issue - annotations hot reload.
In this case, annotations of the active alert won't change even if Rule's
annotations were changed.
This fix enables Annotations re-evaluation on each iteration to resolve
issues above. It would have some impact on performance, but it is unlikely
it will be noticeable.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: add tp Changelog
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change adds an example of `curl` command to the Rule's page.
The command is generated for each recorded state. It is supposed
user can just copy&execute the command to see what was returned
to vmalert.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmselect/promql: add alphanumeric sort by label (sort_by_label_numeric)
* vmselect/promql: fix tests, add documentation
* vmselect/promql: update test
* vmselect/promql: update for alphanumeric sorting, fix tests
* vmselect/promql: remove comments
* vmselect/promql: cleanup
* vmselect/promql: avoid memory allocations, update functions descriptions
* vmselect/promql: make linter happy (remove ineffectual assigment)
* vmselect/promql: add test case, fix behavior when strings are equal
* vendor: update github.com/VictoriaMetrics/metricsql from v0.44.1 to v0.45.0
this adds support for sort_by_label_numeric and sort_by_label_numeric_desc functions
* wip
* lib/promscrape: read response body into memory in stream parsing mode before parsing it
This reduces scrape duration for targets returning big responses.
The response body was already read into memory in stream parsing mode before this change,
so this commit shouldn't increase memory usage.
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
vmalert: add experimental feature of storing Rule's evaluation state
The new feature keeps last 20 state changes of each Rule
in memory. The state are available for view on the Rule's
view page. The page can be opened by clicking on `Details`
link next to Rule's name on the `/groups` page.
States change suppose to help in investigating cases when Rule
doesn't generate alerts or records.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This reduces scrape duration for targets returning big responses.
The response body was already read into memory in stream parsing mode before this change,
so this commit shouldn't increase memory usage.
- Rename logDebug() to logDebugf() and pass format string together
with format args directly to logDebugf(). This eliminates fmt.Sprintf()
overhead at logDebug() call site when debugging is disabled.
- Format labels in debug message in Prometheus format, e.g. {label1="value1",...labelN="valueN"}
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3025
`go install` is the preferred way for installing go binaries starting
from the minimum supported Go version for VictoriaMetrics - Go1.18 -
see https://tip.golang.org/doc/go1.18#go-command
The automated push of release tags to Github require specifying the remote repository name
when doing `git push <remote-repo-name> v1.xx.y`.
The remote repository name can differ in different environments,
so it cannot be put into Makefile rule.
TODO: create a Makefile rule, which generates standard remote names for public
and private repositories in Git, so `git push` for release tags could be automated then.
- Add getCommonParamsWithDefaultDuration function and use it at /api/v1/series, /api/v1/labels and /api/v1/label/.../values
- Document the default behaviour for setting 5 minutes time range if start arg isn't passed to /api/v1/series, /api/v1/labels and /api/v1/label/.../values
- Document the change at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3052
The panic may trigger during data blocks' processing received
from vmstorage nodes when some of vmstorage nodes return an error
or when `-replicationFactor` is set to values higher than 2 at `vmselect`.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3058
Note that the parallel execution of `union()` args may take more memory and CPU time
than the sequential execution if args contain heavy queries, which may load all the available CPU,
disk and memory resources and vmselect and vmstorage levels.
- Use getScalar() function for obtaining the expected scalar from phi arg
- Reduce the error message returned to the user when incorrect phi is passed to histogram_quantiles
- Improve the description of this bugfix in the docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3026
Cache sanitized label names and return them next time.
This reduces the number of allocations and speeds up the SanitizeLabelName()
function for common case when the number of unique label names is smaller than 100k
The following regex patterns are optimized:
- literal string match, e.g. "foo"
- prefix match, e.g. "foo.*" and "foo.+"
- substring match, e.g. ".*foo.*" and ".+foo.+"
- alternate values match, e.g. "foo|bar|baz"
This is OK, since the anchors are implicitly applied to the whole regexp.
This optimization should improve the speed for regexp series filters with explicit $ and ^ anchors.
For example, `{label="^(foo|bar)$"}`
- Document the change at docs/CHANGELOG.md
- Move auth token parsing from app/vmagent/opentsdbhttp/ to app/vmagent/main.go,
since it must be parsed only when multitenancy support is enabled at vmagent side.
See https://docs.victoriametrics.com/vmagent.html#multitenancy
These metrics allow alerting when the number of unique series approach the limit.
For example, the following query alerts when the number of series reaches 90% of the configured limit:
vm_hourly_series_limit_current_series / vm_hourly_series_limit_max_series > 0.9
The io/ioutil package is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is time to remove the io/ioutil from source code
This is a follow-up for 02ca2342ab
ioutil.ReadAll is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is OK to switch from ioutil.ReadAll to io.ReadAll.
This is a follow-up for 02ca2342ab
The ioutil.ReadDir is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is time to switch from io.ReadDir to os.ReadDir
This is a follow-up for 02ca2342ab
The ioutil.{Read|Write}File is deprecated since Go1.16 -
see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics needs at least Go1.18, so it is safe to remove ioutil usage
from source code.
This is a follow-up for 02ca2342ab
* lib/storage: bump max merge concurrency for small parts to 15
The change is based on the feedback from users on github.
Thier examples show, that limit of 8 sometimes become a
bottleneck. Users report that without limit concurrency
can climb up to 15-20 merges at once.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update lib/storage/partition.go
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Flags `-search.maxTagKeys` and `-search.maxTagValues` are present at vmstorage and not at vmselect
```
# ./vmselect-prod -h | egrep 'search.maxTagKeys|search.maxTagValues'
# ./vmstorage-prod -h | egrep 'search.maxTagKeys|search.maxTagValues'
-search.maxTagKeys int
-search.maxTagValues int
```
We switch default alert's source link to redirect user
to vmalert's UI instead of previous JSON object. While it breaks
compatibility, it also supposed to improve user's experience.
The old behavior can be achieved by updating `-external.alert.source`
command-line flag.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The 429 status code means that the server is overwhelmed with requests.
The client can retry the request after some wait time.
Implement this strategy for service discovery and scrape requests.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2940
* Explicitly store a pointer to UserReadableError in the error interface.
Previously Go automatically converted the value to a pointer before storing in the error interface.
* Add Unwrap() method to UserReadableError, so it can be used transparently with the other code,
which calls errors.Is() and errors.As().
* Document the change in docs/CHANGELOG.md
When read query fails, VM returns rich error message with
all the details. While these details might be useful
for debugging specific cases, they're usually too verbose
for users.
Introducing a new error type `UserReadableError` is supposed
to allow to return to user only the most important parts
of the error trace. This supposed to improve error readability
in web interfaces such as VMUI or Grafana.
The full error trace is still logged with the full context
and can be found in vmselect logs.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously a single syncwg.WaitGroup was used for tracking the lifetime of processBlock callbacks
across all the per-vmstorage goroutines. This could be slow on systems with many CPU cores
because of inter-CPU synchronization overhead.
Use a separate per-vmstorage sync.WaitGroup instead in order to reduce inter-CPU synchronization overhead.
This should imrpove performance for heavy queries over big number of blocks on multi-CPU systems.
Other components, such as `vmagent`, mark these flags as sensitive and
hide them from the `/metrics` endpoint by default. This commit adds
similar handling to the `vmalert` component, hiding them by default, to
prevent logging of secrets inappropriately.
Showing of these values is controlled by an additional flag.
Follow up to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2947
* lib/storage: prevent excessive loops when storage is in RO
Returning nil error when storage is in RO mode results
into excessive loops and function calls which could
result into CPU exhaustion. Returning an err instead
will trigger delays in the for loop and save some resources.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* document the change
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
vmalert can be successfully used with datasources
compatible with Prometheus HTTP API. So we remove comments or
notes in Readme which are saying opposite.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/auth: add tests from NewToken function
* lib/auth: update test, fix problem with type conversion
* lib/auth: update test description
* lib/auth: simplify failure tests
The `-mod=vendor` is automatically set when there is a `vendor` directory
starting from Go1.14 - see https://go.dev/doc/go1.14#go-command
Since the minimum supported Go version for VictoriaMetrics is Go1.17,
then the `-mod=vendor` option is no longer needed.
* Update README.md: fix typo to "HA pairs"
Update README.md to fix typo from "HA paris" to "HA pairs"
* Update docs/README.md : Fix of typo from "HA paris"
Update docs/README.md : Fix of typo from "HA paris" to "HA pairs"
* Update of Single-server-VictoriaMetrics.md to fix typo "HA paris" to "HA pairs"
Update of Single-server-VictoriaMetrics.md to fix typo "HA paris" to "HA pairs"
This adds the following push-related metrics when -pushmetrics.url is set:
- metrics_push_interval_seconds
- metrics_push_total
- metrics_push_errors_total
- metrics_push_bytes_pushed_total
- metrics_push_duration_seconds
- metrics_push_block_size_bytes
Updates https://github.com/VictoriaMetrics/metrics/issues/35
* deployment/docker/Makefile: added docker-scan
docker-scan based on native 'docker scan' function that use snyk.io, see https://docs.docker.com/engine/scan/
* set to call 'docker-scan after release binaries but before publishing
Reduce inter-CPU communications when processing the query over big number of time series.
This should improve performance for queries over big number of time series
on systems with many CPU cores.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2896
Based on b596ac3745
Thanks to @zqyzyq for the idea.
Usernames could be duplicate if it has uniq password.
vmauth makes routing based on auth token and username + password combination must be unique for this case.
The new metric `vmagent_remotewrite_queues` exports a static value of
number of configured remote write queus. This metric is useful to
calculate total saturation per each configured URL with given number
of queues. See corresponding changes to vmagent alerts and dashboard.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change allows to specify default value for `getScrapeInterval`
function when actual interval can't be calculated.
Before the change, function were returning `maxSilenceInterval` (5m)
in such cases, which may be not correct for instant queries processing.
The specific scenario where using `maxSilenceInterval` caused issues
is the following:
1. Series becomes stale;
2. Client (in this case vmalert) continues to request series every 15s;
3. Database returns empty results as expected;
4. But at some specific moment of time database returns datapoints from `now()-5m`,
because lookback window was extended to `maxSilenceInterval`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmselect: cover special cases for vmalert's routing in single-node version
* remove trailing `/` from requests
* redirect to vmalert's home page when `/vmalert` is requested.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: fix review comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmselect/main.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Use binary search instead of linear scan when locating the run of smallest timestamps
in blocks with intersected time ranges. This should improve performance
when merging blocks with big number of samples
- Skip samples with duplicate timestamps. This should increase query performance
in cluster version of VictoriaMetrics with the enabled replication.
* vmalert: deprecate alert's status link
Deprecate alert's status link `/api/v1/<groupID>/<alertID>/status` in favour of
`api/v1/alerts?group_id=<group_id>&alert_id=<alert_id>"`.
The change was needed for simplifying logic in vmselect for proxying vmalert's requests.
The old alert's status link will be still supported for a few versions but will be removed in the future.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: fix review comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously the time series could be put into dateMetricIDCache without
registering in the per-day inverted index if GetOrCreateTSIDByName
finds TSID entry in the global index. This could lead to missing
series in query results.
The issue has been introduced in the commit 55e7afae3a,
which has been included in VictoriaMetrics v1.78.0
* docs: warn about potential issue with read queries for 1.78.0
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docs: warn about potential issue with read queries for 1.78.0
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- show dates in human-readable format, e.g. 2022-05-07, instead of a numeric value
- limit the maximum length of queries and filters shown in trace messages
Production experience shows that 100k is too big for /api/v1/series .
It leads to increased CPU usage when Grafana queries /api/v1/series over VictoriaMetrics
with big number of time series during auto-completion and when modifying template variables.
Previously SearchMetricNames was returning unmarshaled metric names.
This wasn't great for vmstorage, which should spend additional CPU time
for marshaling the metric names before sending them to vmselect.
While at it, remove possible duplicate metric names, which could occur when
multiple samples for new time series are ingested via concurrent requests.
Also sort the metric names before returning them to the client.
This simplifies debugging of the returned metric names across repeated requests to /api/v1/series
querytracer has been added to the following storage.Storage methods:
- RegisterMetricNames
- DeleteMetrics
- SearchTagValueSuffixes
- SearchGraphitePaths
Previously the cache could store 10K unique regexps. When every regexp is huge (e.g. hundreds of kilobytes),
then the total cache size could grow to multiples of gigabytes. Now the cache size is limited by the total length
of all cached regexps. So huge regexps won't result in high memory usage for the cache.
alerts: filter out non error log messages for `TooManyLogs`
Info and Warn error levels aren't always a result of malfunctioning
or faulty state. So we filter them out.
If the previous dial attempt was unsuccessful, then all the new dial attempts are skipped
until the background goroutine determines that the given address can be successfully dialed.
This reduces query latency when some of vmstorage nodes are unavailable and dialing them is slow.
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
This commit is based on ideas from the https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2756
The main differences are:
- The check for healthy/unhealthy storage nodes is moved one level lower from app/vmselect/netstorage to lib/netutil.ConnPool.
This makes possible re-using this feature everywhere lib/netutil.ConnPool is used.
- The check doesn't take into account handshake errors for already established connections.
Handshake errors usually mean improperly configured VictoriaMetrics cluster, so they shouldn't be ignored.
The commit 5fb45173ae takes into account only newly registered series
when applying cardinality limits. This means that the cardinality limit could be exceeded with already registered series.
This commit returns back accounting for already registered series when applying cardinality limits.
Previously the creation of per-day indexes and global indexes
for the newly registered time series was decoupled.
Now global indexes and per-day indexes for the current day are created toghether for new time series.
This should speed up registering new time series a bit.
* vmalert: remove head of line blocking for sending alerts
This change makes sending alerts to notifiers concurrent instead
of sequential. This eliminates head of line blocking, where first
faulty notifier address prevents the rest of notifiers from
receiving notifications.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: make default timeout for sending alerts 10s
Previous value of 1m was too high and was inconsistent
with default timeout defined for notifiers via
configuration file.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: linter checks fix
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmselect: limit `end` param max value by 2d in future
The change is applied only to service handlers like `/labels` or `/series`
and limits the `end` param by max value <= now() + 2 days. The same limit
is applied for the ingested data, so no reason to allow to request data
in future far than that.
The change is also needed for corner cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2669
where too high `end` value triggers inefficient global index search.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docs/CHANGELOG.md: document the bugfix
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This allows filling the seriesCountByFocusLabelValue list in the /api/v1/status/tsdb response
with label values for the specified focusLabel, which contain the highest number of time series.
TODO: add this to Cardinality explorer at VMUI - https://docs.victoriametrics.com/#cardinality-explorer
Suggest `-search.maxStalenessInterval=10s` instead of `-search.maxStalenessInterval=1ms`,
since `1ms` would result in empty graphs in most cases, since the interval between data points
on the graph is usually much higher than 1ms. For example, if the graph shows time range of one hour
and it contains 1000 points, then the interval between points on the graph would equal to
3600s/1000=3.6 seconds.
* feat: make datepicker to be set to last 30 min by default
* fix: correct spinner while loading data
* feat: change legend style
* app/vmselect: `make vmui-update`
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
vmalert: support `limit` param in groups definition
`limit` param limits number of time series samples produced by a single rule
during execution.
On reaching the limit rule will return an err.
Signed-off-by: lihaowei <haoweili35@gmail.com>
The change updates histogram for registering SD update duration
only SD is considered as `active`. SD is active if at least
one scraper for this SD has started.
This change supposed to reduce metrics cardinality produced
by duration histogram which gets updated even if SD isn't configured.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2671
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Workers count for merges affects the max part size during merges. Such behaviour
protects storage from running out of disk space for scenario when all workers
are merging parts with the max size.
This works very well for most cases. But for systems where high number of CPUs
is allocated for vmstorage components this could significantly impact the max
part size and result in more unmerged parts than expected.
While checking multiple production highly loaded setups it was discovered that
`max_over_time(vm_active_merges{type="storage/big}[1h]}"` rarely exceeds 2,
and `max_over_time(vm_active_merges{type="storage/small}[1h]}"` rarely exceeds 4.
The change in this commit limits the max value for concurrency accordingly.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Remove unused js bloatware from /targets page. This strips down binary size by more than 100Kb
- Add /service-discovery page for API compatibility with Prometheus
- Properly load bootstrap.min.css from /prometheus/targets
- Serve static contents for /targets page from app/vminsert instead of app/vmselect, because /targets page is served from there
- Take into account `ca`, `key` and `cert` values when generating string representation of TLSConfig.
Print hashes instead of real values because of security considerations.
- Properly update Config.tlsCertDigets when `key` and `cert` values are set.
This allows properly updating scrape targets after these values are updated in configs.
- Do not re-generate certificate from `key` and `cert` values per each call to getTLSCert,
because these values are immutable.
- Do not set `ca` value from `ca_file` value, so it isn't exposed at `/config` page.
- Generate proper error messages on incorrect `key`, `cert` or `ca` values.
Fixes security vulnerability in nth-check version <=1.0.2
My previous version pin was insufficient, as it was imported again through a different (svgo -> css-select).
- data modification
- more applications for gauges
- recommendations for instrumenting app with metrics
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/promscrape/discovery/kubernetes: properly updates discovered scrape works
previously, added or updated scrapeworks may override previuosly
discovered.
it happens because swosByKey may contain small subset of kubernetes
objects with it's labels.
It happens for objectsUpdated and objectsAdded maps, which include only changed elements
* Properly calculate vm_promscrape_discovery_kubernetes_scrape_works
Co-authored-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The new metric shows the configured evaluation interval per group.
Metric updates its value when group's interval is changed during
hot reload.
The new metric can be used to estimate how close group
is to start missing evaluation rounds. The following query
will show the % of used time by the group to evaluate all rules
before the next round:
```
(max(vmalert_iteration_duration_seconds{quantile="0.99"}) / vmalert_iteration_interval_seconds) * 100
```
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2618
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This adds the ability to utilize sigv4 signing for all AWS services not
just "aps". When the newly introduced property "service" is not set it
will default to "aps".
Signed-off-by: Boris Petersen <boris.petersen@idealo.de>
docs: update docs for beginners
QuickStart page was updated with more relevant information.
Key Concepts was added to cover basics for the VictoriaMetrics.
Unexpectedly, Grafana makes an extra request to `/rules`
handler in addition to `/api/v1/rules` calls in alerts UI.
This happens only for Grafana versions older than 8.5.*.
Apparently, this is related to support of other monitoring
systems.
Prometheus responds with `text/html` content for UI page `/rules`
to such requests. Actually, returning just a blank page with
SC=200 works as well.
Returning actual response of `/api/v1/rules`
results in error in Grafana since it expects a `yaml` (?) in response.
So we add a placeholder to `vmalert`.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2583
Signed-off-by: hagen1778 <roman@victoriametrics.com>
For liquid text processor double braces `{{` `}}`
are special chars for templating.
Since we use them in some of our docs with different purpose,
we must escape them to avoid syntax errors from liquid.
For escaping curly braces we use bult-in plugin which helps
to enclose sections of text via `{% raw %}` and `{% endraw %}`.
This approach prevents liquid syntax errors and makes render correct.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Rules executor within group tracks series sent to remote write
in order to mark them as stale if they had disappeared in next
evaluation round.
The executor uses rules ID as a key to identifies series which belong to rule.
On config reload, executor remains active but the set of rules could change.
Hence, we need to properly cleanup the tracker for rules which has been disappeared
on config reload.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* deployment/docker: pass `-buildvs=false` to `go build` for production builds
This should resolve the `error obtaining VCS status: exit status 128` error
when the environment contains incorrect version of git or has incorrect access rights
to the directory with VictoriaMetrics source code.
See the following links for additional info:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2508#issuecomment-1117126702 ,
- https://github.com/google/ko/issues/672
- https://github.com/golang/go/issues/49004
* lib/netutil: limit the number of concurrently established connections when calling ConnPool.Get()
This should reduce potential spikes in the number of established connections in the following cases:
- when the connection establishing procedure becomes temporarily slow
- after a temporary spike in the rate of ConnPool.Get() calls
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2552
* docs/CHANGELOG.md: document c8af625bcc
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1322#issuecomment-1120276146
* docs/Cluster-VictoriaMetrics.md: typo fix: `by by` -> `by`
* docs: add `resource usage limits` docs, which describe fine-grained tuning for various resource usage limits
* docs/Cluster-VictoriaMetrics.md: the `/api/v1/label/.../values` query can take CPU and ram at both vmstorage and vmselect
* Update root Readme and root vmagent readme
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should reduce potential spikes in the number of established connections in the following cases:
- when the connection establishing procedure becomes temporarily slow
- after a temporary spike in the rate of ConnPool.Get() calls
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2552
* vmalert: calculate time for firing alert based on the given timestamp
Previously, current time was used for checking the `firing` threshold.
This is not correct, since alerts are evaluated at specific timestamps.
Hence, this specific timestamp supposed to be used in the calculation.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: properly calculate evaluation timestamp for rules
Timestamp for rules evaluation should be calculated after
the artifical delay for groups start. Otherwise, evaluation
timestamp can fall back too far in time.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Make it possible to migrate timeseries while restoring the
original timeseries name previously written from Prometheus
to InfluxDB v1 via remote_write.
Fixes: https://github.com/VictoriaMetrics/vmctl/issues/8
Previously ScrapeConfig.clone() was improperly copying promauth.Secret fields -
their contents was replaced with `<secret>` value.
This led to inability to use passwords and secrets in `-promscrape.config` file.
The bug has been introduced in v1.77.0 in the commit 67b10896d2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2551
Do not assume the db label to be the last one and also
make sure we are not skipping it and everything afterwards.
Breaking the loop would cause following labels to be empty.
* {lib/promscrape,app/vmagent}: adds sigv4 support for vmagent remoteWrite
moves aws related code into separate lib from lib/promscrape
it allows to write data from vmagent to the AWS managed prometheus (cortex)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1287
* Apply suggestions from code review
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
vmctl: fix vmctl blocking on process interrupt
This change prevents vmctl from indefinite blocking on
receiving the interrupt signal. The update touches all
import modes and suppose to improve tool reliability.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2491
This adds a metric for the rate limit.
The limit is present as a flag currently:
`flag{name="remoteWrite.rateLimit", value="500000", is_set="true"} 1`
We are running many instances of vmagent and when creating alerts it is harder than it needs to be when extracting the value from the flag.
With this change it should be easier to monitor how close to the limit we are.
`((100/vmagent_remotewrite_rate_limit{account="account"})*sum (rate(vmagent_remotewrite_conn_bytes_written_total{account="account"}))) and ON (account) flag{name="remoteWrite.rateLimit"} == 1`
Function `ValidateTemplates`, used on the vmalert startup,
is supposed to check whether used templates and functions
in loaded rules are correct. The function was parsing
and executing loaded templates.
However, rules may contain functions which can't be executed
without values (label values or query results), like `slice`.
Because of this, validation for completely valid expression
`{{ slice $labels.job 9 }}` will fail since `$labels.job`
is empty during validation.
This PR updates `ValidateTemplates` function to only parse
templates without executing them.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2514
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/{storage,flagutil} - Add option for snapshot autoremoval
- add prometheus-like duration as command flag
- add option to delete stale snapshots
- update duration.go flag to re-use own code
* wip
* lib/flagutil: re-use Duration.Set() call in NewDuration
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
add progress bars to the VM importer
The new progress bars supposed to display the processing speed per each
VM importer worker. This info should help to identify if there is a bottleneck
on the VM side during the import process, without waiting for its finish.
The new progress bars can be disabled by passing `vm-disable-progress-bar` flag.
Plotting multiple progress bars requires using experimental progress bar pool
from github.com/cheggaaa/pb/v3. Switch to progress bar pool required changes
in all import modes.
The openTSDB mode wasn't changed due to its implementation, which implies individual progress
bars per each series. Because of this, using the pool wasn't possible.
Signed-off-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* Export "null" in jsonl instead of NaN
The NaN appeared because of staleness markers that were added for compatibility. I think it's better to use json `null`, implemented here.
Also maybe it also makes sense to add a flag like `?skip-staleness-markers=true` to `/export`, to skip nulls at all?
* Update app/vmselect/prometheus/export.qtpl
* app/vmselect/prometheus/export.qtpl.go: `make quicktemplate-gen`
* docs/CHANGELOG.md: document the change
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmselect: adds API /api/v1/status/buildinfo
it should fix an compability error with grafana 8.5 prometheus datasource
https://github.com/grafana/grafana/pull/46771
* Update main.go
* dashboards: remove index filter from stats panel for DiskUsage
The diskUsage stats panel was showing disk usage without including
size of the index, which is not correct. The filter was removed
to reflect the total disk usage.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2368
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: add adhoc filter to dasbhoard variables
The adhoc filter allows to quickly apply global filters without
modifying the panels.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: add new panel `IndexDB items rate`
The new panel supposed to reflect the pressure on indexDB
caused by churn rate or new series registration.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: rm "Deferred merges" panel since it could be misleading
See more context here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1682#issuecomment-938608067
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: replace fixed interval of `5m` for `rate` expressions
Before we used fixed `5m` interval for expressions with `rate` func.
Unfortunately, this interval wasn't a fit for all the cases. So we
switch to `$__rate_interval` instead.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: bump version requirement
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: rm `vm_indexdb_items_added_size_bytes_total` expression
Rate over `vm_indexdb_items_added_size_bytes_total` doesn't seem to be useful
on the dasbhoard panel.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This should prevent from `duplicate time series` errors when executing the following query:
kube_pod_container_resource_requests{resource="cpu"} * on (namespace,pod) group_left() (kube_pod_status_phase{phase=~"Pending|Running"}==1)
where `kube_pod_status_phase{phase=~"Pending|Running"}==1` filters out diplicate time series
replaces internStringMap with sync.Map - it greatly reduces lock contention
concurently reload scrape work for api watcher - each object labels added by dedicated CPU
changes can be tested with following script https://gist.github.com/f41gh7/6f8f8d8719786aff1f18a85c23aebf70
Reduce the number of memory allocations in this function. This improves its performance by up to 50%.
This should improve service discovery speed when big number of potential targets with big number of meta-labels
are generated by service discovery.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2270
This reduces the number of allocations and improves the performance for updating dropped targets' map.
This map is exposed at /api/v1/targets as in droppedTargets list.
* lib/promscrape: adds job restart method
it must restart only ScrapeConfig with changed content
this change greatly reduce time, that needed for job restart
and it should decrease possible data loss when config frequently changed at kubernetes based deployments
Apply suggestions from code review
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* wip
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/httpserver: added tlsCipherSuites flag
* lib/httpserver: compare lower case strings
* lib/httpserver: use EqualFold
* lib/httpserver: used flagutil.NewArray, supported only strings cipher suites
* lib/httpserver: updated flag description, added flag to documentation
* Update lib/httpserver/httpserver.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This change fixes incorrect marshalling for ScrapeConfig
it affects http endpoint and ScrapeConfig checksum.
With omitempty, custom Marshaller is not called if field is not a pointer.
Previously this issue happened at vmalert
* lib/promscrape: allows to use k8s pod name as clusterMemberNum
it must improve user expirience and simplify clustering scrapers.
it must allow to use vmagent cluster with distroless images
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2359
* Apply suggestions from code review
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
previously, if native block cannot be unmarshaled, wg.Done wasn't called by unmarshal work.
It leads to connection blocking and possible dead-lock at client side
Before, relabeling for notifier configured via file was supported
only for target labels discovered via SD.
With this change, new config field `alert_relabel_configs` is introduced
for applying relabeling to labels of sent alerts.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This metric is equivalent to `vm_available_memory_bytes`, but it has better name,
since the metric is related to a process, not VictoriaMetrics itself.
Leave `vm_available_memory_bytes` for backwards compatibility.
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.
For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.
See more details here https://github.com/prometheus/compliance/issues/80
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This should improve the performance for items sorting inside inmemoryBlock.MarshalUnsortedData
if they have common prefix.
While at it, improve the performance for inmemoryBlock.updateCommonPrefix for sorted items.
This should improve performance for inmemoryBlock.MarshalSortedData during background merge.
This reduces memory usage under production workloads by up to 10%,
while CPU spent on GC remains roughly the same.
The CPU spent on GC can be monitored with go_memstats_gc_cpu_fraction metric
The following actions are taken:
- Increase the TLS hashdshake timeout from 5 seconds to 10 seconds
- Increase dial timeout from 5 seconds to 30 seconds
- Specify DialContext instead of Dial in http.Transport. This allows properly handling
the Context arg during dialing the remote storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
* lib/protoparser: changes ParseStream for native format
uses reader instead of http.Request
updates app/vmagent and app/vmagent method usage
* app/vmctl: add verify-block subcommand
it allows to check exported from VictoriaMetrics data block in native format
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2362
Update app/vmctl/README.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
The new flag `datasource.disableKeepAlive` allows disabling keepalive
connections. This may be useful if there are multiple datasource
replicas (e.g. vmselects) behind the HTTP balancer to avoid uneven
load spread because of long-lived connections.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Executor recently gain field for storing previously sent series.
Since the same executor object can be used in multiple goroutines,
the access to this field should be serialized.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: split alert's `Start` field into `ActiveAt` and `Start`
The `ActiveAt` field identifies when alert becomes active for rules
with `for > 0`. Previously, this value was stored in field `Start`.
The field `Start` now identifies the moment alert became `FIRING`.
The split is needed in order to distinguish these two moments
in the API responses for alerts.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: support specific moment of time for rules evaluation
The Querier interface was extended to accept a new argument
used as a timestamp at which evaluation should be made.
It is needed to align rules execution time within the group.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: mark disappeared series as stale
Series generated by alerting rules, which were sent to remote write
now will be marked as stale if they will disappear on the next
evaluation. This would make ALERTS and ALERTS_FOR_TIME series
more precise.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: evaluate rules at fixed timestamp
Before, time at which rules were evaluated was calculated
right before rule execution. The change makes sure
that timestamp is calculated only once per evalution round
and all rules are using the same timestamp.
It also updates the logic of resending of already resolved
alert notification.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: allow overridin `alertname` label value if it is present in response
Previously, `alertname` was always equal to the Alerting Rule name. Now,
its value can be overriden if series in response containt the different value
for this label.
The change is needed for improving compatibility with Prometheus.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: align rules evaluation in time
Now, evaluation timestamp for rules evaluates as if
there was no delay in rules evaluation. It means, that
rules will be evaluated at fixed timestamps+group_interval.
This way provides more consistent evaluation results and
improves compatibility with Prometheus,
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: add metric for missed iterations
New metric `vmalert_iteration_missed_total` will show
whether rules evaluation round was missed.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: reduce delay before the initial rule evaluation in group
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: rollback alertname override
According to the spec:
```
The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label.
```
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: throw err immediately on dedup detection
```
The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts
or add samples to samples receiver if there is more than one alert with the same labels
```
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: cleanup
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: use strings builder to reduce allocs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache
It should decrease memory usage for regexp caching
with storing cacheEntry by pointer - golang map should be able to effectivly shrink it's size
original issue with this case - unexpected map grows and storage OOM
Apply suggestions from code review
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Adds missing metrics for regexp cache and regexpPrefixes cache
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* Update url-examples.md
+additional examples
* Update
* Update url-examples
Some changes requested by Roman.
* Update url-examples.md
* Update url-example
* Update url-examples
Additional info and marking for /labels part
* Update url-example
Added example with complex query which needs encoding:
How to execute the query similar to - sum(increase(foo{status="bar"}[5m])) by (status)
* Update url-samples
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* fix for issue 2255 - matchTagFilters for positive empty-match filters
* add example to comments
* formatting
* add test for positive empty match
* formatting
* vmalert: add support for `sortByLabel` template function
* vmalert: update API according to Prometheus conformance program
The changes to the API, field names and URL path has been made
according to the Prometheus specification for `alert_generator`
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md
* vmalert: fix the timestamp of the evaluated rules
The timestamp used for alert's `EndsAt` was calculated
before sending the notification. While the correct way
is to use the timestamp taken right before rules evaluation.
* vmalert: add `-datasource.queryTimeAlignment` flag
The flag is supposed to provide ability to disable `time`
param alignment when executing rules. By default, this flag
is enabled, so it remains backward compatible.
The flag was introduced to achieve better compatibility
with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Some of the releases could negatively affect performance for a limited
period of time due to some changes in core. Update details are meant to
warn users about expected changes in peformance after the update.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The lifetime of storageBlock is much shorter comparing to the lifetime of inmemoryPart,
so sync.Pool usage should reduce overall memory usage and improve performance
because of better locality of reference when marshaling inmemoryBlock to inmemoryPart.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
There is no need to sort the underlying data according to sorted items there.
This should reduce cpu usage when registering new time series in `indexdb`.
Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2245
* fixes jwt token parse with correct base64Url decoding
it must be applied according to jwt RFC that requires token to be URL safe
added slow path for decoding tokens with std base64 decoding
adds error logging for vmgateway
* docs/CHANGELOG.md: document the bugfix
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/discovery/consul: update services on the watcher's start
Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.
The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: remove workarounds for consul SD
Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/discovery/consul: update after review
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* added missed runbook for udpating k8s VM Cluster in DO
* Update deployment/marketplace/digitialocean/one-click-droplet/RELEASE_GUIDE.md
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Lowering threshold from 50% to 5% will be more sufficient
for discovering un-healthy system state. It also goes in
sync with alert definition in cluster branch.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The new row Caches adds more visibility for cache utilization by VM.
It replaces the old `Cache size` panel.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: plot cpu limits for vmagent, vmalert and vm-single dashboards
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* alerts: add `TooHighCPUUsage` alert for all VM components
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: bump components version requirements
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Remove unneeded dependency on `numeral` package
* Properly parse numbers obtained from /api/v1/query_range according to
https://prometheus.io/docs/prometheus/latest/querying/api/#expression-query-result-formats
* Optimize updating processing the received data from /api/v1/query_range
* Make smoother zoom on `ctrl+scroll`
* Reduce the number of points received from /api/v1/query_range by 2x in order to reduce load on backend
- Postpone the pre-poulation to the last hour of the current day. This should reduce the number
of useless entries in the next per-day index, which shouldn't be created there,
when the corresponding time series are stopped to be pushed during the current day.
- Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself
when calculating the need for for the given MetricID pre-population.
- Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache
after indexdb rotation. This should improve code maintainability.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/promscrape: support prometheus-like duration in scrape configs
The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/blockcache: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/promscrape: support prometheus-like duration in scrape configs
* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
* docs/CHANGELOG.md: document the feature
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* adds CGO build for arm64
it must improve performance for arm64 based deployments of vmstorage and
vmsingle for 15-20%
it depends on gozstd package update for correct musl gozstd vendoring
* typo fixes
* docs/CHANGELOG.md: document the change
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should improve data ingestion speed if time series samples are ingested with interval bigger than 2 minutes.
The actual interval could exceed 2 minutes if the original interval between samples doesn't exceed 2 minutes
in the case of slow inserts. Slow inserts may appear in the following cases:
* Big number of new time series are pushed to VictoriaMetrics, so they couldn't be registered in 2 minutes.
* MetricName->tsid cache reset on indexdb rotation or due to unclean shutdown.
In this case VictoriaMetrics needs to load MetricName->tsid entries for all the incoming series from IndexDB.
IndexDB uses the block cache for increasing lookup performance. If the cache has no the needed block,
then IndexDB reads and unpacks the block from disk. This requires an extra disk read IO and CPU.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
This also should increase performance for periodically executed queries with intervals from 2 minutes to 5 minutes.
See the previous similar commit - 43103be011
It is possible that the timeout can be increased further. Let's collect production numbers for this change
so the timeout could be adjusted further.
Previously limits for new caches were taken from cache stats.
These limits could mismatch the original limits. This could result in failed cache load
if the stored cache has been created with the limits obtained from cache stats.
* Mention about the ability to configure vmalert notifiers via files in docs/CHANGELOG.md
* Mention about the ability to use Consul service discovery for vmalert notifiers in docs/CHANGELOG.md
* Run `make docs-sync` in order to sync app/vmalert/README.md to docs/vmalert.md
vmalert: support configuration file for notifiers
* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
This metric shows the number of CPU cores available to the process.
This allows creating alerting rules on CPU saturation with the following query:
rate(process_cpu_seconds_total[5m]) / process_cpu_cores_available > 0.9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107
* optimized code ,because only the first error,so no need var errors []error
* optimized code ,because only the first error,so no need var errors []error
Co-authored-by: lirenzuo <lirenzuo@shein.com>
* FAQ update: how downsampling and deduplication will work at the same time
* Update docs/FAQ.md
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
This adds more optimization cases for https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization
For example:
* Multi-level transform functions. For example, abs(round(foo{a="b"})) + bar{x="y"}
is now optimized to abs(round(foo{a="b",x="y"})) + bar{a="b",x="y"}
* Binary operations with `on()`, `without()`, `group_left()` and `group_right()` modifiers.
For example, foo{a="b"} on (a) + bar is now optimized to foo{a="b"} on (a) + bar{a="b"}
* Multi-level binary operations. For example, foo{a="b"} + bar{x="y"} + baz{z="q"}
is now optimized to foo{a="b",x="y",z="q"} + bar{a="b",x="y",z="q"} + baz{a="b",x="y",z="q"}
* Aggregate functions. For example, sum(foo{a="b"}) by (c) + bar{c="d"}
is now optimized to sum(foo{a="b",c="d"}) by (c) + bar{c="d"}
Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice.
This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed
after slize resizing. Switch such places to bytesutil.ResizeNoCopy().
Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability.
Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice
exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls.
This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache).
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
- Optimize Cache.RemoveBlocksFromPart(), so it doesn't need to iterate over all the cached blocks.
- Cache blocks if there were no cache misses during the last 2 minutes.
This may be the case when new blocks are added simultaneously to the storage and to the cache.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
* fix: add date validate for time range
* app/vmselect/vmui: `make vmui-update`
* docs/CHANGELOG.md: document the bugfix
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`,
since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded,
which could result to out of memory errors.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
* feat: replace @codemirror to text field
* feat: switch to Preact from React
* fix: optimize mui imports
* fix: remove unused vars
* update package-lock.json
* app/vmselect/vmui: `make vmui-update`
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Document the bugfix at docs/CHANGELOG.md
- Set __address__ field after copying commonLabels to the resulting map of discovered labels.
This makes sure that the correct __address__ label is used.
This reverts commit 2104330d4c.
This check doesn't work well for community pull requests, since third-party users
aren't motivated to rebase pull requests to branch head after they are created.
This check is useful for private repositories though.
* Simplify queries to OpenTSDB (and make them properly appear in OpenTSDB query stats) and also tweak defaults a bit
* Convert seconds to milliseconds before writing to VictoriaMetrics and increase subquery size
Signed-off-by: John Seekins <jseekins@datto.com>
On import process interruption `vmctl` now prints the max and min timestamps of:
* last failed batch if import ended with error;
* last sent batch if import was cancelled by user.
To get more details for each timeseries in batch user needs to specify `--verbose` flag.
The change does not relate to `vm-native` mode, since `vmctl` has no control over
transferred data in this mode.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1236
Signed-off-by: hagen1778 <roman@victoriametrics.com>
It will prevent merging in a branch that's not based on its base branch HEAD, leading to streamlined history.
Note it will not prevent squash commits, nor commits directly to base branch.
* feat: add a reset query by clicking the logo
* feat: add sequence number for query fields
* feat: invert behavior on the graph's legend
* app/vmselect/vmui: `make vmui-update`
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* vmalert: check if remoteWrite is configured for replay mode
The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmalert/main.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* vmagent: add error log for skipped data block when rejected by receiving side
Previously, rejected data blocks were silently dropped - only metrics were update.
From operational perspective, having an additional logging for such cases is preferable.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmagent: throttle log messages about skipped blocks
The new type of logger was added to logger pacakge.
This new type supposed to control number of logged messages
by time.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/logger: make LogThrottler public, so its methods can be inspected by external packages
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should handle the case when the original job_name has been changed in -promscrape.config ,
while the resulting job label remains the same because it is overriden via relabeling.
* fix: remove disabling custom step when zooming
* feat: add a dynamic calc of the width of the graph
* fix: add validate y-axis limits
* fix: correct axis limits for value 0
* fix: change logic create time series
* fix: change types for tooltip
* fix: correct points on the line
* fix: change the logic for set graph width
* fix: stop checking the period when auto-refresh is enabled
* app/vmselect/vmui: `make vmui-update`
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* dashboards/vmsingle: add "Merges deferred" panel
The new panel supposed to show if there were deferred merges
due to insufficient disk space.
It goes within alerting rule which suppose to send a signal
in such cases.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmsingle: add "Cache usage" panel
The new panel supposed to show the % of the used cache
compared to allowed size by type.
It should help to determine underutilized types of caches.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmsingle: bump version requirement
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmsingle: rm alert for `vm_merge_need_free_disk_space`
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update FAQ.md
Adding explanation "Why do same metrics have differences in VictoriaMetrics and Prometheus dashboards?"
* Update FAQ
* Update FAQ.md
* Apply suggestions from code review
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
* Document changes_prometheus(), increase_prometheus() and delta_prometheus() functions.
* Simplify their implementation
* Mention these functions in docs/CHANGELOG.md
* dashboards/vmagent: shuffle panels for better visibility
More important error/dropped panels were moved higher on the main row.
Network usage panel moved to Resource usage row.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmagent: add Troubleshooting row to show top 5 instances/jobs by churn rate
New panels are supposed to show top 5 jobs or targets which generate the most
of the churn rate. They were placed into a new row "Troubleshooting".
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmagent: add panels for showing persistent queue saturation
New panels were added to Torubleshooting row to show the persistent queue
saturation. The corresponding alerts were added and linked to these
panels as well.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards/vmagent: add alert "RejectedRemoteWriteDataBlocksAreDropped"
New alert suppose to send a notification when vmagent starts to drop
data blocks rejected by configured remote write destiantion.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Simplify queries to OpenTSDB (and make them properly appear in OpenTSDB query stats) and also tweak defaults a bit
Signed-off-by: John Seekins <jseekins@datto.com>
When using `vmalert` with older Prometheus versions, the passed
`step=2m` may be parsed by Prometheus with an err: "cannot parse \"2m0s\" to a valid duration".
In order to improve compatibility vmalert will always convert step duration to seconds.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1943
Signed-off-by: hagen1778 <roman@victoriametrics.com>
For example, `{__graphite__=~"foo.(bar|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution.
This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.
* feat: add a label for the Query field
* fix: change zoom position
* fix: add description and error code to alerts
* fix: correct logic query history
* fix: correct update query history
* feat: add custom step
* update package-lock.json
* docs: document that VMUI now supports overriding of `step` query arg, which is passed to `/api/v1/query_range`
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Service labels like `alertname` or `alertgroup` were attached
after template expanding for `labels` section. Because of this,
labels `alertname` or `alertgroup` weren't available for templating
in `labels` section of alert's definition.
This commit changes the order of labels attaching and adds a test
for verifying these labels availability.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* feat: change duration by "enter"
* fix: optimize data processing for chart
* feat: set minimum step to 1ms
* update dependencies
* feat: remove save the last query to local storage
* fix: handle an error in a table with subqueries
* feat: store display type in URL
* Revert "feat: store display type in URL"
This reverts commit ccc242c69a.
* feat: store display type in URL
* refactor: move the time setting to a folder
* refactor: move the query configurator to a folder
* refactor: move the auth settings to a folder
* feat: improve styles
* feat: add multi query
* update package-lock
* feat: add display multiple queries
* feat: add limits for multiple queries
* update dependencies
* feat: add history for multiple queries
* feat: add line type to legend
* feat: change style for switch
* feat: change the logic for axes limits for multiple queries
* update package-lock.json
* update dependencies
* feat: add the filter to legend
* wip
* lib/httpserver: add missing 127.0.0.1 hostname to the logged address for http and pprof server if the address starts with ':'
This allows copy-pasting the url to http server from logs.
* lib/httpserver: add missing 127.0.0.1 hostname to the logged address for http and pprof server if the address starts with ':'
This allows copy-pasting the url to http server from logs.
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* Document the ability to specify http or https urls in `-auth.config` at docs/CHANGELOG.md
* Move the ReadFileOrHTTP to lib/fs, so it can be re-used in other places where a file
should be read from the given path. For example, in `-promscrape.config` at `vmagent`.
* add support for reading remote auth_config file via http
* fix lint
* fix defer on close body
Co-authored-by: Tiago Magalhães <tmagalhaes@wavecom.pt>
* vmalert: introduce additional HTTP URL params per-group configuration
The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.
Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.
`params` fields are supported by both Prometheus and Graphite datasource types.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: provide more examples for `params` field
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: set higher priority for `params` setting
If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The bump was required for `vmalert` package.
`vmalert` docs now also contain an updated description.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query:
vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9
* removes FileSize from backup part key
it should fix download restoration for backups
* Update lib/backup/common/part.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The exported data isn't de-duplicated by default due to performance reasons.
It is expected that the de-duplication is applied during importing the exported data.
The deduplication is applied only when exporting data via /api/v1/export if `reduce_mem_usage=1` query arg isn't passed to the request.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1837
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt
Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
For a long time notifier.Addr flag was required. The assumption was that vmalert will
be always used for alerting. However, practice shows that some users need only
recording rules. In this case, requirement of notifier.Addr is ambigious.
The change verifies if loaded config contains recording or alerting rules and
if there are corresponding flags set. This is true for initial config load
and hot reload.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix: handle an error in a table with subqueries
* feat: store display type in URL
* Revert "feat: store display type in URL"
This reverts commit ccc242c69a.
* Simplify queries to OpenTSDB (and make them properly appear in OpenTSDB query stats) and also tweak defaults a bit
Signed-off-by: John Seekins <jseekins@datto.com>
* remove extraneous printlns
Signed-off-by: John Seekins <jseekins@datto.com>
* remove empty line
Signed-off-by: John Seekins <jseekins@datto.com>
* fix bug in offset calcuation and closer to working with simpler queries
Signed-off-by: John Seekins <jseekins@datto.com>
* fix boolean eval
Signed-off-by: John Seekins <jseekins@datto.com>
* fix casting and check for multiple series
Signed-off-by: John Seekins <jseekins@datto.com>
Describe remoteWrite.url is used to persist rules and alerts state info,
and add an additional paragraph explaining the separation between
-remoteRead.url and -datasource.url.
Fixes#1810.
* feat: add query history
* fix: change detect keyUp for nav query history
* feat: set default query history
* feat: change graph legend
* update dependencies
* update codemirror version
* fix: correct update period time after zoom/pan
* fix: optimize data processing for the graph
* fix: eliminate memory leaks related to mouse events
* fix: correct display of straight line
* Merge branch 'master' into vmui-fix-reset-graph
* app/vmselect: `make vmui-update`
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This removes the unneeded level of indirection and improves code readability.
The "prometheus" and "graphite" constants aren't going to change in the future, so there is no sense in hiding them behind constants.
This should improve the maximum data ingestion speed for highly-loaded vmagent instances
which run on beefy servers with many CPU cores and big amounts of RAM
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
Prometheus allows to have groups with no rules, so we should support
it in vmalert as well for compatibility reasons.
It is also allowed to hot-reload empty groups by adding or removing rules.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
httpListenAddr=flag.String("httpListenAddr",":8428","TCP address to listen for http connections")
minScrapeInterval=flag.Duration("dedup.minScrapeInterval",0,"Leave only the first sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication for details")
minScrapeInterval=flag.Duration("dedup.minScrapeInterval",0,"Leave only the last sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication and https://docs.victoriametrics.com/#downsampling")
dryRun=flag.Bool("dryRun",false,"Whether to check only -promscrape.config and then exit. "+
"Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse")
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse=false command-line flag")
inmemoryDataFlushInterval=flag.Duration("inmemoryDataFlushInterval",5*time.Second,"The interval for guaranteed saving of in-memory data to disk. "+
"The saved data survives unclean shutdown such as OOM crash, hardware reset, SIGKILL, etc. "+
"Bigger intervals may help increasing lifetime of flash storage with limited write cycles (e.g. Raspberry PI). "+
"Smaller intervals increase disk IO load. Minimum supported value is 1s")
)
funcmain(){
@@ -37,6 +42,7 @@ func main() {
envflag.Parse()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
ifpromscrape.IsDryRun(){
*dryRun=true
@@ -51,7 +57,8 @@ func main() {
logger.Infof("starting VictoriaMetrics at %q...",*httpListenAddr)
measurementFieldSeparator=flag.String("influxMeasurementFieldSeparator","_","Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol")
skipSingleField=flag.Bool("influxSkipSingleField",false,"Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field")
skipMeasurement=flag.Bool("influxSkipMeasurement",false,"Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'")
dbLabel=flag.String("influxDBLabel","db","Default label for the DB name sent over '?db={db_name}' query parameter")
httpListenAddr=flag.String("httpListenAddr",":8429","TCP address to listen for http connections. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
influxListenAddr=flag.String("influxListenAddr","","TCP and UDP address to listen for InfluxDB line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
influxListenAddr=flag.String("influxListenAddr","","TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr=flag.String("graphiteListenAddr","","TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr=flag.String("opentsdbListenAddr","","TCP and UDP address to listen for OpentTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty")
opentsdbHTTPListenAddr=flag.String("opentsdbHTTPListenAddr","","TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty")
configAuthKey=flag.String("configAuthKey","","Authorization key for accessing /config page. It must be passed via authKey query arg")
dryRun=flag.Bool("dryRun",false,"Whether to check only config files without running vmagent. The following files are checked: "+
"Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse")
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
rateLimit=flagutil.NewArrayInt("remoteWrite.rateLimit","Optional rate limit in bytes per second for data sent to -remoteWrite.url. "+
rateLimit=flagutil.NewArrayInt("remoteWrite.rateLimit","Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. "+
"By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"is sent after temporary unavailability of the remote storage")
sendTimeout=flagutil.NewArrayDuration("remoteWrite.sendTimeout","Timeout for sending a single block of data to -remoteWrite.url")
proxyURL=flagutil.NewArray("remoteWrite.proxyURL","Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. "+
tlsInsecureSkipVerify=flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify","Whether to skip tls verification when connecting to -remoteWrite.url")
tlsCertFile=flagutil.NewArray("remoteWrite.tlsCertFile","Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
tlsKeyFile=flagutil.NewArray("remoteWrite.tlsKeyFile","Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
tlsCAFile=flagutil.NewArray("remoteWrite.tlsCAFile","Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. "+
"By default system CA is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
tlsServerName=flagutil.NewArray("remoteWrite.tlsServerName","Optional TLS server name to use for connections to -remoteWrite.url. "+
"By default the server name from -remoteWrite.url is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
tlsInsecureSkipVerify=flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify","Whether to skip tls verification when connecting to the corresponding -remoteWrite.url")
tlsCertFile=flagutil.NewArrayString("remoteWrite.tlsCertFile","Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -remoteWrite.url")
tlsKeyFile=flagutil.NewArrayString("remoteWrite.tlsKeyFile","Optional path to client-side TLS certificate key to use when connecting to the corresponding -remoteWrite.url")
tlsCAFile=flagutil.NewArrayString("remoteWrite.tlsCAFile","Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. "+
"By default system CA is used")
tlsServerName=flagutil.NewArrayString("remoteWrite.tlsServerName","Optional TLS server name to use for connections to the corresponding -remoteWrite.url. "+
"By default the server name from -remoteWrite.url is used")
basicAuthUsername=flagutil.NewArray("remoteWrite.basicAuth.username","Optional basic auth username to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPassword=flagutil.NewArray("remoteWrite.basicAuth.password","Optional basic auth password to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPasswordFile=flagutil.NewArray("remoteWrite.basicAuth.passwordFile","Optional path to basic auth password to use for -remoteWrite.url. "+
"The file is re-read every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerToken=flagutil.NewArray("remoteWrite.bearerToken","Optional bearer auth token to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerTokenFile=flagutil.NewArray("remoteWrite.bearerTokenFile","Optional path to bearer token file to use for -remoteWrite.url. "+
"The token is re-read from the file every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
headers=flagutil.NewArrayString("remoteWrite.headers","Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. "+
"For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. "+
"Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'")
oauth2ClientID=flagutil.NewArray("remoteWrite.oauth2.clientID","Optional OAuth2 clientID to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecret=flagutil.NewArray("remoteWrite.oauth2.clientSecret","Optional OAuth2 clientSecret to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecretFile=flagutil.NewArray("remoteWrite.oauth2.clientSecretFile","Optional OAuth2 clientSecretFile to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2TokenURL=flagutil.NewArray("remoteWrite.oauth2.tokenUrl","Optional OAuth2 tokenURL to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2Scopes=flagutil.NewArray("remoteWrite.oauth2.scopes","Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthUsername=flagutil.NewArrayString("remoteWrite.basicAuth.username","Optional basic auth username to use for the corresponding -remoteWrite.url")
basicAuthPassword=flagutil.NewArrayString("remoteWrite.basicAuth.password","Optional basic auth password to use for the corresponding -remoteWrite.url")
basicAuthPasswordFile=flagutil.NewArrayString("remoteWrite.basicAuth.passwordFile","Optional path to basic auth password to use for the corresponding -remoteWrite.url. "+
"The file is re-read every second")
bearerToken=flagutil.NewArrayString("remoteWrite.bearerToken","Optional bearer auth token to use for the corresponding -remoteWrite.url")
bearerTokenFile=flagutil.NewArrayString("remoteWrite.bearerTokenFile","Optional path to bearer token file to use for the corresponding -remoteWrite.url. "+
"The token is re-read from the file every second")
oauth2ClientID=flagutil.NewArrayString("remoteWrite.oauth2.clientID","Optional OAuth2 clientID to use for the corresponding -remoteWrite.url")
oauth2ClientSecret=flagutil.NewArrayString("remoteWrite.oauth2.clientSecret","Optional OAuth2 clientSecret to use for the corresponding -remoteWrite.url")
oauth2ClientSecretFile=flagutil.NewArrayString("remoteWrite.oauth2.clientSecretFile","Optional OAuth2 clientSecretFile to use for the corresponding -remoteWrite.url")
oauth2TokenURL=flagutil.NewArrayString("remoteWrite.oauth2.tokenUrl","Optional OAuth2 tokenURL to use for the corresponding -remoteWrite.url")
oauth2Scopes=flagutil.NewArrayString("remoteWrite.oauth2.scopes","Optional OAuth2 scopes to use for the corresponding -remoteWrite.url. Scopes must be delimited by ';'")
awsUseSigv4=flagutil.NewArrayBool("remoteWrite.aws.useSigv4","Enables SigV4 request signing for the corresponding -remoteWrite.url. "+
"It is expected that other -remoteWrite.aws.* command-line flags are set if sigv4 request signing is enabled")
awsEC2Endpoint=flagutil.NewArrayString("remoteWrite.aws.ec2Endpoint","Optional AWS EC2 API endpoint to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set")
awsSTSEndpoint=flagutil.NewArrayString("remoteWrite.aws.stsEndpoint","Optional AWS STS API endpoint to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set")
awsRegion=flagutil.NewArrayString("remoteWrite.aws.region","Optional AWS region to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set")
awsRoleARN=flagutil.NewArrayString("remoteWrite.aws.roleARN","Optional AWS roleARN to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set")
awsAccessKey=flagutil.NewArrayString("remoteWrite.aws.accessKey","Optional AWS AccessKey to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set")
awsService=flagutil.NewArrayString("remoteWrite.aws.service","Optional AWS Service to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set. "+
"Defaults to \"aps\"")
awsSecretKey=flagutil.NewArrayString("remoteWrite.aws.secretKey","Optional AWS SecretKey to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set")
flushInterval=flag.Duration("remoteWrite.flushInterval",time.Second,"Interval for flushing the data to remote storage. "+
"This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url")
maxUnpackedBlockSize=flagutil.NewBytes("remoteWrite.maxBlockSize",8*1024*1024,"The maximum size in bytes of unpacked request to send to remote storage. "+
"It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics")
maxUnpackedBlockSize=flagutil.NewBytes("remoteWrite.maxBlockSize",8*1024*1024,"The maximum block size to send to remote storage. Bigger blocks may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxRowsPerBlock")
maxRowsPerBlock=flag.Int("remoteWrite.maxRowsPerBlock",10000,"The maximum number of samples to send in each block to remote storage. Higher number may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxBlockSize")
)
// the maximum number of rows to send per each block.
constmaxRowsPerBlock=10000
// the maximum number of labels to send per each block.
unparsedLabelsGlobal=flagutil.NewArray("remoteWrite.label","Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. "+
unparsedLabelsGlobal=flagutil.NewArrayString("remoteWrite.label","Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. "+
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
relabelConfigPathGlobal=flag.String("remoteWrite.relabelConfig","","Optional path to file with relabel_config entries. These entries are applied to all the metrics "+
relabelConfigPathGlobal=flag.String("remoteWrite.relabelConfig","","Optional path to file with relabel_config entries. "+
"The path can point either to local file or to http url. These entries are applied to all the metrics "+
"before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details")
relabelDebugGlobal=flag.Bool("remoteWrite.relabelDebug",false,"Whether to log metrics before and after relabeling with -remoteWrite.relabelConfig. "+
"If the -remoteWrite.relabelDebug is enabled, then the metrics aren't sent to remote storage. This is useful for debugging the relabeling configs")
relabelConfigPaths=flagutil.NewArray("remoteWrite.urlRelabelConfig","Optional path to relabel config for the corresponding -remoteWrite.url")
relabelDebug=flagutil.NewArrayBool("remoteWrite.urlRelabelDebug","Whether to log metrics before and after relabeling with -remoteWrite.urlRelabelConfig. "+
"If the -remoteWrite.urlRelabelDebug is enabled, then the metrics aren't sent to the corresponding -remoteWrite.url. "+
"This is useful for debugging the relabeling configs")
relabelConfigPaths=flagutil.NewArrayString("remoteWrite.urlRelabelConfig","Optional path to relabel config for the corresponding -remoteWrite.url. "+
"The path can point either to local file or to http url")
usePromCompatibleNaming=flag.Bool("usePromCompatibleNaming",false,"Whether to replace characters unsupported by Prometheus with underscores "+
"in the ingested metric names and label names. For example, foo.bar{a.b='c'} is transformed into foo_bar{a_b='c'} during data ingestion if this flag is set. "+
"See https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels")
remoteWriteURLs=flagutil.NewArray("remoteWrite.url","Remote storage URL to write data to. It must support Prometheus remote_write API. "+
remoteWriteURLs=flagutil.NewArrayString("remoteWrite.url","Remote storage URL to write data to. It must support Prometheus remote_write API. "+
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteMultitenantURLs=flagutil.NewArray("remoteWrite.multitenantURL","Base path for multitenant remote storage URL to write data to. "+
remoteWriteMultitenantURLs=flagutil.NewArrayString("remoteWrite.multitenantURL","Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
tmpDataPath=flag.String("remoteWrite.tmpDataPath","vmagent-remotewrite-data","Path to directory where temporary data for remote write component is stored. "+
@@ -38,9 +39,9 @@ var (
"isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs")
showRemoteWriteURL=flag.Bool("remoteWrite.showURL",false,"Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
maxPendingBytesPerURL=flagutil.NewBytes("remoteWrite.maxDiskUsagePerURL",0,"The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
maxPendingBytesPerURL=flagutil.NewArrayBytes("remoteWrite.maxDiskUsagePerURL","The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
"for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. "+
"Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. "+
"Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500MB. "+
"Disk usage is unlimited if the value is set to 0")
significantFigures=flagutil.NewArrayInt("remoteWrite.significantFigures","The number of significant figures to leave in metric values before writing them "+
"to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. "+
addr=flag.String("datasource.url","","VictoriaMetrics or vmselect url. Required parameter. "+
"E.g. http://127.0.0.1:8428")
appendTypePrefix=flag.Bool("datasource.appendTypePrefix",false,"Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.")
addr=flag.String("datasource.url","","Datasource compatible with Prometheus HTTP API. It can be single node VictoriaMetrics or vmselect URL. Required parameter. "+
"E.g. http://127.0.0.1:8428 . See also -remoteRead.disablePathAppend and -datasource.showURL")
appendTypePrefix=flag.Bool("datasource.appendTypePrefix",false,"Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.")
showDatasourceURL=flag.Bool("datasource.showURL",false,"Whether to show -datasource.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
headers=flag.String("datasource.headers","","Optional HTTP extraHeaders to send with each request to the corresponding -datasource.url. "+
"For example, -datasource.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -datasource.url. "+
"Multiple headers must be delimited by '^^': -datasource.headers='header1:value1^^header2:value2'")
basicAuthUsername=flag.String("datasource.basicAuth.username","","Optional basic auth username for -datasource.url")
basicAuthPassword=flag.String("datasource.basicAuth.password","","Optional basic auth password for -datasource.url")
basicAuthPasswordFile=flag.String("datasource.basicAuth.passwordFile","","Optional path to basic auth password to use for -datasource.url")
bearerToken=flag.String("datasource.bearerToken","","Optional bearer auth token to use for -datasource.url.")
bearerTokenFile=flag.String("datasource.bearerTokenFile","","Optional path to bearer token file to use for -datasource.url.")
bearerToken =flag.String("datasource.bearerToken","","Optional bearer auth token to use for -datasource.url.")
bearerTokenFile=flag.String("datasource.bearerTokenFile","","Optional path to bearer token file to use for -datasource.url.")
tlsInsecureSkipVerify=flag.Bool("datasource.tlsInsecureSkipVerify",false,"Whether to skip tls verification when connecting to -datasource.url")
tlsCertFile=flag.String("datasource.tlsCertFile","","Optional path to client-side TLS certificate file to use when connecting to -datasource.url")
@@ -25,24 +36,41 @@ var (
tlsCAFile=flag.String("datasource.tlsCAFile","",`Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used`)
tlsServerName=flag.String("datasource.tlsServerName","",`Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used`)
oauth2ClientID=flag.String("datasource.oauth2.clientID","","Optional OAuth2 clientID to use for -datasource.url. ")
oauth2ClientSecret=flag.String("datasource.oauth2.clientSecret","","Optional OAuth2 clientSecret to use for -datasource.url.")
oauth2ClientSecretFile=flag.String("datasource.oauth2.clientSecretFile","","Optional OAuth2 clientSecretFile to use for -datasource.url. ")
oauth2TokenURL=flag.String("datasource.oauth2.tokenUrl","","Optional OAuth2 tokenURL to use for -datasource.url.")
oauth2Scopes=flag.String("datasource.oauth2.scopes","","Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")
lookBack=flag.Duration("datasource.lookback",0,`Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
queryStep=flag.Duration("datasource.queryStep",0,"queryStep defines how far a value can fallback to when evaluating queries. "+
"For example, if datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query."+
"If queryStep isn't specified, rule's evaluationInterval will be used instead.")
queryStep=flag.Duration("datasource.queryStep",5*time.Minute,"How far a value can fallback to when evaluating queries. "+
"For example, if -datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query."+
"If set to 0, rule's evaluation interval will be used instead.")
queryTimeAlignment=flag.Bool("datasource.queryTimeAlignment",true,`Whether to align "time" parameter with evaluation interval.`+
"Alignment supposed to produce deterministic results despite of number of vmalert replicas or time they were started. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1257")
maxIdleConnections=flag.Int("datasource.maxIdleConnections",100,`Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state.`)
roundDigits=flag.Int("datasource.roundDigits",0,`Adds "round_digits" GET param to datasource requests. `+
disableKeepAlive=flag.Bool("datasource.disableKeepAlive",false,`Whether to disable long-lived connections to the datasource. `+
`If true, disables HTTP keep-alives and will only use the connection to the server for a single HTTP request.`)
roundDigits=flag.Int("datasource.roundDigits",0,`Adds "round_digits" GET param to datasource requests. `+
`In VM "round_digits" limits the number of digits after the decimal point in response values.`)
)
// InitSecretFlags must be called after flag.Parse and before any logging
funcInitSecretFlags(){
if!*showDatasourceURL{
flagutil.RegisterSecretFlag("datasource.url")
}
}
// Param represents an HTTP GET param
typeParamstruct{
Key,Valuestring
}
// Init creates a Querier from provided flag values.
// Provided extraParams will be added as GET params to
// Provided extraParams will be added as GET params for
rulePath=flagutil.NewArray("rule",`Path to the file with alert rules.
rulePath=flagutil.NewArrayString("rule",`Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
@@ -34,8 +36,18 @@ Examples:
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
ruleTemplatesPath=flagutil.NewArrayString("rule.templates",`Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
Examples:
-rule.templates="/path/to/file". Path to a single file with go templates
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
absolute path to all .tpl files in root.`)
rulesCheckInterval=flag.Duration("rule.configCheckInterval",0,"Interval for checking for changes in '-rule' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes")
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead")
configCheckInterval=flag.Duration("configCheckInterval",0,"Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr=flag.String("httpListenAddr",":8880","Address to listen for http connections")
evaluationInterval=flag.Duration("evaluationInterval",time.Minute,"How often to evaluate the rules")
@@ -44,19 +56,24 @@ Rule files may contain %{ENV_VAR} placeholders, which are substituted by the cor
validateExpressions=flag.Bool("rule.validateExpressions",true,"Whether to validate rules expressions via MetricsQL engine")
maxResolveDuration=flag.Duration("rule.maxResolveDuration",0,"Limits the maximum duration for automatic alert expiration, "+
"which is by default equal to 3 evaluation intervals of the parent group.")
resendDelay=flag.Duration("rule.resendDelay",0,"Minimum amount of time to wait before resending an alert to notifier")
externalURL=flag.String("external.url","","External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource=flag.String("external.alert.source","",`External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
externalLabels=flagutil.NewArray("external.label","Optional label in the form 'name=value' to add to all generated recording rules and alerts. "+
externalAlertSource=flag.String("external.alert.source","",`External Alert Source allows to override the Source link for alerts sent to AlertManager `+
`for cases where you want to build a custom link to Grafana, Prometheus or any other service. `+
`Supports templating - see https://docs.victoriametrics.com/vmalert.html#templating . `+
`For example, link to Grafana: -external.alert.source='explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":{{$expr|jsonEscape|queryEscape}} },{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' . `+
`If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.`)
externalLabels=flagutil.NewArrayString("external.label","Optional label in the form 'Name=value' to add to all generated recording rules and alerts. "+
"Pass multiple -label flags in order to add multiple label sets.")
remoteReadLookBack=flag.Duration("remoteRead.lookback",time.Hour,"Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors=flag.Bool("remoteRead.ignoreRestoreErrors",true,"Whether to ignore errors from remote storage when restoring alerts state on startup.")
disableAlertGroupLabel=flag.Bool("disableAlertgroupLabel",false,"Whether to disable adding group's name as label to generated alerts and time series.")
disableAlertGroupLabel=flag.Bool("disableAlertgroupLabel",false,"Whether to disable adding group's Name as label to generated alerts and time series.")
dryRun=flag.Bool("dryRun",false,"Whether to check only config files without running vmalert. The rules file are validated. The `-rule` flag must be specified.")
dryRun=flag.Bool("dryRun",false,"Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.")
)
varalertURLGeneratorFnnotifier.AlertURLGenerator
@@ -66,13 +83,20 @@ func main() {
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage=usage
envflag.Parse()
remoteread.InitSecretFlags()
remotewrite.InitSecretFlags()
datasource.InitSecretFlags()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
err:=templates.Load(*ruleTemplatesPath,true)
iferr!=nil{
logger.Fatalf("failed to parse %q: %s",*ruleTemplatesPath,err)
addrs=flagutil.NewArray("notifier.url","Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093")
basicAuthUsername=flagutil.NewArray("notifier.basicAuth.username","Optional basic auth username for -notifier.url")
basicAuthPassword=flagutil.NewArray("notifier.basicAuth.password","Optional basic auth password for -notifier.url")
configPath=flag.String("notifier.config","","Path to configuration file for notifiers")
suppressDuplicateTargetErrors=flag.Bool("notifier.suppressDuplicateTargetErrors",false,"Whether to suppress 'duplicate target' errors during discovery")
addrs=flagutil.NewArrayString("notifier.url","Prometheus alertmanager URL, e.g. http://127.0.0.1:9093")
basicAuthUsername=flagutil.NewArrayString("notifier.basicAuth.username","Optional basic auth username for -notifier.url")
basicAuthPassword=flagutil.NewArrayString("notifier.basicAuth.password","Optional basic auth password for -notifier.url")
basicAuthPasswordFile=flagutil.NewArrayString("notifier.basicAuth.passwordFile","Optional path to basic auth password file for -notifier.url")
bearerToken=flagutil.NewArrayString("notifier.bearerToken","Optional bearer token for -notifier.url")
bearerTokenFile=flagutil.NewArrayString("notifier.bearerTokenFile","Optional path to bearer token file for -notifier.url")
tlsInsecureSkipVerify=flagutil.NewArrayBool("notifier.tlsInsecureSkipVerify","Whether to skip tls verification when connecting to -notifier.url")
tlsCertFile=flagutil.NewArray("notifier.tlsCertFile","Optional path to client-side TLS certificate file to use when connecting to -notifier.url")
tlsKeyFile=flagutil.NewArray("notifier.tlsKeyFile","Optional path to client-side TLS certificate key to use when connecting to -notifier.url")
tlsCAFile=flagutil.NewArray("notifier.tlsCAFile","Optional path to TLS CA file to use for verifying connections to -notifier.url. "+
tlsCertFile=flagutil.NewArrayString("notifier.tlsCertFile","Optional path to client-side TLS certificate file to use when connecting to -notifier.url")
tlsKeyFile=flagutil.NewArrayString("notifier.tlsKeyFile","Optional path to client-side TLS certificate key to use when connecting to -notifier.url")
tlsCAFile=flagutil.NewArrayString("notifier.tlsCAFile","Optional path to TLS CA file to use for verifying connections to -notifier.url. "+
"By default system CA is used")
tlsServerName=flagutil.NewArray("notifier.tlsServerName","Optional TLS server name to use for connections to -notifier.url. "+
tlsServerName=flagutil.NewArrayString("notifier.tlsServerName","Optional TLS server name to use for connections to -notifier.url. "+
"By default the server name from -notifier.url is used")
oauth2ClientID=flagutil.NewArrayString("notifier.oauth2.clientID","Optional OAuth2 clientID to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2ClientSecret=flagutil.NewArrayString("notifier.oauth2.clientSecret","Optional OAuth2 clientSecret to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2ClientSecretFile=flagutil.NewArrayString("notifier.oauth2.clientSecretFile","Optional OAuth2 clientSecretFile to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2TokenURL=flagutil.NewArrayString("notifier.oauth2.tokenUrl","Optional OAuth2 tokenURL to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
oauth2Scopes=flagutil.NewArrayString("notifier.oauth2.scopes","Optional OAuth2 scopes to use for -notifier.url. Scopes must be delimited by ';'. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
)
// Init creates a Notifier object based on provided flags.
funcInit(genAlertURLGenerator)([]Notifier,error){
iflen(*addrs)==0{
returnnil,fmt.Errorf("at least one `-notifier.url` must be set")
// cw holds a configWatcher for configPath configuration file
// configWatcher provides a list of Notifier objects discovered
// from static config or via service discovery.
// cw is not nil only if configPath is provided.
varcw*configWatcher
// Reload checks the changes in configPath configuration file
// and applies changes if any.
funcReload()error{
ifcw==nil{
returnnil
}
returncw.reload(*configPath)
}
varstaticNotifiersFnfunc()[]Notifier
var(
// externalLabels is a global variable for holding external labels configured via flags
// It is supposed to be inited via Init function only.
externalLabelsmap[string]string
// externalURL is a global variable for holding external URL value configured via flag
// It is supposed to be inited via Init function only.
externalURLstring
)
// Init returns a function for retrieving actual list of Notifier objects.
// Init works in two mods:
// - configuration via flags (for backward compatibility). Is always static
// and don't support live reloads.
// - configuration via file. Supports live reloads and service discovery.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.