Commit Graph

32 Commits

Author SHA1 Message Date
Hui Wang
3e834b5853 vmauth: add new counters to track the number of user request errors
follow up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10177

Add `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` which track
the number of user request errors, and aligned with
`vmauth_user_requests_total`.

The existing `vmauth_http_request_errors_total` currently only counts
requests with `invalid_auth_token`. Once authorization has passed, any
subsequent request errors are tracked under
`xxx_user_request_backend_requests_total`.
2025-12-22 13:06:42 +01:00
Aliaksandr Valialkin
ebb5ccbfcf deployment/docker/rules/alerts-health.yml: clarify the description of the TooManyTSIDMisses alert after the commit 30641b201b
It is expected that the number of TSIDs misses over the last 5 minutes is zero in steady state.
If it is non-zero, then something wrong happens. That's why it is better to use increase() instead of rate() function
for this alert.
2025-11-10 14:37:36 +01:00
Aliaksandr Valialkin
d7e4d8aa7a deployment/docker/rules/alerts-health.yml: clarify the description for the TooManyTSIDMisses alert
This alert is expected after unclean shutdown (OOM, power off, kill -9) of VictoriaMetrics.
It should go away in a few minutes after the restart while VictoriaMetrics deletes metricIDs
for the missing MetricID->TSID entries which were created for the newly registered time series
just before unclean shutdown. It is OK to delete such metricIDs, since the corresponding time series
will be re-registered again. See the commit 20812008a7 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502
2025-11-10 14:31:30 +01:00
Andrii Chubatiuk
ea172f7bc7 {dashboards,rules}: update storage ETA calculations in both dashboards and rules (#9848)
currently index file size is calculated as average across all storages
from all clusters, updated it to get more valid calculations. also PR
[fixes helm chart issue
](https://github.com/VictoriaMetrics/helm-charts/issues/2474).

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-10-17 12:50:56 +03:00
Andrii Chubatiuk
d8ec4894b5 deployment/rules: set proper job filters for rules (#9587)
### Describe Your Changes

related issue https://github.com/VictoriaMetrics/helm-charts/issues/2350

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

(cherry picked from commit 7e05200c60)
2025-08-21 15:37:24 +02:00
Corporte Gadfly
6494508f00 fix typo in sentence 2025-08-18 22:54:35 +02:00
Fred Navruzov
28f1986a0c docs/vmanomaly: release v1.25.1 (#9496)
### Describe Your Changes

Documentation updates for vmanomaly release v1.25.1

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-07-25 15:48:25 +02:00
Aliaksandr Valialkin
23222ecd69 deployment/docker: remove all the code related to VictoriaLogs, since it has been migrated to https://github.com/VictoriaMetrics/VictoriaLogs/ 2025-07-07 03:43:19 +02:00
Hui Wang
77a754678a alerts: fix the alerting rule ScrapePoolHasNoTargets (#9045)
as it may cause false positive in [sharding
mode](https://docs.victoriametrics.com/victoriametrics/vmagent/#scraping-big-number-of-targets)

related https://github.com/VictoriaMetrics/helm-charts/issues/2200

(cherry picked from commit 309f1898b3)
2025-05-29 11:52:02 +02:00
Hui Wang
87ba3d429a alerts: improve disk full estimation (#8955)
enhance alerting rule `DiskRunsOutOfSpaceIn3Days` and
`NodeBecomesReadonlyIn3Days` to account for
[deduplication](https://docs.victoriametrics.com/#deduplication) and
[indexDB](https://docs.victoriametrics.com/#indexdb) when calculating
disk consumption rate.

And try bring the `Storage full ETA` panel back.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 454ad7a1b4)
2025-05-16 16:08:45 +02:00
Roman Khavronenko
c10b6d4542 deployment: update vlogs demo installation (#8894)
* move example alerts out of /rules folder, since this folder should
contain only useful rules
* expose ports for vlogs and vmetrics for local debug
* add some comments explamining vmalert config

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 001dc7c985)
2025-05-08 14:20:43 +02:00
Roman Khavronenko
451aa69164 deployment/rules: add alerting rule ScrapePoolHasNoTargets to vmagent (#8868)
The new rule should notify user when there is a job with 0 configured or
discovered targets, which is usually a sign of misconfiguration.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit b5c9284748)
2025-05-05 15:57:10 +02:00
Aliaksandr Valialkin
9df1e8e71e use new canonical urls to stream-aggregation docs: https://docs.victoriametrics.com/victoriametrics/stream-aggregation/
This avoids a redirect from the old link https://docs.victoriametrics.com/stream-aggregation/ to https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ ,
and fixes `backwards` navigation for these links across VictoriaMetrics docs.

This is a follow-up for f152021521
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8595#issuecomment-2831598274
2025-04-30 22:48:10 +02:00
Aliaksandr Valialkin
a9e637162e use new canonical urls to troubleshooting docs: https://docs.victoriametrics.com/victoriametrics/troubleshooting/
This avoids a redirect from the old link https://docs.victoriametrics.com/troubleshooting/ to https://docs.victoriametrics.com/victoriametrics/troubleshooting/ ,
and fixes `backwards` navigation for these links across VictoriaMetrics docs.

This is a follow-up for f152021521
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8595#issuecomment-2831598274
2025-04-30 17:32:22 +02:00
Aliaksandr Valialkin
7f1febcd4b all: use new canonical urls to vmauth docs: https://docs.victoriametrics.com/victoriametrics/vmauth/
This avoids a redirect from the old link https://docs.victoriametrics.com/vmauth/ to https://docs.victoriametrics.com/victoriametrics/vmauth/ ,
and fixes `backwards` navigation for these links across VictoriaMetrics docs.

This is a follow-up for f152021521
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8595#issuecomment-2831598274
2025-04-30 16:40:36 +02:00
Aliaksandr Valialkin
f740f1106d all: use new canonical urls to vmalert docs: https://docs.victoriametrics.com/victoriametrics/vmalert/
This avoids a redirect from the old link https://docs.victoriametrics.com/vmalert/ to https://docs.victoriametrics.com/victoriametrics/vmalert/ ,
and fixes `backwards` navigation for these links across VictoriaMetrics docs.

This is a follow-up for f152021521
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8595#issuecomment-2831598274
2025-04-30 16:04:18 +02:00
Aliaksandr Valialkin
d756e83f80 all: consistently use Grafana dashboard links ending with dashboard ID
The suffix after the ID may change in dashboard settings.
If it changes, the link becomes broken (dubious decision at grafana.com/grafana/dashboards/ ).
That's why it is better to drop all the suffixes and use links for Grafana dashbooards ending with IDs.
In this case they are automatically redirected to the url with the proper suffix.

This is a follow-up for 3e4c38c56c

See also the previous commits 9c0863babc and 0a5ffb3bc1
2025-04-22 14:20:55 +02:00
Andrii Chubatiuk
57aadb94a2 deployment/docker: added victorialogs cluster docker compose setup (#8725)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8694

additionally removed container_name, docker network, renamed all
compose, config files for consistency

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit f38736343d)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-18 14:33:31 +02:00
hagen1778
35b0233b5b dashboards: rm ETA panel from single and cluster dashboards
The panel was producing wrong predictions as it is almost impossible,
without making too expensive queries, to make a precise predictions.

More details on reasoning why it is better to remove it than fix it
is here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8492.

This change also removes ETA panels from alerting rules annotations.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit ef16681dbf)
2025-03-27 10:41:14 +01:00
Guillem Jover
1d8b7faf71 spelling and grammar fixes via codespell (#8497)
### Describe Your Changes

Fix many spelling errors and some grammar, including misspellings in
filenames.

The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards.
So it seems to have low impact on users.

The change also deprecates `cspell` as it is much heavier and less usable.
---------

Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>

(cherry picked from commit 76d205feae)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-17 16:38:11 +01:00
Roman Khavronenko
04a94793b7 deployment/rules: add alerting rule TooHighQueryLoad (#8365)
TooHighQueryLoad should trigger when vmsingle or vmselect can't start
processing read queries for last 15min.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8f87427c81)
2025-02-25 09:30:21 +01:00
Artem Fetishev
1ea2586856 RemoteWriteConnectionIsSaturated alert: add another saturation cause to the alert description (#8195)
### Describe Your Changes

Currently the alert descrption considers only one end of the connection
(vmagent). While saturation can also be caused by slowness of the
receiving components (vminsert, vmstorage). Update the alert description
with a brief suggestion to also check the dashboards of these
components.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 30af662d84)
2025-02-01 22:31:56 +01:00
Mathias Palmersheim
fb76aad365 fixed #7804 Added NoSelfMonitoringMetrics rule (#7805)
### Describe Your Changes

fixes #7804 by adding alert for missing uptime metric in vmanomaly

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-12-18 22:40:51 +01:00
f41gh7
78ad858ff7 app/{vminsert,vmagent}: drop time series on exceeding labels limits.
Previously, time series with labels exceeding the configured limits were truncated and written to storage, potentially causing data inconsistency. This could lead to collisions between time series and make it difficult to identify the source due to truncated labels.

This commit changes the behavior:
*  Such time series are now rejected outright.
* Rejected time series are logged to stdout, and corresponding counters are incremented.
* removes `vm_too_long_label_values_total`, `vm_too_long_label_names_total`, `vm_metrics_with_dropped_labels_total` metrics.
* adds new values `[too_many_labels,too_long_label_name,too_long_label_value]`  to `reason` label of the `vm_rows_ignored_total` metric name

related issues:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6928
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7661
2024-12-10 22:15:38 +01:00
Andrii Chubatiuk
3be3705097 deployment/rules: updated sum expressions in alerts to be able to inject cluster labels in helm charts scripts (#7670)
### Describe Your Changes

Many users are running k8s-stack in multiple kubernetes clusters and to
configure a proper routing in alertmanager it's required to support
`cluster` label in alerting rules. It's now implemented in helm-chart
hack scripts, but it's tricky part to define if cluster label should be
added or not, when functions has no `by` expression. Updated existing
alerts to provide later an ability to inject cluster label later

Also take into an account `storage.minFreeDiskSpaceBytes` in
`DiskRunsOutOfSpace` alerts

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit fefa3e7936)
2024-12-09 12:23:31 +01:00
Hui Wang
5f9db9a61f alerts-vmalert: reserve rule name for description (#7659) 2024-11-26 18:50:30 +01:00
Fred Navruzov
e53a69c27d docs/vmanomaly: add self-monitoring section (#7558)
- Added self-monitoring guide for `vmanomaly`.
- Added cross-referencing on other pages.
- Slight improvements in wording on related pages
- Update references to v1.18.4
- [x] publish Grafana dashboard to
https://grafana.com/orgs/victoriametrics/dashboards:
https://grafana.com/grafana/dashboards/22337-victoriametrics-vmanomaly/

@AndrewChubatiuk , JFYI if it somehow impacts your work on supporting
`vmanomaly` in operator.

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-11-20 16:32:45 +01:00
Hui Wang
aa817602fa add vlogs type of rule in example (#7548) 2024-11-20 16:31:55 +01:00
Hui Wang
1bff6c1bbd dashboards: add file label filter to vmalert dashboard panels (#7515)
Previously, metrics from groups with the same name but in different
files could be mixed in the results.

e.g. the evaluation time
[here](https://grafana.maas.victoriametrics.com/d/LzldHAVnz/victoriametrics-vmalert?orgId=1&var-ds=PE8D8DB4BEE4E4B22&var-job=All&var-instance=All&var-file=%2Fetc%2Fvmalert%2Fconfig%2Fvm-per-tenant-rulefiles-0%2Fmaas-tenant-1011-maas-1011-vm-health.yaml&var-group=All&var-topk=5&editPanel=23)
is the total for multiple groups from different tenants.
2024-11-14 17:21:31 +01:00
Roman Khavronenko
d71c745d19 deployment/alerts: add RemoteWriteDroppingData to vmalert rules (#7393)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3f0e2ab3b2)
2024-10-31 14:11:10 +01:00
hagen1778
0e6ed4171b deployment/alerts: consistently update path to alerting rules
Follow-up after 68bad22fd2

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6494606924)
2024-10-30 16:44:51 +01:00
Hui Wang
9616814728 vmalert: integrate with victorialogs (#7255)
address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706.
See
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md.

Related fix
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254.

Note: in this pull request, vmalert doesn't support
[backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md#rules-backfilling)
for rules with a customized time filter. It might be added in the
future, see [this
issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7289)
for details.

Feature can be tested with image
`victoriametrics/vmalert:heads-vmalert-support-vlog-ds-0-g420629c-scratch`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 68bad22fd2)
2024-10-29 16:32:00 +01:00