Compare commits

..

13 Commits

Author SHA1 Message Date
Max Kotliar
d33bda1b13 upd changelog 2026-07-02 15:03:41 +03:00
Max Kotliar
38c1b65b64 Update deployment/docker/rules/alerts-vmauth.yml
Co-authored-by: Hui Wang <haley@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
2026-07-02 14:04:10 +03:00
Max Kotliar
75aea9f609 Update deployment/docker/rules/alerts-vmauth.yml
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
2026-07-02 13:26:57 +03:00
Max Kotliar
7a4d4bc6d4 upd 2026-07-02 13:15:49 +03:00
Max Kotliar
af4ae4d458 upd alert 2026-07-02 13:13:51 +03:00
Max Kotliar
b96f63b588 upd desc 2026-07-01 15:23:22 +03:00
Max Kotliar
6b980bdb6f add changelog 2026-07-01 15:09:06 +03:00
Max Kotliar
222bfb0f0e fix dashboard link 2026-07-01 15:06:15 +03:00
Max Kotliar
13036b9297 upd 2026-07-01 12:53:51 +03:00
Max Kotliar
359c634157 Merge remote-tracking branch 'opensource/master' into vmauth-invalid-auth-token-alert 2026-07-01 12:48:31 +03:00
Max Kotliar
290029897b specialize alert 2026-06-30 20:09:39 +03:00
Max Kotliar
9a150c309c fix dashboard link 2026-06-30 19:42:06 +03:00
Max Kotliar
2a40a40e9e deployment: add TooManyInvalidAuthTokenRequests alert
The alert should help surface client auth misconfigurations or potential
brute force attacks.

Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11180
2026-06-30 19:31:05 +03:00
4 changed files with 19 additions and 19 deletions

View File

@@ -1687,10 +1687,6 @@ func assertInstantValues(tss []*timeseries) {
var memoryIntensiveQueries = metrics.NewCounter(`vm_memory_intensive_queries_total`)
var _ = metrics.NewGauge(`vm_max_memory_per_query`, func() float64 {
return float64(maxMemoryPerQuery.N)
})
func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcName string, rf rollupFunc,
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, windowExpr *metricsql.DurationExpr,
) ([]*timeseries, error) {

View File

@@ -223,16 +223,4 @@ groups:
Unexpected TSID misses for \"{{ $labels.job }}\" ({{ $labels.instance }}) for the last 15 minutes.
If this happens after unclean shutdown of VictoriaMetrics process (via \"kill -9\", OOM or power off),
then this is OK - the alert must go away in a few minutes after the restart.
Otherwise this may point to the corruption of index data.
- alert: VMSelectConcurrentQueriesExceedMemoryLimit
expr: (vm_max_memory_per_query * on(job, instance) vm_concurrent_select_capacity) > on(job, instance) vm_available_memory_bytes
for: 5m
labels:
severity: warning
annotations:
summary: "vmselect ({{ $labels.instance }}) concurrent query memory may exceed pod limit"
description: "Current concurrent queries ({{ $value | humanize1024 }} combined max memory) exceed
the available memory on instance {{ $labels.instance }}.
This may result in OOM kills. Consider reducing -maxConcurrentRequests,
lowering -maxMemoryPerQuery, or scaling up pod memory limits."
Otherwise this may point to the corruption of index data.

View File

@@ -56,3 +56,20 @@ groups:
summary: "Too many errors served for user {{ $labels.username }} (instance {{ $labels.instance }})"
description: "Requests from user {{ $labels.username }} are receiving errors.
Please check the vmauth logs to verify that the configuration is correct and clients are sending valid requests."
- alert: InvalidAuthTokenRequestErrors
expr: sum(increase(vmauth_http_request_errors_total{reason="invalid_auth_token"}[5m])) without (instance, reason) > 0
for: 15m
labels:
severity: warning
annotations:
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=16&var-job={{ $labels.job }}"
summary: "vmauth {{ $labels.job }} is receiving many requests with invalid auth tokens"
description: |
vmauth {{ $labels.job }} received {{ $value }} requests with invalid auth tokens in the last 5 minutes.
This may indicate:
- credentials have been updated on vmauth but not on clients
- client misconfiguration or use of an expired token
- a brute-force attack.
Check vmauth metrics for longevity and scale of the issue.
Check access log for detailed information: https://docs.victoriametrics.com/victoriametrics/vmauth/#access-log

View File

@@ -28,12 +28,11 @@ See also [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-rel
* SECURITY: upgrade base docker image (Alpine) from 3.23.4 to 3.24.1. See [Alpine 3.24.1 release notes](https://www.alpinelinux.org/posts/Alpine-3.24.1-released.html).
* FEATURE: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): expose `vm_max_memory_per_query` metric reflecting the `-search.maxMemoryPerQuery` limit. Create `VMSelectConcurrentQueriesExceedMemoryLimit` alert to warn when OOMs are possible due to misconfiguration of `-search.maxMemoryPerQuery` and max concurrent queries.
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): add `default_vm_access_claim` field into `jwt` section of auth config. It could be used at [JWT claim placeholders](https://docs.victoriametrics.com/victoriametrics/vmauth/#jwt-claim-based-request-templating), if `JWT` token doesn't have `vm_access` claim. See [#11054](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11054).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): reduces CPU usage by 10% at [sharding among remote storages](https://docs.victoriametrics.com/victoriametrics/vmagent/#sharding-among-remote-storages). See [#11113](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11113). Thanks to @bennf for contribution.
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): add `optimize_repeated_binary_op_subexprs=1` query arg to [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query) for executing binary operator sides sequentially when they share the same optimized aggregate rollup result expression. This allows the second side to reuse rollup result cache populated by the first side. See [#10575](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10575).
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): prevent possible password brute-force attacks with an artificial 2-3 second delay as recommended by [OWASP](https://owasp.org/Top10/2025/A07_2025-Authentication_Failures). See [#11180](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11180).
* FEATURE: [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/rules): add `InvalidAuthTokenRequestErrors` alerting rule to [vmauth alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/rules/alerts-vmauth.yml). The new rule notifies when vmauth receives requests with invalid or missing auth tokens, which may indicate a client misconfiguration, expired token use, or brute-force attack. See [#11180](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11180).
* BUGFIX: all VictoriaMetrics components: cancel in-flight HTTP requests shortly before `-http.maxGracefulShutdownDuration` elapses during graceful shutdown, so they can drain and the shutdown completes cleanly within that window instead of timing out and exiting via `logger.Fatalf` -> `os.Exit`. This prevents skipping the storage flush and losing in-memory data when long-lived requests are in flight (such as VictoriaLogs live tailing). See [#1502](https://github.com/VictoriaMetrics/VictoriaLogs/issues/1502).
* BUGFIX: `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fixes unexpected rare rerouting. See [#11162](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11162).