Compare commits
72 Commits
dmc-only
...
feature/re
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f395e5db49 | ||
|
|
6db36e244c | ||
|
|
abfd742a0f | ||
|
|
937e3654f3 | ||
|
|
bcbe6d98cc | ||
|
|
c00ecdde57 | ||
|
|
ef5174fef3 | ||
|
|
b3f57c113b | ||
|
|
686c9a21ff | ||
|
|
8f215137e7 | ||
|
|
ed5dc35876 | ||
|
|
13ab8cfb78 | ||
|
|
f8a101e45e | ||
|
|
a1a35fd870 | ||
|
|
0d5df2722d | ||
|
|
db3353c6e1 | ||
|
|
cfbc5ae31d | ||
|
|
fdb3c96fc1 | ||
|
|
486d923351 | ||
|
|
f8552bdc96 | ||
|
|
893c981c57 | ||
|
|
3d7ff783b6 | ||
|
|
78543b7f87 | ||
|
|
f54d22562a | ||
|
|
b672e05dce | ||
|
|
847871b916 | ||
|
|
2aecca1163 | ||
|
|
d1efb2dd37 | ||
|
|
6882c72075 | ||
|
|
60eb543dba | ||
|
|
7db42b0659 | ||
|
|
8d924f0631 | ||
|
|
791679253d | ||
|
|
a745bb797a | ||
|
|
3607c53b7c | ||
|
|
7969647553 | ||
|
|
5f887b66c5 | ||
|
|
d3e2946791 | ||
|
|
603dc03c7d | ||
|
|
1cc471a6c1 | ||
|
|
d40adb1e58 | ||
|
|
8056806d5f | ||
|
|
3d67942a65 | ||
|
|
23bdd14cee | ||
|
|
18a2955553 | ||
|
|
570a9ef627 | ||
|
|
40e27fc2c8 | ||
|
|
befbf9afca | ||
|
|
65d0a8e129 | ||
|
|
c2841ca36c | ||
|
|
cd2026e430 | ||
|
|
216821aa1c | ||
|
|
ef507d372b | ||
|
|
e383b62f59 | ||
|
|
8f34284dd2 | ||
|
|
8f4eca39f7 | ||
|
|
d467faf739 | ||
|
|
673b2ca7db | ||
|
|
40ccf0c333 | ||
|
|
fe341a4204 | ||
|
|
83ebf00659 | ||
|
|
5e602726f5 | ||
|
|
a6200cc83d | ||
|
|
a5811d3c3b | ||
|
|
5962b47c31 | ||
|
|
9a4edc738a | ||
|
|
30d01e9cae | ||
|
|
6b46f3920c | ||
|
|
97b11146ee | ||
|
|
2ef74bd6ea | ||
|
|
845161e377 | ||
|
|
f176a6624a |
23
.github/copilot-instructions.md
vendored
@@ -1,23 +0,0 @@
|
||||
# Project Overview
|
||||
|
||||
VictoriaMetrics is a fast, cost-saving, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.
|
||||
|
||||
## Folder Structure
|
||||
|
||||
- `/app`: Contains the compilable binaries.
|
||||
- `/lib`: Contains the golang reusable libraries
|
||||
- `/docs/victoriametrics`: Contains documentation for the project.
|
||||
- `/apptest/tests`: Contains integration tests.
|
||||
|
||||
## Libraries and Frameworks
|
||||
|
||||
- Backend: Golang, no framework. Use third-party libraries sparingly.
|
||||
- Frontend: React.
|
||||
|
||||
## Code review guidelines
|
||||
|
||||
Ensure the feature or bugfix includes a changelog entry in /docs/victoriametrics/changelog/CHANGELOG.md.
|
||||
Verify the entry is under the ## tip section and matches the structure and style of existing entries.
|
||||
Chore-only changes may be omitted from the changelog.
|
||||
|
||||
|
||||
4
.github/workflows/test.yml
vendored
@@ -86,7 +86,7 @@ jobs:
|
||||
- run: go version
|
||||
|
||||
- name: Run tests
|
||||
run: GOGC=10 make ${{ matrix.scenario}}
|
||||
run: make ${{ matrix.scenario}}
|
||||
|
||||
- name: Publish coverage
|
||||
uses: codecov/codecov-action@v5
|
||||
@@ -95,7 +95,7 @@ jobs:
|
||||
|
||||
apptest:
|
||||
name: apptest
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: apptest
|
||||
|
||||
steps:
|
||||
- name: Code checkout
|
||||
|
||||
25
SECURITY.md
@@ -12,6 +12,31 @@ The following versions of VictoriaMetrics receive regular security fixes:
|
||||
|
||||
See [this page](https://victoriametrics.com/security/) for more details.
|
||||
|
||||
## Software Bill of Materials (SBOM)
|
||||
|
||||
Every VictoriaMetrics container{{% available_from "#" %}} image published to
|
||||
[Docker Hub](https://hub.docker.com/u/victoriametrics)
|
||||
and [Quay.io](https://quay.io/organization/victoriametrics)
|
||||
includes an [SPDX](https://spdx.dev/) SBOM attestation
|
||||
generated automatically by BuildKit during
|
||||
`docker buildx build`.
|
||||
|
||||
To inspect the SBOM for an image:
|
||||
|
||||
```sh
|
||||
docker buildx imagetools inspect \
|
||||
docker.io/victoriametrics/victoria-metrics:latest \
|
||||
--format "{{ json .SBOM }}"
|
||||
```
|
||||
|
||||
To scan an image using its SBOM attestation with
|
||||
[Trivy](https://github.com/aquasecurity/trivy):
|
||||
|
||||
```sh
|
||||
trivy image --sbom-sources oci \
|
||||
docker.io/victoriametrics/victoria-metrics:latest
|
||||
```
|
||||
|
||||
## Reporting a Vulnerability
|
||||
|
||||
Please report any security issues to <security@victoriametrics.com>
|
||||
|
||||
@@ -49,6 +49,11 @@ func insertRows(at *auth.Token, sketches []*datadogsketches.Sketch, extraLabels
|
||||
Name: "__name__",
|
||||
Value: m.Name,
|
||||
})
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10557
|
||||
labels = append(labels, prompb.Label{
|
||||
Name: "host",
|
||||
Value: sketch.Host,
|
||||
})
|
||||
for _, label := range m.Labels {
|
||||
labels = append(labels, prompb.Label{
|
||||
Name: label.Name,
|
||||
@@ -57,9 +62,6 @@ func insertRows(at *auth.Token, sketches []*datadogsketches.Sketch, extraLabels
|
||||
}
|
||||
for _, tag := range sketch.Tags {
|
||||
name, value := datadogutil.SplitTag(tag)
|
||||
if name == "host" {
|
||||
name = "exported_host"
|
||||
}
|
||||
labels = append(labels, prompb.Label{
|
||||
Name: name,
|
||||
Value: value,
|
||||
|
||||
@@ -12,6 +12,7 @@ import (
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
"gopkg.in/yaml.v2"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
@@ -82,30 +83,58 @@ func WriteRelabelConfigData(w io.Writer) {
|
||||
_, _ = w.Write(*p)
|
||||
}
|
||||
|
||||
// GetRemoteWriteRelabelConfigString returns -remoteWrite.relabelConfig contents in string
|
||||
func GetRemoteWriteRelabelConfigString() string {
|
||||
var bb bytesutil.ByteBuffer
|
||||
WriteRelabelConfigData(&bb)
|
||||
if bb.Len() == 0 {
|
||||
return ""
|
||||
}
|
||||
return string(bb.B)
|
||||
}
|
||||
|
||||
type UrlRelabelCfg struct {
|
||||
Url string `yaml:"url"`
|
||||
RelabelConfig any `yaml:"relabel_config"`
|
||||
|
||||
RelabelConfigStr string
|
||||
}
|
||||
|
||||
// WriteURLRelabelConfigData writes -remoteWrite.urlRelabelConfig contents to w
|
||||
func WriteURLRelabelConfigData(w io.Writer) {
|
||||
p := remoteWriteURLRelabelConfigData.Load()
|
||||
if p == nil {
|
||||
cs := GetURLRelabelConfigData()
|
||||
if cs == nil {
|
||||
// Nothing to write to w
|
||||
return
|
||||
}
|
||||
type urlRelabelCfg struct {
|
||||
Url string `yaml:"url"`
|
||||
RelabelConfig any `yaml:"relabel_config"`
|
||||
d, _ := yaml.Marshal(cs)
|
||||
_, _ = w.Write(d)
|
||||
}
|
||||
|
||||
// GetURLRelabelConfigData is similar to WriteURLRelabelConfigData but returning data in []UrlRelabelCfg.
|
||||
func GetURLRelabelConfigData() []UrlRelabelCfg {
|
||||
p := remoteWriteURLRelabelConfigData.Load()
|
||||
if p == nil {
|
||||
return nil
|
||||
}
|
||||
var cs []urlRelabelCfg
|
||||
var cs []UrlRelabelCfg
|
||||
for i, url := range *remoteWriteURLs {
|
||||
cfgData := (*p)[i]
|
||||
var cfgDataBytes []byte
|
||||
if cfgData != nil {
|
||||
cfgDataBytes, _ = yaml.Marshal(cfgData)
|
||||
}
|
||||
if !*showRemoteWriteURL {
|
||||
url = fmt.Sprintf("%d:secret-url", i+1)
|
||||
}
|
||||
cs = append(cs, urlRelabelCfg{
|
||||
cs = append(cs, UrlRelabelCfg{
|
||||
Url: url,
|
||||
RelabelConfig: cfgData,
|
||||
|
||||
RelabelConfigStr: string(cfgDataBytes),
|
||||
})
|
||||
}
|
||||
d, _ := yaml.Marshal(cs)
|
||||
_, _ = w.Write(d)
|
||||
return cs
|
||||
}
|
||||
|
||||
func reloadRelabelConfigs() {
|
||||
|
||||
@@ -81,12 +81,9 @@ func (g *Group) Validate(validateTplFn ValidateTplFn, validateExpressions bool)
|
||||
if g.Interval.Duration() < 0 {
|
||||
return fmt.Errorf("interval shouldn't be lower than 0")
|
||||
}
|
||||
if g.EvalOffset.Duration() < 0 {
|
||||
return fmt.Errorf("eval_offset shouldn't be lower than 0")
|
||||
}
|
||||
// if `eval_offset` is set, interval won't use global evaluationInterval flag and must bigger than offset.
|
||||
if g.EvalOffset.Duration() > g.Interval.Duration() {
|
||||
return fmt.Errorf("eval_offset should be smaller than interval; now eval_offset: %v, interval: %v", g.EvalOffset.Duration(), g.Interval.Duration())
|
||||
// if `eval_offset` is set, the group interval must be specified explicitly(instead of inherited from global evaluationInterval flag) and must bigger than offset.
|
||||
if g.EvalOffset.Duration().Abs() > g.Interval.Duration() {
|
||||
return fmt.Errorf("the abs value of eval_offset should be smaller than interval; now eval_offset: %v, interval: %v", g.EvalOffset.Duration(), g.Interval.Duration())
|
||||
}
|
||||
if g.EvalOffset != nil && g.EvalDelay != nil {
|
||||
return fmt.Errorf("eval_offset cannot be used with eval_delay")
|
||||
|
||||
@@ -176,11 +176,17 @@ func TestGroupValidate_Failure(t *testing.T) {
|
||||
}, false, "interval shouldn't be lower than 0")
|
||||
|
||||
f(&Group{
|
||||
Name: "wrong eval_offset",
|
||||
Name: "too big eval_offset",
|
||||
Interval: promutil.NewDuration(time.Minute),
|
||||
EvalOffset: promutil.NewDuration(2 * time.Minute),
|
||||
}, false, "eval_offset should be smaller than interval")
|
||||
|
||||
f(&Group{
|
||||
Name: "too big negative eval_offset",
|
||||
Interval: promutil.NewDuration(time.Minute),
|
||||
EvalOffset: promutil.NewDuration(-2 * time.Minute),
|
||||
}, false, "eval_offset should be smaller than interval")
|
||||
|
||||
limit := -1
|
||||
f(&Group{
|
||||
Name: "wrong limit",
|
||||
|
||||
@@ -186,6 +186,11 @@ func (c *Client) run(ctx context.Context) {
|
||||
return
|
||||
case <-ticker.C:
|
||||
c.flush(ctx, wr)
|
||||
// drain the potential stale tick to avoid small or empty flushes after a slow flush.
|
||||
select {
|
||||
case <-ticker.C:
|
||||
default:
|
||||
}
|
||||
case ts, ok := <-c.input:
|
||||
if !ok {
|
||||
continue
|
||||
|
||||
@@ -31,8 +31,8 @@ var (
|
||||
"0 means no limit.")
|
||||
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
|
||||
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param.")
|
||||
resendDelay = flag.Duration("rule.resendDelay", 0, "MiniMum amount of time to wait before resending an alert to notifier.")
|
||||
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maxiMum duration for automatic alert expiration, "+
|
||||
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier.")
|
||||
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
|
||||
"which by default is 4 times evaluationInterval of the parent group")
|
||||
evalDelay = flag.Duration("rule.evalDelay", 30*time.Second, "Adjustment of the 'time' parameter for rule evaluation requests to compensate intentional data delay from the datasource. "+
|
||||
"Normally, should be equal to '-search.latencyOffset' (cmd-line flag configured for VictoriaMetrics single-node or vmselect). "+
|
||||
@@ -484,8 +484,15 @@ func (g *Group) UpdateWith(newGroup *Group) {
|
||||
// delayBeforeStart calculates delay based on Group ID, so all groups will start at different moments of time.
|
||||
func (g *Group) delayBeforeStart(ts time.Time, maxDelay time.Duration) time.Duration {
|
||||
if g.EvalOffset != nil {
|
||||
offset := *g.EvalOffset
|
||||
// adjust the offset for negative evalOffset, the rule is:
|
||||
// `eval_offset: -x` is equivalent to `eval_offset: y` for `interval: x+y`.
|
||||
// For example, `eval_offset: -6m` is equivalent to `eval_offset: 4m` for `interval: 10m`.
|
||||
if offset < 0 {
|
||||
offset += g.Interval
|
||||
}
|
||||
// if offset is specified, ignore the maxDelay and return a duration aligned with offset
|
||||
currentOffsetPoint := ts.Truncate(g.Interval).Add(*g.EvalOffset)
|
||||
currentOffsetPoint := ts.Truncate(g.Interval).Add(offset)
|
||||
if currentOffsetPoint.Before(ts) {
|
||||
// wait until the next offset point
|
||||
return currentOffsetPoint.Add(g.Interval).Sub(ts)
|
||||
|
||||
@@ -606,6 +606,15 @@ func TestGroupStartDelay(t *testing.T) {
|
||||
f("2023-01-01T00:03:30.000+00:00", "2023-01-01T00:08:00.000+00:00")
|
||||
f("2023-01-01T00:08:00.000+00:00", "2023-01-01T00:08:00.000+00:00")
|
||||
|
||||
// test group with negative offset -2min, which is equivalent to 3min offset for 5min interval
|
||||
offset = -2 * time.Minute
|
||||
g.EvalOffset = &offset
|
||||
|
||||
f("2023-01-01T00:00:15.000+00:00", "2023-01-01T00:03:00.000+00:00")
|
||||
f("2023-01-01T00:01:00.000+00:00", "2023-01-01T00:03:00.000+00:00")
|
||||
f("2023-01-01T00:03:30.000+00:00", "2023-01-01T00:08:00.000+00:00")
|
||||
f("2023-01-01T00:08:00.000+00:00", "2023-01-01T00:08:00.000+00:00")
|
||||
|
||||
maxDelay = time.Minute * 1
|
||||
g.EvalOffset = nil
|
||||
|
||||
|
||||
@@ -13,6 +13,7 @@ import (
|
||||
"net/url"
|
||||
"os"
|
||||
"regexp"
|
||||
"slices"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
@@ -28,6 +29,7 @@ import (
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs/fscore"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
|
||||
@@ -90,6 +92,8 @@ type UserInfo struct {
|
||||
|
||||
MetricLabels map[string]string `yaml:"metric_labels,omitempty"`
|
||||
|
||||
AccessLog *AccessLog `yaml:"access_log,omitempty"`
|
||||
|
||||
concurrencyLimitCh chan struct{}
|
||||
concurrencyLimitReached *metrics.Counter
|
||||
|
||||
@@ -102,11 +106,37 @@ type UserInfo struct {
|
||||
requestsDuration *metrics.Summary
|
||||
}
|
||||
|
||||
// AccessLog represents configuration for access log settings.
|
||||
type AccessLog struct {
|
||||
Filters *AccessLogFilters `yaml:"filters"`
|
||||
}
|
||||
|
||||
// AccessLogFilters represents list of filters for access logs printing
|
||||
type AccessLogFilters struct {
|
||||
// SkipStatusCodes is a list of HTTP status codes for which access logs will be skipped
|
||||
SkipStatusCodes []int `yaml:"skip_status_codes"`
|
||||
}
|
||||
|
||||
func (ui *UserInfo) logRequest(r *http.Request, userName string, statusCode int) {
|
||||
filters := ui.AccessLog.Filters
|
||||
if filters != nil && len(filters.SkipStatusCodes) > 0 {
|
||||
if slices.Contains(filters.SkipStatusCodes, statusCode) {
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
|
||||
requestURI := httpserver.GetRequestURI(r)
|
||||
logger.Infof("access_log request_host=%q request_uri=%q status_code=%d remote_addr=%s user_agent=%q referer=%q username=%q",
|
||||
r.Host, requestURI, statusCode, remoteAddr, r.UserAgent(), r.Referer(), userName)
|
||||
}
|
||||
|
||||
// HeadersConf represents config for request and response headers.
|
||||
type HeadersConf struct {
|
||||
RequestHeaders []*Header `yaml:"headers,omitempty"`
|
||||
ResponseHeaders []*Header `yaml:"response_headers,omitempty"`
|
||||
KeepOriginalHost *bool `yaml:"keep_original_host,omitempty"`
|
||||
RequestHeaders []*Header `yaml:"headers,omitempty"`
|
||||
ResponseHeaders []*Header `yaml:"response_headers,omitempty"`
|
||||
KeepOriginalHost *bool `yaml:"keep_original_host,omitempty"`
|
||||
hasAnyPlaceHolders bool
|
||||
}
|
||||
|
||||
func (ui *UserInfo) beginConcurrencyLimit(ctx context.Context) error {
|
||||
@@ -349,6 +379,7 @@ func (bus *backendURLs) add(u *url.URL) {
|
||||
url: u,
|
||||
healthCheckContext: bus.healthChecksContext,
|
||||
healthCheckWG: &bus.healthChecksWG,
|
||||
hasPlaceHolders: hasAnyPlaceholders(u),
|
||||
})
|
||||
}
|
||||
|
||||
@@ -366,6 +397,8 @@ type backendURL struct {
|
||||
concurrentRequests atomic.Int32
|
||||
|
||||
url *url.URL
|
||||
|
||||
hasPlaceHolders bool
|
||||
}
|
||||
|
||||
func (bu *backendURL) isBroken() bool {
|
||||
@@ -903,6 +936,9 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
|
||||
if ui.Name != "" {
|
||||
return nil, fmt.Errorf("field name can't be specified for unauthorized_user section")
|
||||
}
|
||||
if err := parseJWTPlaceholdersForUserInfo(ui, false); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if err := ui.initURLs(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
@@ -960,6 +996,10 @@ func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
|
||||
at, ui.Username, ui.Name, uiOld.Username, uiOld.Name)
|
||||
}
|
||||
}
|
||||
|
||||
if err := parseJWTPlaceholdersForUserInfo(ui, false); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if err := ui.initURLs(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
@@ -1059,6 +1099,7 @@ func (ui *UserInfo) initURLs() error {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
for _, e := range ui.URLMaps {
|
||||
if len(e.SrcPaths) == 0 && len(e.SrcHosts) == 0 && len(e.SrcQueryArgs) == 0 && len(e.SrcHeaders) == 0 {
|
||||
return fmt.Errorf("missing `src_paths`, `src_hosts`, `src_query_args` and `src_headers` in `url_map`")
|
||||
@@ -1118,6 +1159,9 @@ func (ui *UserInfo) name() string {
|
||||
h := xxhash.Sum64([]byte(ui.AuthToken))
|
||||
return fmt.Sprintf("auth_token:hash:%016X", h)
|
||||
}
|
||||
if ui.JWT != nil {
|
||||
return `jwt`
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
|
||||
@@ -276,6 +276,50 @@ users:
|
||||
url_prefix: http://foo.bar
|
||||
metric_labels:
|
||||
not-prometheus-compatible: value
|
||||
`)
|
||||
// placeholder in url_prefix
|
||||
f(`
|
||||
users:
|
||||
- username: foo
|
||||
password: bar
|
||||
url_prefix: 'http://ahost/{{a_placeholder}}/foobar'
|
||||
`)
|
||||
// placeholder in a header
|
||||
f(`
|
||||
users:
|
||||
- username: foo
|
||||
password: bar
|
||||
headers:
|
||||
- 'X-Foo: {{a_placeholder}}'
|
||||
url_prefix: 'http://ahost'
|
||||
`)
|
||||
// placeholder in url_prefix
|
||||
f(`
|
||||
users:
|
||||
- username: foo
|
||||
password: bar
|
||||
url_prefix: 'http://ahost/{{a_placeholder}}/foobar'
|
||||
`)
|
||||
// placeholder in a header in url_map
|
||||
f(`
|
||||
users:
|
||||
- username: foo
|
||||
password: bar
|
||||
url_map:
|
||||
- src_paths: ["/select/.*"]
|
||||
headers:
|
||||
- 'X-Foo: {{a_placeholder}}'
|
||||
url_prefix: 'http://ahost'
|
||||
`)
|
||||
|
||||
// placeholder in a header in url_map
|
||||
f(`
|
||||
users:
|
||||
- username: foo
|
||||
password: bar
|
||||
url_map:
|
||||
- src_paths: ["/select/.*"]
|
||||
url_prefix: 'http://ahost/{{a_placeholder}}/foobar'
|
||||
`)
|
||||
}
|
||||
|
||||
@@ -637,6 +681,31 @@ users:
|
||||
URLPrefix: mustParseURL("http://aaa:343/bbb"),
|
||||
},
|
||||
}, nil)
|
||||
|
||||
// Multiple users with access logs enabled
|
||||
f(`
|
||||
users:
|
||||
- username: foo
|
||||
url_prefix: http://foo
|
||||
access_log: {}
|
||||
- username: bar
|
||||
url_prefix: https://bar/x/
|
||||
access_log:
|
||||
filters:
|
||||
skip_status_codes: [404]
|
||||
`, map[string]*UserInfo{
|
||||
getHTTPAuthBasicToken("foo", ""): {
|
||||
Username: "foo",
|
||||
URLPrefix: mustParseURL("http://foo"),
|
||||
AccessLog: &AccessLog{},
|
||||
},
|
||||
getHTTPAuthBasicToken("bar", ""): {
|
||||
Username: "bar",
|
||||
URLPrefix: mustParseURL("https://bar/x/"),
|
||||
AccessLog: &AccessLog{Filters: &AccessLogFilters{SkipStatusCodes: []int{404}}},
|
||||
},
|
||||
}, nil)
|
||||
|
||||
}
|
||||
|
||||
func TestParseAuthConfigPassesTLSVerificationConfig(t *testing.T) {
|
||||
|
||||
@@ -125,3 +125,8 @@ unauthorized_user:
|
||||
- http://vmselect-az1/?deny_partial_response=1
|
||||
- http://vmselect-az2/?deny_partial_response=1
|
||||
retry_status_codes: [503, 500]
|
||||
# log access for requests routed to this user
|
||||
access_log:
|
||||
filters:
|
||||
# except requests with Status Codes below
|
||||
skip_status_codes: [200, 202]
|
||||
|
||||
@@ -2,7 +2,9 @@ package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net/url"
|
||||
"os"
|
||||
"slices"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
@@ -10,6 +12,35 @@ import (
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
)
|
||||
|
||||
const (
|
||||
metricsTenantPlaceholder = `{{.MetricsTenant}}`
|
||||
metricsExtraLabelsPlaceholder = `{{.MetricsExtraLabels}}`
|
||||
metricsExtraFiltersPlaceholder = `{{.MetricsExtraFilters}}`
|
||||
|
||||
logsAccountIDPlaceholder = `{{.LogsAccountID}}`
|
||||
logsProjectIDPlaceholder = `{{.LogsProjectID}}`
|
||||
logsExtraFiltersPlaceholder = `{{.LogsExtraFilters}}`
|
||||
logsExtraStreamFiltersPlaceholder = `{{.LogsExtraStreamFilters}}`
|
||||
|
||||
placeholderPrefix = `{{`
|
||||
)
|
||||
|
||||
var allPlaceholders = []string{
|
||||
metricsTenantPlaceholder,
|
||||
metricsExtraLabelsPlaceholder,
|
||||
metricsExtraFiltersPlaceholder,
|
||||
logsAccountIDPlaceholder,
|
||||
logsProjectIDPlaceholder,
|
||||
logsExtraFiltersPlaceholder,
|
||||
logsExtraStreamFiltersPlaceholder,
|
||||
}
|
||||
|
||||
var urlPathPlaceHolders = []string{
|
||||
metricsTenantPlaceholder,
|
||||
logsAccountIDPlaceholder,
|
||||
logsProjectIDPlaceholder,
|
||||
}
|
||||
|
||||
type jwtCache struct {
|
||||
// users contain UserInfo`s from AuthConfig with JWTConfig set
|
||||
users []*UserInfo
|
||||
@@ -68,6 +99,9 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, error) {
|
||||
|
||||
jwtToken.verifierPool = vp
|
||||
}
|
||||
if err := parseJWTPlaceholdersForUserInfo(&ui, true); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if err := ui.initURLs(); err != nil {
|
||||
return nil, err
|
||||
@@ -101,7 +135,7 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, error) {
|
||||
jui = append(jui, &ui)
|
||||
}
|
||||
|
||||
// the limitation will be lifted once claim based matching will be implemented
|
||||
// TODO: the limitation will be lifted once claim based matching will be implemented
|
||||
if len(jui) > 1 {
|
||||
return nil, fmt.Errorf("multiple users with JWT tokens are not supported; found %d users", len(jui))
|
||||
}
|
||||
@@ -109,10 +143,10 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, error) {
|
||||
return jui, nil
|
||||
}
|
||||
|
||||
func getUserInfoByJWTToken(ats []string) *UserInfo {
|
||||
func getUserInfoByJWTToken(ats []string) (*UserInfo, *jwt.Token) {
|
||||
js := *jwtAuthCache.Load()
|
||||
if len(js.users) == 0 {
|
||||
return nil
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
for _, at := range ats {
|
||||
@@ -131,6 +165,8 @@ func getUserInfoByJWTToken(ats []string) *UserInfo {
|
||||
}
|
||||
if tkn.IsExpired(time.Now()) {
|
||||
if *logInvalidAuthTokens {
|
||||
// TODO: add more context:
|
||||
// token claims with issuer
|
||||
logger.Infof("jwt token is expired")
|
||||
}
|
||||
continue
|
||||
@@ -138,7 +174,7 @@ func getUserInfoByJWTToken(ats []string) *UserInfo {
|
||||
|
||||
for _, ui := range js.users {
|
||||
if ui.JWT.SkipVerify {
|
||||
return ui
|
||||
return ui, tkn
|
||||
}
|
||||
|
||||
if err := ui.JWT.verifierPool.Verify(tkn); err != nil {
|
||||
@@ -148,9 +184,190 @@ func getUserInfoByJWTToken(ats []string) *UserInfo {
|
||||
continue
|
||||
}
|
||||
|
||||
return ui
|
||||
return ui, tkn
|
||||
}
|
||||
}
|
||||
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
func replaceJWTPlaceholders(bu *backendURL, hc HeadersConf, vma *jwt.VMAccessClaim) (*url.URL, HeadersConf) {
|
||||
if !bu.hasPlaceHolders && !hc.hasAnyPlaceHolders {
|
||||
return bu.url, hc
|
||||
}
|
||||
targetURL := bu.url
|
||||
data := jwtClaimsData(vma)
|
||||
if bu.hasPlaceHolders {
|
||||
// template url params and request path
|
||||
// make a copy of url
|
||||
uCopy := *bu.url
|
||||
for _, uph := range urlPathPlaceHolders {
|
||||
replacement := data[uph]
|
||||
uCopy.Path = strings.ReplaceAll(uCopy.Path, uph, replacement[0])
|
||||
}
|
||||
query := uCopy.Query()
|
||||
var foundAnyQueryPlaceholder bool
|
||||
var templatedValues []string
|
||||
for param, values := range query {
|
||||
templatedValues = templatedValues[:0]
|
||||
// filter in-place values with placeholders
|
||||
// and accumulate replacements
|
||||
// it will change the order of param values
|
||||
// but it's not guaranteed
|
||||
// and will be changed in any way with multiple arg templates
|
||||
var cnt int
|
||||
for _, value := range values {
|
||||
if dv, ok := data[value]; ok {
|
||||
foundAnyQueryPlaceholder = true
|
||||
templatedValues = append(templatedValues, dv...)
|
||||
continue
|
||||
}
|
||||
values[cnt] = value
|
||||
cnt++
|
||||
}
|
||||
values = values[:cnt]
|
||||
values = append(values, templatedValues...)
|
||||
query[param] = values
|
||||
}
|
||||
if foundAnyQueryPlaceholder {
|
||||
uCopy.RawQuery = query.Encode()
|
||||
}
|
||||
targetURL = &uCopy
|
||||
}
|
||||
if hc.hasAnyPlaceHolders {
|
||||
// make a copy of headers and update only values with placeholder
|
||||
rhs := make([]*Header, 0, len(hc.RequestHeaders))
|
||||
for _, rh := range hc.RequestHeaders {
|
||||
if dv, ok := data[rh.Value]; ok {
|
||||
rh := &Header{
|
||||
Name: rh.Name,
|
||||
Value: strings.Join(dv, ","),
|
||||
}
|
||||
rhs = append(rhs, rh)
|
||||
continue
|
||||
}
|
||||
rhs = append(rhs, rh)
|
||||
}
|
||||
hc.RequestHeaders = rhs
|
||||
}
|
||||
|
||||
return targetURL, hc
|
||||
}
|
||||
|
||||
func jwtClaimsData(vma *jwt.VMAccessClaim) map[string][]string {
|
||||
data := map[string][]string{
|
||||
// TODO: optimize at parsing stage
|
||||
metricsTenantPlaceholder: {fmt.Sprintf("%d:%d", vma.MetricsAccountID, vma.MetricsProjectID)},
|
||||
metricsExtraLabelsPlaceholder: vma.MetricsExtraLabels,
|
||||
metricsExtraFiltersPlaceholder: vma.MetricsExtraFilters,
|
||||
|
||||
// TODO: optimize at parsing stage
|
||||
logsAccountIDPlaceholder: {fmt.Sprintf("%d", vma.LogsAccountID)},
|
||||
logsProjectIDPlaceholder: {fmt.Sprintf("%d", vma.LogsProjectID)},
|
||||
logsExtraFiltersPlaceholder: vma.LogsExtraFilters,
|
||||
logsExtraStreamFiltersPlaceholder: vma.LogsExtraStreamFilters,
|
||||
}
|
||||
return data
|
||||
}
|
||||
|
||||
func parseJWTPlaceholdersForUserInfo(ui *UserInfo, isAllowed bool) error {
|
||||
if ui.URLPrefix != nil {
|
||||
if err := validateJWTPlaceholdersForURL(ui.URLPrefix, isAllowed); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
if err := parsePlaceholdersForHC(&ui.HeadersConf, isAllowed); err != nil {
|
||||
return err
|
||||
}
|
||||
if ui.DefaultURL != nil {
|
||||
if err := validateJWTPlaceholdersForURL(ui.DefaultURL, isAllowed); err != nil {
|
||||
return fmt.Errorf("invalid `default_url` placeholders: %w", err)
|
||||
}
|
||||
}
|
||||
for i := range ui.URLMaps {
|
||||
e := &ui.URLMaps[i]
|
||||
if e.URLPrefix != nil {
|
||||
if err := validateJWTPlaceholdersForURL(e.URLPrefix, isAllowed); err != nil {
|
||||
return fmt.Errorf("invalid `url_map` `url_prefix` placeholders: %w", err)
|
||||
}
|
||||
}
|
||||
if err := parsePlaceholdersForHC(&e.HeadersConf, isAllowed); err != nil {
|
||||
return fmt.Errorf("invalid `url_map` headers placeholders: %w", err)
|
||||
}
|
||||
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func validateJWTPlaceholdersForURL(up *URLPrefix, isAllowed bool) error {
|
||||
for _, bu := range up.busOriginal {
|
||||
ok := strings.Contains(bu.Path, placeholderPrefix)
|
||||
if ok && !isAllowed {
|
||||
return fmt.Errorf("placeholder: %q is only allowed at JWT token context", bu.Path)
|
||||
}
|
||||
if ok {
|
||||
p := bu.Path
|
||||
for _, ph := range allPlaceholders {
|
||||
p = strings.ReplaceAll(p, ph, ``)
|
||||
}
|
||||
if strings.Contains(p, placeholderPrefix) {
|
||||
return fmt.Errorf("invalid placeholder found in URL request path: %q, supported values are: %s", bu.Path, strings.Join(allPlaceholders, ", "))
|
||||
|
||||
}
|
||||
}
|
||||
for param, values := range bu.Query() {
|
||||
for _, value := range values {
|
||||
ok := strings.Contains(value, placeholderPrefix)
|
||||
if ok && !isAllowed {
|
||||
return fmt.Errorf("query param: %q with placeholder: %q is only allowed at JWT token context", param, value)
|
||||
}
|
||||
if ok {
|
||||
// possible placeholder
|
||||
if !slices.Contains(allPlaceholders, value) {
|
||||
return fmt.Errorf("query param: %q has unsupported placeholder string: %q, supported values are: %s", param, value, strings.Join(allPlaceholders, ", "))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func parsePlaceholdersForHC(hc *HeadersConf, isAllowed bool) error {
|
||||
for _, rhs := range hc.RequestHeaders {
|
||||
ok := strings.Contains(rhs.Value, placeholderPrefix)
|
||||
if ok && !isAllowed {
|
||||
return fmt.Errorf("request header: %q placeholder: %q is only supported at JWT context", rhs.Name, rhs.Value)
|
||||
}
|
||||
if ok {
|
||||
if !slices.Contains(allPlaceholders, rhs.Value) {
|
||||
return fmt.Errorf("request header: %q has unsupported placeholder: %q, supported values are: %s", rhs.Name, rhs.Value, strings.Join(allPlaceholders, ", "))
|
||||
}
|
||||
hc.hasAnyPlaceHolders = true
|
||||
}
|
||||
}
|
||||
for _, rhs := range hc.ResponseHeaders {
|
||||
if strings.Contains(rhs.Value, placeholderPrefix) {
|
||||
return fmt.Errorf("response header placeholders are not supported; found placeholder prefix at header: %q with value: %q", rhs.Name, rhs.Value)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func hasAnyPlaceholders(u *url.URL) bool {
|
||||
if strings.Contains(u.Path, placeholderPrefix) {
|
||||
return true
|
||||
}
|
||||
if len(u.Query()) == 0 {
|
||||
return false
|
||||
}
|
||||
for _, values := range u.Query() {
|
||||
for _, value := range values {
|
||||
if strings.HasPrefix(value, placeholderPrefix) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
@@ -32,14 +32,14 @@ XOtclIk1uhc03oL9nOQ=
|
||||
ac, err := parseAuthConfig([]byte(s))
|
||||
if err != nil {
|
||||
if expErr != err.Error() {
|
||||
t.Fatalf("unexpected error; got %q; want %q", err.Error(), expErr)
|
||||
t.Fatalf("unexpected error; got\n%q\nwant\n%q", err.Error(), expErr)
|
||||
}
|
||||
return
|
||||
}
|
||||
users, err := parseJWTUsers(ac)
|
||||
if err != nil {
|
||||
if expErr != err.Error() {
|
||||
t.Fatalf("unexpected error; got %q; want %q", err.Error(), expErr)
|
||||
t.Fatalf("unexpected error; got\n%q\nwant \n%q", err.Error(), expErr)
|
||||
}
|
||||
return
|
||||
}
|
||||
@@ -164,6 +164,38 @@ users:
|
||||
- `+publicKeyFile+`
|
||||
url_prefix: http://foo.bar
|
||||
`, "cannot parse public key from file \""+publicKeyFile+"\": failed to parse key \"invalidPEM\": failed to decode PEM block containing public key")
|
||||
|
||||
// unsupported placeholder in a header
|
||||
f(`
|
||||
users:
|
||||
- jwt:
|
||||
skip_verify: true
|
||||
url_prefix: http://foo.bar/{{.UnsupportedPlaceholder}}/foo`,
|
||||
"invalid placeholder found in URL request path: \"/{{.UnsupportedPlaceholder}}/foo\", supported values are: {{.MetricsTenant}}, {{.MetricsExtraLabels}}, {{.MetricsExtraFilters}}, {{.LogsAccountID}}, {{.LogsProjectID}}, {{.LogsExtraFilters}}, {{.LogsExtraStreamFilters}}",
|
||||
)
|
||||
// unsupported placeholder in a header
|
||||
f(`
|
||||
users:
|
||||
- jwt:
|
||||
skip_verify: true
|
||||
headers:
|
||||
- "AccountID: {{.UnsupportedPlaceholder}}"
|
||||
url_prefix: http://foo.bar
|
||||
`,
|
||||
"request header: \"AccountID\" has unsupported placeholder: \"{{.UnsupportedPlaceholder}}\", supported values are: {{.MetricsTenant}}, {{.MetricsExtraLabels}}, {{.MetricsExtraFilters}}, {{.LogsAccountID}}, {{.LogsProjectID}}, {{.LogsExtraFilters}}, {{.LogsExtraStreamFilters}}",
|
||||
)
|
||||
|
||||
// spaces in templating not allowed
|
||||
f(`
|
||||
users:
|
||||
- jwt:
|
||||
skip_verify: true
|
||||
headers:
|
||||
- "AccountID: {{ .LogsAccountID }}"
|
||||
url_prefix: http://foo.bar
|
||||
`,
|
||||
"request header: \"AccountID\" has unsupported placeholder: \"{{ .LogsAccountID }}\", supported values are: {{.MetricsTenant}}, {{.MetricsExtraLabels}}, {{.MetricsExtraFilters}}, {{.LogsAccountID}}, {{.LogsProjectID}}, {{.LogsExtraFilters}}, {{.LogsExtraStreamFilters}}",
|
||||
)
|
||||
}
|
||||
|
||||
func TestJWTParseAuthConfigSuccess(t *testing.T) {
|
||||
|
||||
@@ -16,6 +16,7 @@ import (
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/jwt"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
|
||||
@@ -173,7 +174,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
||||
// Process requests for unauthorized users
|
||||
ui := authConfig.Load().UnauthorizedUser
|
||||
if ui != nil {
|
||||
processUserRequest(w, r, ui)
|
||||
processUserRequest(w, r, ui, nil)
|
||||
return true
|
||||
}
|
||||
|
||||
@@ -182,17 +183,21 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
||||
}
|
||||
|
||||
if ui := getUserInfoByAuthTokens(ats); ui != nil {
|
||||
processUserRequest(w, r, ui)
|
||||
processUserRequest(w, r, ui, nil)
|
||||
return true
|
||||
}
|
||||
if ui := getUserInfoByJWTToken(ats); ui != nil {
|
||||
processUserRequest(w, r, ui)
|
||||
if ui, tkn := getUserInfoByJWTToken(ats); ui != nil {
|
||||
if tkn == nil {
|
||||
logger.Panicf("BUG: unexpected nil jwt token for user %q", ui.name())
|
||||
}
|
||||
|
||||
processUserRequest(w, r, ui, tkn)
|
||||
return true
|
||||
}
|
||||
|
||||
uu := authConfig.Load().UnauthorizedUser
|
||||
if uu != nil {
|
||||
processUserRequest(w, r, uu)
|
||||
processUserRequest(w, r, uu, nil)
|
||||
return true
|
||||
}
|
||||
|
||||
@@ -221,7 +226,37 @@ func getUserInfoByAuthTokens(ats []string) *UserInfo {
|
||||
return nil
|
||||
}
|
||||
|
||||
func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
// responseWriterWithStatus is a wrapper around http.ResponseWriter that captures the status code written to the response.
|
||||
type responseWriterWithStatus struct {
|
||||
http.ResponseWriter
|
||||
status int
|
||||
}
|
||||
|
||||
// WriteHeader records the status so it can be easily retrieved later
|
||||
func (rws *responseWriterWithStatus) WriteHeader(status int) {
|
||||
rws.status = status
|
||||
rws.ResponseWriter.WriteHeader(status)
|
||||
}
|
||||
|
||||
// Flush implements net/http.Flusher interface
|
||||
//
|
||||
// This is needed for the copyStreamToClient()
|
||||
func (rws *responseWriterWithStatus) Flush() {
|
||||
flusher, ok := rws.ResponseWriter.(http.Flusher)
|
||||
if !ok {
|
||||
logger.Panicf("BUG: it is expected http.ResponseWriter (%T) supports http.Flusher interface", rws.ResponseWriter)
|
||||
}
|
||||
flusher.Flush()
|
||||
}
|
||||
|
||||
// Unwrap returns the original ResponseWriter wrapped by rws.
|
||||
//
|
||||
// This is needed for the net/http.ResponseController - see https://pkg.go.dev/net/http#NewResponseController
|
||||
func (rws *responseWriterWithStatus) Unwrap() http.ResponseWriter {
|
||||
return rws.ResponseWriter
|
||||
}
|
||||
|
||||
func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *jwt.Token) {
|
||||
startTime := time.Now()
|
||||
defer ui.requestsDuration.UpdateDuration(startTime)
|
||||
|
||||
@@ -230,6 +265,19 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
ctx, cancel := context.WithTimeout(r.Context(), *maxQueueDuration)
|
||||
defer cancel()
|
||||
|
||||
userName := ui.name()
|
||||
if userName == "" {
|
||||
userName = "unauthorized"
|
||||
}
|
||||
|
||||
if ui.AccessLog != nil {
|
||||
w = &responseWriterWithStatus{ResponseWriter: w}
|
||||
defer func() {
|
||||
rws := w.(*responseWriterWithStatus)
|
||||
ui.logRequest(r, userName, rws.status)
|
||||
}()
|
||||
}
|
||||
|
||||
// Acquire global concurrency limit.
|
||||
if err := beginConcurrencyLimit(ctx); err != nil {
|
||||
handleConcurrencyLimitError(w, r, err)
|
||||
@@ -248,10 +296,6 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
}
|
||||
|
||||
// Read the initial chunk for the request body.
|
||||
userName := ui.name()
|
||||
if userName == "" {
|
||||
userName = "unauthorized"
|
||||
}
|
||||
bb, err := bufferRequestBody(ctx, r.Body, userName)
|
||||
if err != nil {
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
@@ -272,7 +316,7 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
defer ui.endConcurrencyLimit()
|
||||
|
||||
// Process the request.
|
||||
processRequest(w, r, ui)
|
||||
processRequest(w, r, ui, tkn)
|
||||
}
|
||||
|
||||
func beginConcurrencyLimit(ctx context.Context) error {
|
||||
@@ -345,7 +389,7 @@ func bufferRequestBody(ctx context.Context, r io.ReadCloser, userName string) (i
|
||||
return bb, nil
|
||||
}
|
||||
|
||||
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *jwt.Token) {
|
||||
u := normalizeURL(r.URL)
|
||||
up, hc := ui.getURLPrefixAndHeaders(u, r.Host, r.Header)
|
||||
isDefault := false
|
||||
@@ -377,6 +421,10 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
break
|
||||
}
|
||||
targetURL := bu.url
|
||||
if tkn != nil {
|
||||
// for security reasons allow templating only for configured url values and headers
|
||||
targetURL, hc = replaceJWTPlaceholders(bu, hc, tkn.VMAccess())
|
||||
}
|
||||
if isDefault {
|
||||
// Don't change path and add request_path query param for default route.
|
||||
query := targetURL.Query()
|
||||
@@ -386,7 +434,6 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
|
||||
// Update path for regular routes.
|
||||
targetURL = mergeURLs(targetURL, u, up.dropSrcPathPrefixParts, up.mergeQueryArgs)
|
||||
}
|
||||
|
||||
wasLocalRetry := false
|
||||
again:
|
||||
ok, needLocalRetry := tryProcessingRequest(w, r, targetURL, hc, up.retryStatusCodes, ui, bu)
|
||||
|
||||
@@ -17,6 +17,7 @@ import (
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sort"
|
||||
"strings"
|
||||
"sync/atomic"
|
||||
"testing"
|
||||
@@ -571,22 +572,41 @@ func TestJWTRequestHandler(t *testing.T) {
|
||||
|
||||
return payload + "." + signatureB64
|
||||
}
|
||||
genToken(t, nil, false)
|
||||
|
||||
f := func(cfgStr string, r *http.Request, responseExpected string) {
|
||||
t.Helper()
|
||||
|
||||
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
if _, err := w.Write([]byte(r.RequestURI + "\n")); err != nil {
|
||||
if _, err := w.Write([]byte("path: " + r.URL.Path + "\n")); err != nil {
|
||||
panic(fmt.Errorf("cannot write response: %w", err))
|
||||
}
|
||||
if v := r.Header.Get(`extra_label`); v != "" {
|
||||
if _, err := w.Write([]byte(`extra_label=` + v + "\n")); err != nil {
|
||||
if _, err := w.Write([]byte("query:\n")); err != nil {
|
||||
panic(fmt.Errorf("cannot write response: %w", err))
|
||||
}
|
||||
names := make([]string, 0, len(r.URL.Query()))
|
||||
query := r.URL.Query()
|
||||
for n := range query {
|
||||
names = append(names, n)
|
||||
}
|
||||
sort.Strings(names)
|
||||
for _, n := range names {
|
||||
for _, v := range query[n] {
|
||||
if _, err := w.Write([]byte(" " + n + "=" + v + "\n")); err != nil {
|
||||
panic(fmt.Errorf("cannot write response: %w", err))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if _, err := w.Write([]byte("headers:\n")); err != nil {
|
||||
panic(fmt.Errorf("cannot write response: %w", err))
|
||||
}
|
||||
if v := r.Header.Get(`AccountID`); v != "" {
|
||||
if _, err := w.Write([]byte(` AccountID=` + v + "\n")); err != nil {
|
||||
panic(fmt.Errorf("cannot write response: %w", err))
|
||||
}
|
||||
}
|
||||
if v := r.Header.Get(`extra_filters`); v != "" {
|
||||
if _, err := w.Write([]byte(`extra_filters=` + v + "\n")); err != nil {
|
||||
if v := r.Header.Get(`ProjectID`); v != "" {
|
||||
if _, err := w.Write([]byte(` ProjectID=` + v + "\n")); err != nil {
|
||||
panic(fmt.Errorf("cannot write response: %w", err))
|
||||
}
|
||||
}
|
||||
@@ -632,7 +652,7 @@ users:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/foo`, string(publicKeyPEM))
|
||||
noVMAccessClaimToken := genToken(t, nil, true)
|
||||
defaultVMAccessClaimToken := genToken(t, map[string]any{
|
||||
minimalToken := genToken(t, map[string]any{
|
||||
"exp": time.Now().Add(10 * time.Minute).Unix(),
|
||||
"vm_access": map[string]any{},
|
||||
}, true)
|
||||
@@ -645,6 +665,30 @@ users:
|
||||
"vm_access": map[string]any{},
|
||||
}, false)
|
||||
|
||||
fullToken := genToken(t, map[string]any{
|
||||
"exp": time.Now().Add(10 * time.Minute).Unix(),
|
||||
"vm_access": map[string]any{
|
||||
"metrics_account_id": 123,
|
||||
"metrics_project_id": 234,
|
||||
"metrics_extra_labels": []string{
|
||||
"label1=value1",
|
||||
"label2=value2",
|
||||
},
|
||||
"metrics_extra_filters": []string{
|
||||
`{label3="value3"}`,
|
||||
`{label4="value4"}`,
|
||||
},
|
||||
"logs_account_id": 345,
|
||||
"logs_project_id": 456,
|
||||
"logs_extra_filters": []string{
|
||||
`{"namespace":"my-app","env":"prod"}`,
|
||||
},
|
||||
"logs_extra_stream_filters": []string{
|
||||
`{"team":"dev"}`,
|
||||
},
|
||||
},
|
||||
}, true)
|
||||
|
||||
// missing authorization
|
||||
request := httptest.NewRequest(`GET`, "http://some-host.com/abc", nil)
|
||||
responseExpected := `
|
||||
@@ -682,7 +726,9 @@ Unauthorized`
|
||||
request.Header.Set(`Authorization`, `Bearer `+invalidSignatureToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
/foo/abc`
|
||||
path: /foo/abc
|
||||
query:
|
||||
headers:`
|
||||
f(`
|
||||
users:
|
||||
- jwt:
|
||||
@@ -691,15 +737,17 @@ users:
|
||||
|
||||
// token with default valid vm_access claim
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/abc", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+defaultVMAccessClaimToken)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
/foo/abc`
|
||||
path: /foo/abc
|
||||
query:
|
||||
headers:`
|
||||
f(simpleCfgStr, request, responseExpected)
|
||||
|
||||
// jwt token used but no matching user with JWT token in config
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/abc", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+defaultVMAccessClaimToken)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=401
|
||||
Unauthorized`
|
||||
@@ -715,16 +763,479 @@ users:
|
||||
t.Fatalf("failed to write public key file: %s", err)
|
||||
}
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/abc", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+defaultVMAccessClaimToken)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
/foo/abc`
|
||||
path: /foo/abc
|
||||
query:
|
||||
headers:`
|
||||
f(fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_key_files:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/foo`, string(publicKeyFile)), request, responseExpected)
|
||||
url_prefix: {BACKEND}/foo`, publicKeyFile), request, responseExpected)
|
||||
|
||||
// ---- VictoriaMetrics specific tests ----
|
||||
|
||||
// extra_label and extra_filters dropped if empty in vm_access claim
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/0:0/api/v1/query
|
||||
query:
|
||||
headers:`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/?extra_label={{.MetricsExtraLabels}}&extra_filters={{.MetricsExtraFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// extra_label and extra_filters set if present in vm_access claim
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters={label3="value3"}
|
||||
extra_filters={label4="value4"}
|
||||
extra_label=label1=value1
|
||||
extra_label=label2=value2
|
||||
headers:`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/?extra_label={{.MetricsExtraLabels}}&extra_filters={{.MetricsExtraFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// extra_label and extra_filters from vm_access claim merged with statically defined
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters=aStaticFilter
|
||||
extra_filters={label3="value3"}
|
||||
extra_filters={label4="value4"}
|
||||
extra_label=aStaticLabel
|
||||
extra_label=label1=value1
|
||||
extra_label=label2=value2
|
||||
headers:`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/?extra_label=aStaticLabel&extra_filters=aStaticFilter&extra_label={{.MetricsExtraLabels}}&extra_filters={{.MetricsExtraFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// extra_labels and extra_filters set from vm_access claim should override user provided query args
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query?extra_label=userProvidedLabel&extra_filters=userProvidedFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters={label3="value3"}
|
||||
extra_filters={label4="value4"}
|
||||
extra_label=label1=value1
|
||||
extra_label=label2=value2
|
||||
headers:`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/?extra_label={{.MetricsExtraLabels}}&extra_filters={{.MetricsExtraFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// merge user provided query args with extra_labels and extra_filters from vm_access claim
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query?extra_label=userProvidedLabel&extra_filters=userProvidedFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters={label3="value3"}
|
||||
extra_filters={label4="value4"}
|
||||
extra_filters=userProvidedFilter
|
||||
extra_label=label1=value1
|
||||
extra_label=label2=value2
|
||||
extra_label=userProvidedLabel
|
||||
headers:`
|
||||
f(fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
merge_query_args: [extra_filters, extra_label]
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/?extra_label={{.MetricsExtraLabels}}&extra_filters={{.MetricsExtraFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// pass user provided query args if vm_access claim has no extra_labels and extra_filters
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query?extra_label=userProvidedLabel&extra_filters=userProvidedFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters=userProvidedFilter
|
||||
extra_label=userProvidedLabel
|
||||
headers:`
|
||||
f(fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
merge_query_args: [extra_filters, extra_label]
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// pass user provided query args if vm_access claim has no extra_labels and extra_filters
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query?extra_label=userProvidedLabel&extra_filters=userProvidedFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters=userProvidedFilter
|
||||
extra_label=userProvidedLabel
|
||||
headers:`
|
||||
f(fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// placeholders in url_map
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/api/v1/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/123:234/api/v1/query
|
||||
query:
|
||||
extra_filters={label3="value3"}
|
||||
extra_filters={label4="value4"}
|
||||
extra_label=label1=value1
|
||||
extra_label=label2=value2
|
||||
headers:`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_map:
|
||||
- src_paths: ["/api/.*"]
|
||||
url_prefix: {BACKEND}/select/{{.MetricsTenant}}/?extra_label={{.MetricsExtraLabels}}&extra_filters={{.MetricsExtraFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// ---- VictoriaLogs specific tests ----
|
||||
|
||||
// tenant headers not overwritten if set statically
|
||||
// extra_filters extra_stream_filters dropped if empty in vm_access claim
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
headers:
|
||||
AccountID=555
|
||||
ProjectID=666`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: 555"
|
||||
- "ProjectID: 666"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// tenant headers are overwritten if set as placeholders
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
headers:
|
||||
AccountID=0
|
||||
ProjectID=0`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// tenant headers are overwritten if set as placeholders
|
||||
// extra_filters extra_stream_filters from vm_access claim merged with statically defined
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters=aStaticFilter
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_stream_filters=aStaticStreamFilter
|
||||
extra_stream_filters={"team":"dev"}
|
||||
headers:
|
||||
AccountID=345
|
||||
ProjectID=456`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters=aStaticFilter&extra_stream_filters=aStaticStreamFilter&extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// tenant headers are overwritten if set as placeholders
|
||||
// extra_filters extra_stream_filters from vm_access claim merged with statically defined
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters=aStaticFilter
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_stream_filters=aStaticStreamFilter
|
||||
extra_stream_filters={"team":"dev"}
|
||||
headers:
|
||||
AccountID=345
|
||||
ProjectID=456`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters=aStaticFilter&extra_stream_filters=aStaticStreamFilter&extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// claim info should overwrite user provided query args and headers
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query?extra_filters=aUserFilter&extra_stream_filters=aUserStreamFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
request.Header.Set(`AccountID`, `aUserAccountID`)
|
||||
request.Header.Set(`ProjectID`, `aUserProjectID`)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_stream_filters={"team":"dev"}
|
||||
headers:
|
||||
AccountID=345
|
||||
ProjectID=456`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// merge user provided query args with extra_filters and extra_stream_filters from vm_access claim
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query?extra_filters=aUserFilter&extra_stream_filters=aUserStreamFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_filters=aUserFilter
|
||||
extra_stream_filters={"team":"dev"}
|
||||
extra_stream_filters=aUserStreamFilter
|
||||
headers:
|
||||
AccountID=345
|
||||
ProjectID=456`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
merge_query_args: [extra_filters, extra_stream_filters]
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// pass user provided query args if vm_access claim has no extra_labels and extra_filters
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query?extra_filters=aUserFilter&extra_stream_filters=aUserStreamFilter", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+minimalToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters=aUserFilter
|
||||
extra_stream_filters=aUserStreamFilter
|
||||
headers:
|
||||
AccountID=0
|
||||
ProjectID=0`
|
||||
f(
|
||||
fmt.Sprintf(`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
merge_query_args: [extra_filters, extra_stream_filters]
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// placeholders in url_map
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_stream_filters={"team":"dev"}
|
||||
headers:
|
||||
AccountID=345
|
||||
ProjectID=456`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_map:
|
||||
- src_paths: ["/query"]
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
// multiple placeholders in url_map for the same param
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_stream_filters={"team":"dev"}
|
||||
tenant_info=static=value
|
||||
tenant_info=345
|
||||
tenant_info=456
|
||||
headers:
|
||||
AccountID=345
|
||||
ProjectID=456`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_map:
|
||||
- src_paths: ["/query"]
|
||||
headers:
|
||||
- "AccountID: {{.LogsAccountID}}"
|
||||
- "ProjectID: {{.LogsProjectID}}"
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}&tenant_info=static=value&tenant_info={{.LogsAccountID}}&tenant_info={{.LogsProjectID}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
// client request params must be ignored by placeholders
|
||||
request = httptest.NewRequest(`GET`, "http://some-host.com/query?template_attack={{.LogsExtraFilters}}", nil)
|
||||
request.Header.Set(`Authorization`, `Bearer `+fullToken)
|
||||
request.Header.Set(`AccountID`, `{{.LogsAccountID}}`)
|
||||
responseExpected = `
|
||||
statusCode=200
|
||||
path: /select/logsql/query
|
||||
query:
|
||||
extra_filters={"namespace":"my-app","env":"prod"}
|
||||
extra_stream_filters={"team":"dev"}
|
||||
template_attack={{.LogsExtraFilters}}
|
||||
headers:
|
||||
AccountID={{.LogsAccountID}}`
|
||||
f(fmt.Sprintf(
|
||||
`
|
||||
users:
|
||||
- jwt:
|
||||
public_keys:
|
||||
- %q
|
||||
url_map:
|
||||
- src_paths: ["/query"]
|
||||
url_prefix: {BACKEND}/select/logsql/?extra_filters={{.LogsExtraFilters}}&extra_stream_filters={{.LogsExtraStreamFilters}}`, string(publicKeyPEM)),
|
||||
request,
|
||||
responseExpected,
|
||||
)
|
||||
|
||||
}
|
||||
|
||||
type fakeResponseWriter struct {
|
||||
|
||||
@@ -45,15 +45,14 @@ func insertRows(sketches []*datadogsketches.Sketch, extraLabels []prompb.Label)
|
||||
ms := sketch.ToSummary()
|
||||
for _, m := range ms {
|
||||
ctx.Labels = ctx.Labels[:0]
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10557
|
||||
ctx.AddLabel("host", sketch.Host) // newly added
|
||||
ctx.AddLabel("", m.Name)
|
||||
for _, label := range m.Labels {
|
||||
ctx.AddLabel(label.Name, label.Value)
|
||||
}
|
||||
for _, tag := range sketch.Tags {
|
||||
name, value := datadogutil.SplitTag(tag)
|
||||
if name == "host" {
|
||||
name = "exported_host"
|
||||
}
|
||||
ctx.AddLabel(name, value)
|
||||
}
|
||||
for j := range extraLabels {
|
||||
|
||||
@@ -321,19 +321,23 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
|
||||
return true
|
||||
case "/tags/tagSeries":
|
||||
graphiteTagsTagSeriesRequests.Inc()
|
||||
if err := graphite.TagsTagSeriesHandler(startTime, w, r); err != nil {
|
||||
graphiteTagsTagSeriesErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
err := &httpserver.ErrorWithStatusCode{
|
||||
Err: fmt.Errorf("graphite tag registration has been disabled and is planned to be removed in future. " +
|
||||
"See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10544"),
|
||||
StatusCode: http.StatusNotImplemented,
|
||||
}
|
||||
graphiteTagsTagSeriesErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
case "/tags/tagMultiSeries":
|
||||
graphiteTagsTagMultiSeriesRequests.Inc()
|
||||
if err := graphite.TagsTagMultiSeriesHandler(startTime, w, r); err != nil {
|
||||
graphiteTagsTagMultiSeriesErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
err := &httpserver.ErrorWithStatusCode{
|
||||
Err: fmt.Errorf("graphite tag registration has been disabled and is planned to be removed in future. " +
|
||||
"See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10544"),
|
||||
StatusCode: http.StatusNotImplemented,
|
||||
}
|
||||
graphiteTagsTagMultiSeriesErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
case "/tags":
|
||||
graphiteTagsRequests.Inc()
|
||||
|
||||
@@ -12,6 +12,7 @@ import (
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
"unicode/utf8"
|
||||
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
"github.com/VictoriaMetrics/metricsql"
|
||||
@@ -528,6 +529,14 @@ func LabelValuesHandler(qt *querytracer.Tracer, startTime time.Time, labelName s
|
||||
return err
|
||||
}
|
||||
sq := storage.NewSearchQuery(cp.start, cp.end, cp.filterss, *maxLabelsAPISeries)
|
||||
|
||||
if strings.HasPrefix(labelName, "U__") {
|
||||
// This label seems to be Unicode-encoded according to the Prometheus spec.
|
||||
// See https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values
|
||||
// Spec: https://github.com/prometheus/proposals/blob/main/proposals/0028-utf8.md
|
||||
labelName = unescapePrometheusLabelName(labelName)
|
||||
}
|
||||
|
||||
labelValues, err := netstorage.LabelValues(qt, labelName, sq, limit, cp.deadline)
|
||||
if err != nil {
|
||||
return fmt.Errorf("cannot obtain values for label %q: %w", labelName, err)
|
||||
@@ -1330,3 +1339,70 @@ func calculateMaxUniqueTimeSeriesForResource(maxConcurrentRequests, remainingMem
|
||||
func GetMaxUniqueTimeSeries() int {
|
||||
return maxUniqueTimeseriesValue
|
||||
}
|
||||
|
||||
// copied from https://github.com/prometheus/common/blob/adea6285c1c7447fcb7bfdeb6abfc6eff893e0a7/model/metric.go#L483
|
||||
// it's not possible to use direct import due to increased binary size
|
||||
func unescapePrometheusLabelName(name string) string {
|
||||
// lower function taken from strconv.atoi.
|
||||
lower := func(c byte) byte {
|
||||
return c | ('x' - 'X')
|
||||
}
|
||||
if len(name) == 0 {
|
||||
return name
|
||||
}
|
||||
escapedName, found := strings.CutPrefix(name, "U__")
|
||||
if !found {
|
||||
return name
|
||||
}
|
||||
|
||||
var unescaped strings.Builder
|
||||
TOP:
|
||||
for i := 0; i < len(escapedName); i++ {
|
||||
// All non-underscores are treated normally.
|
||||
if escapedName[i] != '_' {
|
||||
unescaped.WriteByte(escapedName[i])
|
||||
continue
|
||||
}
|
||||
i++
|
||||
if i >= len(escapedName) {
|
||||
return name
|
||||
}
|
||||
// A double underscore is a single underscore.
|
||||
if escapedName[i] == '_' {
|
||||
unescaped.WriteByte('_')
|
||||
continue
|
||||
}
|
||||
// We think we are in a UTF-8 code, process it.
|
||||
var utf8Val uint
|
||||
for j := 0; i < len(escapedName); j++ {
|
||||
// This is too many characters for a utf8 value based on the MaxRune
|
||||
// value of '\U0010FFFF'.
|
||||
if j >= 6 {
|
||||
return name
|
||||
}
|
||||
// Found a closing underscore, convert to a rune, check validity, and append.
|
||||
if escapedName[i] == '_' {
|
||||
utf8Rune := rune(utf8Val)
|
||||
if !utf8.ValidRune(utf8Rune) {
|
||||
return name
|
||||
}
|
||||
unescaped.WriteRune(utf8Rune)
|
||||
continue TOP
|
||||
}
|
||||
r := lower(escapedName[i])
|
||||
utf8Val *= 16
|
||||
switch {
|
||||
case r >= '0' && r <= '9':
|
||||
utf8Val += uint(r) - '0'
|
||||
case r >= 'a' && r <= 'f':
|
||||
utf8Val += uint(r) - 'a' + 10
|
||||
default:
|
||||
return name
|
||||
}
|
||||
i++
|
||||
}
|
||||
// Didn't find closing underscore, invalid.
|
||||
return name
|
||||
}
|
||||
return unescaped.String()
|
||||
}
|
||||
|
||||
@@ -1166,6 +1166,61 @@ func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string,
|
||||
},
|
||||
}
|
||||
return evalExpr(qt, ec, be)
|
||||
// the cached rate result could be inaccurate in edge cases, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10098
|
||||
case "rate":
|
||||
if iafc != nil {
|
||||
if !strings.EqualFold(iafc.ae.Name, "sum") {
|
||||
qt.Printf("do not apply instant rollup optimization for incremental aggregate %s()", iafc.ae.Name)
|
||||
return evalAt(qt, timestamp, window)
|
||||
}
|
||||
qt.Printf("optimized calculation for sum(rate(m[d])) as (sum(increase(m[d])) / d)")
|
||||
afe := expr.(*metricsql.AggrFuncExpr)
|
||||
fe := afe.Args[0].(*metricsql.FuncExpr)
|
||||
feIncrease := *fe
|
||||
feIncrease.Name = "increase"
|
||||
// copy RollupExpr to drop possible offset,
|
||||
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9762
|
||||
newArg := copyRollupExpr(fe.Args[0].(*metricsql.RollupExpr))
|
||||
newArg.Offset = nil
|
||||
feIncrease.Args = []metricsql.Expr{newArg}
|
||||
d := newArg.Window.Duration(ec.Step)
|
||||
if d == 0 {
|
||||
d = ec.Step
|
||||
}
|
||||
afeIncrease := *afe
|
||||
afeIncrease.Args = []metricsql.Expr{&feIncrease}
|
||||
be := &metricsql.BinaryOpExpr{
|
||||
Op: "/",
|
||||
KeepMetricNames: true,
|
||||
Left: &afeIncrease,
|
||||
Right: &metricsql.NumberExpr{
|
||||
N: float64(d) / 1000,
|
||||
},
|
||||
}
|
||||
return evalExpr(qt, ec, be)
|
||||
}
|
||||
qt.Printf("optimized calculation for instant rollup rate(m[d]) as (increase(m[d]) / d)")
|
||||
fe := expr.(*metricsql.FuncExpr)
|
||||
feIncrease := *fe
|
||||
feIncrease.Name = "increase"
|
||||
// copy RollupExpr to drop possible offset,
|
||||
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9762
|
||||
newArg := copyRollupExpr(fe.Args[0].(*metricsql.RollupExpr))
|
||||
newArg.Offset = nil
|
||||
feIncrease.Args = []metricsql.Expr{newArg}
|
||||
d := newArg.Window.Duration(ec.Step)
|
||||
if d == 0 {
|
||||
d = ec.Step
|
||||
}
|
||||
be := &metricsql.BinaryOpExpr{
|
||||
Op: "/",
|
||||
KeepMetricNames: fe.KeepMetricNames,
|
||||
Left: &feIncrease,
|
||||
Right: &metricsql.NumberExpr{
|
||||
N: float64(d) / 1000,
|
||||
},
|
||||
}
|
||||
return evalExpr(qt, ec, be)
|
||||
case "max_over_time":
|
||||
if iafc != nil {
|
||||
if !strings.EqualFold(iafc.ae.Name, "max") {
|
||||
|
||||
@@ -4018,6 +4018,12 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(scalar)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(123, 456, time())`
|
||||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(single-value-no-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_quantile(0.6, label_set(100, "foo", "bar"))`
|
||||
@@ -4030,6 +4036,12 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(single-value-no-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(123,456, label_set(100, "foo", "bar"))`
|
||||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(single-value-invalid-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_quantile(0.6, label_set(100, "le", "foobar"))`
|
||||
@@ -4042,6 +4054,12 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(single-value-invalid-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(50, 60, label_set(100, "le", "foobar"))`
|
||||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(single-value-inf-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_quantile(0.6, label_set(100, "le", "+Inf"))`
|
||||
@@ -4183,6 +4201,28 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(single-value-valid-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(0, 100, label_set(100, "le", "200"))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0.5, 0.5, 0.5, 0.5, 0.5, 0.5},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(single-value-valid-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(200, 300, label_set(100, "le", "200"))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0, 0, 0, 0, 0, 0},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(single-value-valid-le, boundsLabel)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(histogram_quantile(0.6, label_set(100, "le", "200"), "foobar"))`
|
||||
@@ -4212,7 +4252,7 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r1, r2, r3}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(single-value-valid-le, boundsLabel)`, func(t *testing.T) {
|
||||
t.Run(`histogram_share(single-value-valid-le, boundsLabel)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(histogram_share(120, label_set(100, "le", "200"), "foobar"))`
|
||||
r1 := netstorage.Result{
|
||||
@@ -4311,7 +4351,37 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_share(single-value-valid-le-mid-le)`, func(t *testing.T) {
|
||||
t.Run(`histogram_fraction(single-value-valid-le-max-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(0,100, (
|
||||
label_set(100, "le", "100"),
|
||||
label_set(40, "le", "50"),
|
||||
label_set(0, "le", "10"),
|
||||
))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{1, 1, 1, 1, 1, 1},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(single-value-valid-le-min-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(0,10, (
|
||||
label_set(100, "le", "100"),
|
||||
label_set(40, "le", "50"),
|
||||
label_set(0, "le", "10"),
|
||||
))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0, 0, 0, 0, 0, 0},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_share(single-value-valid-le-mid-le-1)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_share(105, (
|
||||
label_set(100, "le", "200"),
|
||||
@@ -4325,6 +4395,34 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_share(single-value-valid-le-mid-le-2)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_share(55, (
|
||||
label_set(100, "le", "200"),
|
||||
label_set(0, "le", "55"),
|
||||
))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0, 0, 0, 0, 0, 0},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(single-value-valid-le-mid-le)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(55,105, (
|
||||
label_set(100, "le", "200"),
|
||||
label_set(0, "le", "55"),
|
||||
))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0.3448275862068966, 0.3448275862068966, 0.3448275862068966, 0.3448275862068966, 0.3448275862068966, 0.3448275862068966},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(single-value-valid-le-min-phi-no-zero-bucket)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_quantile(0, label_set(100, "le", "200"))`
|
||||
@@ -4358,6 +4456,17 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(scalar-phi)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(25, time() / 8, label_set(100, "le", "200"))`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0.5, 0.625, 0.75, 0.875, 0.875, 0.875},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(duplicate-le)`, func(t *testing.T) {
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3225
|
||||
t.Parallel()
|
||||
@@ -4439,6 +4548,36 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r1, r2}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(valid)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(histogram_fraction(0, 25,
|
||||
label_set(90, "foo", "bar", "le", "10")
|
||||
or label_set(100, "foo", "bar", "le", "30")
|
||||
or label_set(300, "foo", "bar", "le", "+Inf")
|
||||
or label_set(200, "tag", "xx", "le", "10")
|
||||
or label_set(300, "tag", "xx", "le", "30")
|
||||
))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0.325, 0.325, 0.325, 0.325, 0.325, 0.325},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r1.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
}}
|
||||
r2 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0.9166666666666666, 0.9166666666666666, 0.9166666666666666, 0.9166666666666666, 0.9166666666666666, 0.9166666666666666},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("tag"),
|
||||
Value: []byte("xx"),
|
||||
}}
|
||||
resultExpected := []netstorage.Result{r1, r2}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(negative-bucket-count)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_quantile(0.6,
|
||||
@@ -4555,6 +4694,25 @@ func TestExecSuccess(t *testing.T) {
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_fraction(normal-bucket-count)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `histogram_fraction(22,35,
|
||||
label_set(0, "foo", "bar", "le", "10")
|
||||
or label_set(100, "foo", "bar", "le", "30")
|
||||
or label_set(300, "foo", "bar", "le", "+Inf")
|
||||
)`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{0.1333333333333333, 0.1333333333333333, 0.1333333333333333, 0.1333333333333333, 0.1333333333333333, 0.1333333333333333},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
}}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`histogram_quantile(normal-bucket-count, boundsLabel)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(histogram_quantile(0.2,
|
||||
|
||||
@@ -51,6 +51,7 @@ var transformFuncs = map[string]transformFunc{
|
||||
"exp": newTransformFuncOneArg(transformExp),
|
||||
"floor": newTransformFuncOneArg(transformFloor),
|
||||
"histogram_avg": transformHistogramAvg,
|
||||
"histogram_fraction": transformHistogramFraction,
|
||||
"histogram_quantile": transformHistogramQuantile,
|
||||
"histogram_quantiles": transformHistogramQuantiles,
|
||||
"histogram_share": transformHistogramShare,
|
||||
@@ -662,13 +663,13 @@ func transformHistogramShare(tfa *transformFuncArg) ([]*timeseries, error) {
|
||||
if math.IsNaN(leReq) || len(xss) == 0 {
|
||||
return nan, nan, nan
|
||||
}
|
||||
fixBrokenBuckets(i, xss)
|
||||
if leReq < 0 {
|
||||
return 0, 0, 0
|
||||
}
|
||||
if math.IsInf(leReq, 1) {
|
||||
return 1, 1, 1
|
||||
}
|
||||
fixBrokenBuckets(i, xss)
|
||||
var vPrev, lePrev float64
|
||||
for _, xs := range xss {
|
||||
v := xs.ts.Values[i]
|
||||
@@ -729,6 +730,85 @@ func transformHistogramShare(tfa *transformFuncArg) ([]*timeseries, error) {
|
||||
return rvs, nil
|
||||
}
|
||||
|
||||
// histogram_fraction is a shortcut for `histogram_share(upperLe, buckets) - histogram_share(lowerLe, buckets)`;
|
||||
// histogram_fraction(x, y) = histogram_fraction(-Inf, y) - histogram_fraction(-Inf, x) = histogram_share(y) - histogram_share(x).
|
||||
// This function is supported by PromQL.
|
||||
func transformHistogramFraction(tfa *transformFuncArg) ([]*timeseries, error) {
|
||||
args := tfa.args
|
||||
if err := expectTransformArgsNum(args, 3); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
lowerles, err := getScalar(args[0], 0)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot parse lower le: %w", err)
|
||||
}
|
||||
upperles, err := getScalar(args[1], 1)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot parse upper le: %w", err)
|
||||
}
|
||||
if lowerles[0] >= upperles[0] {
|
||||
return nil, fmt.Errorf("lower le cannot be greater than upper le; got lower le: %f, upper le: %f", lowerles[0], upperles[0])
|
||||
}
|
||||
|
||||
// Convert buckets with `vmrange` labels to buckets with `le` labels.
|
||||
tss := vmrangeBucketsToLE(args[2])
|
||||
|
||||
// Group metrics by all tags excluding "le"
|
||||
m := groupLeTimeseries(tss)
|
||||
|
||||
fraction := func(i int, lowerle, upperle float64, xss []leTimeseries) (q float64) {
|
||||
if math.IsNaN(lowerle) || math.IsNaN(upperle) || len(xss) == 0 {
|
||||
return nan
|
||||
}
|
||||
fixBrokenBuckets(i, xss)
|
||||
share := func(leReq float64) float64 {
|
||||
if leReq < 0 {
|
||||
return 0
|
||||
}
|
||||
if math.IsInf(leReq, 1) {
|
||||
return 1
|
||||
}
|
||||
var vPrev, lePrev float64
|
||||
for _, xs := range xss {
|
||||
v := xs.ts.Values[i]
|
||||
le := xs.le
|
||||
if leReq >= le {
|
||||
vPrev = v
|
||||
lePrev = le
|
||||
continue
|
||||
}
|
||||
// precondition: lePrev <= leReq < le
|
||||
vLast := xss[len(xss)-1].ts.Values[i]
|
||||
lower := vPrev / vLast
|
||||
if math.IsInf(le, 1) {
|
||||
return lower
|
||||
}
|
||||
if lePrev == leReq {
|
||||
return lower
|
||||
}
|
||||
q = lower + (v-vPrev)/vLast*(leReq-lePrev)/(le-lePrev)
|
||||
return q
|
||||
}
|
||||
return 1
|
||||
}
|
||||
return share(upperle) - share(lowerle)
|
||||
}
|
||||
rvs := make([]*timeseries, 0, len(m))
|
||||
for _, xss := range m {
|
||||
sort.Slice(xss, func(i, j int) bool {
|
||||
return xss[i].le < xss[j].le
|
||||
})
|
||||
xss = mergeSameLE(xss)
|
||||
dst := xss[0].ts
|
||||
for i := range dst.Values {
|
||||
q := fraction(i, lowerles[i], upperles[i], xss)
|
||||
dst.Values[i] = q
|
||||
}
|
||||
rvs = append(rvs, dst)
|
||||
}
|
||||
return rvs, nil
|
||||
}
|
||||
|
||||
func transformHistogramAvg(tfa *transformFuncArg) ([]*timeseries, error) {
|
||||
args := tfa.args
|
||||
if err := expectTransformArgsNum(args, 1); err != nil {
|
||||
|
||||
@@ -1227,7 +1227,10 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
|
||||
#### buckets_limit
|
||||
|
||||
`buckets_limit(limit, buckets)` is a [transform function](#transform-functions), which limits the number
|
||||
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
|
||||
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
|
||||
|
||||
The result will preserve the first and the last bucket to improve accuracy for min and max values.
|
||||
So, if the `limit` is greater than 0 and less than 3, the function will still return 3 buckets: the first bucket, the last bucket, and a selected bucket.
|
||||
|
||||
See also [prometheus_buckets](#prometheus_buckets) and [histogram_quantile](#histogram_quantile).
|
||||
|
||||
@@ -1381,6 +1384,15 @@ It can be used for calculating the average over the given time range across mult
|
||||
For example, `histogram_avg(sum(histogram_over_time(response_time_duration_seconds[5m])) by (vmrange,job))` would return the average response time
|
||||
per each `job` over the last 5 minutes.
|
||||
|
||||
#### histogram_fraction
|
||||
|
||||
`histogram_fraction(lowerLe, upperLe, buckets)` is a [transform function](#transform-functions), which calculates the share (in the range `[0...1]`) for `buckets` that fall between `lowerLe` and `upperLe`.
|
||||
The result of `histogram_fraction(lowerLe, upperLe, buckets)` is equivalent to `histogram_share(upperLe, buckets) - histogram_share(lowerLe, buckets)`.
|
||||
|
||||
This function is supported by PromQL.
|
||||
|
||||
See also [histogram_share](#histogram_share).
|
||||
|
||||
#### histogram_quantile
|
||||
|
||||
`histogram_quantile(phi, buckets)` is a [transform function](#transform-functions), which calculates `phi`-[percentile](https://en.wikipedia.org/wiki/Percentile)
|
||||
@@ -37,7 +37,7 @@
|
||||
<meta property="og:title" content="UI for VictoriaMetrics">
|
||||
<meta property="og:url" content="https://victoriametrics.com/">
|
||||
<meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data">
|
||||
<script type="module" crossorigin src="./assets/index-C1hTBemk.js"></script>
|
||||
<script type="module" crossorigin src="./assets/index-DIRuq0ns.js"></script>
|
||||
<link rel="modulepreload" crossorigin href="./assets/vendor-BR6Q0Fin.js">
|
||||
<link rel="stylesheet" crossorigin href="./assets/vendor-D1GxaB_c.css">
|
||||
<link rel="stylesheet" crossorigin href="./assets/index-D7CzMv1O.css">
|
||||
|
||||
@@ -62,7 +62,7 @@ var (
|
||||
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-limiter . "+
|
||||
"See also -storage.maxHourlySeries")
|
||||
|
||||
minFreeDiskSpaceBytes = flagutil.NewBytes("storage.minFreeDiskSpaceBytes", 10e6, "The minimum free disk space at -storageDataPath after which the storage stops accepting new data")
|
||||
minFreeDiskSpaceBytes = flagutil.NewBytes("storage.minFreeDiskSpaceBytes", 100e6, "The minimum free disk space at -storageDataPath after which the storage stops accepting new data")
|
||||
|
||||
cacheSizeStorageTSID = flagutil.NewBytes("storage.cacheSizeStorageTSID", 0, "Overrides max size for storage/tsid cache. "+
|
||||
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning")
|
||||
|
||||
@@ -1227,7 +1227,10 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
|
||||
#### buckets_limit
|
||||
|
||||
`buckets_limit(limit, buckets)` is a [transform function](#transform-functions), which limits the number
|
||||
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
|
||||
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
|
||||
|
||||
The result will preserve the first and the last bucket to improve accuracy for min and max values.
|
||||
So, if the `limit` is greater than 0 and less than 3, the function will still return 3 buckets: the first bucket, the last bucket, and a selected bucket.
|
||||
|
||||
See also [prometheus_buckets](#prometheus_buckets) and [histogram_quantile](#histogram_quantile).
|
||||
|
||||
@@ -1381,6 +1384,15 @@ It can be used for calculating the average over the given time range across mult
|
||||
For example, `histogram_avg(sum(histogram_over_time(response_time_duration_seconds[5m])) by (vmrange,job))` would return the average response time
|
||||
per each `job` over the last 5 minutes.
|
||||
|
||||
#### histogram_fraction
|
||||
|
||||
`histogram_fraction(lowerLe, upperLe, buckets)` is a [transform function](#transform-functions), which calculates the share (in the range `[0...1]`) for `buckets` that fall between `lowerLe` and `upperLe`.
|
||||
The result of `histogram_fraction(lowerLe, upperLe, buckets)` is equivalent to `histogram_share(upperLe, buckets) - histogram_share(lowerLe, buckets)`.
|
||||
|
||||
This function is supported by PromQL.
|
||||
|
||||
See also [histogram_share](#histogram_share).
|
||||
|
||||
#### histogram_quantile
|
||||
|
||||
`histogram_quantile(phi, buckets)` is a [transform function](#transform-functions), which calculates `phi`-[percentile](https://en.wikipedia.org/wiki/Percentile)
|
||||
|
||||
@@ -33,6 +33,8 @@ type PrometheusQuerier interface {
|
||||
// separate interface or rename this interface to allow for multiple querier
|
||||
// types.
|
||||
GraphiteMetricsIndex(t *testing.T, opts QueryOpts) GraphiteMetricsIndexResponse
|
||||
GraphiteTagsTagSeries(t *testing.T, record string, opts QueryOpts)
|
||||
GraphiteTagsTagMultiSeries(t *testing.T, records []string, opts QueryOpts)
|
||||
}
|
||||
|
||||
// Writer contains methods for writing new data
|
||||
|
||||
@@ -60,3 +60,60 @@ func TestClusterMetricsIndex(t *testing.T) {
|
||||
|
||||
testMetricsIndex(tc.T(), sut)
|
||||
}
|
||||
|
||||
// testTagSeries tests the registration of new time series in index.
|
||||
//
|
||||
// See https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb.
|
||||
func testTagSeries(tc *apptest.TestCase, sut apptest.PrometheusWriteQuerier, getStorageMetric func(string) int) {
|
||||
t := tc.T()
|
||||
|
||||
assertNewTimeseriesCreatedTotal := func(want int) {
|
||||
tc.Assert(&apptest.AssertOptions{
|
||||
Msg: "unexpected vm_new_timeseries_created_total",
|
||||
Got: func() any {
|
||||
return getStorageMetric("vm_new_timeseries_created_total")
|
||||
},
|
||||
Want: want,
|
||||
})
|
||||
}
|
||||
|
||||
rec := "disk.used;rack=a1;datacenter=dc1;server=web01"
|
||||
sut.GraphiteTagsTagSeries(t, rec, apptest.QueryOpts{})
|
||||
assertNewTimeseriesCreatedTotal(0)
|
||||
|
||||
recs := []string{
|
||||
"metric.yyy;t2=a;t1=b;t3=c",
|
||||
"metric.zzz;t5=d;t4=e;t6=f",
|
||||
"metric.xxx;t8=g;t7=h;t9=i",
|
||||
}
|
||||
sut.GraphiteTagsTagMultiSeries(t, recs, apptest.QueryOpts{})
|
||||
assertNewTimeseriesCreatedTotal(0)
|
||||
}
|
||||
|
||||
func TestSingleTagSeries(t *testing.T) {
|
||||
tc := apptest.NewTestCase(t)
|
||||
defer tc.Stop()
|
||||
|
||||
sut := tc.MustStartDefaultVmsingle()
|
||||
getStorageMetric := func(name string) int {
|
||||
return sut.GetIntMetric(t, name)
|
||||
}
|
||||
|
||||
testTagSeries(tc, sut, getStorageMetric)
|
||||
}
|
||||
|
||||
func TestClusterTagSeries(t *testing.T) {
|
||||
tc := apptest.NewTestCase(t)
|
||||
defer tc.Stop()
|
||||
|
||||
sut := tc.MustStartDefaultCluster()
|
||||
getStorageMetric := func(name string) int {
|
||||
var v int
|
||||
for _, s := range sut.Vmstorages {
|
||||
v += s.GetIntMetric(t, name)
|
||||
}
|
||||
return v
|
||||
}
|
||||
|
||||
testTagSeries(tc, sut, getStorageMetric)
|
||||
}
|
||||
|
||||
@@ -32,6 +32,8 @@ func TestSingleInstantQuery(t *testing.T) {
|
||||
testInstantQueryDoesNotReturnStaleNaNs(t, sut)
|
||||
|
||||
testQueryRangeWithAtModifier(t, sut)
|
||||
|
||||
testLabelValuesWithUTFNames(t, sut)
|
||||
}
|
||||
|
||||
func TestClusterInstantQuery(t *testing.T) {
|
||||
@@ -44,6 +46,8 @@ func TestClusterInstantQuery(t *testing.T) {
|
||||
testInstantQueryDoesNotReturnStaleNaNs(t, sut)
|
||||
|
||||
testQueryRangeWithAtModifier(t, sut)
|
||||
|
||||
testLabelValuesWithUTFNames(t, sut)
|
||||
}
|
||||
|
||||
func testInstantQueryWithUTFNames(t *testing.T, sut apptest.PrometheusWriteQuerier) {
|
||||
@@ -236,3 +240,46 @@ func testQueryRangeWithAtModifier(t *testing.T, sut apptest.PrometheusWriteQueri
|
||||
t.Fatalf("unexpected error: %q", resp.Error)
|
||||
}
|
||||
}
|
||||
|
||||
// This test checks that label values are decoded from UTF-8 according to Prometheus spec.
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10446
|
||||
// Spec: https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values
|
||||
func testLabelValuesWithUTFNames(t *testing.T, sut apptest.PrometheusWriteQuerier) {
|
||||
|
||||
timestamp := millis("2025-01-01T00:00:00Z")
|
||||
data := prompb.WriteRequest{
|
||||
Timeseries: []prompb.TimeSeries{
|
||||
{
|
||||
Labels: []prompb.Label{
|
||||
{Name: "__name__", Value: "labelvals"},
|
||||
{Name: "kubernetes_something/special&' chars", Value: "漢©®€£"},
|
||||
{Name: "3👋tfにちは", Value: "漢©®€£"},
|
||||
},
|
||||
Samples: []prompb.Sample{
|
||||
{Value: 1, Timestamp: timestamp},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
sut.PrometheusAPIV1Write(t, data, apptest.QueryOpts{})
|
||||
sut.ForceFlush(t)
|
||||
|
||||
cmpOptions := []cmp.Option{}
|
||||
|
||||
// encoded via prometheus model.EscapeName(string,model.ValueEncodingEscaping)
|
||||
want := map[string][]string{
|
||||
"__name__": {"labelvals"},
|
||||
"U__kubernetes__something_2f_special_26__27__20_chars": {"漢©®€£"},
|
||||
"U___33__1f44b_tf_306b__3061__306f_": {"漢©®€£"},
|
||||
}
|
||||
for labelName, expected := range want {
|
||||
got := sut.PrometheusAPIV1LabelValues(t, labelName, `{__name__="labelvals"}`, apptest.QueryOpts{
|
||||
Start: fmt.Sprintf("%d", timestamp),
|
||||
End: fmt.Sprintf("%d", timestamp),
|
||||
})
|
||||
if diff := cmp.Diff(expected, got.Data, cmpOptions...); diff != "" {
|
||||
t.Errorf("unexpected response (-want, +got):\n%s", diff)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -171,6 +171,26 @@ func TestClusterMultiTenantSelect(t *testing.T) {
|
||||
t.Errorf("unexpected response (-want, +got):\n%s", diff)
|
||||
}
|
||||
|
||||
// /api/v1/label/../value with extra_filters
|
||||
|
||||
wantVR := apptest.NewPrometheusAPIV1LabelValuesResponse(t,
|
||||
`{"data": [
|
||||
"5"
|
||||
]
|
||||
}`)
|
||||
wantSR.Sort()
|
||||
gotVR := vmselect.PrometheusAPIV1LabelValues(t, "vm_account_id", "foo", apptest.QueryOpts{
|
||||
Start: "2022-05-10T08:00:00.000Z",
|
||||
End: "2022-05-10T08:30:00.000Z",
|
||||
ExtraFilters: []string{`{vm_account_id="5"}`},
|
||||
Tenant: "multitenant",
|
||||
})
|
||||
gotSR.Sort()
|
||||
|
||||
if diff := cmp.Diff(wantVR, gotVR, cmpopts.IgnoreFields(apptest.PrometheusAPIV1LabelValuesResponse{}, "Status", "IsPartial")); diff != "" {
|
||||
t.Errorf("unexpected response (-want, +got):\n%s", diff)
|
||||
}
|
||||
|
||||
// Delete series from specific tenant
|
||||
vmselect.APIV1AdminTSDBDeleteSeries(t, "foo_bar", apptest.QueryOpts{
|
||||
Tenant: "5:15",
|
||||
|
||||
@@ -61,8 +61,8 @@ func TestClusterSearchWithDisabledPerDayIndex(t *testing.T) {
|
||||
|
||||
type startSUTFunc func(name string, disablePerDayIndex bool) apptest.PrometheusWriteQuerier
|
||||
|
||||
// testDisablePerDayIndex_Search shows what search results to expect when data
|
||||
// is first inserted with per-day index enabled and then with per-day index
|
||||
// testSearchWithDisabledPerDayIndex shows what search results to expect when
|
||||
// data is first inserted with per-day index enabled and then with per-day index
|
||||
// disabled.
|
||||
//
|
||||
// The data inserted with enabled per-day index must be searchable with disabled
|
||||
@@ -112,8 +112,8 @@ func testSearchWithDisabledPerDayIndex(tc *apptest.TestCase, start startSUTFunc)
|
||||
})
|
||||
}
|
||||
|
||||
// Start vmsingle with enabled per-day index, insert sample1, and confirm it
|
||||
// is searchable.
|
||||
// Start SUT with enabled per-day index, insert sample1, and confirm it is
|
||||
// searchable.
|
||||
sut := start("with-per-day-index", false)
|
||||
sample1 := []string{"metric1 111 1704067200000"} // 2024-01-01T00:00:00Z
|
||||
sut.PrometheusAPIV1ImportPrometheus(t, sample1, apptest.QueryOpts{})
|
||||
@@ -130,8 +130,8 @@ func testSearchWithDisabledPerDayIndex(tc *apptest.TestCase, start startSUTFunc)
|
||||
},
|
||||
})
|
||||
|
||||
// Restart vmsingle with disabled per-day index, insert sample2, and confirm
|
||||
// that both sample1 and sample2 is searchable.
|
||||
// Restart SUT with disabled per-day index, insert sample2, and confirm that
|
||||
// both sample1 and sample2 is searchable.
|
||||
tc.StopPrometheusWriteQuerier(sut)
|
||||
sut = start("without-per-day-index", true)
|
||||
sample2 := []string{"metric2 222 1704067200000"} // 2024-01-01T00:00:00Z
|
||||
@@ -156,8 +156,8 @@ func testSearchWithDisabledPerDayIndex(tc *apptest.TestCase, start startSUTFunc)
|
||||
},
|
||||
})
|
||||
|
||||
// Insert sample1 but for a different date, restart vmsingle with enabled
|
||||
// per-day index and confirm that:
|
||||
// Insert sample1 but for a different date, restart SUT with enabled per-day
|
||||
// index and confirm that:
|
||||
// - sample1 is searchable within the time range of Jan 1st
|
||||
// - sample1 is not searchable within the time range of Jan 20th
|
||||
// - sample1 is searchable within the time range of Jan 1st-20th (because
|
||||
|
||||
@@ -298,13 +298,14 @@ func (app *Vminsert) String() string {
|
||||
func (app *Vminsert) sendBlocking(t *testing.T, numRecordsToSend int, send func()) {
|
||||
t.Helper()
|
||||
|
||||
wantRowsSentCount := app.rpcRowsSentTotal(t) + numRecordsToSend
|
||||
|
||||
send()
|
||||
|
||||
const (
|
||||
retries = 20
|
||||
period = 100 * time.Millisecond
|
||||
)
|
||||
wantRowsSentCount := app.rpcRowsSentTotal(t) + numRecordsToSend
|
||||
for range retries {
|
||||
d := app.rpcRowsSentTotal(t)
|
||||
if d >= wantRowsSentCount {
|
||||
|
||||
@@ -307,6 +307,37 @@ func (app *Vmselect) GraphiteMetricsIndex(t *testing.T, opts QueryOpts) Graphite
|
||||
return index
|
||||
}
|
||||
|
||||
// GraphiteTagsTagSeries is a test helper function that registers Graphite tags
|
||||
// for a single time series by sending a HTTP POST request to
|
||||
// /graphite/tags/tagSeries vmsingle endpoint.
|
||||
func (app *Vmselect) GraphiteTagsTagSeries(t *testing.T, record string, opts QueryOpts) {
|
||||
t.Helper()
|
||||
|
||||
url := fmt.Sprintf("http://%s/select/%s/graphite/tags/tagSeries", app.httpListenAddr, opts.getTenant())
|
||||
values := opts.asURLValues()
|
||||
values.Add("path", record)
|
||||
|
||||
_, statusCode := app.cli.PostForm(t, url, values)
|
||||
if got, want := statusCode, http.StatusNotImplemented; got != want {
|
||||
t.Fatalf("unexpected status code: got %d, want %d", got, want)
|
||||
}
|
||||
}
|
||||
|
||||
func (app *Vmselect) GraphiteTagsTagMultiSeries(t *testing.T, records []string, opts QueryOpts) {
|
||||
t.Helper()
|
||||
|
||||
url := fmt.Sprintf("http://%s/select/%s/graphite/tags/tagMultiSeries", app.httpListenAddr, opts.getTenant())
|
||||
values := opts.asURLValues()
|
||||
for _, rec := range records {
|
||||
values.Add("path", rec)
|
||||
}
|
||||
|
||||
_, statusCode := app.cli.PostForm(t, url, values)
|
||||
if got, want := statusCode, http.StatusNotImplemented; got != want {
|
||||
t.Fatalf("unexpected status code: got %d, want %d", got, want)
|
||||
}
|
||||
}
|
||||
|
||||
// APIV1AdminTenants sends a query to a /admin/tenants endpoint
|
||||
func (app *Vmselect) APIV1AdminTenants(t *testing.T) *AdminTenantsResponse {
|
||||
t.Helper()
|
||||
|
||||
@@ -414,6 +414,37 @@ func (app *Vmsingle) GraphiteMetricsIndex(t *testing.T, _ QueryOpts) GraphiteMet
|
||||
return index
|
||||
}
|
||||
|
||||
// GraphiteTagsTagSeries is a test helper function that registers Graphite tags
|
||||
// for a single time series by sending a HTTP POST request to
|
||||
// /graphite/tags/tagSeries vmsingle endpoint.
|
||||
func (app *Vmsingle) GraphiteTagsTagSeries(t *testing.T, record string, opts QueryOpts) {
|
||||
t.Helper()
|
||||
|
||||
url := fmt.Sprintf("http://%s/graphite/tags/tagSeries", app.httpListenAddr)
|
||||
values := opts.asURLValues()
|
||||
values.Add("path", record)
|
||||
|
||||
_, statusCode := app.cli.PostForm(t, url, values)
|
||||
if got, want := statusCode, http.StatusNotImplemented; got != want {
|
||||
t.Fatalf("unexpected status code: got %d, want %d", got, want)
|
||||
}
|
||||
}
|
||||
|
||||
func (app *Vmsingle) GraphiteTagsTagMultiSeries(t *testing.T, records []string, opts QueryOpts) {
|
||||
t.Helper()
|
||||
|
||||
url := fmt.Sprintf("http://%s/graphite/tags/tagMultiSeries", app.httpListenAddr)
|
||||
values := opts.asURLValues()
|
||||
for _, rec := range records {
|
||||
values.Add("path", rec)
|
||||
}
|
||||
|
||||
_, statusCode := app.cli.PostForm(t, url, values)
|
||||
if got, want := statusCode, http.StatusNotImplemented; got != want {
|
||||
t.Fatalf("unexpected status code: got %d, want %d", got, want)
|
||||
}
|
||||
}
|
||||
|
||||
// APIV1StatusMetricNamesStats sends a query to a /api/v1/status/metric_names_stats endpoint
|
||||
// and returns the statistics response for given params.
|
||||
//
|
||||
|
||||
@@ -31,12 +31,6 @@
|
||||
"id": "table",
|
||||
"name": "Table",
|
||||
"version": ""
|
||||
},
|
||||
{
|
||||
"type": "datasource",
|
||||
"id": "victoriametrics-metrics-datasource",
|
||||
"name": "VictoriaMetrics",
|
||||
"version": "0.16.0"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
@@ -61,6 +55,7 @@
|
||||
}
|
||||
]
|
||||
},
|
||||
"description": "Overview of alerts state in time based on metrics generated by VictoriaMetrics vmalert.",
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
@@ -179,7 +174,7 @@
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "sort_desc(topk_max($topk, sum(vmalert_alerts_firing{group=~\"$group\"}) by (alertname)))",
|
||||
"expr": "sort_desc(topk_max($topk, sum(vmalert_alerts_firing{job=~\"$job\",instance=~\"$instance\",group=~\"$group\"}) by (alertname)))",
|
||||
"format": "time_series",
|
||||
"instant": false,
|
||||
"legendFormat": "__auto",
|
||||
@@ -247,7 +242,7 @@
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "count(count(vmalert_alerting_rules_errors_total{group=~\"$group\"}) by (group))",
|
||||
"expr": "count(count(vmalert_alerting_rules_errors_total{job=~\"$job\",instance=~\"$instance\",group=~\"$group\"}) by (group))",
|
||||
"interval": "",
|
||||
"legendFormat": "",
|
||||
"range": true,
|
||||
@@ -314,7 +309,7 @@
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "count(vmalert_alerting_rules_errors_total{group=~\"$group\"})",
|
||||
"expr": "count(vmalert_alerting_rules_errors_total{job=~\"$job\",instance=~\"$instance\",group=~\"$group\"})",
|
||||
"instant": false,
|
||||
"interval": "",
|
||||
"legendFormat": "",
|
||||
@@ -403,7 +398,7 @@
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "topk_max(100, sum(increases_over_time(vmalert_alerts_firing{group=~\"$group\"}[$__range])) by(group, alertname) > 0)",
|
||||
"expr": "topk_max(100, sum(increases_over_time(vmalert_alerts_firing{job=~\"$job\",instance=~\"$instance\",group=~\"$group\"}[$__range])) by(group, alertname) > 0)",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"key": "Q-3934f0fb-8ad6-4519-a98d-c26d0fc6b312-0",
|
||||
@@ -506,6 +501,24 @@
|
||||
"value": 200
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byName",
|
||||
"options": "Alert"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "links",
|
||||
"value": [
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Alert",
|
||||
"url": "/alerting/${ds:text}/${__value.text}/find"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
@@ -538,7 +551,7 @@
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "topk_max($topk, sum(increases_over_time(vmalert_alerts_firing{group=~\"$group\"}[$__range])) by (group, alertname) > 0)",
|
||||
"expr": "topk_max($topk, sum(increases_over_time(vmalert_alerts_firing{job=~\"$job\",instance=~\"$instance\",group=~\"$group\"}[$__range])) by (group, alertname) > 0)",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"key": "Q-3934f0fb-8ad6-4519-a98d-c26d0fc6b312-0",
|
||||
@@ -590,6 +603,46 @@
|
||||
"regex": "",
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"definition": "label_values(vm_app_version{version=~\"^vmalert.*\"},job)",
|
||||
"includeAll": true,
|
||||
"multi": true,
|
||||
"name": "job",
|
||||
"options": [],
|
||||
"query": {
|
||||
"query": "label_values(vm_app_version{version=~\"^vmalert.*\"},job)",
|
||||
"refId": "StandardVariableQuery"
|
||||
},
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"type": "query"
|
||||
},
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"definition": "label_values(vm_app_version{job=~\"$job\"},instance)",
|
||||
"includeAll": true,
|
||||
"multi": true,
|
||||
"name": "instance",
|
||||
"options": [],
|
||||
"query": {
|
||||
"query": "label_values(vm_app_version{job=~\"$job\"},instance)",
|
||||
"refId": "StandardVariableQuery"
|
||||
},
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"type": "query"
|
||||
},
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
@@ -659,4 +712,4 @@
|
||||
"uid": "ehXxUsGSk",
|
||||
"version": 1,
|
||||
"weekStart": ""
|
||||
}
|
||||
}
|
||||
@@ -91,8 +91,26 @@
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {},
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
@@ -103,17 +121,42 @@
|
||||
},
|
||||
"id": 24,
|
||||
"options": {
|
||||
"code": {
|
||||
"language": "plaintext",
|
||||
"showLineNumbers": false,
|
||||
"showMiniMap": false
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"percentChangeColorMode": "standard",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "/^short_version$/",
|
||||
"values": false
|
||||
},
|
||||
"content": "<div style=\"text-align: center;\">$version</div>",
|
||||
"mode": "markdown"
|
||||
"showPercentChange": false,
|
||||
"textMode": "value",
|
||||
"wideLayout": true
|
||||
},
|
||||
"pluginVersion": "12.3.0",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "vm_app_version{job=~\"$job\",instance=~\"$instance\"}",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"interval": "",
|
||||
"legendFormat": "{{short_version}}",
|
||||
"range": false,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Version",
|
||||
"type": "text"
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
|
||||
@@ -1521,7 +1521,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=203&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=203&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1650,12 +1650,12 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - RSS memory usage",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=189&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=189&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
},
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - Memory usage breakdown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=225&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=225&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1770,7 +1770,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1888,7 +1888,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5077,7 +5077,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=224&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=224&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5129,7 +5129,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
|
||||
"legendFormat": "__auto",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
@@ -6107,7 +6107,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=$job_storage&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=$job_storage&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -6257,7 +6257,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=$job_storage&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=$job_storage&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -6908,7 +6908,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=200&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=200&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -7178,7 +7178,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=201&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=201&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -8042,7 +8042,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=$job_select&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=$job_select&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -8186,7 +8186,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=$job_select&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=$job_select&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -9375,7 +9375,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=$job_insert&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=192&var-job=$job_insert&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -9519,7 +9519,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=$job_insert&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz?viewPanel=190&var-job=$job_insert&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -11153,7 +11153,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
|
||||
"legendFormat": "{{instance}} ({{job}})",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
|
||||
@@ -1521,7 +1521,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=154&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=154&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1644,12 +1644,12 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - RSS memory usage",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=148&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=148&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
},
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - Memory usage breakdown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=141&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=141&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1764,7 +1764,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=151&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=151&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1882,7 +1882,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=149&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=149&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5122,7 +5122,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=140&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=140&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5174,7 +5174,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
|
||||
"legendFormat": "__auto",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
@@ -5356,7 +5356,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=150&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=150&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5973,7 +5973,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=153&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=153&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -6237,7 +6237,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk?viewPanel=152&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk?viewPanel=152&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -7667,7 +7667,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
|
||||
"legendFormat": "{{instance}} ({{job}})",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
|
||||
@@ -92,29 +92,72 @@
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "victoriametrics-metrics-datasource",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {},
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 3,
|
||||
"h": 4,
|
||||
"w": 4,
|
||||
"x": 0,
|
||||
"y": 1
|
||||
},
|
||||
"id": 24,
|
||||
"options": {
|
||||
"code": {
|
||||
"language": "plaintext",
|
||||
"showLineNumbers": false,
|
||||
"showMiniMap": false
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"percentChangeColorMode": "standard",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "/^short_version$/",
|
||||
"values": false
|
||||
},
|
||||
"content": "<div style=\"text-align: center;\">$version</div>",
|
||||
"mode": "markdown"
|
||||
"showPercentChange": false,
|
||||
"textMode": "value",
|
||||
"wideLayout": true
|
||||
},
|
||||
"pluginVersion": "12.3.0",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "victoriametrics-metrics-datasource",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": false,
|
||||
"expr": "vm_app_version{job=~\"$job\",instance=~\"$instance\"}",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"interval": "",
|
||||
"legendFormat": "{{short_version}}",
|
||||
"range": false,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Version",
|
||||
"type": "text"
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
|
||||
@@ -1522,7 +1522,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=203&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=203&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1651,12 +1651,12 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - RSS memory usage",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=189&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=189&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
},
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - Memory usage breakdown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=225&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=225&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1771,7 +1771,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1889,7 +1889,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5078,7 +5078,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=224&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=224&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5130,7 +5130,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
|
||||
"legendFormat": "__auto",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
@@ -6108,7 +6108,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=$job_storage&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=$job_storage&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -6258,7 +6258,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=$job_storage&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=$job_storage&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -6909,7 +6909,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=200&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=200&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -7179,7 +7179,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=201&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=201&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -8043,7 +8043,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=$job_select&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=$job_select&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -8187,7 +8187,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=$job_select&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=$job_select&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -9376,7 +9376,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=$job_insert&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=192&var-job=$job_insert&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -9520,7 +9520,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=$job_insert&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/oS7Bi_0Wz_vm?viewPanel=190&var-job=$job_insert&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -11154,7 +11154,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
|
||||
"legendFormat": "{{instance}} ({{job}})",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
|
||||
@@ -1522,7 +1522,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=154&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=154&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1645,12 +1645,12 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - RSS memory usage",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=148&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=148&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
},
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown - Memory usage breakdown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=141&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=141&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1765,7 +1765,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=151&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=151&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1883,7 +1883,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=149&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=149&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5123,7 +5123,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=140&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=140&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5175,7 +5175,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
|
||||
"legendFormat": "__auto",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
@@ -5357,7 +5357,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=150&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=150&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -5974,7 +5974,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=153&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=153&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -6238,7 +6238,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=152&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/wNf0q_kZk_vm?viewPanel=152&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -7668,7 +7668,7 @@
|
||||
"uid": "${ds}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
|
||||
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
|
||||
"legendFormat": "{{instance}} ({{job}})",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
|
||||
@@ -964,7 +964,7 @@
|
||||
"links": [
|
||||
{
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=123&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=123&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1231,7 +1231,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=162&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=162&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1743,7 +1743,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=117&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=117&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1858,7 +1858,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=119&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=119&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -2332,7 +2332,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=121&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz_vm?viewPanel=121&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
|
||||
@@ -1612,7 +1612,7 @@
|
||||
"type": "victoriametrics-metrics-datasource",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"expr": "sum(go_memstats_sys_bytes{job=~\"$job\", instance=~\"$instance\"}) + sum(vm_cache_size_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"expr": "sum(go_memstats_sys_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
@@ -1624,7 +1624,7 @@
|
||||
"type": "victoriametrics-metrics-datasource",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"expr": "sum(go_memstats_heap_inuse_bytes{job=~\"$job\", instance=~\"$instance\"}) + sum(vm_cache_size_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"expr": "sum(go_memstats_heap_inuse_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
|
||||
@@ -963,7 +963,7 @@
|
||||
"links": [
|
||||
{
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=123&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=123&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1230,7 +1230,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=162&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=162&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1742,7 +1742,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=117&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=117&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -1857,7 +1857,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=119&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=119&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
@@ -2331,7 +2331,7 @@
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "Drilldown",
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=121&var-job=${__field.labels.job}&var-ds=$ds&var-instance=$instance&${__url_time_range}"
|
||||
"url": "/d/G7Z9GzMGz?viewPanel=121&var-job=${__field.labels.job}&var-ds=$ds&${__url_time_range}"
|
||||
}
|
||||
],
|
||||
"mappings": [],
|
||||
|
||||
@@ -1611,7 +1611,7 @@
|
||||
"type": "prometheus",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"expr": "sum(go_memstats_sys_bytes{job=~\"$job\", instance=~\"$instance\"}) + sum(vm_cache_size_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"expr": "sum(go_memstats_sys_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
@@ -1623,7 +1623,7 @@
|
||||
"type": "prometheus",
|
||||
"uid": "$ds"
|
||||
},
|
||||
"expr": "sum(go_memstats_heap_inuse_bytes{job=~\"$job\", instance=~\"$instance\"}) + sum(vm_cache_size_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"expr": "sum(go_memstats_heap_inuse_bytes{job=~\"$job\", instance=~\"$instance\"})",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
|
||||
@@ -100,6 +100,7 @@ publish-via-docker:
|
||||
) \
|
||||
-o type=image \
|
||||
--provenance=false \
|
||||
--sbom=true \
|
||||
-f app/$(APP_NAME)/multiarch/Dockerfile \
|
||||
--push \
|
||||
bin
|
||||
@@ -120,6 +121,7 @@ publish-via-docker:
|
||||
) \
|
||||
-o type=image \
|
||||
--provenance=false \
|
||||
--sbom=true \
|
||||
-f app/$(APP_NAME)/multiarch/Dockerfile \
|
||||
--push \
|
||||
bin
|
||||
|
||||
@@ -3,7 +3,7 @@ services:
|
||||
# It scrapes targets defined in --promscrape.config
|
||||
# And forward them to --remoteWrite.url
|
||||
vmagent:
|
||||
image: victoriametrics/vmagent:v1.136.0
|
||||
image: victoriametrics/vmagent:v1.137.0
|
||||
depends_on:
|
||||
- "vmauth"
|
||||
ports:
|
||||
@@ -33,18 +33,19 @@ services:
|
||||
- ./../../dashboards/vmagent.json:/var/lib/grafana/dashboards/vmagent.json
|
||||
- ./../../dashboards/vmalert.json:/var/lib/grafana/dashboards/vmalert.json
|
||||
- ./../../dashboards/vmauth.json:/var/lib/grafana/dashboards/vmauth.json
|
||||
- ./../../dashboards/alert-statistics.json:/var/lib/grafana/dashboards/alert-statistics.json
|
||||
|
||||
# vmstorage shards. Each shard receives 1/N of all metrics sent to vminserts,
|
||||
# where N is number of vmstorages (2 in this case).
|
||||
vmstorage-1:
|
||||
image: victoriametrics/vmstorage:v1.136.0-cluster
|
||||
image: victoriametrics/vmstorage:v1.137.0-cluster
|
||||
volumes:
|
||||
- strgdata-1:/storage
|
||||
command:
|
||||
- "--storageDataPath=/storage"
|
||||
restart: always
|
||||
vmstorage-2:
|
||||
image: victoriametrics/vmstorage:v1.136.0-cluster
|
||||
image: victoriametrics/vmstorage:v1.137.0-cluster
|
||||
volumes:
|
||||
- strgdata-2:/storage
|
||||
command:
|
||||
@@ -54,7 +55,7 @@ services:
|
||||
# vminsert is ingestion frontend. It receives metrics pushed by vmagent,
|
||||
# pre-process them and distributes across configured vmstorage shards.
|
||||
vminsert-1:
|
||||
image: victoriametrics/vminsert:v1.136.0-cluster
|
||||
image: victoriametrics/vminsert:v1.137.0-cluster
|
||||
depends_on:
|
||||
- "vmstorage-1"
|
||||
- "vmstorage-2"
|
||||
@@ -63,7 +64,7 @@ services:
|
||||
- "--storageNode=vmstorage-2:8400"
|
||||
restart: always
|
||||
vminsert-2:
|
||||
image: victoriametrics/vminsert:v1.136.0-cluster
|
||||
image: victoriametrics/vminsert:v1.137.0-cluster
|
||||
depends_on:
|
||||
- "vmstorage-1"
|
||||
- "vmstorage-2"
|
||||
@@ -75,7 +76,7 @@ services:
|
||||
# vmselect is a query fronted. It serves read queries in MetricsQL or PromQL.
|
||||
# vmselect collects results from configured `--storageNode` shards.
|
||||
vmselect-1:
|
||||
image: victoriametrics/vmselect:v1.136.0-cluster
|
||||
image: victoriametrics/vmselect:v1.137.0-cluster
|
||||
depends_on:
|
||||
- "vmstorage-1"
|
||||
- "vmstorage-2"
|
||||
@@ -85,7 +86,7 @@ services:
|
||||
- "--vmalert.proxyURL=http://vmalert:8880"
|
||||
restart: always
|
||||
vmselect-2:
|
||||
image: victoriametrics/vmselect:v1.136.0-cluster
|
||||
image: victoriametrics/vmselect:v1.137.0-cluster
|
||||
depends_on:
|
||||
- "vmstorage-1"
|
||||
- "vmstorage-2"
|
||||
@@ -100,7 +101,7 @@ services:
|
||||
# read requests from Grafana, vmui, vmalert among vmselects.
|
||||
# It can be used as an authentication proxy.
|
||||
vmauth:
|
||||
image: victoriametrics/vmauth:v1.136.0
|
||||
image: victoriametrics/vmauth:v1.137.0
|
||||
depends_on:
|
||||
- "vmselect-1"
|
||||
- "vmselect-2"
|
||||
@@ -114,7 +115,7 @@ services:
|
||||
|
||||
# vmalert executes alerting and recording rules
|
||||
vmalert:
|
||||
image: victoriametrics/vmalert:v1.136.0
|
||||
image: victoriametrics/vmalert:v1.137.0
|
||||
depends_on:
|
||||
- "vmauth"
|
||||
ports:
|
||||
|
||||
@@ -3,7 +3,7 @@ services:
|
||||
# It scrapes targets defined in --promscrape.config
|
||||
# And forward them to --remoteWrite.url
|
||||
vmagent:
|
||||
image: victoriametrics/vmagent:v1.136.0
|
||||
image: victoriametrics/vmagent:v1.137.0
|
||||
depends_on:
|
||||
- "victoriametrics"
|
||||
ports:
|
||||
@@ -18,7 +18,7 @@ services:
|
||||
# VictoriaMetrics instance, a single process responsible for
|
||||
# storing metrics and serve read requests.
|
||||
victoriametrics:
|
||||
image: victoriametrics/victoria-metrics:v1.136.0
|
||||
image: victoriametrics/victoria-metrics:v1.137.0
|
||||
ports:
|
||||
- 8428:8428
|
||||
- 8089:8089
|
||||
@@ -50,11 +50,12 @@ services:
|
||||
- ./../../dashboards/victoriametrics.json:/var/lib/grafana/dashboards/vm.json
|
||||
- ./../../dashboards/vmagent.json:/var/lib/grafana/dashboards/vmagent.json
|
||||
- ./../../dashboards/vmalert.json:/var/lib/grafana/dashboards/vmalert.json
|
||||
- ./../../dashboards/alert-statistics.json:/var/lib/grafana/dashboards/alert-statistics.json
|
||||
restart: always
|
||||
|
||||
# vmalert executes alerting and recording rules
|
||||
vmalert:
|
||||
image: victoriametrics/vmalert:v1.136.0
|
||||
image: victoriametrics/vmalert:v1.137.0
|
||||
depends_on:
|
||||
- "victoriametrics"
|
||||
- "alertmanager"
|
||||
|
||||
@@ -27,7 +27,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} will run out of disk space in 3 days"
|
||||
description: "Taking into account current ingestion rate, free disk space will be enough only
|
||||
for {{ $value | humanizeDuration }} on instance {{ $labels.instance }}.\n
|
||||
@@ -51,7 +51,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} will become read-only in 3 days"
|
||||
description: "Taking into account current ingestion rate, free disk space and -storage.minFreeDiskSpaceBytes
|
||||
instance {{ $labels.instance }} will remain writable for {{ $value | humanizeDuration }}.\n
|
||||
@@ -68,7 +68,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} (job={{ $labels.job }}) will run out of disk space soon"
|
||||
description: "Disk utilisation on instance {{ $labels.instance }} is more than 80%.\n
|
||||
Having less than 20% of free disk space could cripple merges processes and overall performance.
|
||||
@@ -81,7 +81,7 @@ groups:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=52&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=52&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many errors served for {{ $labels.job }} path {{ $labels.path }} (instance {{ $labels.instance }})"
|
||||
description: "Requests to path {{ $labels.path }} are receiving errors.
|
||||
Please verify if clients are sending correct requests."
|
||||
@@ -100,7 +100,7 @@ groups:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=44&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=44&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many RPC errors for {{ $labels.job }} (instance {{ $labels.instance }})"
|
||||
description: "RPC errors are interconnection errors between cluster components.\n
|
||||
Possible reasons for errors are misconfiguration, overload, network blips or unreachable components."
|
||||
@@ -116,7 +116,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=102"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=102"
|
||||
summary: "Churn rate is more than 10% for the last 15m"
|
||||
description: "VM constantly creates new time series.\n
|
||||
This effect is known as Churn Rate.\n
|
||||
@@ -132,7 +132,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=102"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=102"
|
||||
summary: "Too high number of new series created over last 24h"
|
||||
description: "The number of created new time series over last 24h is 3x times higher than
|
||||
current number of active series.\n
|
||||
@@ -151,7 +151,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=108"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=108"
|
||||
summary: "Percentage of slow inserts is more than 5% for the last 15m"
|
||||
description: "High rate of slow inserts may be a sign of resource exhaustion
|
||||
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series.
|
||||
@@ -164,7 +164,7 @@ groups:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=139&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=139&var-instance={{ $labels.instance }}"
|
||||
summary: "Connection between vminsert on {{ $labels.instance }} and vmstorage on {{ $labels.addr }} is saturated"
|
||||
description: "The connection between vminsert (instance {{ $labels.instance }}) and vmstorage (instance {{ $labels.addr }})
|
||||
is saturated by more than 90% and vminsert won't be able to keep up.\n
|
||||
|
||||
@@ -15,7 +15,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=49&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=49&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} is dropping data from persistent queue"
|
||||
description: "Vmagent dropped {{ $value | humanize1024 }} from persistent queue
|
||||
on instance {{ $labels.instance }} for the last 10m."
|
||||
@@ -26,7 +26,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=79&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=79&var-instance={{ $labels.instance }}"
|
||||
summary: "Vmagent is dropping data blocks that are rejected by remote storage"
|
||||
description: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} drops the rejected by
|
||||
remote-write server data blocks. Check the logs to find the reason for rejects."
|
||||
@@ -37,7 +37,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=31&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=31&var-instance={{ $labels.instance }}"
|
||||
summary: "Vmagent fails to scrape one or more targets"
|
||||
description: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} fails to scrape targets for last 15m"
|
||||
|
||||
@@ -61,7 +61,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=77&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=77&var-instance={{ $labels.instance }}"
|
||||
summary: "Vmagent responds with too many errors on data ingestion protocols"
|
||||
description: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} responds with errors to write requests for last 15m."
|
||||
|
||||
@@ -71,7 +71,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=61&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=61&var-instance={{ $labels.instance }}"
|
||||
summary: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} fails to push to remote storage"
|
||||
description: "Vmagent fails to push data via remote write protocol to destination \"{{ $labels.url }}\"\n
|
||||
Ensure that destination is up and reachable."
|
||||
@@ -87,7 +87,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=84&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=84&var-instance={{ $labels.instance }}"
|
||||
summary: "Remote write connection from \"{{ $labels.job }}\" (instance {{ $labels.instance }}) to {{ $labels.url }} is saturated"
|
||||
description: "The remote write connection between vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }}) and destination \"{{ $labels.url }}\"
|
||||
is saturated by more than 90% and vmagent won't be able to keep up.\n
|
||||
@@ -101,7 +101,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=98&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=98&var-instance={{ $labels.instance }}"
|
||||
summary: "Persistent queue writes for instance {{ $labels.instance }} are saturated"
|
||||
description: "Persistent queue writes for vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }})
|
||||
are saturated by more than 90% and vmagent won't be able to keep up with flushing data on disk.
|
||||
@@ -113,7 +113,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=99&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=99&var-instance={{ $labels.instance }}"
|
||||
summary: "Persistent queue reads for instance {{ $labels.instance }} are saturated"
|
||||
description: "Persistent queue reads for vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }})
|
||||
are saturated by more than 90% and vmagent won't be able to keep up with reading data from the disk.
|
||||
@@ -124,7 +124,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=88&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=88&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} reached 90% of the limit"
|
||||
description: "Max series limit set via -remoteWrite.maxHourlySeries flag is close to reaching the max value.
|
||||
Then samples for new time series will be dropped instead of sending them to remote storage systems."
|
||||
@@ -134,7 +134,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=90&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=90&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} reached 90% of the limit"
|
||||
description: "Max series limit set via -remoteWrite.maxDailySeries flag is close to reaching the max value.
|
||||
Then samples for new time series will be dropped instead of sending them to remote storage systems."
|
||||
|
||||
@@ -23,7 +23,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/LzldHAVnz?viewPanel=13&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
|
||||
dashboard: "{{ $externalURL }}/d/LzldHAVnz?viewPanel=13&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
|
||||
summary: "Alerting rules are failing for vmalert instance {{ $labels.instance }}"
|
||||
description: "Alerting rules execution is failing for \"{{ $labels.alertname }}\" from group \"{{ $labels.group }}\" in file \"{{ $labels.file }}\".
|
||||
Check vmalert's logs for detailed error message."
|
||||
@@ -34,7 +34,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/LzldHAVnz?viewPanel=30&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
|
||||
dashboard: "{{ $externalURL }}/d/LzldHAVnz?viewPanel=30&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
|
||||
summary: "Recording rules are failing for vmalert instance {{ $labels.instance }}"
|
||||
description: "Recording rules execution is failing for \"{{ $labels.recording }}\" from group \"{{ $labels.group }}\" in file \"{{ $labels.file }}\".
|
||||
Check vmalert's logs for detailed error message."
|
||||
@@ -45,7 +45,7 @@ groups:
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/LzldHAVnz?viewPanel=33&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
|
||||
dashboard: "{{ $externalURL }}/d/LzldHAVnz?viewPanel=33&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
|
||||
summary: "Recording rule {{ $labels.recording }} ({{ $labels.group }}) produces no data"
|
||||
description: "Recording rule \"{{ $labels.recording }}\" from group \"{{ $labels.group }}\ in file \"{{ $labels.file }}\"
|
||||
produces 0 samples over the last 30min. It might be caused by a misconfiguration
|
||||
|
||||
@@ -11,7 +11,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
|
||||
summary: "vmauth ({{ $labels.instance }}) reached concurrent requests limit"
|
||||
description: "Possible solutions: increase -maxQueueDuration flag value, increase -maxConcurrentRequests flag value,
|
||||
deploy additional vmauth replicas, check requests latency at backend service.
|
||||
@@ -22,7 +22,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
|
||||
summary: "vmauth ({{ $labels.instance }}) has reached concurrent requests limit for username {{ $labels.username }}"
|
||||
description: "Possible solutions: increase -maxQueueDuration flag value, increase -maxConcurrentPerUserRequests flag value,
|
||||
deploy additional vmauth replicas, check requests latency at backend service."
|
||||
@@ -32,7 +32,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
|
||||
summary: "vmauth ({{ $labels.instance }}) has reached concurrent requests limit for unauthorized user"
|
||||
description: "Possible solutions: increase -maxQueueDuration flag value, increase -maxConcurrentPerUserRequests flag value,
|
||||
deploy additional vmauth replicas, check requests latency at backend service."
|
||||
@@ -42,7 +42,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many errors served for unauthorized user (instance {{ $labels.instance }})"
|
||||
description: "Requests from unauthorized user are receiving errors.
|
||||
Please check the vmauth logs to verify that the configuration is correct and clients are sending valid requests."
|
||||
@@ -52,7 +52,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many errors served for user {{ $labels.username }} (instance {{ $labels.instance }})"
|
||||
description: "Requests from user {{ $labels.username }} are receiving errors.
|
||||
Please check the vmauth logs to verify that the configuration is correct and clients are sending valid requests."
|
||||
|
||||
@@ -27,7 +27,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} will run out of disk space soon"
|
||||
description: "Taking into account current ingestion rate, free disk space will be enough only
|
||||
for {{ $value | humanizeDuration }} on instance {{ $labels.instance }}.\n
|
||||
@@ -51,7 +51,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=53&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=53&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} will become read-only in 3 days"
|
||||
description: "Taking into account current ingestion rate and free disk space
|
||||
instance {{ $labels.instance }} is writable for {{ $value | humanizeDuration }}.\n
|
||||
@@ -68,7 +68,7 @@ groups:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
|
||||
summary: "Instance {{ $labels.instance }} (job={{ $labels.job }}) will run out of disk space soon"
|
||||
description: "Disk utilisation on instance {{ $labels.instance }} is more than 80%.\n
|
||||
Having less than 20% of free disk space could cripple merge processes and overall performance.
|
||||
@@ -80,7 +80,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance }})"
|
||||
description: "Requests to path {{ $labels.path }} are receiving errors.
|
||||
Please verify if clients are sending correct requests."
|
||||
@@ -96,7 +96,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
|
||||
summary: "Churn rate is more than 10% on \"{{ $labels.instance }}\" for the last 15m"
|
||||
description: "VM constantly creates new time series on \"{{ $labels.instance }}\".\n
|
||||
This effect is known as Churn Rate.\n
|
||||
@@ -112,7 +112,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
|
||||
summary: "Too high number of new series on \"{{ $labels.instance }}\" created over last 24h"
|
||||
description: "The number of created new time series over last 24h is 3x times higher than
|
||||
current number of active series on \"{{ $labels.instance }}\".\n
|
||||
@@ -131,7 +131,7 @@ groups:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=68&var-instance={{ $labels.instance }}"
|
||||
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=68&var-instance={{ $labels.instance }}"
|
||||
summary: "Percentage of slow inserts is more than 5% on \"{{ $labels.instance }}\" for the last 15m"
|
||||
description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion
|
||||
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
vmagent:
|
||||
image: victoriametrics/vmagent:v1.136.0
|
||||
image: victoriametrics/vmagent:v1.137.0
|
||||
depends_on:
|
||||
- "victoriametrics"
|
||||
ports:
|
||||
@@ -14,7 +14,7 @@ services:
|
||||
restart: always
|
||||
|
||||
victoriametrics:
|
||||
image: victoriametrics/victoria-metrics:v1.136.0
|
||||
image: victoriametrics/victoria-metrics:v1.137.0
|
||||
ports:
|
||||
- 8428:8428
|
||||
volumes:
|
||||
@@ -40,7 +40,7 @@ services:
|
||||
restart: always
|
||||
|
||||
vmalert:
|
||||
image: victoriametrics/vmalert:v1.136.0
|
||||
image: victoriametrics/vmalert:v1.137.0
|
||||
depends_on:
|
||||
- "victoriametrics"
|
||||
ports:
|
||||
|
||||
@@ -136,21 +136,21 @@ models:
|
||||
|
||||
Here's how default (backward-compatible) behavior looks like - anomalies will be tracked in `both` directions (`y > yhat` or `y < yhat`). This is useful when there is no domain expertise to filter the required direction.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
When set to `above_expected`, anomalies are tracked only when `y > yhat`.
|
||||
|
||||
*Example metrics*: Error rate, response time, page load time, number of failed transactions - metrics where *lower values are better*, so **higher** values are typically tracked.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
When set to `below_expected`, anomalies are tracked only when `y < yhat`.
|
||||
|
||||
*Example metrics*: Service Level Agreement (SLA) compliance, conversion rate, Customer Satisfaction Score (CSAT) - metrics where *higher values are better*, so **lower** values are typically tracked.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
Config with a split example:
|
||||
@@ -199,13 +199,13 @@ reader:
|
||||
|
||||
Visualizations below demonstrate this concept; the green zone defined as the `[yhat - min_dev_from_expected, yhat + min_dev_from_expected]` range excludes actual data points (`y`) from generating anomaly scores if they fall within that range.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
Example config of how to use this param based on query results:
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
|
Before Width: | Height: | Size: 29 KiB After Width: | Height: | Size: 29 KiB |
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
|
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 30 KiB |
|
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 30 KiB |
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
@@ -10,9 +10,9 @@ sitemap:
|
||||
|
||||
- To use *vmanomaly*, part of the enterprise package, a license key is required. Obtain your key [here](https://victoriametrics.com/products/enterprise/trial/) for this tutorial or for enterprise use.
|
||||
- In the tutorial, we'll be using the following VictoriaMetrics components:
|
||||
- [VictoriaMetrics Single-Node](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) (v1.136.0)
|
||||
- [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) (v1.136.0)
|
||||
- [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) (v1.136.0)
|
||||
- [VictoriaMetrics Single-Node](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) (v1.137.0)
|
||||
- [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) (v1.137.0)
|
||||
- [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) (v1.137.0)
|
||||
- [Grafana](https://grafana.com/) (v.10.2.1)
|
||||
- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/)
|
||||
- [Node exporter](https://github.com/prometheus/node_exporter#node-exporter) (v1.7.0) and [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) (v0.27.0)
|
||||
@@ -323,7 +323,7 @@ Let's wrap it all up together into the `docker-compose.yml` file.
|
||||
services:
|
||||
vmagent:
|
||||
container_name: vmagent
|
||||
image: victoriametrics/vmagent:v1.136.0
|
||||
image: victoriametrics/vmagent:v1.137.0
|
||||
depends_on:
|
||||
- "victoriametrics"
|
||||
ports:
|
||||
@@ -340,7 +340,7 @@ services:
|
||||
|
||||
victoriametrics:
|
||||
container_name: victoriametrics
|
||||
image: victoriametrics/victoria-metrics:v1.136.0
|
||||
image: victoriametrics/victoria-metrics:v1.137.0
|
||||
ports:
|
||||
- 8428:8428
|
||||
volumes:
|
||||
@@ -373,7 +373,7 @@ services:
|
||||
|
||||
vmalert:
|
||||
container_name: vmalert
|
||||
image: victoriametrics/vmalert:v1.136.0
|
||||
image: victoriametrics/vmalert:v1.137.0
|
||||
depends_on:
|
||||
- "victoriametrics"
|
||||
ports:
|
||||
|
||||
@@ -7,7 +7,7 @@ sitemap:
|
||||
disable: true
|
||||
---
|
||||
|
||||
This guide walks you through deploying VictoriaMetrics and VictoriaLogs on Kubernetes, and collecting [metrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) and [logs](https://docs.victoriametrics.com/victorialogs/data-ingestion/opentelemetry/) from a Go application either directly or via the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/).
|
||||
This guide walks you through deploying VictoriaMetrics and VictoriaLogs on Kubernetes, and collecting [metrics](https://docs.victoriametrics.com/victoriametrics/data-ingestion/opentelemetry-collector/) and [logs](https://docs.victoriametrics.com/victorialogs/data-ingestion/opentelemetry/) from a Go application either directly or via the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/).
|
||||
|
||||
## Pre-Requirements
|
||||
|
||||
@@ -316,4 +316,4 @@ using query `service.name: unknown_service:otel`.
|
||||
## Limitations
|
||||
|
||||
- VictoriaMetrics and VictoriaLogs do not support experimental JSON encoding [format](https://github.com/open-telemetry/opentelemetry-proto/blob/main/examples/metrics.json).
|
||||
- VictoriaMetrics supports only the `AggregationTemporalityCumulative` type for [histogram](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram) and [summary](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#summary-legacy). Either consider using cumulative temporality or use the [`delta-to-cumulative processor`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/deltatocumulativeprocessor) to convert to cumulative temporality in OpenTelemetry Collector.
|
||||
|
||||
|
||||
@@ -249,27 +249,27 @@ services:
|
||||
- grafana_data:/var/lib/grafana/
|
||||
|
||||
vmsingle:
|
||||
image: victoriametrics/victoria-metrics:v1.136.0
|
||||
image: victoriametrics/victoria-metrics:v1.137.0
|
||||
command:
|
||||
- -httpListenAddr=0.0.0.0:8429
|
||||
|
||||
vmstorage:
|
||||
image: victoriametrics/vmstorage:v1.136.0-cluster
|
||||
image: victoriametrics/vmstorage:v1.137.0-cluster
|
||||
|
||||
vminsert:
|
||||
image: victoriametrics/vminsert:v1.136.0-cluster
|
||||
image: victoriametrics/vminsert:v1.137.0-cluster
|
||||
command:
|
||||
- -storageNode=vmstorage:8400
|
||||
- -httpListenAddr=0.0.0.0:8480
|
||||
|
||||
vmselect:
|
||||
image: victoriametrics/vmselect:v1.136.0-cluster
|
||||
image: victoriametrics/vmselect:v1.137.0-cluster
|
||||
command:
|
||||
- -storageNode=vmstorage:8401
|
||||
- -httpListenAddr=0.0.0.0:8481
|
||||
|
||||
vmagent:
|
||||
image: victoriametrics/vmagent:v1.136.0
|
||||
image: victoriametrics/vmagent:v1.137.0
|
||||
volumes:
|
||||
- ./scrape.yaml:/etc/vmagent/config.yaml
|
||||
command:
|
||||
@@ -278,7 +278,7 @@ services:
|
||||
- -remoteWrite.url=http://vmsingle:8429/api/v1/write
|
||||
|
||||
vmgateway-cluster:
|
||||
image: victoriametrics/vmgateway:v1.136.0-enterprise
|
||||
image: victoriametrics/vmgateway:v1.137.0-enterprise
|
||||
ports:
|
||||
- 8431:8431
|
||||
volumes:
|
||||
@@ -294,7 +294,7 @@ services:
|
||||
- -auth.oidcDiscoveryEndpoints=http://keycloak:8080/realms/master/.well-known/openid-configuration
|
||||
|
||||
vmgateway-single:
|
||||
image: victoriametrics/vmgateway:v1.136.0-enterprise
|
||||
image: victoriametrics/vmgateway:v1.137.0-enterprise
|
||||
ports:
|
||||
- 8432:8431
|
||||
volumes:
|
||||
@@ -405,7 +405,7 @@ Once iDP configuration is done, vmagent configuration needs to be updated to use
|
||||
|
||||
```yaml
|
||||
vmagent:
|
||||
image: victoriametrics/vmagent:v1.136.0
|
||||
image: victoriametrics/vmagent:v1.137.0
|
||||
volumes:
|
||||
- ./scrape.yaml:/etc/vmagent/config.yaml
|
||||
- ./vmagent-client-secret:/etc/vmagent/oauth2-client-secret
|
||||
|
||||
@@ -6,67 +6,62 @@ build:
|
||||
sitemap:
|
||||
disable: true
|
||||
---
|
||||
**This guide covers:**
|
||||
|
||||
* The setup of a [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) in [Kubernetes](https://kubernetes.io/) via Helm charts
|
||||
* How to scrape metrics from k8s components using service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com) tsdb
|
||||
This guide walks you through deploying a VictoriaMetrics cluster version on Kubernetes.
|
||||
|
||||
**Precondition**
|
||||
By the end of this guide, you will know:
|
||||
|
||||
- How to install and configure [VictoriaMetrics cluster version](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) using Helm.
|
||||
- How to scrape metrics from Kubernetes components using service discovery.
|
||||
- How to store metrics in [VictoriaMetrics](https://victoriametrics.com) time-series database.
|
||||
- How to visualize metrics in Grafana
|
||||
|
||||
We will use:
|
||||
* [Kubernetes cluster 1.31.1-gke.1678000](https://cloud.google.com/kubernetes-engine)
|
||||
> We use GKE cluster from [GCP](https://cloud.google.com/) but this guide is also applied on any Kubernetes cluster. For example [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3.14+](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.31](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||

|
||||
- [Kubernetes cluster 1.34+](https://cloud.google.com/kubernetes-engine)
|
||||
- [Helm 4.1+](https://helm.sh/docs/intro/install)
|
||||
- [kubectl 1.34+](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||
> We use a GKE cluster from [GCP](https://cloud.google.com/), but this guide can also be applied to any Kubernetes cluster. For example, [Amazon EKS](https://aws.amazon.com/ru/eks/) or an on-premises cluster.
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
You need to add the VictoriaMetrics Helm repository to install VictoriaMetrics components. We’re going to use [VictoriaMetrics Cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/). You can do this by running the following command:
|
||||
To start, add the VictoriaMetrics Helm repository with the following commands:
|
||||
|
||||
```shell
|
||||
helm repo add vm https://victoriametrics.github.io/helm-charts/
|
||||
```
|
||||
|
||||
Update Helm repositories:
|
||||
|
||||
```shell
|
||||
helm repo update
|
||||
```
|
||||
|
||||
To verify that everything is set up correctly you may run this command:
|
||||
To verify that everything is set up correctly, you may run this command:
|
||||
|
||||
```shell
|
||||
helm search repo vm/
|
||||
```
|
||||
|
||||
The expected output is:
|
||||
You should see a list similar to this:
|
||||
|
||||
```text
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-logs-single 0.9.3 v1.16.0 Victoria Logs Single version - high-performance...
|
||||
vm/victoria-metrics-agent 0.17.2 v1.113.0 Victoria Metrics Agent - collects metrics from ...
|
||||
vm/victoria-metrics-alert 0.15.0 v1.113.0 Victoria Metrics Alert - executes a list of giv...
|
||||
vm/victoria-metrics-anomaly 1.9.0 v1.21.0 Victoria Metrics Anomaly Detection - a service ...
|
||||
vm/victoria-metrics-auth 0.10.0 v1.113.0 Victoria Metrics Auth - is a simple auth proxy ...
|
||||
vm/victoria-metrics-cluster 0.19.2 v1.113.0 Victoria Metrics Cluster version - high-perform...
|
||||
vm/victoria-metrics-common 0.0.42 Victoria Metrics Common - contains shared templ...
|
||||
vm/victoria-metrics-distributed 0.9.0 v1.113.0 A Helm chart for Running VMCluster on Multiple ...
|
||||
vm/victoria-metrics-gateway 0.8.0 v1.113.0 Victoria Metrics Gateway - Auth & Rate-Limittin...
|
||||
vm/victoria-metrics-k8s-stack 0.39.0 v1.113.0 Kubernetes monitoring on VictoriaMetrics stack....
|
||||
vm/victoria-metrics-operator 0.43.0 v0.54.1 Victoria Metrics Operator
|
||||
vm/victoria-metrics-single 0.15.1 v1.113.0 Victoria Metrics Single version - high-performa...
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-metrics-cluster 0.34.0 v1.135.0 VictoriaMetrics Cluster version - high-performa...
|
||||
vm/victoria-metrics-agent 0.31.0 v1.135.0 VictoriaMetrics Agent - collects metrics from v...
|
||||
...(the list continues)...
|
||||
```
|
||||
|
||||
## 2. Install VictoriaMetrics Cluster from the Helm chart
|
||||
|
||||
Run this command in your terminal:
|
||||
A [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) consists of three services:
|
||||
|
||||
- `vminsert`: receives incoming metrics and distributes them across `vmstorage` nodes via consistent hashing on metric names and labels.
|
||||
- `vmstorage`: stores raw data and serves queries filtered by time range and labels.
|
||||
- `vmselect`: executes queries by fetching data across all configured `vmstorage` nodes.
|
||||
|
||||

|
||||
|
||||
To get started, create a config file for the VictoriaMetrics Helm chart:
|
||||
|
||||
```sh
|
||||
cat <<EOF | helm install vmcluster vm/victoria-metrics-cluster -f -
|
||||
cat <<EOF >victoria-metrics-cluster-values.yml
|
||||
vmselect:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
@@ -84,19 +79,26 @@ vmstorage:
|
||||
EOF
|
||||
```
|
||||
|
||||
* By running `Helm install vmcluster vm/victoria-metrics-cluster` we install [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) to default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) inside your cluster.
|
||||
* By adding `podAnnotations: prometheus.io/scrape: "true"` we enable the scraping of metrics from the vmselect, vminsert and vmstorage pods.
|
||||
* By adding `podAnnotations:prometheus.io/port: "some_port" ` we enable the scraping of metrics from the vmselect, vminsert and vmstorage pods from their ports as well.
|
||||
The config file defines two settings for the VictoriaMetrics services:
|
||||
|
||||
- `podAnnotations: prometheus.io/scrape: "true"` enables automatic service discovery and metric scraping from the VictoriaMetrics pods.
|
||||
- `podAnnotations:prometheus.io/port: "<port-number>"` defines which port numbers to target for scraping metrics from the VictoriaMetrics pods.
|
||||
|
||||
As a result of this command you will see the following output:
|
||||
Next, install VictoriaMetrics cluster version with the following command:
|
||||
|
||||
```sh
|
||||
helm install vmcluster vm/victoria-metrics-cluster -f victoria-metrics-cluster-values.yml
|
||||
```
|
||||
|
||||
The expected output should look like this:
|
||||
|
||||
```text
|
||||
NAME: vmcluster
|
||||
LAST DEPLOYED: Fri Mar 21 11:55:50 2025
|
||||
LAST DEPLOYED: Wed Feb 4 12:00:55 2026
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
DESCRIPTION: Install complete
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
Write API:
|
||||
@@ -141,16 +143,29 @@ for example - inside the Kubernetes cluster:
|
||||
http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local.:8481/select/0/prometheus/
|
||||
```
|
||||
|
||||
For us it’s important to remember the url for the datasource (copy lines from the output).
|
||||
Note the following endpoint URLs:
|
||||
|
||||
- The `remote_write` URL will be required on [Step 3](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/#id-3-install-vmagent-from-the-helm-chart) to configure where the `vmagent` service sends telemetry data.
|
||||
|
||||
```text
|
||||
remote_write:
|
||||
- url: http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local.:8480/insert/0/prometheus/
|
||||
```
|
||||
|
||||
- The `VictoriaMetrics read api` will be required on [Step 4](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/#id-4-install-and-connect-grafana-to-victoriametrics-with-helm) to configure the Grafana datasource.
|
||||
|
||||
```text
|
||||
The VictoriaMetrics read api can be accessed via port 8481 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local.
|
||||
```
|
||||
|
||||
Verify that [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) pods are up and running by executing the following command:
|
||||
|
||||
|
||||
```sh
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
The expected output is:
|
||||
You should see a list of pods similar to this:
|
||||
|
||||
```text
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
@@ -164,266 +179,75 @@ vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running
|
||||
|
||||
## 3. Install vmagent from the Helm chart
|
||||
|
||||
To scrape metrics from Kubernetes with a [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) we need to install [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) with additional configuration. To do so, please run these commands in your terminal:
|
||||
In order to collect metrics from the Kubernetes cluster, we need to install [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/). This service scrapes, relabels, and sends metrics to the `vminsert` service running in the cluster.
|
||||
|
||||
Run the following command to install the `vmagent` service in your cluster:
|
||||
|
||||
```shell
|
||||
helm install vmagent vm/victoria-metrics-agent -f https://docs.victoriametrics.com/guides/examples/guide-vmcluster-vmagent-values.yaml
|
||||
```
|
||||
|
||||
Here is full file content `guide-vmcluster-vmagent-values.yaml`
|
||||
Here are the key settings in the chart values file `guide-vmcluster-vmagent-values.yaml`:
|
||||
|
||||
```yaml
|
||||
remoteWrite:
|
||||
- url: http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
- `remoteWrite` defines the `vminsert` endpoint that receives telemetry from [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/). This value should match exactly the URL for the `remote_write` in the output of [Step 2](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/#id-2-install-victoriametrics-cluster-from-the-helm-chart).
|
||||
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 10s
|
||||
```yaml
|
||||
remoteWrite:
|
||||
- url: http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
```
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
metrics_path: /metrics/cadvisor
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- source_labels: [__metrics_path__]
|
||||
target_label: metrics_path
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
- job_name: "kubernetes-service-endpoints"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-service-endpoints-slow"
|
||||
scrape_interval: 5m
|
||||
scrape_timeout: 30s
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-services"
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
kubernetes_sd_configs:
|
||||
- role: service
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_probe]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- target_label: __address__
|
||||
replacement: blackbox
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
- job_name: "kubernetes-pods"
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||||
action: replace
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
target_label: __address__
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_pod_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_pod_name]
|
||||
action: replace
|
||||
target_label: kubernetes_pod_name
|
||||
```
|
||||
|
||||
* By updating `remoteWrite` we're configuring [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) to write scraped metrics into the `vminsert` service.
|
||||
* The second part of this yaml file is needed to add the `metric_relabel_configs` section that helps us to show Kubernetes metrics on the Grafana dashboard.
|
||||
- `metric_relabel_configs` defines label-rewriting rules that help us show Kubernetes metrics in the Grafana dashboard later on.
|
||||
|
||||
```yaml
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
```
|
||||
|
||||
Verify that `vmagent`'s pod is up and running by executing the following command:
|
||||
|
||||
|
||||
```shell
|
||||
kubectl get pods | grep vmagent
|
||||
```
|
||||
|
||||
The expected output is:
|
||||
Check that the pod is in `Running` state:
|
||||
|
||||
```text
|
||||
vmagent-victoria-metrics-agent-69974b95b4-mhjph 1/1 Running 0 11m
|
||||
```
|
||||
|
||||
|
||||
## 4. Install and connect Grafana to VictoriaMetrics with Helm
|
||||
|
||||
Add the Grafana Helm repository.
|
||||
|
||||
Add the Grafana Community Helm repository:
|
||||
|
||||
```shell
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo add grafana-community https://grafana-community.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
See more information on Grafana ArtifactHUB [https://artifacthub.io/packages/helm/grafana/grafana](https://artifacthub.io/packages/helm/grafana/grafana)
|
||||
|
||||
To install the chart with the release name `my-grafana`, add the VictoriaMetrics datasource with official dashboard and the Kubernetes dashboard:
|
||||
> [!NOTE] Tip
|
||||
> See more information on Grafana in [ArtifactHUB](https://artifacthub.io/packages/helm/grafana-community/grafana)
|
||||
|
||||
Create a values config file to define the data sources and dashboards for VictoriaMetrics in the Grafana service:
|
||||
|
||||
```sh
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
cat <<EOF > grafana-cluster-values.yml
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
@@ -454,60 +278,111 @@ cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 11176
|
||||
revision: 18
|
||||
datasource: victoriametrics
|
||||
vmagent:
|
||||
gnetId: 12683
|
||||
revision: 7
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
|
||||
By running this command we:
|
||||
* Install Grafana from the Helm repository.
|
||||
* Provision a VictoriaMetrics data source with the url from the output above which we remembered.
|
||||
* Add [this dashboard](https://grafana.com/grafana/dashboards/11176) for [VictoriaMetrics Cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/).
|
||||
* Add [this dashboard](https://grafana.com/grafana/dashboards/12683) for [VictoriaMetrics Agent](https://docs.victoriametrics.com/victoriametrics/vmagent/).
|
||||
* Add [this dashboard](https://grafana.com/grafana/dashboards/14205) to see Kubernetes cluster metrics.
|
||||
The config file defines the following settings for Grafana:
|
||||
|
||||
- Provides a VictoriaMetrics data source. This value must match the `VictoriaMetrics read api` endpoint and port obtained in [Step 2](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/#id-2-install-victoriametrics-cluster-from-the-helm-chart) during the VictoriaMetrics cluster installation.
|
||||
- Adds three starter dashboards:
|
||||
- [VictoriaMetrics - cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/) for the [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/).
|
||||
- [VictoriaMetrics - vmagent](https://grafana.com/grafana/dashboards/12683-victoriametrics-vmagent/) for the [VictoriaMetrics agent](https://docs.victoriametrics.com/victoriametrics/vmagent/).
|
||||
- [Kubernetes cluster Monitoring (via Prometheus)](https://grafana.com/grafana/dashboards/14205-kubernetes-cluster-monitoring-via-prometheus/) to show Kubernetes cluster metrics.
|
||||
|
||||
Run the following command to install the Grafana chart with the name `my-grafana`:
|
||||
|
||||
```sh
|
||||
helm install my-grafana grafana-community/grafana -f grafana-cluster-values.yml
|
||||
```
|
||||
|
||||
You should get the following output:
|
||||
|
||||
```text
|
||||
NAME: my-grafana
|
||||
LAST DEPLOYED: Wed Feb 4 15:00:28 2026
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
DESCRIPTION: Install complete
|
||||
NOTES:
|
||||
1. Get your 'admin' user password by running:
|
||||
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
|
||||
|
||||
Please see the output log in your terminal. Copy, paste and run these commands.
|
||||
The first one will show `admin` password for the Grafana admin.
|
||||
The second and the third will forward Grafana to `127.0.0.1:3000`:
|
||||
2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:
|
||||
|
||||
my-grafana.default.svc.cluster.local
|
||||
|
||||
Get the Grafana URL to visit by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
|
||||
3. Login with the password from step 1 and the username: admin
|
||||
#################################################################################
|
||||
###### WARNING: Persistence is disabled!!! You will lose your data when #####
|
||||
###### the Grafana pod is terminated. #####
|
||||
#################################################################################
|
||||
```
|
||||
|
||||
Use the first command in the output to obtain the password for the `admin` user:
|
||||
|
||||
```shell
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
```
|
||||
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
The second part of the output shows how to port-forward the Grafana service in order to access it locally on `127.0.0.1:3000`:
|
||||
|
||||
```shell
|
||||
export pod_name=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
|
||||
kubectl --namespace default port-forward $pod_name 3000
|
||||
```
|
||||
|
||||
## 5. Check the result you obtained in your browser
|
||||
|
||||
To check that [VictoriaMetrics](https://victoriametrics.com) collects metrics from k8s cluster open in browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose the `Kubernetes Cluster Monitoring (via Prometheus)` dashboard. Use `admin` for login and `password` that you previously got from kubectl.
|
||||
To check that [VictoriaMetrics](https://victoriametrics.com) collects metrics from the Kubernetes cluster, open in your browser `http://127.0.0.1:3000/dashboards`. Use `admin` for login and `password` obtained in the previous step.
|
||||
|
||||
You should see three dashboards installed. Select "Kubernetes Cluster Monitoring".
|
||||
|
||||

|
||||
<figcaption style="text-align: center; font-style: italic;">List of pre-installed dashboards in Grafana</figcaption>
|
||||
|
||||
You will see something like this:
|
||||
This is the main dashboard, which shows activity across your Kubernetes cluster:
|
||||
|
||||

|
||||

|
||||
<figcaption style="text-align: center; font-style: italic;">Grafana dashboard for Kubernetes metrics</figcaption>
|
||||
|
||||
The VictoriaMetrics dashboard is also available to use:
|
||||
The VictoriaMetrics cluster dashboard is also available to monitor telemetry ingestion and resource utilization:
|
||||
|
||||

|
||||

|
||||
<figcaption style="text-align: center; font-style: italic;">Grafana dashboard for VictoriaMetrics services</figcaption>
|
||||
|
||||
vmagent has its own dashboard:
|
||||
And vmagent has a separate dashboard to monitor scraping and queue activity:
|
||||
|
||||

|
||||

|
||||
<figcaption style="text-align: center; font-style: italic;">Grafana dashboard for vmagent ingestion and resource usage</figcaption>
|
||||
|
||||
## 6. Final thoughts
|
||||
|
||||
* We set up TimeSeries Database for your Kubernetes cluster.
|
||||
* We collected metrics from all running pods,nodes, … and stored them in a VictoriaMetrics database.
|
||||
* We visualized resources used in the Kubernetes cluster by using Grafana dashboards.
|
||||
- We set up a TimeSeries Database for your Kubernetes cluster.
|
||||
- We collected metrics from all running pods, nodes, and services and stored them in a VictoriaMetrics database.
|
||||
- We visualized resources used in the Kubernetes cluster by using Grafana dashboards.
|
||||
|
||||
Consider reading these resources to complete your setup:
|
||||
|
||||
- VictoriaMetrics
|
||||
- [Learn more about the cluster version](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/)
|
||||
- [Migrate existing metric data into VictoriaMetrics with vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/)
|
||||
- [Setup alerts](https://docs.victoriametrics.com/victoriametrics/vmalert/)
|
||||
- Grafana
|
||||
- [Enable persistent storage](https://grafana.com/docs/grafana/latest/setup-grafana/installation/helm/#enable-persistent-storage-recommended)
|
||||
- [Configure private TLS authority](https://grafana.com/docs/grafana/latest/setup-grafana/installation/helm/#configure-a-private-ca-certificate-authority)
|
||||
|
||||
|
Before Width: | Height: | Size: 60 KiB After Width: | Height: | Size: 516 KiB |
|
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 244 KiB |
|
After Width: | Height: | Size: 506 KiB |
|
After Width: | Height: | Size: 434 KiB |
@@ -24,7 +24,7 @@ Resources:
|
||||
VictoriaMetrics single-node, vmagent and vminsert components support ingestion of metrics via OpenTelemetry Protocol (OTLP)
|
||||
from [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) and applications instrumented with [OpenTelemetry SDKs](https://opentelemetry.io/docs/languages/).
|
||||
|
||||
See the detailed description about protocol support [here](https://docs.victoriametrics.com/victoriametrics/#sending-data-via-opentelemetry).
|
||||
See the detailed description about protocol support [here](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
|
||||
|
||||
> See a practical guide [How to use OpenTelemetry metrics with VictoriaMetrics](https://docs.victoriametrics.com/guides/getting-started-with-opentelemetry/).
|
||||
|
||||
|
||||
@@ -21,9 +21,9 @@ It is recommended to run the latest available release of VictoriaMetrics from [t
|
||||
There is no need to tune VictoriaMetrics because it uses reasonable defaults for command-line flags. These flags are automatically adjusted for the available CPU and RAM resources. There is no need in Operating System tuning because VictoriaMetrics is optimized for default OS settings. The only option is to increase the limit on the [number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a), so VictoriaMetrics could accept more incoming connections and could keep open more data files.
|
||||
|
||||
## Swap
|
||||
For machines running [vmstorage](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#storage) or [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), it is recommended to disable swap.
|
||||
These components rely on available RAM for high performance operations.
|
||||
If swap is enabled, the operating system may move active data from fast RAM to the much slower disk as memory usage approaches system limits or configured thresholds.
|
||||
|
||||
It is recommended to disable swap for machines running [vmstorage](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#storage) or [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/).
|
||||
If swap is enabled, the operating system may move actively used data from fast RAM to much slower disk storage when memory usage approaches system limits or configured thresholds.
|
||||
This leads to performance degradation and latency spikes. On systemd-based Linux distributions run:
|
||||
|
||||
```sh
|
||||
@@ -34,7 +34,8 @@ systemctl mask swap.target
|
||||
Reboot the host after applying the commands.
|
||||
|
||||
If you're unsure whether swap-related issues are occurring, check the `Troubleshooting – Major page faults`
|
||||
and `Resource usage – Memory pressure` panels in official Grafana dashboards.
|
||||
and `Resource usage – Memory pressure` panels on [official Grafana dashboards](https://grafana.com/orgs/victoriametrics/dashboards) for VictoriaMetrics.
|
||||
See how to [monitor VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/#monitoring).
|
||||
|
||||
## Filesystem
|
||||
|
||||
|
||||
@@ -595,7 +595,7 @@ Check practical examples of [VictoriaMetrics API](https://docs.victoriametrics.c
|
||||
- `prometheus/api/v1/import/native` - for importing data obtained via `api/v1/export/native` on `vmselect` (see below).
|
||||
- `prometheus/api/v1/import/csv` - for importing arbitrary CSV data. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-csv-data) for details.
|
||||
- `prometheus/api/v1/import/prometheus` - for importing data in [Prometheus text exposition format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) and in [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md). This endpoint also supports [Pushgateway protocol](https://github.com/prometheus/pushgateway#url). See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-data-in-prometheus-exposition-format) for details.
|
||||
- `opentelemetry/v1/metrics` - for ingesting data via [OpenTelemetry protocol for metrics](https://github.com/open-telemetry/opentelemetry-specification/blob/ffddc289462dfe0c2041e3ca42a7b1df805706de/specification/metrics/data-model.md). See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry).
|
||||
- `opentelemetry/v1/metrics` - for ingesting data via [OpenTelemetry protocol for metrics](https://github.com/open-telemetry/opentelemetry-specification/blob/97c826b70e2f89cfdf655d5150791f3f0c2bae19/specification/metrics/data-model.md). See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
|
||||
- `datadog/api/v1/series` - for ingesting data with DataDog submit metrics API v1. See [these docs](https://docs.victoriametrics.com/victoriametrics/url-examples/#datadogapiv1series) for details.
|
||||
- `datadog/api/v2/series` - for ingesting data with [DataDog submit metrics API](https://docs.datadoghq.com/api/latest/metrics/#submit-metrics). See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/) for details.
|
||||
- `datadog/api/beta/sketches` - for ingesting data with [DataDog lambda extension](https://docs.datadoghq.com/serverless/libraries_integrations/extension/).
|
||||
@@ -1015,7 +1015,7 @@ to ensure query results consistency, even if storage layer didn't complete dedup
|
||||
## Metrics Metadata
|
||||
|
||||
Cluster version of VictoriaMetrics can store metric metadata (TYPE, HELP, UNIT) {{% available_from "v1.130.0" %}}.
|
||||
Metadata ingestion is enabled by default{{% available_from "#" %}}. To disable it, set `-enableMetadata=false` on `vminsert`, and `vmagent`.
|
||||
Metadata ingestion is enabled by default{{% available_from "v1.137.0" %}}. To disable it, set `-enableMetadata=false` on `vminsert`, and `vmagent`.
|
||||
|
||||
The metadata is cached in-memory in a ring buffer and can use up to 1% of available memory by default (see `-storage.maxMetadataStorageSize` cmd-line flag).
|
||||
When in-memory size is exceeded, the least updated entries are dropped first. Entries that weren't updated for 1h are cleaned up automatically.
|
||||
|
||||
@@ -557,8 +557,8 @@ and proportionally to the total length of all the labels seen across all the reg
|
||||
|
||||
Typical monitoring in Kubernetes generates moderate-to-high churn rate for time series because every restart of the `pod` creates a new set of time series
|
||||
for all the [metrics](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#what-is-a-metric) exposed by that pod, with a new `pod` label.
|
||||
The number of labels and the summary length of `label=value` pairs per every time series in Kubernetes is quite large
|
||||
(~30-40 labels with ~1KB summary length of `label=value` pairs per time series). This contributes to quick growth of the `indexdb` over time,
|
||||
The number of labels and the total length of `label=value` pairs per every time series in Kubernetes is quite large
|
||||
(~30-40 labels with ~1KB total length of `label=value` pairs per time series). This contributes to quick growth of the `indexdb` over time,
|
||||
so its' size may exceed the size of the `data` folder by up to 2x in typical production cases.
|
||||
|
||||
There are the following workarounds, which can reduce the growth rate of the `indexdb`:
|
||||
|
||||
@@ -1227,7 +1227,10 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
|
||||
#### buckets_limit
|
||||
|
||||
`buckets_limit(limit, buckets)` is a [transform function](#transform-functions), which limits the number
|
||||
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
|
||||
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
|
||||
|
||||
The result will preserve the first and the last bucket to improve accuracy for min and max values.
|
||||
So, if the `limit` is greater than 0 and less than 3, the function will still return 3 buckets: the first bucket, the last bucket, and a selected bucket.
|
||||
|
||||
See also [prometheus_buckets](#prometheus_buckets) and [histogram_quantile](#histogram_quantile).
|
||||
|
||||
@@ -1381,6 +1384,15 @@ It can be used for calculating the average over the given time range across mult
|
||||
For example, `histogram_avg(sum(histogram_over_time(response_time_duration_seconds[5m])) by (vmrange,job))` would return the average response time
|
||||
per each `job` over the last 5 minutes.
|
||||
|
||||
#### histogram_fraction
|
||||
|
||||
`histogram_fraction(lowerLe, upperLe, buckets)` is a [transform function](#transform-functions), which calculates the share (in the range `[0...1]`) for `buckets` that fall between `lowerLe` and `upperLe`.
|
||||
The result of `histogram_fraction(lowerLe, upperLe, buckets)` is equivalent to `histogram_share(upperLe, buckets) - histogram_share(lowerLe, buckets)`.
|
||||
|
||||
This function is supported by PromQL.
|
||||
|
||||
See also [histogram_share](#histogram_share).
|
||||
|
||||
#### histogram_quantile
|
||||
|
||||
`histogram_quantile(phi, buckets)` is a [transform function](#transform-functions), which calculates `phi`-[percentile](https://en.wikipedia.org/wiki/Percentile)
|
||||
|
||||
@@ -58,9 +58,9 @@ Download the newest available [VictoriaMetrics release](https://docs.victoriamet
|
||||
from [DockerHub](https://hub.docker.com/r/victoriametrics/victoria-metrics) or [Quay](https://quay.io/repository/victoriametrics/victoria-metrics?tab=tags):
|
||||
|
||||
```sh
|
||||
docker pull victoriametrics/victoria-metrics:v1.136.0
|
||||
docker pull victoriametrics/victoria-metrics:v1.137.0
|
||||
docker run -it --rm -v `pwd`/victoria-metrics-data:/victoria-metrics-data -p 8428:8428 \
|
||||
victoriametrics/victoria-metrics:v1.136.0 --selfScrapeInterval=5s -storageDataPath=victoria-metrics-data
|
||||
victoriametrics/victoria-metrics:v1.137.0 --selfScrapeInterval=5s -storageDataPath=victoria-metrics-data
|
||||
```
|
||||
|
||||
_For Enterprise images see [this link](https://docs.victoriametrics.com/victoriametrics/enterprise/#docker-images)._
|
||||
|
||||
@@ -75,7 +75,7 @@ VictoriaMetrics has the following prominent features:
|
||||
* [Native binary format](#how-to-import-data-in-native-format).
|
||||
* [DataDog agent or DogStatsD](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/).
|
||||
* [NewRelic infrastructure agent](https://docs.victoriametrics.com/victoriametrics/integrations/newrelic/#sending-data-from-agent).
|
||||
* [OpenTelemetry metrics format](#sending-data-via-opentelemetry).
|
||||
* [OpenTelemetry metrics format](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
|
||||
* [Zabbix Connector streaming format](https://docs.victoriametrics.com/victoriametrics/integrations/zabbixconnector/#send-data-from-zabbix-connector).
|
||||
* It supports powerful [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/), which can be used as a [statsd](https://github.com/statsd/statsd) alternative.
|
||||
* It supports metrics [relabeling](#relabeling).
|
||||
@@ -708,7 +708,7 @@ Using the delete API is not recommended in the following cases, since it brings
|
||||
time series occupy disk space until the next merge operation, which can never occur when deleting too old data.
|
||||
[Forced merge](#forced-merge) may be used for freeing up disk space occupied by old data.
|
||||
Note that VictoriaMetrics doesn't delete entries from [IndexDB](#indexdb) for the deleted time series.
|
||||
IndexDB is cleaned up once per the configured [retention](#retention).
|
||||
IndexDB is cleaned up along with the corresponding data partition once it becomes outside the [-retentionPeriod](#retention).
|
||||
|
||||
It's better to use the `-retentionPeriod` command-line flag for efficient pruning of old data.
|
||||
|
||||
@@ -852,7 +852,7 @@ Additionally, VictoriaMetrics can accept metrics via the following popular data
|
||||
* DataDog `submit metrics` API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/) for details.
|
||||
* InfluxDB line protocol. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/#influxdb-compatible-agents-such-as-telegraf) for details.
|
||||
* Graphite plaintext protocol. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#ingesting) for details.
|
||||
* OpenTelemetry http API. See [these docs](#sending-data-via-opentelemetry) for details.
|
||||
* OpenTelemetry http API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) for details.
|
||||
* OpenTSDB telnet put protocol. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentsdb/#sending-data-via-telnet) for details.
|
||||
* OpenTSDB http `/api/put` protocol. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentsdb/#sending-data-via-http) for details.
|
||||
* `/api/v1/import` for importing data obtained from [/api/v1/export](#how-to-export-data-in-json-line-format).
|
||||
@@ -863,7 +863,7 @@ Additionally, VictoriaMetrics can accept metrics via the following popular data
|
||||
* `/api/v1/import/prometheus` for importing data in Prometheus exposition format and in [Pushgateway format](https://github.com/prometheus/pushgateway#url).
|
||||
See [these docs](#how-to-import-data-in-prometheus-exposition-format) for details.
|
||||
|
||||
Please note, most of the ingestion APIs (except [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), [OpenTelemetry](#sending-data-via-opentelemetry) and [Influx Line Protocol](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/#influxdb-compatible-agents-such-as-telegraf))
|
||||
Please note, most of the ingestion APIs (except [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), [OpenTelemetry](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) and [Influx Line Protocol](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/#influxdb-compatible-agents-such-as-telegraf))
|
||||
are optimized for performance and processes data in a streaming fashion.
|
||||
It means that client can transfer unlimited amount of data through the open connection. Because of this, import APIs
|
||||
may not return parsing errors to the client, as it is expected for data stream to be not interrupted.
|
||||
@@ -1031,53 +1031,6 @@ Note that it could be required to flush response cache after importing historica
|
||||
|
||||
VictoriaMetrics also may scrape Prometheus targets - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
|
||||
|
||||
### Sending data via OpenTelemetry
|
||||
|
||||
VictoriaMetrics supports data ingestion via [OpenTelemetry protocol (OTLP) for metrics](https://github.com/open-telemetry/opentelemetry-specification/blob/ffddc289462dfe0c2041e3ca42a7b1df805706de/specification/metrics/data-model.md) at `/opentelemetry/v1/metrics` path.
|
||||
It expects `protobuf`-encoded requests at `/opentelemetry/v1/metrics`. For gzip-compressed workload set HTTP request header `Content-Encoding: gzip`.
|
||||
|
||||
Use the following OpenTelemetry collector exporter configuration to push metrics to VictoriaMetrics:
|
||||
|
||||
```yaml
|
||||
exporters:
|
||||
otlphttp/victoriametrics:
|
||||
compression: gzip
|
||||
encoding: proto
|
||||
endpoint: http://<collector/vmagent>.<namespace>.svc.cluster.local:<port>/opentelemetry
|
||||
```
|
||||
|
||||
> Note, [cluster version of VM](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format) expects specifying tenant ID, i.e. `http://<vminsert>:<port>/insert/<accountID>/opentelemetry`.
|
||||
> See more about [multitenancy](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy).
|
||||
|
||||
Remember to add the exporter to the desired service pipeline to activate the exporter.
|
||||
|
||||
```yaml
|
||||
service:
|
||||
pipelines:
|
||||
metrics:
|
||||
exporters:
|
||||
- otlphttp/victoriametrics
|
||||
receivers:
|
||||
- otlp
|
||||
```
|
||||
|
||||
By default, VictoriaMetrics stores the ingested OpenTelemetry [metric samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples) as is **without any transformations**.
|
||||
The following label transformations can be enabled:
|
||||
* `--usePromCompatibleNaming` - replaces characters unsupported by Prometheus with `_` in metric names and labels **for all ingestion protocols**.
|
||||
For example, `process.cpu.time{service.name="foo"}` is converted to `process_cpu_time{service_name="foo"}`.
|
||||
* `--opentelemetry.usePrometheusNaming` - converts metric names and labels according to [OTLP Metric points to Prometheus specification](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.33.0/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus) for metrics ingested via OTLP.
|
||||
For example, `process.cpu.time{service.name="foo"}` is converted to `process_cpu_time_seconds_total{service_name="foo"}`.
|
||||
* `-opentelemetry.convertMetricNamesToPrometheus` - converts **only metric names** according to [OTLP Metric points to Prometheus specification](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.33.0/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus) for metrics ingested via OTLP.
|
||||
For example, `process.cpu.time{service.name="foo"}` is converted to `process_cpu_time_seconds_total{service.name="foo"}`. See more about this use case [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830).
|
||||
|
||||
> These flags can applied on vmagent, vminsert or VictoriaMetrics single-node.
|
||||
|
||||
OpenTelemetry [exponential histogram](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram) is automatically converted
|
||||
to [VictoriaMetrics histogram format](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350).
|
||||
|
||||
See [How to use OpenTelemetry metrics with VictoriaMetrics](https://docs.victoriametrics.com/guides/getting-started-with-opentelemetry/).
|
||||
See more about [OpenTelemetry in VictoriaMetrics](https://docs.victoriametrics.com/opentelemetry/).
|
||||
|
||||
## JSON line format
|
||||
|
||||
VictoriaMetrics accepts data in JSON line format at [/api/v1/import](#how-to-import-data-in-json-line-format)
|
||||
@@ -1378,7 +1331,7 @@ see [these docs](https://docs.victoriametrics.com/victoriametrics/stream-aggrega
|
||||
## Metrics Metadata
|
||||
|
||||
Single-node VictoriaMetrics can store metric metadata (`TYPE`, `HELP`, `UNIT`) {{% available_from "v1.130.0" %}}.
|
||||
Metadata ingestion and querying are enabled by default{{% available_from "#" %}}. To disable them, set `-enableMetadata=false`.
|
||||
Metadata ingestion and querying are enabled by default{{% available_from "v1.137.0" %}}. To disable them, set `-enableMetadata=false`.
|
||||
|
||||
The metadata is cached in-memory in a ring buffer and can use up to 1% of available memory by default (see `-storage.maxMetadataStorageSize` cmd-line flag).
|
||||
When in-memory size is exceeded, the least updated entries are dropped first. Entries that weren't updated for 1h are cleaned up automatically.
|
||||
@@ -1466,21 +1419,22 @@ See also [how to work with snapshots](#how-to-work-with-snapshots) and [IndexDB]
|
||||
## IndexDB
|
||||
|
||||
VictoriaMetrics identifies
|
||||
[time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) by
|
||||
`TSID` (time series ID) and stores
|
||||
[raw samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples) sorted
|
||||
by TSID (see [Storage](#storage)). Thus, the TSID is a primary index and could
|
||||
be used for searching and retrieving raw samples. However, the TSID is never
|
||||
exposed to the clients, i.e. it is for internal use only.
|
||||
[time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series)
|
||||
by `TSID` (time series ID) and stores
|
||||
[raw samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples)
|
||||
sorted by TSID (see [Storage](#storage)). Thus, the TSID is a primary index and
|
||||
could be used for searching and retrieving raw samples. However, the TSID is
|
||||
never exposed to the clients, i.e. it is for internal use only.
|
||||
|
||||
Instead, VictoriaMetrics maintains an **inverted index** that enables searching
|
||||
the raw samples by metric name, label name, and label value by mapping these
|
||||
values to the corresponding TSIDs.
|
||||
Instead, VictoriaMetrics maintains an **inverted index** (known as `indexDB`)
|
||||
that enables searching the raw samples by metric name, label name, and label
|
||||
value by mapping these values to the corresponding TSIDs. Every data
|
||||
[partition](#storage) has its own indexDB.
|
||||
|
||||
VictoriaMetrics uses two types of inverted indexes:
|
||||
|
||||
* Global index. Searches using this index is performed across the entire
|
||||
retention period.
|
||||
partition time range.
|
||||
* Per-day index. This index stores mappings similar to ones in global index
|
||||
but also includes the date in each mapping. This speeds up data retrieval
|
||||
for queries within a shorter time range (which is often just the last day).
|
||||
@@ -1488,19 +1442,18 @@ VictoriaMetrics uses two types of inverted indexes:
|
||||
When the search query is executed, VictoriaMetrics decides which index to use
|
||||
based on the time range of the query:
|
||||
|
||||
* Per-day index is used if the search time range is 40 days or less.
|
||||
* Global index is used for search queries with a time range greater than 40
|
||||
days.
|
||||
* Per-day index is used if the search time range is less than the partition time range.
|
||||
* Global index is used for search queries with a time range that matches exactly
|
||||
or greater than the partition time range.
|
||||
|
||||
Mappings are added to the indexes during the data ingestion:
|
||||
|
||||
* In global index each mapping is created only once per retention period.
|
||||
* In global index each mapping is created only once per partition.
|
||||
* In the per-day index each mapping is created for each unique date that
|
||||
has been seen in the samples for the corresponding time series.
|
||||
|
||||
IndexDB respects [retention period](#retention) and once it is over, the indexes
|
||||
are dropped. For the new retention period, the indexes are gradually populated
|
||||
again as the new samples arrive.
|
||||
Since indexDB is a part of a partition, it is dropped along with it as it
|
||||
becomes outside the [retention period](#retention).
|
||||
|
||||
See also [Why IndexDB size is so large?](https://docs.victoriametrics.com/victoriametrics/faq/#why-indexdb-size-is-so-large).
|
||||
|
||||
@@ -2192,10 +2145,13 @@ It is also possible removing [rollup result cache](#rollup-result-cache) on star
|
||||
|
||||
### Rollup result cache
|
||||
|
||||
VictoriaMetrics caches query responses by default. This allows increasing performance for repeated queries
|
||||
VictoriaMetrics caches query responses by default and utilizes the cache for future queries when possible. This improves performance for repeated queries
|
||||
to [`/api/v1/query`](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#instant-query) and [`/api/v1/query_range`](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query)
|
||||
with the increasing `time`, `start` and `end` query args.
|
||||
|
||||
> For range query: the cache can be used for queries with the same expression and step.
|
||||
> For instant query: the cache can be used for queries with the same expression that uses a lookbehind window larger than `-search.minWindowForInstantRollupOptimization` and specific functions such as `xx_over_time`, `increase`, `rate`. (For `rate`, the cache result may be inaccurate in edge cases, see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10098#issuecomment-3895011084) for details)
|
||||
|
||||
This cache may work incorrectly when ingesting historical data into VictoriaMetrics. See [these docs](#backfilling) for details.
|
||||
|
||||
The rollup cache can be disabled either globally by running VictoriaMetrics with `-search.disableCache` command-line flag
|
||||
@@ -2501,6 +2457,11 @@ Moved to [integrations/opentsdb#sending-data-via-telnet](https://docs.victoriame
|
||||
|
||||
Moved to [integrations/opentsdb#sending-data-via-http](https://docs.victoriametrics.com/victoriametrics/integrations/opentsdb/#sending-data-via-http).
|
||||
|
||||
###### Sending data via OpenTelemetry
|
||||
|
||||
- See [OpenTelemetry integration](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) for protocol details, metric naming and histogram conversion.
|
||||
- See [OpenTelemetry Collector](https://docs.victoriametrics.com/victoriametrics/data-ingestion/opentelemetry-collector/) for collector configuration.
|
||||
|
||||
###### How to send data from NewRelic agent
|
||||
|
||||
Moved to [integrations/newrelic](https://docs.victoriametrics.com/victoriametrics/integrations/newrelic/).
|
||||
|
||||
@@ -134,6 +134,14 @@ and the candidate is deployed to the sandbox environment.
|
||||
* linux/ppc64le
|
||||
* linux/386
|
||||
This step can be run manually with the command `make publish` from the needed git tag.
|
||||
* c) [SPDX](https://spdx.dev/) SBOM attestations are
|
||||
generated automatically by BuildKit during
|
||||
`docker buildx build` (`--sbom=true`). SBOMs can
|
||||
be inspected with
|
||||
`docker buildx imagetools inspect <image> --format "{{ json .SBOM }}"`
|
||||
or consumed by vulnerability scanners such as
|
||||
[Trivy](https://github.com/aquasecurity/trivy) via
|
||||
`trivy image --sbom-sources oci <image-ref>`.
|
||||
|
||||
1. Run `TAG=v1.xx.y make github-create-release github-upload-assets`. This command performs the following tasks:
|
||||
|
||||
@@ -166,7 +174,7 @@ Issues included in the release are closed, with the comment.
|
||||
|
||||
1. Review the performance of the release candidate in the sandbox environment.
|
||||
If any issues are found, they must be addressed, and the release process restarted from [Step 1](#step-1) with an incremented release candidate version.
|
||||
1. Run `TAG=v1.xx.y EXTRA_DOCKER_TAG_SUFFIX=-rc1 make publish-final-images`. This command publishes the final release images from release candidate image for given `EXTRA_DOCKER_TAG_SUFFIX` and updates `latest` Docker image tag for the given `TAG`.
|
||||
1. Run `TAG=v1.xx.y EXTRA_DOCKER_TAG_SUFFIX=-rc1 make publish-final-images`. This command publishes the final release images from release candidate image for given `EXTRA_DOCKER_TAG_SUFFIX` and updates `latest` Docker image tag for the given `TAG`. SBOM attestations are preserved from the RC images by `imagetools create`.
|
||||
This command must be run only for the latest officially published release. It must be skipped when publishing other releases such as
|
||||
[LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/) or some test releases.
|
||||
1. Deploy the final images to the sandbox environment and perform a quick smoke test to verify basic functionality works.
|
||||
|
||||
@@ -12,96 +12,78 @@ aliases:
|
||||
- /troubleshooting/index.html
|
||||
- /troubleshooting/
|
||||
---
|
||||
This document contains troubleshooting guides for the most common issues when working with VictoriaMetrics:
|
||||
|
||||
- [General troubleshooting checklist](#general-troubleshooting-checklist)
|
||||
- [Unexpected query results](#unexpected-query-results)
|
||||
- [Slow data ingestion](#slow-data-ingestion)
|
||||
- [Slow queries](#slow-queries)
|
||||
- [Out of memory errors](#out-of-memory-errors)
|
||||
- [Cluster instability](#cluster-instability)
|
||||
- [Too much disk space used](#too-much-disk-space-used)
|
||||
- [Monitoring](#monitoring)
|
||||
This document contains troubleshooting guides for the most common issues when working with VictoriaMetrics.
|
||||
|
||||
## General troubleshooting checklist
|
||||
|
||||
If you hit some issue or have some question about VictoriaMetrics components,
|
||||
then please follow these steps in order to quickly find the solution:
|
||||
If you encounter an issue or have a question about VictoriaMetrics components, follow these steps to quickly find a solution:
|
||||
|
||||
1. Check the version of VictoriaMetrics component, you are troubleshooting and compare
|
||||
it to [the latest available version](https://docs.victoriametrics.com/victoriametrics/changelog/).
|
||||
If the used version is lower than the latest available version, then there are high chances
|
||||
that the issue is already resolved in newer versions. Carefully read [the changelog](https://docs.victoriametrics.com/victoriametrics/changelog/)
|
||||
between your version and the latest version and check whether the issue is already fixed there.
|
||||
1. Check the version of the VictoriaMetrics component you are troubleshooting and compare
|
||||
it with [the latest available version](https://docs.victoriametrics.com/victoriametrics/changelog/).
|
||||
|
||||
If the issue is already fixed in newer versions, then upgrade to the newer version and verify whether the issue is fixed:
|
||||
If you are running an older version, the issue may already be fixed. Review the [changelog](https://docs.victoriametrics.com/victoriametrics/changelog/)
|
||||
for all releases between your version and the latest release to see whether the problem has been resolved.
|
||||
|
||||
If the issue is fixed in a newer release, upgrade and verify that the problem no longer occurs:
|
||||
|
||||
- [How to upgrade single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-upgrade-victoriametrics)
|
||||
- [How to upgrade VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#updating--reconfiguring-cluster-nodes)
|
||||
|
||||
Upgrade procedure for other VictoriaMetrics components is as simple as gracefully stopping the component
|
||||
by sending `SIGINT` signal to it and starting the new version of the component.
|
||||
The upgrade procedure for other VictoriaMetrics components is as simple as gracefully stopping the component
|
||||
by sending it a `SIGINT` signal and starting the new version of the component.
|
||||
|
||||
There may be breaking changes between different versions of VictoriaMetrics components in rare cases.
|
||||
These cases are documented in [the changelog](https://docs.victoriametrics.com/victoriametrics/changelog/).
|
||||
So please read the changelog before the upgrade.
|
||||
In rare cases, upgrades may include breaking changes. These cases are documented in the [changelog](https://docs.victoriametrics.com/victoriametrics/changelog/),
|
||||
especially check the **Update notes** near the top of the changelog, as they point out any special actions or considerations to take when upgrading.
|
||||
|
||||
1. Inspect command-line flags passed to VictoriaMetrics components and remove flags that have unclear outcomes for your workload.
|
||||
VictoriaMetrics components are designed to work optimally with the default command-line flag values (e.g. when these flags aren't set explicitly).
|
||||
It is recommended to remove flags with unclear outcomes, since they may result in unexpected issues.
|
||||
1. Review command-line flags passed to VictoriaMetrics components and remove any flags whose impact on your workload is unclear.
|
||||
VictoriaMetrics components are optimized to work well with default settings (that is, when flags aren't explicitly set).
|
||||
Unnecessary or poorly understood flags can lead to unexpected behavior, so it's best to remove them unless you clearly understand why they are needed.
|
||||
|
||||
1. Check for logs in VictoriaMetrics components. They may contain useful information about cause of the issue
|
||||
and how to fix the issue. If the log message doesn't have enough useful information for troubleshooting,
|
||||
then search the log message in Google. There are high chances that the issue is already reported
|
||||
somewhere (docs, StackOverflow, Github issues, etc.) and the solution is already documented there.
|
||||
1. Check logs. They often contain useful details about the root cause and possible fixes.
|
||||
|
||||
1. If VictoriaMetrics logs have no relevant information, then try searching for the issue in Google
|
||||
via multiple keywords and phrases specific to the issue. There are high chances that the issue
|
||||
and the solution is already documented somewhere.
|
||||
If the logs don't provide enough information, try searching the error message on Google. In many cases, the issue has
|
||||
already been discussed (in documentation, on Stack Overflow, or in GitHub issues), and a solution may already be available.
|
||||
|
||||
1. Try searching for the issue at [VictoriaMetrics GitHub](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
|
||||
The signal/noise quality of search results here is much lower than in Google, but sometimes it may help
|
||||
finding the relevant information about the issue when Google fails to find the needed information.
|
||||
If you located the relevant GitHub issue, but it misses some information on how to diagnose or troubleshoot it,
|
||||
then please provide this information in comments to the issue. This increases chances that it will be resolved soon.
|
||||
1. If VictoriaMetrics logs do not have relevant information, then try searching for the issue on Google
|
||||
using multiple keywords and phrases specific to the issue. In many cases, both the issue and its solution are already documented.
|
||||
|
||||
1. Try searching for information about the issue in [VictoriaMetrics source code](https://github.com/search?q=repo%3AVictoriaMetrics%2FVictoriaMetrics&type=code).
|
||||
GitHub code search may be not very good in some cases, so it is recommended [checking out VictoriaMetrics source code](https://github.com/VictoriaMetrics/VictoriaMetrics/)
|
||||
and perform local search in the checked out code.
|
||||
Note that the source code for VictoriaMetrics cluster is located in [the cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) branch.
|
||||
1. Try searching for the issue in [VictoriaMetrics GitHub](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
|
||||
The signal-to-noise ratio of search results here is much lower than on Google, but sometimes it can help
|
||||
find relevant information when Google fails.
|
||||
If you located the relevant GitHub issue, but it lacks details for diagnosis or troubleshooting,
|
||||
then please add them in the issue comments. This increases the chance that it will be resolved soon.
|
||||
|
||||
1. Try searching for information about the issue in the history of [VictoriaMetrics Slack chat](https://victoriametrics.slack.com).
|
||||
There are non-zero chances that somebody already stuck with the same issue and documented the solution at Slack.
|
||||
1. Try searching for information about the issue in the [VictoriaMetrics source code](https://github.com/search?q=repo%3AVictoriaMetrics%2FVictoriaMetrics&type=code).
|
||||
GitHub code search may not be very effective in some cases, so it is recommended [to check out the VictoriaMetrics source code](https://github.com/VictoriaMetrics/VictoriaMetrics/)
|
||||
and perform a local search in the code.
|
||||
Note that the source code for the VictoriaMetrics cluster is located in [the cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) branch.
|
||||
|
||||
1. If steps above didn't help finding the solution to the issue, then please [file a new issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new/choose)
|
||||
by providing the maximum details on how to reproduce the issue.
|
||||
1. If the steps above didn't help to find the solution to the issue, then please [file a new issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new/choose)
|
||||
with as many details as possible on how to reproduce it.
|
||||
|
||||
After that you can post the link to the issue to [VictoriaMetrics Slack chat](https://victoriametrics.slack.com),
|
||||
so VictoriaMetrics community could help finding the solution to the issue. It is better filing the issue at VictoriaMetrics GitHub
|
||||
before posting your question to VictoriaMetrics Slack chat, since GitHub issues are indexed by Google,
|
||||
while Slack messages aren't indexed by Google. This simplifies searching for the solution to the issue for future VictoriaMetrics users.
|
||||
After that you can post the link to the issue in the [VictoriaMetrics Slack chat](https://victoriametrics.slack.com),
|
||||
so the VictoriaMetrics community can help find a solution. It is better to file the issue on VictoriaMetrics GitHub
|
||||
before posting your question to the VictoriaMetrics Slack chat, since GitHub issues are indexed by Google,
|
||||
while Slack messages are not. This simplifies finding a solution to the issue for future VictoriaMetrics users.
|
||||
|
||||
1. Pro tip 1: if you see that [VictoriaMetrics docs](https://docs.victoriametrics.com/victoriametrics/) contain incomplete or incorrect information,
|
||||
then please create a pull request with the relevant changes. This will help VictoriaMetrics community.
|
||||
|
||||
All the docs published at `https://docs.victoriametrics.com` are located in the [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/docs)
|
||||
folder inside VictoriaMetrics repository.
|
||||
then please create a pull request with the relevant changes or a new issue explaining the problem. This will help the VictoriaMetrics community.
|
||||
|
||||
1. Pro tip 2: please provide links to existing docs / GitHub issues / StackOverflow questions
|
||||
instead of copy-n-pasting the information from these sources when asking or answering questions
|
||||
from VictoriaMetrics community. If the linked resources have no enough information,
|
||||
then it is better posting the missing information in the web resource before providing links
|
||||
to this information in Slack chat. This will simplify searching for this information in the future
|
||||
instead of copying and pasting the information from these sources when asking or answering questions
|
||||
to the VictoriaMetrics community. If the linked resources do not have enough information,
|
||||
then it is better to add the missing information to the original web resource before linking it to Slack chat. This will simplify searching for this information in the future
|
||||
for VictoriaMetrics users via Google and [Perplexity](https://www.perplexity.ai/).
|
||||
|
||||
1. Pro tip 3: if you are answering somebody's question about VictoriaMetrics components
|
||||
at GitHub issues / Slack chat / StackOverflow, then the best answer is a direct link to the information
|
||||
regarding the question.
|
||||
in GitHub issues / Slack chat / StackOverflow, then the best answer is a direct link to the information
|
||||
with the answer or solution to the question.
|
||||
The better answer is a concise message with multiple links to the relevant information.
|
||||
The worst answer is a message with misleading or completely wrong information.
|
||||
|
||||
1. Pro tip 4: if you can fix the issue on yourself, then please do it and provide the corresponding pull request!
|
||||
We are glad to get pull requests from VictoriaMetrics community.
|
||||
1. Pro tip 4: If you can fix the issue on your own, then please do it and provide the corresponding pull request!
|
||||
We are happy to get pull requests from the VictoriaMetrics community.
|
||||
|
||||
## Unexpected query results
|
||||
|
||||
@@ -111,150 +93,150 @@ If you see unexpected or unreliable query results from VictoriaMetrics, then try
|
||||
`sum(rate(http_requests_total[5m])) by (job)`, then check whether the following queries return
|
||||
expected results:
|
||||
|
||||
- Remove the outer `sum` and execute `rate(http_requests_total[5m])`,
|
||||
since aggregations could hide some missing series, gaps in data or anomalies in existing series.
|
||||
If this query returns too many time series, then try adding more specific label filters to it.
|
||||
- Remove the outer `sum` and execute `rate(http_requests_total[5m])`.
|
||||
Aggregations could hide missing series, data gaps, or anomalies.
|
||||
|
||||
- If the query returns too many series, try adding more specific label filters.
|
||||
For example, if you see that the original query returns unexpected results for the `job="foo"`,
|
||||
then use `rate(http_requests_total{job="foo"}[5m])` query.
|
||||
If this isn't enough, then continue adding more specific label filters, so the resulting query returns
|
||||
manageable number of time series.
|
||||
then use the `rate(http_requests_total{job="foo"}[5m])` query.
|
||||
Continue adding more specific label filters until the resulting query returns a manageable number of time series.
|
||||
|
||||
- Remove the outer `rate` and execute `http_requests_total`. Additional label filters may be added here in order
|
||||
to reduce the number of returned series.
|
||||
- Remove the outer `rate` and execute `http_requests_total`. Add label filters to reduce the number of returned series
|
||||
if needed.
|
||||
|
||||
Sometimes the query may be improperly constructed, so it returns unexpected results.
|
||||
It is recommended reading and understanding [MetricsQL docs](https://docs.victoriametrics.com/victoriametrics/metricsql/),
|
||||
Sometimes the query may be improperly constructed, leading to unexpected results.
|
||||
It is recommended to read and understand [MetricsQL docs](https://docs.victoriametrics.com/victoriametrics/metricsql/),
|
||||
especially [subqueries](https://docs.victoriametrics.com/victoriametrics/metricsql/#subqueries)
|
||||
and [rollup functions](https://docs.victoriametrics.com/victoriametrics/metricsql/#rollup-functions) sections.
|
||||
|
||||
|
||||
1. If the simplest query continues returning unexpected / unreliable results, then try verifying correctness
|
||||
of raw unprocessed samples for this query via [/api/v1/export](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-export-data-in-json-line-format)
|
||||
on the given `[start..end]` time range and check whether they are expected:
|
||||
of raw unprocessed samples in [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) via the `Raw Query` tab.
|
||||
|
||||
Responses returned from [/api/v1/query](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#instant-query)
|
||||
and [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query) contain **evaluated** data
|
||||
instead of stored raw samples. In some cases, [staleness](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness),
|
||||
[deduplication](https://docs.victoriametrics.com/victoriametrics/#deduplication), or irregular scrapes can affect evaluations.
|
||||
See [this short video](https://www.youtube.com/watch?v=7AyVCC6uKfI) for details.
|
||||
|
||||
Raw data can be downloaded via the `Export` button in vmui's `Raw Query` tab or via [/api/v1/export](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-export-data-in-json-line-format)
|
||||
query on the given `[start..end]` time range and check whether they are expected:
|
||||
|
||||
```sh
|
||||
single-node: curl http://victoriametrics:8428/api/v1/export -d 'match[]=http_requests_total' -d 'start=...' -d 'end=...' -d 'reduce_mem_usage=1'
|
||||
|
||||
cluster: curl http://<vmselect>:8481/select/<tenantID>/prometheus/api/v1/export -d 'match[]=http_requests_total' -d 'start=...' -d 'end=...' -d 'reduce_mem_usage=1'
|
||||
```
|
||||
When raising a GitHub ticket about query issues, please also attach the raw data, so maintainers can reproduce your case locally.
|
||||
|
||||
Note that responses returned from [/api/v1/query](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#instant-query)
|
||||
and from [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query) contain **evaluated** data
|
||||
instead of raw samples stored in VictoriaMetrics. See [these docs](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness)
|
||||
for details. The raw samples can be also viewed in [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) in `Raw Query` tab and shared via `export` button.
|
||||
1. Try executing the query with [tracer](https://docs.victoriametrics.com/victoriametrics/#query-tracing) enabled. The trace
|
||||
contains a lot of additional information about query execution, series matching, caches, and internal modifications.
|
||||
When raising a GitHub ticket about query issues, please also attach the trace so maintainers can investigate.
|
||||
|
||||
If you migrate from InfluxDB, then pass `-search.setLookbackToStep` command-line flag to single-node VictoriaMetrics
|
||||
or to `vmselect` in VictoriaMetrics cluster. See also [how to migrate from InfluxDB to VictoriaMetrics](https://docs.victoriametrics.com/guides/migrate-from-influx/).
|
||||
1. If you observe gaps when plotting series, it is likely caused by irregular intervals for metrics collection (network delays
|
||||
or targets unavailability during scrapes, irregular pushes, irregular timestamps).
|
||||
VictoriaMetrics automatically [fills the gaps](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query)
|
||||
based on the median interval between [data samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples).
|
||||
This may yield incorrect results for irregular data, as the median will be skewed. In this case, it is recommended to either fix the
|
||||
irregularities or switch to the static interval for gaps filling by setting `-search.minStalenessInterval=5m` command-line flag (`5m` is
|
||||
used by Prometheus by default).
|
||||
|
||||
1. Sometimes response caching may lead to unexpected results when samples with older timestamps
|
||||
1. Sometimes, response caching may lead to unexpected results when samples with older timestamps
|
||||
are ingested into VictoriaMetrics (aka [backfilling](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#backfilling)).
|
||||
Try disabling response cache and see whether this helps. This can be done in the following ways:
|
||||
Try disabling response cache and see whether this helps:
|
||||
|
||||
- By clicking on the toggle `Disable cache` in vmui.
|
||||
|
||||
- By passing `-search.disableCache` command-line flag to a single-node VictoriaMetrics
|
||||
or to all the `vmselect` components if cluster version of VictoriaMetrics is used.
|
||||
or to all the `vmselect` components if the cluster version of VictoriaMetrics is used.
|
||||
|
||||
- By passing `nocache=1` query arg to every request to `/api/v1/query` and `/api/v1/query_range`.
|
||||
If you use Grafana, then this query arg can be specified in `Custom Query Parameters` field
|
||||
at Prometheus datasource settings - see [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
|
||||
If you use Grafana, then this query arg can be specified in the `Custom Query Parameters` field
|
||||
in Prometheus datasource settings. See [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
|
||||
|
||||
If the problem was in the cache, try resetting it via [resetRollupCache handler](https://docs.victoriametrics.com/victoriametrics/url-examples/#internalresetrollupresultcache).
|
||||
If the problem was in the cache, try resetting it via the [resetRollupCache handler](https://docs.victoriametrics.com/victoriametrics/url-examples/#internalresetrollupresultcache).
|
||||
|
||||
1. If you use cluster version of VictoriaMetrics, then it may return partial responses by default
|
||||
when some of `vmstorage` nodes are temporarily unavailable - see [cluster availability docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-availability)
|
||||
for details. If you want to prioritize query consistency over cluster availability,
|
||||
then you can pass `-search.denyPartialResponse` command-line flag to all the `vmselect` nodes.
|
||||
In this case VictoriaMetrics returns an error during querying if at least a single `vmstorage` node is unavailable.
|
||||
Another option is to pass `deny_partial_response=1` query arg to `/api/v1/query` and `/api/v1/query_range`.
|
||||
If you use Grafana, then this query arg can be specified in `Custom Query Parameters` field
|
||||
at Prometheus datasource settings - see [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
|
||||
1. Cluster version of VictoriaMetrics may return partial responses by default when some of the `vmstorage` nodes are temporarily
|
||||
unavailable. See [cluster availability docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-availability).
|
||||
If you want to prioritize query consistency over cluster availability, then pass `-search.denyPartialResponse` command-line flag to all the `vmselect` nodes.
|
||||
This causes VictoriaMetrics to return an error during query execution if at least one `vmstorage` node is unavailable.
|
||||
Another option is to pass `deny_partial_response=1` query argument to `/api/v1/query` and `/api/v1/query_range`.
|
||||
If you use Grafana, then this query argument can be specified in the `Custom Query Parameters` field
|
||||
in Prometheus/VictoriaMetrics datasource settings. See [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
|
||||
|
||||
1. If you pass `-replicationFactor` command-line flag to `vmselect`, then it is recommended removing this flag from `vmselect`,
|
||||
1. If you pass the `-replicationFactor` command-line flag to `vmselect`, then it is recommended to remove this flag from `vmselect`,
|
||||
since it may lead to incomplete responses when `vmstorage` nodes contain less than `-replicationFactor`
|
||||
copies of the requested data.
|
||||
|
||||
1. If you observe gaps when plotting time series try simplifying your query according to p2 and follow the list.
|
||||
If problem still remains, then it is likely caused by irregular intervals for metrics collection (network delays
|
||||
or targets unavailability on scrapes, irregular pushes, irregular timestamps).
|
||||
VictoriaMetrics automatically [fills the gaps](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query)
|
||||
based on median interval between [data samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples).
|
||||
This may work incorrectly for irregular data as median will be skewed. In this case it is recommended to switch
|
||||
to the static interval for gaps filling by setting `-search.minStalenessInterval=5m` command-line flag (`5m` is
|
||||
the static interval used by Prometheus).
|
||||
|
||||
1. If you observe recently written data is not immediately visible/queryable, then read more about
|
||||
1. If you observe that recently written data is not immediately visible/queryable, then read more about
|
||||
[query latency](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#query-latency) behavior.
|
||||
|
||||
1. Try upgrading to the [latest available version of VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
|
||||
and verifying whether the issue is fixed there.
|
||||
|
||||
1. Try executing the query with `trace=1` query arg. This enables query tracing, that may contain
|
||||
useful information on why the query returns unexpected data. See [query tracing docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing) for details.
|
||||
1. Inspect command-line flags passed to VictoriaMetrics components. If you don't clearly understand the purpose
|
||||
or the effect of some flags, then remove them from the list of flags.
|
||||
VictoriaMetrics components are optimized to work well with default settings (that is, when flags aren't explicitly set).
|
||||
Unnecessary or poorly understood flags can lead to unexpected behavior, so it's best to remove them unless you clearly understand why they are needed.
|
||||
|
||||
1. Inspect command-line flags passed to VictoriaMetrics components. If you don't understand clearly the purpose
|
||||
or the effect of some flags, then remove them from the list of flags passed to VictoriaMetrics components,
|
||||
because some command-line flags may change query results in unexpected ways when set to improper values.
|
||||
VictoriaMetrics is optimized for running with default flag values (e.g. when they aren't set explicitly).
|
||||
|
||||
1. If the steps above didn't help identifying the root cause of unexpected query results,
|
||||
then [file a bugreport](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new) with details on how to reproduce the issue.
|
||||
Instead of sharing screenshots in the issue, consider sharing query and [trace](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing)
|
||||
results in [VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) by clicking on `Export query` button in top right corner of the graph area.
|
||||
1. If the steps above didn't help identify the root cause of unexpected query results,
|
||||
then [file a bug report](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new) with details on how to reproduce the issue.
|
||||
Instead of sharing screenshots in the issue, consider sharing the query, [raw samples](https://docs.victoriametrics.com/victoriametrics/#vmui) and [trace](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing)
|
||||
results via [VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui).
|
||||
|
||||
## Slow data ingestion
|
||||
|
||||
These are the most commons reasons for slow data ingestion in VictoriaMetrics:
|
||||
These are the most common reasons for slow data ingestion in VictoriaMetrics:
|
||||
|
||||
1. Memory shortage for the given amounts of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series).
|
||||
|
||||
VictoriaMetrics (or `vmstorage` in cluster version of VictoriaMetrics) maintains an in-memory cache
|
||||
for quick search for internal series ids per each incoming metric.
|
||||
This cache is named `storage/tsid`. VictoriaMetrics automatically determines the maximum size for this cache
|
||||
depending on the available memory on the host where VictoriaMetrics (or `vmstorage`) runs. If the cache size isn't enough
|
||||
for holding all the entries for active time series, then VictoriaMetrics locates the needed data on disk,
|
||||
unpacks it, re-constructs the missing entry and puts it into the cache. This takes additional CPU time and disk read IO.
|
||||
VictoriaMetrics (or `vmstorage` in the cluster version of VictoriaMetrics) maintains an in-memory cache `storage/tsid`
|
||||
for a quick search for internal series IDs for each incoming metric. VictoriaMetrics automatically determines the maximum
|
||||
size for this cache depending on the available memory on the host where VictoriaMetrics (or `vmstorage`) runs.
|
||||
If the cache size isn't enough to hold all the entries for active time series, then VictoriaMetrics locates the required data on disk,
|
||||
unpacks it, reconstructs the missing entry, and adds it to the cache. This takes additional CPU time and disk read I/O.
|
||||
|
||||
The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
|
||||
contain `Slow inserts` graph, that shows the cache miss percentage for `storage/tsid` cache
|
||||
during data ingestion. If `slow inserts` graph shows values greater than 5% for more than 10 minutes,
|
||||
then it is likely the current number of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series)
|
||||
contain a `Slow inserts` graph that shows the cache miss percentage for the `storage/tsid` cache during data ingestion.
|
||||
If the `slow inserts` graph shows values greater than 5% for more than 10 minutes,
|
||||
then it is likely that the current number of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series)
|
||||
cannot fit the `storage/tsid` cache.
|
||||
|
||||
These are the solutions that exist for this issue:
|
||||
|
||||
- To increase the available memory on the host where VictoriaMetrics runs until `slow inserts` percentage
|
||||
will become lower than 5%. If you run VictoriaMetrics cluster, then you need increasing total available
|
||||
memory at `vmstorage` nodes. This can be done in two ways: either to increase the available memory
|
||||
per each existing `vmstorage` node or to add more `vmstorage` nodes to the cluster.
|
||||
- Increase the available memory on the host where VictoriaMetrics runs until the `slow inserts` percentage
|
||||
drops to 5% or less. If you run a VictoriaMetrics cluster, then you need to increase the total available
|
||||
memory at all `vmstorage` nodes. This can be done in two ways: either to increase the available memory
|
||||
for each `vmstorage` node or to add more `vmstorage` nodes to the cluster to spread the load.
|
||||
|
||||
- To reduce the number of active time series. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
|
||||
contain a graph showing the number of active time series. Recent versions of VictoriaMetrics
|
||||
provide [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer),
|
||||
that can help determining and fixing the source of [high cardinality](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-cardinality).
|
||||
- Reduce the number of active time series. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
|
||||
contain a graph showing the number of active time series. Use the [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer)
|
||||
to determine and fix the source of [high cardinality](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-cardinality).
|
||||
|
||||
- Insert performance can degrade when the same time series arrives with labels in different order.
|
||||
- Insert performance can degrade when the same time series arrives with labels in a different order.
|
||||
Ensure your ingestion client always sends labels in a consistent order for each series.
|
||||
Prometheus and `vmagent` already guarantee this, but custom or third-party clients might not.
|
||||
As a fallback, you can enable `-sortLabels=true` on VictoriaMetrics or on `vminsert` in cluster mode.
|
||||
This forces the server to normalize label order, though it increases CPU usage during ingestion.
|
||||
|
||||
1. [High churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate),
|
||||
e.g. when old time series are substituted with new time series at a high rate.
|
||||
When VictoriaMetrics encounters a sample for new time series, it needs to register the time series
|
||||
in the internal index (aka `indexdb`), so it can be quickly located on subsequent select queries.
|
||||
1. [High churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate).
|
||||
When VictoriaMetrics encounters a sample for a new time series, it needs to register the time series
|
||||
in the internal index (aka `indexdb`), so it can be quickly located during select queries.
|
||||
The process of registering new time series in the internal index is an order of magnitude slower
|
||||
than the process of adding new sample to already registered time series.
|
||||
than the process of adding a new sample to an already registered time series.
|
||||
So VictoriaMetrics may work slower than expected under [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate).
|
||||
|
||||
The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
|
||||
provides `Churn rate` graph, that shows the average number of new time series registered
|
||||
provide a `Churn rate` graph, which shows the average number of new time series registered
|
||||
during the last 24 hours. If this number exceeds the number of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series),
|
||||
then you need to identify and fix the source of [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate).
|
||||
The most common source of high churn rate is a label, that frequently changes its value. Try avoiding such labels.
|
||||
The [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer) can help identifying
|
||||
The most common source of high churn rate is a label that frequently changes its value (like timestamp, session_id). **Try avoiding such labels.**
|
||||
The [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer) can help identify
|
||||
such labels.
|
||||
|
||||
1. Resource shortage. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
|
||||
contain `resource usage` graphs, that show memory usage, CPU usage, disk IO usage and free disk size.
|
||||
Make sure VictoriaMetrics has enough free resources for graceful handling of potential spikes in workload
|
||||
contain `Resource usage` graphs that show memory usage, CPU usage, disk I/O usage, etc.
|
||||
Make sure VictoriaMetrics has enough free resources for gracefully handling potential spikes in workload
|
||||
according to the following recommendations:
|
||||
|
||||
- 50% of free CPU
|
||||
@@ -262,52 +244,51 @@ These are the most commons reasons for slow data ingestion in VictoriaMetrics:
|
||||
- 20% of free disk space
|
||||
|
||||
If VictoriaMetrics components have lower amounts of free resources, then this may lead
|
||||
to **significant** performance degradation after workload increases slightly.
|
||||
to **significant** performance degradation when workload increases slightly.
|
||||
For example:
|
||||
|
||||
- If the percentage of free CPU is close to 0, then VictoriaMetrics
|
||||
may experience arbitrary long delays during data ingestion when it cannot keep up
|
||||
with slightly increased data ingestion rate.
|
||||
may experience arbitrarily long delays during data ingestion, even with slight increases in ingestion rate.
|
||||
|
||||
- If the percentage of free memory reaches 0, then the Operating System where VictoriaMetrics components run,
|
||||
may not have enough memory for [page cache](https://en.wikipedia.org/wiki/Page_cache).
|
||||
VictoriaMetrics relies on page cache for quick queries over recently ingested data.
|
||||
If the operating system has no enough free memory for page cache, then it needs
|
||||
to re-read the requested data from disk. This may **significantly** increase disk read IO
|
||||
- If the percentage of free memory reaches 0, then the Operating System where VictoriaMetrics components run
|
||||
may not have enough memory for the [page cache](https://en.wikipedia.org/wiki/Page_cache).
|
||||
VictoriaMetrics relies on the page cache for quick queries over recently ingested data.
|
||||
If the operating system does not have enough free memory for the page cache, then it must
|
||||
re-read the requested data from disk. This may **significantly** increase disk read I/O
|
||||
and slow down both queries and data ingestion.
|
||||
|
||||
- If free disk space is lower than 20%, then VictoriaMetrics is unable to perform optimal
|
||||
background merge of the incoming data. This leads to increased number of data files on disk,
|
||||
that, in turn, slows down both data ingestion and querying. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#storage) for details.
|
||||
- If free disk space is below 20%, then VictoriaMetrics may be unable to perform optimal
|
||||
background merge of the incoming data. This results in more data files on disk.
|
||||
That, in turn, slows down both data ingestion and querying. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#storage) for details.
|
||||
|
||||
1. If you run cluster version of VictoriaMetrics, then make sure `vminsert` and `vmstorage` components
|
||||
are located in the same network with small network latency between them.
|
||||
`vminsert` packs incoming data into batch packets and sends them to `vmstorage` one-by-one.
|
||||
It waits until `vmstorage` returns back `ack` response before sending the next packet.
|
||||
1. If you run the cluster version of VictoriaMetrics, then make sure `vminsert` and `vmstorage` components
|
||||
are located in the same network with a low network latency between them.
|
||||
`vminsert` packs incoming data into batch packets and sends them to `vmstorage` one by one.
|
||||
It waits until `vmstorage` returns back an `ack` response before sending the next packet.
|
||||
If the network latency between `vminsert` and `vmstorage` is high (for example, if they run in different datacenters),
|
||||
then this may become limiting factor for data ingestion speed.
|
||||
then this may become a limiting factor for data ingestion speed.
|
||||
|
||||
The [official Grafana dashboard for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
|
||||
contain `connection saturation` graph for `vminsert` components. If this graph reaches 100% (1s),
|
||||
The [official Grafana dashboard for the cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
|
||||
contains a `connection saturation` panel for `vminsert` components. If this graph reaches 100% (1s),
|
||||
then it is likely you have issues with network latency between `vminsert` and `vmstorage`.
|
||||
Another possible issue for 100% connection saturation between `vminsert` and `vmstorage`
|
||||
is resource shortage at `vmstorage` nodes. In this case you need to increase amounts
|
||||
of available resources (CPU, RAM, disk IO) at `vmstorage` nodes or to add more `vmstorage` nodes to the cluster.
|
||||
is a resource shortage in the `vmstorage` nodes. In this case, you need to increase the amount
|
||||
of available resources (CPU, RAM, disk I/O) at `vmstorage` nodes or add more `vmstorage` nodes to the cluster.
|
||||
|
||||
1. Noisy neighbor. Make sure VictoriaMetrics components run in an environment without other resource-hungry apps.
|
||||
Such apps may steal RAM, CPU, disk IO and network bandwidth, that is needed for VictoriaMetrics components.
|
||||
Issues like this are very hard to catch via [official Grafana dashboard for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
|
||||
Such apps may steal RAM, CPU, disk I/O, and network bandwidth that are needed for VictoriaMetrics components.
|
||||
Issues like this are hard to catch via the [official Grafana dashboard for the cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
|
||||
and proper diagnosis would require checking resource usage on the instances where VictoriaMetrics runs.
|
||||
|
||||
1. If you see `TooHighSlowInsertsRate` [alert](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring) when single-node VictoriaMetrics or `vmstorage` has enough
|
||||
free CPU and RAM, then increase `-cacheExpireDuration` command-line flag at single-node VictoriaMetrics or at `vmstorage` to the value,
|
||||
1. If you see a `TooHighSlowInsertsRate` [alert](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring) when single-node VictoriaMetrics or `vmstorage` has enough
|
||||
free CPU and RAM, then increase the `-cacheExpireDuration` command-line flag at single-node VictoriaMetrics or at `vmstorage` to a value
|
||||
that exceeds the interval between ingested samples for the same time series (aka `scrape_interval`).
|
||||
See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183) for more details.
|
||||
|
||||
1. If you see constant and abnormally high CPU usage of VictoriaMetrics component, check `CPU spent on GC` panel
|
||||
on the corresponding [Grafana dashboard](https://grafana.com/orgs/victoriametrics) in `Resource usage` section. If percentage of CPU time spent on garbage collection
|
||||
is high, then CPU usage of the component can be reduced at the cost of higher memory usage by changing [GOGC](https://tip.golang.org/doc/gc-guide#GOGC) environment variable
|
||||
to higher values. By default VictoriaMetrics components use `GOGC=30`. Try running VictoriaMetrics components with `GOGC=100` and see whether this helps reducing CPU usage.
|
||||
1. If you see constant and abnormally high CPU usage for the VictoriaMetrics component, check the `CPU spent on GC` panel
|
||||
on the corresponding [Grafana dashboard](https://grafana.com/orgs/victoriametrics) in the `Resource usage` section. If the percentage of CPU time spent on garbage collection
|
||||
is high, then CPU usage of the component can be reduced at the cost of higher memory usage by increasing the [GOGC](https://tip.golang.org/doc/gc-guide#GOGC) environment variable.
|
||||
By default, VictoriaMetrics components use `GOGC=30`. Try running VictoriaMetrics components with `GOGC=100` and see whether this helps reduce CPU usage.
|
||||
Note that higher `GOGC` values may increase memory usage.
|
||||
|
||||
## Slow queries
|
||||
@@ -316,40 +297,43 @@ Some queries may take more time and resources (CPU, RAM, network bandwidth) than
|
||||
VictoriaMetrics logs slow queries if their execution time exceeds the duration passed
|
||||
to `-search.logSlowQueryDuration` command-line flag (5s by default).
|
||||
|
||||
VictoriaMetrics provides [`top queries` page at VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#top-queries), that shows
|
||||
queries that took the most time to execute.
|
||||
VictoriaMetrics provides a [`top queries` page in VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#top-queries) that shows
|
||||
the longest-running queries. And [Query execution stats](https://docs.victoriametrics.com/victoriametrics/query-stats/) for dumping slow queries
|
||||
to logs.
|
||||
|
||||
These are the solutions that exist for improving performance of slow queries:
|
||||
These are the solutions that exist for improving the performance of slow queries:
|
||||
|
||||
- Investigating the bottleneck in query execution using [query tracing](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing).
|
||||
It will show the percentage of time spent on each execution step and help understand the volume of processed data.
|
||||
|
||||
- Adding more CPU and memory to VictoriaMetrics, so it may perform the slow query faster.
|
||||
If you use cluster version of VictoriaMetrics, then migrating `vmselect` nodes to machines
|
||||
with more CPU and RAM should help improving speed for slow queries. Query performance
|
||||
is always limited by resources of one `vmselect` that processes the query. For example, if 2vCPU cores on `vmselect`
|
||||
isn't enough to process query fast enough, then migrating `vmselect` to a machine with 4vCPU cores should increase heavy query performance by up to 2x.
|
||||
If the line on `concurrent select` graph form the [official Grafana dashboard for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
|
||||
If you use the cluster version of VictoriaMetrics, then migrating `vmselect` nodes to machines
|
||||
with more CPU and RAM should help improve speed for slow queries. Query performance
|
||||
is always limited by the resources of **one** `vmselect` that processes the query. For example, if 2 vCPU cores on `vmselect`
|
||||
can't process queries fast enough, then migrating `vmselect` to a machine with 4 vCPU cores should increase heavy query performance by up to 2x.
|
||||
If the line on the `concurrent select` graph from the [official Grafana dashboard for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
|
||||
is close to the limit, then prefer adding more `vmselect` nodes to the cluster.
|
||||
Sometimes adding more `vmstorage` nodes also can help improving the speed for slow queries.
|
||||
Sometimes adding more `vmstorage` nodes can also help improve the speed for slow queries.
|
||||
|
||||
- Rewriting slow queries, so they become faster. Unfortunately it is hard determining
|
||||
whether the given query is slow by just looking at it.
|
||||
- Rewriting slow queries, so they become faster.
|
||||
|
||||
The main source of slow queries in practice is [alerting and recording rules](https://docs.victoriametrics.com/victoriametrics/vmalert/#rules)
|
||||
with long lookbehind windows in square brackets. These queries are frequently used in SLI/SLO calculations such as [Sloth](https://github.com/slok/sloth).
|
||||
|
||||
For example, `avg_over_time(up[30d]) > 0.99` needs to read and process
|
||||
all the [raw samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples)
|
||||
for `up` [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) over the last 30 days
|
||||
each time it executes. If this query is executed frequently, then it can take significant share of CPU, disk read IO, network bandwidth and RAM.
|
||||
for the `up` [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) over the last 30 days
|
||||
each time it executes. If this query is executed frequently, it can take a significant share of CPU, disk read I/O, network bandwidth, and RAM.
|
||||
Such queries can be optimized in the following ways:
|
||||
|
||||
- To reduce the lookbehind window in square brackets. For example, `avg_over_time(up[10d])` takes up to 3x less compute resources
|
||||
- To reduce the look-behind window in square brackets. For example, `avg_over_time(up[10d])` takes up to 3x less compute resources
|
||||
than `avg_over_time(up[30d])` at VictoriaMetrics.
|
||||
- To increase evaluation interval for alerting and recording rules, so they are executed less frequently.
|
||||
For example, increasing `-evaluationInterval` command-line flag value at [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/)
|
||||
from `1m` to `2m` should reduce compute resource usage at VictoriaMetrics by 2x.
|
||||
- To increase the evaluation interval for alerting and recording rules, so they are executed less frequently.
|
||||
For example, increasing the `-evaluationInterval` command-line flag value at [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/)
|
||||
from `1m` to `2m` should reduce compute resource usage by VictoriaMetrics 2x.
|
||||
|
||||
Another source of slow queries is improper use of [subqueries](https://docs.victoriametrics.com/victoriametrics/metricsql/#subqueries).
|
||||
It is recommended avoiding subqueries if you don't understand clearly how they work.
|
||||
It is recommended to avoid subqueries if you don't clearly understand how they work.
|
||||
It is easy to create a subquery without knowing about it.
|
||||
For example, `rate(sum(some_metric))` is implicitly transformed into the following subquery
|
||||
according to [implicit conversion rules for MetricsQL queries](https://docs.victoriametrics.com/victoriametrics/metricsql/#implicit-query-conversions):
|
||||
@@ -365,67 +349,64 @@ These are the solutions that exist for improving performance of slow queries:
|
||||
It is likely this query won't return the expected results. Instead, `sum(rate(some_metric))` must be used instead.
|
||||
See [this article](https://www.robustperception.io/rate-then-sum-never-sum-then-rate/) for more details.
|
||||
|
||||
VictoriaMetrics provides [query tracing](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing) feature,
|
||||
that can help determining the source of slow query.
|
||||
See also [this article](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986),
|
||||
that explains how to determine and optimize slow queries.
|
||||
which explains how to identify and optimize slow queries.
|
||||
|
||||
## Out of memory errors
|
||||
|
||||
There are the following most common sources of out of memory (aka OOM) crashes in VictoriaMetrics:
|
||||
The following are the most common sources of out-of-memory (aka OOM) crashes in VictoriaMetrics:
|
||||
|
||||
1. Improper command-line flag values. Inspect command-line flags passed to VictoriaMetrics components.
|
||||
If you don't understand clearly the purpose or the effect of some flags - remove them
|
||||
from the list of flags passed to VictoriaMetrics components. Improper command-line flags values
|
||||
may lead to increased memory and CPU usage. The increased memory usage increases chances for OOM crashes.
|
||||
VictoriaMetrics is optimized for running with default flag values (e.g. when they aren't set explicitly).
|
||||
If you don't clearly understand the purpose or the effect of some flags, remove them
|
||||
from the list of flags passed to VictoriaMetrics components. Improper command-line flag values
|
||||
may lead to increased memory and CPU usage. Increased memory usage increases the risk of OOM crashes.
|
||||
VictoriaMetrics is optimized to run with default flag values (e.g., when they aren't explicitly set).
|
||||
|
||||
For example, it isn't recommended tuning cache sizes in VictoriaMetrics, since it frequently leads to OOM exceptions.
|
||||
[These docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning) refer command-line flags, that aren't
|
||||
recommended to tune. If you see that VictoriaMetrics needs increasing some cache sizes for the current workload,
|
||||
then it is better migrating to a host with more memory instead of trying to tune cache sizes manually.
|
||||
For example, it isn't recommended to change cache sizes in VictoriaMetrics, as this frequently leads to OOM exceptions.
|
||||
[These docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning) refer to command-line flags that aren't
|
||||
recommended to tune. If you see that VictoriaMetrics needs to increase some cache sizes for the current workload,
|
||||
then it is better to migrate to a host with more memory instead of trying to tune cache sizes manually.
|
||||
|
||||
1. Unexpected heavy queries. The query is considered as heavy if it needs to select and process millions of unique time series.
|
||||
Such query may lead to OOM exception, since VictoriaMetrics needs to keep some of per-series data in memory.
|
||||
VictoriaMetrics provides [various settings](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#resource-usage-limits),
|
||||
1. Unexpected heavy queries. The query is considered heavy if it needs to select and process millions of unique time series.
|
||||
Such a query may cause an OOM exception, as VictoriaMetrics needs to keep some per-series data in memory.
|
||||
VictoriaMetrics provides [various settings](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#resource-usage-limits)
|
||||
that can help limit resource usage.
|
||||
For more context, see [How to optimize PromQL and MetricsQL queries](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986).
|
||||
VictoriaMetrics also provides [query tracer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing)
|
||||
to help identify the source of heavy query.
|
||||
to help identify the source of heavy queries. Slow queries can be logged with additional details via [Query execution stats](https://docs.victoriametrics.com/victoriametrics/query-stats/).
|
||||
|
||||
1. Lack of free memory for processing workload spikes. If VictoriaMetrics components use almost all the available memory
|
||||
under the current workload, then it is recommended migrating to a host with bigger amounts of memory.
|
||||
under the current workload, then it is recommended to migrate to a host with larger amounts of memory.
|
||||
This would protect from possible OOM crashes on workload spikes. It is recommended to have at least 50%
|
||||
of free memory for graceful handling of possible workload spikes.
|
||||
of free memory to gracefully handle possible workload spikes.
|
||||
See [capacity planning for single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#capacity-planning)
|
||||
and [capacity planning for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#capacity-planning).
|
||||
and [capacity planning for the cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#capacity-planning).
|
||||
|
||||
## Cluster instability
|
||||
|
||||
VictoriaMetrics cluster may become unstable if there is no enough free resources (CPU, RAM, disk IO, network bandwidth)
|
||||
The VictoriaMetrics cluster may become unstable if there are not enough free resources (CPU, RAM, disk I/O, network bandwidth)
|
||||
for processing the current workload.
|
||||
|
||||
The most common sources of cluster instability are:
|
||||
|
||||
- Workload spikes. For example, if the number of active time series increases by 2x while
|
||||
the cluster has no enough free resources for processing the increased workload,
|
||||
then it may become unstable.
|
||||
VictoriaMetrics provides various configuration settings, that can be used for limiting unexpected workload spikes.
|
||||
the cluster does not have enough free resources for processing the increased workload, then it may become unstable.
|
||||
VictoriaMetrics provides several configuration settings to limit unexpected workload spikes.
|
||||
See [these docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#resource-usage-limits) for details.
|
||||
|
||||
- Various maintenance tasks such as rolling upgrades or rolling restarts during configuration changes.
|
||||
- Various maintenance tasks, such as rolling upgrades or rolling restarts, during configuration changes.
|
||||
For example, if a cluster contains `N=3` `vmstorage` nodes and they are restarted one-by-one (aka rolling restart),
|
||||
then the cluster will have only `N-1=2` healthy `vmstorage` nodes during the rolling restart.
|
||||
This means that the load on healthy `vmstorage` nodes increases by at least `100%/(N-1)=50%`
|
||||
comparing to the load before rolling restart. E.g. they need to process 50% more incoming
|
||||
compared to the load before rolling restart. E.g., they need to process 50% more incoming
|
||||
data and to return 50% more data during queries. In reality, the load on the remaining `vmstorage`
|
||||
nodes increases even more because they need to register new time series, that were re-routed
|
||||
from temporarily unavailable `vmstorage` node. If `vmstorage` nodes had less than 50%
|
||||
of free resources (CPU, RAM, disk IO) before the rolling restart, then it
|
||||
nodes increases even more because they need to register new time series that were re-routed
|
||||
from a temporarily unavailable `vmstorage` node. If `vmstorage` nodes had less than 50%
|
||||
of free resources (CPU, RAM, disk I/O) before the rolling restart, then it
|
||||
can lead to cluster overload and instability for both data ingestion and querying.
|
||||
|
||||
The workload increase during rolling restart can be reduced by increasing
|
||||
the number of `vmstorage` nodes in the cluster. For example, if VictoriaMetrics cluster contains
|
||||
the number of `vmstorage` nodes in the cluster. For example, if the VictoriaMetrics cluster contains
|
||||
`N=11` `vmstorage` nodes, then the workload increase during rolling restart of `vmstorage` nodes
|
||||
would be `100%/(N-1)=10%`. It is recommended to have at least 8 `vmstorage` nodes in the cluster.
|
||||
The recommended number of `vmstorage` nodes should be multiplied by `-replicationFactor` if replication is enabled -
|
||||
@@ -433,11 +414,11 @@ The most common sources of cluster instability are:
|
||||
for details.
|
||||
|
||||
- Time series sharding. Received time series [are consistently sharded](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#architecture-overview)
|
||||
by `vminsert` between configured `vmstorage` nodes. As a sharding key `vminsert` is using time series name and labels,
|
||||
respecting their order. If the order of labels in time series is constantly changing, this could cause wrong sharding
|
||||
calculation and result in un-even and sub-optimal time series distribution across available vmstorages. It is expected
|
||||
that metrics pushing client is responsible for consistent labels order (like `Prometheus` or `vmagent` during scraping).
|
||||
If this can't be guaranteed, set `-sortLabels=true` command-line flag to `vminsert`. Please note, sorting may increase
|
||||
by `vminsert` between configured `vmstorage` nodes. As a sharding key, `vminsert` is using time series name and labels,
|
||||
respecting their order. If the order of labels in a time series is constantly changing, this could cause wrong sharding
|
||||
calculation and result in uneven and suboptimal time series distribution across available vmstorages. It is expected
|
||||
that the client who is pushing metrics is responsible for consistent label order (like `Prometheus` or `vmagent` during scraping).
|
||||
If this can't be guaranteed, set `-sortLabels=true` command-line flag to `vminsert`. Please note that sorting may increase
|
||||
CPU usage for `vminsert`.
|
||||
|
||||
- Network instability between cluster components (`vminsert`, `vmselect`, `vmstorage`) may lead to increased error rates, timeouts, or degraded performance.
|
||||
@@ -447,14 +428,14 @@ The most common sources of cluster instability are:
|
||||
but can still cause transient network failures. In such cases, check CPU usage at the OS level with higher-resolution tools.
|
||||
Consider increasing `-vmstorageDialTimeout` and `-rpc.handshakeTimeout`{{% available_from "v1.124.0" %}} to mitigate the effects of CPU spikes.
|
||||
|
||||
If resource usage looks normal but networking issues still occur, then the root cause is likely outside VictoriaMetrics.
|
||||
If resource usage appears normal but networking issues persist, the root cause is likely outside VictoriaMetrics.
|
||||
This may be caused by unreliable or congested network links, especially across availability zones or regions.
|
||||
In multi-AZ setups, consider [a multi-level cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multi-level-cluster-setup) with region-local load balancers to reduce cross-zone connections.
|
||||
If the network cannot be improved, increasing timeouts such as `-vmstorageDialTimeout`, `-rpc.handshakeTimeout`{{% available_from "v1.124.0" %}}, or `-search.maxQueueDuration` may help, but should be done cautiously, as higher timeouts can impact cluster stability in other ways.
|
||||
Keep in mind that VictoriaMetrics assumes reliable networking between components. If the network is unstable, the overall cluster stability may degrade regardless of resource availability.
|
||||
|
||||
The obvious solution against VictoriaMetrics cluster instability is to make sure cluster components
|
||||
have enough free resources for graceful processing of the increased workload.
|
||||
The obvious solution to VictoriaMetrics cluster instability is to make sure cluster components
|
||||
have sufficient free resources to handle the increased workload gracefully.
|
||||
See [capacity planning docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#capacity-planning)
|
||||
and [cluster resizing and scalability docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-resizing-and-scalability)
|
||||
for details.
|
||||
@@ -464,10 +445,10 @@ for details.
|
||||
If too much disk space is used by a [single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) or by `vmstorage` component
|
||||
at [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/), then please check the following:
|
||||
|
||||
- Make sure that there are no old snapshots, since they can occupy disk space. See [how to work with snapshots](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-work-with-snapshots)
|
||||
, [snapshot troubleshooting](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#snapshot-troubleshooting) and [vmbackup troubleshooting](https://docs.victoriametrics.com/victoriametrics/vmbackup/#troubleshooting).
|
||||
- Make sure that there are no old snapshots, since they can occupy disk space. See [how to work with snapshots](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-work-with-snapshots),
|
||||
[snapshot troubleshooting](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#snapshot-troubleshooting) and [vmbackup troubleshooting](https://docs.victoriametrics.com/victoriametrics/vmbackup/#troubleshooting).
|
||||
|
||||
- Under normal conditions the size of `<-storageDataPath>/indexdb` folder must be smaller than the size of `<-storageDataPath>/data` folder, where `-storageDataPath`
|
||||
- Under normal conditions, the size of `<-storageDataPath>/indexdb` folder must be smaller than the size of `<-storageDataPath>/data` folder, where `-storageDataPath`
|
||||
is the corresponding command-line flag value. This can be checked by the following query if [VictoriaMetrics monitoring](#monitoring) is properly set up:
|
||||
|
||||
```metricsql
|
||||
@@ -476,22 +457,22 @@ at [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cl
|
||||
sum(vm_data_size_bytes{type=~"(storage|indexdb)/.+"}) without(type)
|
||||
```
|
||||
|
||||
If this query returns values bigger than 0.5, then it is likely there is a [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate) issue,
|
||||
that results in excess disk space usage for both `indexdb` and `data` folders under `-storageDataPath` folder.
|
||||
The solution is to identify and fix the source of high churn rate with [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer).
|
||||
If this query returns values greater than 0.5, then it is likely there is a [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate) issue,
|
||||
that results in excess disk space usage for both the `indexdb` and `data` folders under the `-storageDataPath` folder.
|
||||
The solution is to identify and fix the source of the high churn rate with the [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer).
|
||||
|
||||
## Monitoring
|
||||
|
||||
Having proper [monitoring](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
|
||||
would help identify and prevent most of the issues listed above.
|
||||
can help identify and prevent most of the issues listed above.
|
||||
|
||||
[Grafana dashboards](https://grafana.com/orgs/victoriametrics/dashboards) contain panels reflecting the
|
||||
health state, resource usage and other specific metrics for VictoriaMetrics components.
|
||||
health state, resource usage, and other specific metrics for VictoriaMetrics components.
|
||||
|
||||
The list of [recommended alerting rules](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts)
|
||||
for VictoriaMetrics components will notify about issues and provide recommendations for how to solve them.
|
||||
Check the list of [recommended alerting rules](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts)
|
||||
for VictoriaMetrics components to receive notifications about issues and receive recommendations for resolving them.
|
||||
|
||||
Internally, we heavily rely both on dashboards and alerts, and constantly improve them.
|
||||
Internally, we rely heavily on both dashboards and alerts, and we constantly improve them.
|
||||
It is important to stay up to date with such changes.
|
||||
|
||||
|
||||
@@ -500,11 +481,12 @@ It is important to stay up to date with such changes.
|
||||
On some ZFS filesystems, mixing reads from memory-mapped files (`mmap`) with usage of the `mincore()` syscall can trigger a bug in the ZFS in-memory cache (ARC), potentially resulting in **data read corruption** in VictoriaMetrics processes. This scenario has been observed when VictoriaMetrics instances access data directories on ZFS.
|
||||
|
||||
Symptoms:
|
||||
Note that the source code for the VictoriaMetrics cluster is located in [the cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) branch.
|
||||
- Unexpected read errors when accessing data on ZFS.
|
||||
- Corrupted or inconsistent query results.
|
||||
- Crashes or panics in storage/query components when reading from ZFS.
|
||||
|
||||
It could be mitigated with `--fs.disableMincore` flag:
|
||||
It could be mitigated with the `--fs.disableMincore` flag:
|
||||
|
||||
```text
|
||||
./bin/victoria-metrics --storageDataPath /path/to/zfs/data --fs.disableMincore
|
||||
|
||||
@@ -26,15 +26,46 @@ See also [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-rel
|
||||
|
||||
## tip
|
||||
|
||||
* FEATURE: all VictoriaMetrics components: implement proper CORS preflight handling by responding 204 No Content to HTTP OPTIONS requests. See [#5563](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5563).
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): add `access_log` configuration option for each user that will log requests to stdout, and support filtering by HTTP status codes. See more in [docs](https://docs.victoriametrics.com/victoriametrics/vmauth/#access-log). See [#5936](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5936).
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): support negative values for the group `eval_offset` option, which allows starting group evaluation at `groupInterval-abs(eval_offset)` within `[0...groupInterval]`. See [#10424](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10424).
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): Disable `/graphite/tags/tagSeries` and `/graphite/tags/tagMultiSeries` for Graphite tag registration since it is unlikely it is used in context of VictoriaMetrics. See [10544](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10544).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): enhance metrics relabel debug by adding remote write relabel configs to the relabel configs input. See [#9918](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9918).
|
||||
|
||||
* BUGFIX: [dashboards/vmauth](https://grafana.com/grafana/dashboards/21394): fix `requested from system` and `heap inuse` expressions in the memory usage panel. See [#10574](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10574).
|
||||
* BUGFIX: [vmbackup](https://docs.victoriametrics.com/vmbackup/), [vmbackupmanager](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/): do not enable ACL when uploading backups to S3-compatible endpoints by default. ACL is not always supported by S3-compatible endpoints and it is not recommended to use ACLs to limit access to objects. See [#10539](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10539) for more details.
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly attach `host` label to the time series ingested via [/datadog/api/beta/sketches](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/#) API. See [#10557](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10557).
|
||||
|
||||
## [v1.137.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.137.0)
|
||||
|
||||
Released at 2026-02-27
|
||||
|
||||
**Update Note 1:** [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): default value of the flag `-promscrape.dropOriginalLabels` changed from `true` to `false`.
|
||||
It enables back `Discovered targets` debug UI by default.
|
||||
|
||||
* FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup/): can now copy backups between different storage backends, such as from s3 to local disk or gcs to s3. See [#10401](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10401). Thanks to @BenNF for the contribution.
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): add JWT token authentication support with signature verification based on provided `public_keys`. Read more about configuration in [JWT Token auth proxy](https://docs.victoriametrics.com/victoriametrics/vmauth/#jwt-token-auth-proxy) documentation. See [#10445](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10445).
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): support dynamic rewriting of upstream URLs and request headers using placeholders populated from JWT `vm_access` claim fields. This allows routing requests to the correct tenant backend without maintaining a separate user config entry per tenant. Read more in [JWT claim-based request templating](https://docs.victoriametrics.com/victoriametrics/vmauth/#jwt-claim-based-request-templating) documentation. See [#10492](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10492).
|
||||
* FEATURE: all VictoriaMetrics components: expose `process_cpu_seconds_total`, `process_resident_memory_bytes`, and other process-level metrics when running on macOS. See [metrics#75](https://github.com/VictoriaMetrics/metrics/issues/75).
|
||||
* FEATURE: [dashboards/vmauth](https://grafana.com/grafana/dashboards/21394): add `Request body buffering duration` panel to the `Troubleshooting` section. This panel shows the time spent buffering incoming client request bodies, helping identify slow client uploads and potential concurrency issues. The panel is only available when `-requestBufferSize` is non-zero. See [#10309](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): reduce CPU and memory usage when `-promscrape.dropOriginalLabels` command-line flag is set. See [#9952](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9952).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): enable [ingestion](https://docs.victoriametrics.com/victoriametrics/vmagent/#metric-metadata) and in-memory [storage](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#metrics-metadata) of metrics metadata by default. Metadata ingestion can be disabled with `-enableMetadata=false`. See [#2974](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974).
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): decode UTF-8 label names in the `label/<name>/values` API according to the Prometheus API specifications. See [#10446](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10446). Thanks to @utrack for the contribution.
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): increase default value for `-storage.minFreeDiskSpaceBytes` flag from 10M to 100M to reduce risk of panics under high ingestion on small disks. See [#9561](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9561).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): improve [InfluxDB ingestion](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/) parsing error message when a closing quote is missing for a quoted field value, by adding a hint that this may be caused by a raw newline (`\n`) inside the quoted field value. See [#10067](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067). Thanks to @hklhai for the contribution.
|
||||
* FEATURE: [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/): add [histogram_fraction](https://docs.victoriametrics.com/victoriametrics/metricsql/#histogram_fraction) function to calculate the fraction of buckets falling between lowerLe and upperLe. See [#5346](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5346).
|
||||
* FEATURE: [dashboards/alert-statistics](https://grafana.com/grafana/dashboards/24553): add `job` and `instance` filters to the `VictoriaMetrics - Alert statistics` dashboard. This allows users running multiple independent [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) instances to filter and analyze alerts statistics per specific instance, making it easier to identify issues in a particular vmalert deployment. See [#10549](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10549).
|
||||
* FEATURE: [dashboards/alert-statistics](https://grafana.com/grafana/dashboards/24553): add a link to a specific alerting rule on the table of firing alerts. See [#10508](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10508). Thanks to @sias32 for the contribution.
|
||||
* FEATURE: [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/rules): use `$externalURL` instead of `localhost` in the alerting rules. This should improve usability of the rules if `$externalURL` is correctly configured, without need to update rules annotations. See [#10508](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10508). Thanks to @sias32 for the contribution.
|
||||
* FEATURE: all VictoriaMetrics components: publish [SPDX](https://spdx.dev/) SBOM attestations for container images on `docker.io` and `quay.io`. See [SECURITY.md](https://docs.victoriametrics.com/victoriametrics/security/) and [#10474](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10474). Thanks to @smuda for the contribution.
|
||||
|
||||
* BUGFIX: all VictoriaMetrics components: return gzip-compressed response instead of zstd-compressed response to the client if `Accept-Encoding` request header contains both `gzip` and `zstd`. This is needed because some clients and proxies improperly handle zstd-compressed responses. See [#10535](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10535).
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): properly check expired client certificate during mTLS requests. See [#10393](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10393).
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent panic `error parsing regexp: expression nests too deeply` triggered by large repetition ranges in regex, for example `{"__name__"=~"a{0,1000}"}`. See [VictoriaLogs#1112](https://github.com/VictoriaMetrics/VictoriaLogs/issues/1112).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): fix escaping for label names with special characters. See [#10485](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10485).
|
||||
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly search tenants for [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) query request. See [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422).
|
||||
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly apply `extra_filters[]` filter when querying `vm_account_id` or `vm_project_id` labels via [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) request for `/api/v1/label/…/values` API. Before, `extra_filters` was ignored. See [#10503](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10503).
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): revert the use of rollup result cache for [instant queries](https://docs.victoriametrics.com/keyConcepts.html#instant-query) that contain [`rate`](https://docs.victoriametrics.com/MetricsQL.html#rate) function with a lookbehind window larger than `-search.minWindowForInstantRollupOptimization`. The cache usage was removed since [v1.132.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.132.0). See [#10098](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10098#issuecomment-3895011084) for more details.
|
||||
|
||||
## [v1.136.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.136.0)
|
||||
|
||||
@@ -63,6 +94,8 @@ Released at 2026-02-13
|
||||
|
||||
Released at 2026-01-30
|
||||
|
||||
**Update Note 1:** `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): has a bug affecting the `/select/multitenant/*` APIs. Due to an issue in the tenant search logic [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422), these endpoints may return incorrect results. The bug has been fixed and the correction will be included in `v1.137.0`.
|
||||
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): improved scrape size display. Sizes below 1024 bytes are now shown in `B`, and larger sizes are shown as whole `KiB` (rounded up). This prevents confusion where values like 123.456 KiB were interpreted as 123456 KiB, while the actual size was only 123 KiB. See [#10307](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10307).
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): allow buffering request bodies before proxying them to backends. This reduces load on backends when processing requests from slow clients such as IoT devices connected to `vmauth` via slow networks. See [#10309](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309) and [request body buffering docs](https://docs.victoriametrics.com/victoriametrics/vmauth/#request-body-buffering).
|
||||
* FEATURE: [vmbackupmanager](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/): allow completely disabling scheduled backups by using `-disableScheduledBackups` command-line flag. This is useful to run `vmbackupmanager` only for on-demand backups and restores triggered via API. See [#10364](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10364).
|
||||
@@ -186,6 +219,19 @@ See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/ch
|
||||
|
||||
See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/changelog_2025/#v11230)
|
||||
|
||||
## [v1.122.16](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.122.16)
|
||||
|
||||
Released at 2026-02-27
|
||||
|
||||
**v1.122.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
|
||||
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
|
||||
The v1.122.x line will be supported for at least 12 months since [v1.122.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11220) release**
|
||||
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent panic `error parsing regexp: expression nests too deeply` triggered by large repetition ranges in regex, for example `{"__name__"=~"a{0,1000}"}`. See [VictoriaLogs#1112](https://github.com/VictoriaMetrics/VictoriaLogs/issues/1112).
|
||||
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly search tenants for [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) query request. See [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422).
|
||||
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly apply `extra_filters[]` filter when querying `vm_account_id` or `vm_project_id` labels via [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) request for `/api/v1/label/…/values` API. Before, `extra_filters` was ignored. See [#10503](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10503).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): fix escaping for label names with special characters. See [#10485](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10485).
|
||||
|
||||
## [v1.122.15](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.122.15)
|
||||
|
||||
Released at 2026-02-13
|
||||
@@ -209,6 +255,8 @@ The v1.122.x line will be supported for at least 12 months since [v1.122.0](http
|
||||
|
||||
Released at 2026-01-30
|
||||
|
||||
**Update Note 1:** `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): has a bug affecting the `/select/multitenant/*` APIs. Due to an issue in the tenant search logic [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422), these endpoints may return incorrect results. The bug has been fixed and the correction will be included in `v1.122.16`.
|
||||
|
||||
**v1.122.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
|
||||
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
|
||||
The v1.122.x line will be supported for at least 12 months since [v1.122.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11220) release**
|
||||
@@ -336,6 +384,16 @@ See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/ch
|
||||
|
||||
See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/changelog_2025/#v11110)
|
||||
|
||||
## [v1.110.31](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.110.31)
|
||||
|
||||
Released at 2026-02-27
|
||||
|
||||
**v1.110.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
|
||||
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
|
||||
The v1.110.x line will be supported for at least 12 months since [v1.110.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11100) release**
|
||||
|
||||
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly search tenants for [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) query request. See [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422).
|
||||
|
||||
## [v1.110.30](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.110.30)
|
||||
|
||||
Released at 2026-02-13
|
||||
@@ -358,6 +416,8 @@ The v1.110.x line will be supported for at least 12 months since [v1.110.0](http
|
||||
|
||||
Released at 2026-01-30
|
||||
|
||||
**Update Note 1:** `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): has a bug affecting the `/select/multitenant/*` APIs. Due to an issue in the tenant search logic [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422), these endpoints may return incorrect results. The bug has been fixed and the correction will be included in `v1.110.31`.
|
||||
|
||||
**v1.110.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
|
||||
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
|
||||
The v1.110.x line will be supported for at least 12 months since [v1.110.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11100) release**
|
||||
|
||||
@@ -425,7 +425,7 @@ The previous behavior can be restored in the following ways:
|
||||
- `WITH (f(window, step, off) = m[window:step] offset off) f(5m, 10s, 1h)` is automatically transformed to `m[5m:10s] offset 1h`
|
||||
Thanks to @lujiajing1126 for the initial idea and [implementation](https://github.com/VictoriaMetrics/metricsql/pull/13). See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4025).
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): added a new page with the list of currently running queries. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4598) and [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#active-queries).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add support for data ingestion via [OpenTelemetry protocol](https://opentelemetry.io/docs/reference/specification/metrics/). See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry), [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2424) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2570).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add support for data ingestion via [OpenTelemetry protocol](https://opentelemetry.io/docs/reference/specification/metrics/). See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/), [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2424) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2570).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow sharding outgoing time series among the configured remote storage systems. This can be useful for building horizontally scalable [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/), when samples for the same time series must be aggregated by the same `vmagent` instance at the second level. See [these docs](https://docs.victoriametrics.com/victoriametrics/vmagent/#sharding-among-remote-storages) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4637) for details.
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow configuring staleness interval in [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/) config. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4667) for details.
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow specifying a list of [series selectors](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#filtering) inside `if` option of relabeling rules. The corresponding relabeling rule is executed when at least a single series selector matches. See [these docs](https://docs.victoriametrics.com/victoriametrics/relabeling/#relabeling-enhancements).
|
||||
|
||||
@@ -145,7 +145,7 @@ It is recommended upgrading to [v1.107.0](https://docs.victoriametrics.com/victo
|
||||
|
||||
**Update note 1: `-search.maxUniqueTimeseries` limit on `vmselect` can no longer exceed `-search.maxUniqueTimeseries` limit on `vmstorage`. If you don't set this flag at `vmstorage`, then it will be automatically calculated based on available resources. This can result into rejecting expensive read queries if they exceed auto-calculated limit. The limit can be overridden by manually setting `-search.maxUniqueTimeseries` at vmstorage, but for better reliability we recommend sticking to default values. Refer to the CHANGELOG below and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6930).**
|
||||
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add support of [exponential histograms](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram) ingested via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry). Such histograms will be automatically converted to [VictoriaMetrics histogram format](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6354).
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add support of [exponential histograms](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram) ingested via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/). Such histograms will be automatically converted to [VictoriaMetrics histogram format](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6354).
|
||||
* FEATURE: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): automatically set `-search.maxUniqueTimeseries` limit based on available memory and `-search.maxConcurrentRequests`. The more memory is available to the process and the lower is `-search.maxConcurrentRequests`, the higher will be `-search.maxUniqueTimeseries` limit. This should protect vmstorage from expensive queries without the need to manually set `-search.maxUniqueTimeseries`. The calculated limit will be printed during process start-up logs and exposed as `vm_search_max_unique_timeseries` metric. Set `-search.maxUniqueTimeseries` manually to override auto calculation. Please note, `-search.maxUniqueTimeseries` on vmselect can't exceed the same name limit on vmstorage, it can only be set to lower values. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6930).
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): disable stream processing mode for data [ingested via InfluxDB](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/#influxdb-compatible-agents-such-as-telegraf) HTTP endpoints by default. With this change, the data is processed in batches (see `-influx.maxRequestSize`) and user will get parsing errors immediately as they happen. This also improves users' experience and resiliency against thundering herd problems caused by clients without backoff policies like telegraf. To enable stream mode back, pass HTTP header `Stream-Mode: 1` with each request. For data sent via TCP and UDP (see `-influxListenAddr`) protocols streaming processing remains enabled.
|
||||
* FEATURE: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): set default value for `-search.maxUniqueTimeseries` to `0`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6930).
|
||||
@@ -637,7 +637,7 @@ Released at 2024-04-04
|
||||
* FEATURE: [vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/): split [explore phase](https://docs.victoriametrics.com/victoriametrics/vmctl/victoriametrics/) in `vm-native` mode by time intervals when [--vm-native-step-interval](https://docs.victoriametrics.com/victoriametrics/vmctl/#using-time-based-chunking-of-migration) is specified. This should reduce probability of exceeding complexity limits for number of selected series during explore phase. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5369).
|
||||
* FEATURE: [vmgateway](https://docs.victoriametrics.com/victoriametrics/vmgateway/): add `-logInvalidAuthTokens` command-line flag, which can be used for logging invalid auth tokens. This is useful for debugging of auth token format issues. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6029).
|
||||
* FEATURE: [graphite](https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#render-api): add support for [aggregateSeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.aggregateSeriesLists), [diffSeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.diffSeriesLists), [multiplySeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.multiplySeriesLists) and [sumSeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.sumSeriesLists) functions. Thanks to @rbizos for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5809).
|
||||
* FEATURE: [OpenTelemetry](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry): add `-opentelemetry.usePrometheusNaming` command-line flag, which can be used for enabling automatic conversion of the ingested metric names and labels into Prometheus-compatible format. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037).
|
||||
* FEATURE: [OpenTelemetry](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/): add `-opentelemetry.usePrometheusNaming` command-line flag, which can be used for enabling automatic conversion of the ingested metric names and labels into Prometheus-compatible format. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037).
|
||||
|
||||
* BUGFIX: prevent from automatic deletion of newly registered time series when it is queried immediately after the addition. The probability of this bug has been increased significantly after [v1.99.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.99.0) because of optimizations related to registering new time series. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959) issue.
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): properly set `Host` header in requests to scrape targets if it is specified via [`headers` option](https://docs.victoriametrics.com/victoriametrics/sd_configs/#http-api-client-options). Thanks to @fholzer for [the bugreport](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969) and [the fix](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970).
|
||||
@@ -1021,7 +1021,7 @@ The v1.97.x line will be supported for at least 12 months since [v1.97.0](https:
|
||||
* BUGFIX: properly return the list of matching label names and label values from [`/api/v1/labels`](https://docs.victoriametrics.com/victoriametrics/url-examples/#apiv1labels) and [`/api/v1/label/.../values`](https://docs.victoriametrics.com/victoriametrics/url-examples/#apiv1labelvalues) when the database contains more than `-search.maxUniqueTimeseries` unique [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) on the selected time range. Previously VictoriaMetrics could return `the number of matching timeseries exceeds ...` error in this case. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055).
|
||||
* BUGFIX: properly return errors from [export APIs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-export-time-series). Previously these errors were silently suppressed. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5649).
|
||||
* BUGFIX: [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly return full results when `-search.skipSlowReplicas` command-line flag is passed to `vmselect` and when [vmstorage groups](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#vmstorage-groups-at-vmselect) are in use. Previously partial results could be returned in this case.
|
||||
* BUGFIX: `vminsert`: properly accept samples via [OpenTelemetry data ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) when these samples have no [resource attributes](https://opentelemetry.io/docs/instrumentation/go/resources/). Previously such samples were silently skipped.
|
||||
* BUGFIX: `vminsert`: properly accept samples via [OpenTelemetry data ingestion protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) when these samples have no [resource attributes](https://opentelemetry.io/docs/instrumentation/go/resources/). Previously such samples were silently skipped.
|
||||
* BUGFIX: `vmstorage`: added missing `-inmemoryDataFlushInterval` command-line flag, which was missing in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) after implementing [this feature](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337) in [v1.85.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.85.0).
|
||||
* BUGFIX: `vmstorage`: properly expire `storage/prefetchedMetricIDs` cache. Previously this cache was never expired, so it could grow big under [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate). This could result in increasing CPU load over time.
|
||||
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): check `-external.url` schema when starting vmalert, must be `http` or `https`. Before, alertmanager could reject alert notifications if `-external.url` contained no or wrong schema.
|
||||
@@ -1168,7 +1168,7 @@ The v1.93.x line will be supported for at least 12 months since [v1.93.0](https:
|
||||
|
||||
* SECURITY: upgrade Go builder from Go1.21.5 to Go1.21.6. See [the list of issues addressed in Go1.21.6](https://github.com/golang/go/issues?q=milestone%3AGo1.21.6+label%3ACherryPickApproved).
|
||||
|
||||
* BUGFIX: `vminsert`: properly accept samples via [OpenTelemetry data ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) when these samples have no [resource attributes](https://opentelemetry.io/docs/instrumentation/go/resources/). Previously such samples were silently skipped.
|
||||
* BUGFIX: `vminsert`: properly accept samples via [OpenTelemetry data ingestion protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) when these samples have no [resource attributes](https://opentelemetry.io/docs/instrumentation/go/resources/). Previously such samples were silently skipped.
|
||||
* BUGFIX: `vmstorage`: added missing `-inmemoryDataFlushInterval` command-line flag, which was missing in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) after implementing [this feature](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337) in [v1.85.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.85.0).
|
||||
* BUGFIX: `vmstorage`: properly expire `storage/prefetchedMetricIDs` cache. Previously this cache was never expired, so it could grow big under [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate). This could result in increasing CPU load over time.
|
||||
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): check `-external.url` schema when starting vmalert, must be `http` or `https`. Before, alertmanager could reject alert notifications if `-external.url` contained no or wrong schema.
|
||||
|
||||
@@ -71,7 +71,7 @@ Released at 2026-01-02
|
||||
|
||||
Released at 2025-12-12
|
||||
|
||||
**Known issue: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): may leak memory when ingesting data via the [OpenTelemetry protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry).
|
||||
**Known issue: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): may leak memory when ingesting data via the [OpenTelemetry protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
|
||||
The problem introduced in [293d809](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/293d80910ce14c247e943c63cd19467df5767c3c), and is already fixed in commits [fastjson#18c81211](https://github.com/valyala/fastjson/commit/18c812114b638d460f0fc6d8e2b86b719e171389) and [19009836](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/19009836c704a75a295c11b5d55a171c206646bd).
|
||||
If you rely on OpenTelemetry ingestion, skip this version or [build from master](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-build-from-sources) to avoid the leak.
|
||||
Read [VictoriaLogs#869](https://github.com/VictoriaMetrics/VictoriaLogs/issues/869) for more details.**
|
||||
@@ -86,7 +86,7 @@ Read [VictoriaLogs#869](https://github.com/VictoriaMetrics/VictoriaLogs/issues/8
|
||||
* FEATURE: [dashboards/single](https://grafana.com/grafana/dashboards/10229), [dashboards/cluster](https://grafana.com/grafana/dashboards/11176): add `Memory usage breakdown` panels to `Drilldown` section. These panels help analyze overall memory distribution and diagnose anomalies or leaks. See [#10139](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10139).
|
||||
* FEATURE: [dashboards/single](https://grafana.com/grafana/dashboards/10229), [dashboards/cluster](https://grafana.com/grafana/dashboards/11176): add `Major page faults rate` panels to `Troubleshooting` and `Drilldown` sections. See [#9974](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9974)
|
||||
* FEATURE: [Influx line protocol data ingestion](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/): reduce CPU and memory usage when parsing Influx lines with escaped chars - `,`, `\\`, `=` and ` `. See [#10053](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10053).
|
||||
* FEATURE: [OpenTelemetry data ingestion](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry): reduce CPU usage when parsing metrics received via OpenTelemetry protocol. See [293d809](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/293d80910ce14c247e943c63cd19467df5767c3c).
|
||||
* FEATURE: [OpenTelemetry data ingestion](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/): reduce CPU usage when parsing metrics received via OpenTelemetry protocol. See [293d809](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/293d80910ce14c247e943c63cd19467df5767c3c).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add a warning to active targets panel when `-dropOriginalLabels=true` is set (default), indicating that some debug information may not be available. See [#9901](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9901).
|
||||
* FEATURE: [vmbackup](https://docs.victoriametrics.com/victoriametrics/vmbackup/), [vmrestore](https://docs.victoriametrics.com/victoriametrics/vmrestore/), [vmbackupmanager](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/): add support for SSE KMS Key ID and ACL for use with S3-compatible storages. See [#9796](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9796). Thanks to @sylr for the contribution.
|
||||
* FEATURE: `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): improve slowness-based rerouting logic. Now rerouting occurs only for the slowest storage node, and only if the cluster as a whole has enough available capacity to handle the additional load. This prevents unnecessary rerouting when the entire cluster is under heavy load or avoid "rerouting storm". The logic is disabled by default; to enable set `-disableRerouting=false`. See [#9890](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890) for details.
|
||||
@@ -194,7 +194,7 @@ It disables `Discovered targets` debug UI by default.
|
||||
* SECURITY: upgrade base docker image (Alpine) from 3.22.1 to 3.22.2. See [Alpine 3.22.2 release notes](https://www.alpinelinux.org/posts/Alpine-3.19.9-3.20.8-3.21.5-3.22.2-released.html).
|
||||
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add `opentelemetry` format for [kafka](https://docs.victoriametrics.com/victoriametrics/integrations/kafka/#reading-metrics) consumer. See this issue [#9734](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9734) for details.
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): add `-opentelemetry.convertMetricNamesToPrometheus` command-line flag, which can be used for enabling automatic conversion of the ingested metric names into Prometheus-compatible format. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) and this issue [#9830](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): add `-opentelemetry.convertMetricNamesToPrometheus` command-line flag, which can be used for enabling automatic conversion of the ingested metric names into Prometheus-compatible format. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) and this issue [#9830](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): add `-secret.flags` command-line flag to configure flags to be hidden in logs and on `/metrics`. This is useful for protecting sensitive flag values (for example `-remoteWrite.headers`) from being exposed in logs or metrics. See [#6938](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6938). Thank you @truepele for the issue and PR
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): change default value of the flag `-promscrape.dropOriginalLabels` from `false` to `true`. This helps reducing CPU and Memory usage by dropping targets original labels during [service discovery](https://docs.victoriametrics.com/victoriametrics/sd_configs/), but disables [relabel debugging UI](https://docs.victoriametrics.com/victoriametrics/relabeling/#relabel-debugging). This change gives constant improvement on resource usage, and relabel debugging can be turned on on demand. See this issue [#9665](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9665) for details.
|
||||
* FEATURE: add `-fs.maxConcurrency` command-line flag for adjusting the limit on the number of parallel operations with files. This flag may be useful for tuning data ingestion performance on systems with high-latency storage such as NFS or Ceph. This flag can be useful also for reducing Go scheduling latency on systems with small number of CPU cores. See [VictoriaLogs#774](https://github.com/VictoriaMetrics/VictoriaLogs/issues/774).
|
||||
@@ -775,7 +775,7 @@ If you are impacted by this, please upgrade to [v1.114.0](https://github.com/Vic
|
||||
* FEATURE: upgrade Go builder from Go1.23.6 to Go1.24. See [Go1.24 release notes](https://tip.golang.org/doc/go1.24).
|
||||
* FEATURE: provide alternative registry for all VictoriaMetrics components at [Quay.io](https://quay.io/organization/victoriametrics).
|
||||
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): add a new flag `--storage.trackMetricNamesStats` and a new HTTP API - `/api/v1/status/metric_names_stats`. It allows to track how frequent ingested [metric names](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#structure-of-a-metric) are used during [querying](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#query-data). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4458) for details and related [docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#track-ingested-metrics-usage)
|
||||
* FEATURE: [data ingestion](https://docs.victoriametrics.com/victorialogs/data-ingestion/): make `KeyValueList`, `ArrayValue` [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) attributes label values compatible with open-telemetry-collector format. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8384).
|
||||
* FEATURE: [data ingestion](https://docs.victoriametrics.com/victorialogs/data-ingestion/): make `KeyValueList`, `ArrayValue` [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) attributes label values compatible with open-telemetry-collector format. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8384).
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): disallow using [time buckets stats pipe](https://docs.victoriametrics.com/victorialogs/logsql/#stats-by-time-buckets) in VictoriaLogs rule expressions. Such construction produces meaningless results for [stats query API](https://docs.victoriametrics.com/victorialogs/querying/#querying-log-stats) and may lead to cardinality issues.
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): remove random sleep before a group starts when `eval_offset` is specified, because `eval_offset` already disperses the group evaluation time, serving the same purpose as the random sleep. This change also enables chaining groups, see [this doc](https://docs.victoriametrics.com/victoriametrics/vmalert/#chaining-groups) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/860).
|
||||
* FEATURE: [vmalert-tool](https://docs.victoriametrics.com/victoriametrics/vmalert-tool/): add command-line flag `-httpListenPort` to specify the port used during testing. If not provided, a random unoccupied port will be assigned. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8393).
|
||||
@@ -1262,7 +1262,7 @@ Released at 2025-01-24
|
||||
* BUGFIX: [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent panic when `vmselect` receives an error response from `vmstorage` during the query execution and request processing for other `vmstorage` nodes is still in progress. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8114) for the details.
|
||||
* BUGFIX: all VictoriaMetrics [enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) components: properly trim whitespaces at the end of license provided via `-license` and `-licenseFile` command-line flags. Previously, the trailing whitespaces could cause the license verification to fail.
|
||||
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): respect staleness detection in increase, increase_pure and delta functions when time series has gaps and `-search.maxStalenessInterval` is set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072) for details.
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow ingesting histograms with missing `_sum` metric via [OpenTelemetry ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) in the same way as Prometheus does.
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow ingesting histograms with missing `_sum` metric via [OpenTelemetry ingestion protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) in the same way as Prometheus does.
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): fix an issue where pressing the "Enter" key in the query editor did not execute the query. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8058).
|
||||
* BUGFIX: [export API](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-export-time-series): cancel export process on client connection close. Previously client connection close was ignored and VictoriaMetrics started to hog CPU by exporting metrics to nowhere until it export all of them.
|
||||
|
||||
@@ -1272,7 +1272,7 @@ Released at 2025-01-17
|
||||
|
||||
* SECURITY: upgrade base docker image (Alpine) from 3.21.0 to 3.21.2. See [Alpine 3.21.1 release notes](https://alpinelinux.org/posts/Alpine-3.21.1-released.html) and [Alpine 3.21.2 release notes](https://alpinelinux.org/posts/Alpine-3.18.11-3.19.6-3.20.5-3.21.2-released.html).
|
||||
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): log metric names for signals with unsupported delta temporality on ingestion via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry). Thanks to @chenlujjj for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8018).
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): log metric names for signals with unsupported delta temporality on ingestion via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/). Thanks to @chenlujjj for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8018).
|
||||
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix incorrect behavior of increase, increase_pure, delta caused by [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002). This fix reverts to the previous behavior before [v1.109.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11090). But allows controlling staleness detection for these functions explicitly via `-search.maxStalenessInterval`.
|
||||
* BUGFIX: all VictoriaMetrics [enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) components: remove unnecessary delay before failing if all online verification attempts have failed. This should reduce the time required for the component to proceed if all online verification attempts have failed.
|
||||
* BUGFIX: [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent panic when sending `multitenant` [read request](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels) to `/api/v1/series/count`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8126) for the details.
|
||||
@@ -1529,7 +1529,7 @@ Released at 2025-01-28
|
||||
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
|
||||
The v1.102.x line will be supported for at least 12 months since [v1.102.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11020) release**
|
||||
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): log metric names for signals with unsupported delta temporality on ingestion via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry). Thanks to @chenlujjj for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8018).
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): log metric names for signals with unsupported delta temporality on ingestion via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/). Thanks to @chenlujjj for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8018).
|
||||
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): respect staleness detection in increase, increase_pure and delta functions when time series has gaps and `-search.maxStalenessInterval` is set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072) for details.
|
||||
|
||||
## [v1.102.11](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.102.11)
|
||||
@@ -1548,7 +1548,7 @@ The v1.102.x line will be supported for at least 12 months since [v1.102.0](http
|
||||
* BUGFIX: all VictoriaMetrics [enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) components: properly trim whitespaces at the end of license provided via `-license` and `-licenseFile` command-line flags. Previously, the trailing whitespaces could cause the license verification to fail.
|
||||
* BUGFIX: all VictoriaMetrics [enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) components: remove unnecessary delay before failing if all online verification attempts have failed. This should reduce the time required for the component to proceed if all online verification attempts have failed.
|
||||
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): don't take into account the last raw sample before the lookbehind window is sample exceeds the staleness interval. This affects correctness of increase, increase_pure, delta functions when performing calculations on time series with gaps. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002) for details.
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow ingesting histograms with missing `_sum` metric via [OpenTelemetry ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) in the same way as Prometheus does.
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow ingesting histograms with missing `_sum` metric via [OpenTelemetry ingestion protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) in the same way as Prometheus does.
|
||||
|
||||
## [v1.102.10](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.102.10)
|
||||
|
||||
@@ -1642,7 +1642,7 @@ Released at 2025-01-28
|
||||
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
|
||||
The v1.97.x line will be supported for at least 12 months since [v1.97.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v1970) release**
|
||||
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): log metric names for signals with unsupported delta temporality on ingestion via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry). Thanks to @chenlujjj for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8018).
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): log metric names for signals with unsupported delta temporality on ingestion via [OpenTelemetry protocol for metrics](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/). Thanks to @chenlujjj for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8018).
|
||||
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): respect staleness detection in increase, increase_pure and delta functions when time series has gaps and `-search.maxStalenessInterval` is set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072) for details.
|
||||
|
||||
## [v1.97.16](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.16)
|
||||
@@ -1657,7 +1657,7 @@ The v1.97.x line will be supported for at least 12 months since [v1.97.0](https:
|
||||
|
||||
* BUGFIX: all VictoriaMetrics [enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) components: remove unnecessary delay before failing if all online verification attempts have failed. This should reduce the time required for the component to proceed if all online verification attempts have failed.
|
||||
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and [vmselect](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): don't take into account the last raw sample before the lookbehind window is sample exceeds the staleness interval. This affects correctness of increase, increase_pure, delta functions when performing calculations on time series with gaps. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002) for details.
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow ingesting histograms with missing `_sum` metric via [OpenTelemetry ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry) in the same way as Prometheus does.
|
||||
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): allow ingesting histograms with missing `_sum` metric via [OpenTelemetry ingestion protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) in the same way as Prometheus does.
|
||||
|
||||
## [v1.97.15](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.15)
|
||||
|
||||
|
||||
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: OpenTelemetry Collector
|
||||
weight: 7
|
||||
menu:
|
||||
docs:
|
||||
identifier: "opentelemetry-collector"
|
||||
parent: "data-ingestion"
|
||||
weight: 7
|
||||
tags:
|
||||
- metrics
|
||||
---
|
||||
|
||||
[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) is a vendor-agnostic agent for receiving, processing, and exporting telemetry data.
|
||||
VictoriaMetrics supports the [OTLP metrics protocol](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) natively,
|
||||
so the collector can push metrics directly using the `otlphttp` exporter.
|
||||
|
||||
Use the following exporter configuration:
|
||||
|
||||
```yaml
|
||||
exporters:
|
||||
otlphttp/victoriametrics:
|
||||
compression: gzip
|
||||
encoding: proto
|
||||
metrics_endpoint: http://<vmsinle>:8428/opentelemetry/v1/metrics
|
||||
```
|
||||
|
||||
> For the [cluster version](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format) specify the tenant ID:
|
||||
> `http://<vminsert>:8480/insert/<accountID>/opentelemetry/v1/metrics`.
|
||||
> See more about [multitenancy](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy).
|
||||
|
||||
Add the exporter to the desired service pipeline to activate it:
|
||||
|
||||
```yaml
|
||||
service:
|
||||
pipelines:
|
||||
metrics:
|
||||
exporters:
|
||||
- otlphttp/victoriametrics
|
||||
receivers:
|
||||
- otlp
|
||||
```
|
||||
|
||||
See [OpenTelemetry integration](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) for details on metric naming and histogram conversion.
|
||||
@@ -22,10 +22,11 @@ so the urls in the rest of the documentation will look like `https://<victoriame
|
||||
## Documented Collectors/Agents
|
||||
|
||||
- [Telegraf](https://docs.victoriametrics.com/victoriametrics/data-ingestion/telegraf/)
|
||||
- [Vector](https://docs.victoriametrics.com/victoriametrics/data-ingestion/vector/)
|
||||
- [vmagent](https://docs.victoriametrics.com/victoriametrics/data-ingestion/vmagent/)
|
||||
- [Vector](https://docs.victoriametrics.com/victoriametrics/data-ingestion/vector/)
|
||||
- [Grafana Alloy](https://docs.victoriametrics.com/victoriametrics/data-ingestion/alloy/)
|
||||
- [Prometheus](https://docs.victoriametrics.com/victoriametrics/integrations/prometheus/)
|
||||
- [OpenTelemetry Collector](https://docs.victoriametrics.com/victoriametrics/data-ingestion/opentelemetry-collector/)
|
||||
|
||||
|
||||
## Supported Platforms
|
||||
|
||||
@@ -117,7 +117,7 @@ It is allowed to run VictoriaMetrics and VictoriaLogs Enterprise components in [
|
||||
|
||||
Binary releases of Enterprise components are available at [the releases page for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
|
||||
and [the releases page for VictoriaLogs](https://github.com/VictoriaMetrics/VictoriaLogs/releases/latest).
|
||||
Enterprise binaries and packages have `enterprise` suffix in their names. For example, `victoria-metrics-linux-amd64-v1.136.0-enterprise.tar.gz`.
|
||||
Enterprise binaries and packages have `enterprise` suffix in their names. For example, `victoria-metrics-linux-amd64-v1.137.0-enterprise.tar.gz`.
|
||||
|
||||
In order to run binary release of Enterprise component, please download the `*-enterprise.tar.gz` archive for your OS and architecture
|
||||
from the corresponding releases page and unpack it. Then run the unpacked binary.
|
||||
@@ -135,8 +135,8 @@ For example, the following command runs VictoriaMetrics Enterprise binary with t
|
||||
obtained at [this page](https://victoriametrics.com/products/enterprise/trial/):
|
||||
|
||||
```sh
|
||||
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.136.0/victoria-metrics-linux-amd64-v1.136.0-enterprise.tar.gz
|
||||
tar -xzf victoria-metrics-linux-amd64-v1.136.0-enterprise.tar.gz
|
||||
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.137.0/victoria-metrics-linux-amd64-v1.137.0-enterprise.tar.gz
|
||||
tar -xzf victoria-metrics-linux-amd64-v1.137.0-enterprise.tar.gz
|
||||
./victoria-metrics-prod -license=BASE64_ENCODED_LICENSE_KEY
|
||||
```
|
||||
|
||||
@@ -151,7 +151,7 @@ Alternatively, VictoriaMetrics Enterprise license can be stored in the file and
|
||||
It is allowed to run VictoriaMetrics and VictoriaLogs Enterprise components in [cases listed here](#valid-cases-for-victoriametrics-enterprise).
|
||||
|
||||
Docker images for Enterprise components are available at [VictoriaMetrics Docker Hub](https://hub.docker.com/u/victoriametrics) and [VictoriaMetrics Quay](https://quay.io/organization/victoriametrics).
|
||||
Enterprise docker images have `enterprise` suffix in their names. For example, `victoriametrics/victoria-metrics:v1.136.0-enterprise`.
|
||||
Enterprise docker images have `enterprise` suffix in their names. For example, `victoriametrics/victoria-metrics:v1.137.0-enterprise`.
|
||||
|
||||
In order to run Docker image of VictoriaMetrics Enterprise component, it is required to provide the license key via the command-line
|
||||
flag as described in the [binary-releases](#binary-releases) section.
|
||||
@@ -161,13 +161,13 @@ Enterprise license key can be obtained at [this page](https://victoriametrics.co
|
||||
For example, the following command runs VictoriaMetrics Enterprise Docker image with the specified license key:
|
||||
|
||||
```sh
|
||||
docker run --name=victoria-metrics victoriametrics/victoria-metrics:v1.136.0-enterprise -license=BASE64_ENCODED_LICENSE_KEY
|
||||
docker run --name=victoria-metrics victoriametrics/victoria-metrics:v1.137.0-enterprise -license=BASE64_ENCODED_LICENSE_KEY
|
||||
```
|
||||
|
||||
Alternatively, the license code can be stored in the file and then referred via `-licenseFile` command-line flag:
|
||||
|
||||
```sh
|
||||
docker run --name=victoria-metrics -v /vm-license:/vm-license victoriametrics/victoria-metrics:v1.136.0-enterprise -licenseFile=/path/to/vm-license
|
||||
docker run --name=victoria-metrics -v /vm-license:/vm-license victoriametrics/victoria-metrics:v1.137.0-enterprise -licenseFile=/path/to/vm-license
|
||||
```
|
||||
|
||||
Example docker-compose configuration:
|
||||
@@ -177,7 +177,7 @@ version: "3.5"
|
||||
services:
|
||||
victoriametrics:
|
||||
container_name: victoriametrics
|
||||
image: victoriametrics/victoria-metrics:v1.136.0
|
||||
image: victoriametrics/victoria-metrics:v1.137.0
|
||||
ports:
|
||||
- 8428:8428
|
||||
volumes:
|
||||
@@ -209,7 +209,7 @@ is used to provide the license key in plain-text:
|
||||
```yaml
|
||||
server:
|
||||
image:
|
||||
tag: v1.136.0-enterprise
|
||||
tag: v1.137.0-enterprise
|
||||
|
||||
license:
|
||||
key: {BASE64_ENCODED_LICENSE_KEY}
|
||||
@@ -220,7 +220,7 @@ In order to provide the license key via existing secret, the following values fi
|
||||
```yaml
|
||||
server:
|
||||
image:
|
||||
tag: v1.136.0-enterprise
|
||||
tag: v1.137.0-enterprise
|
||||
|
||||
license:
|
||||
secret:
|
||||
@@ -270,7 +270,7 @@ spec:
|
||||
license:
|
||||
key: {BASE64_ENCODED_LICENSE_KEY}
|
||||
image:
|
||||
tag: v1.136.0-enterprise
|
||||
tag: v1.137.0-enterprise
|
||||
```
|
||||
|
||||
In order to provide the license key via an existing secret, the following custom resource is used:
|
||||
@@ -287,7 +287,7 @@ spec:
|
||||
name: vm-license
|
||||
key: license
|
||||
image:
|
||||
tag: v1.136.0-enterprise
|
||||
tag: v1.137.0-enterprise
|
||||
```
|
||||
|
||||
Example secret with license key:
|
||||
@@ -338,7 +338,7 @@ Builds are available for amd64 and arm64 architectures.
|
||||
|
||||
Example archive:
|
||||
|
||||
`victoria-metrics-linux-amd64-v1.136.0-enterprise.tar.gz`
|
||||
`victoria-metrics-linux-amd64-v1.137.0-enterprise.tar.gz`
|
||||
|
||||
Includes:
|
||||
|
||||
@@ -347,7 +347,7 @@ Includes:
|
||||
|
||||
Example Docker image:
|
||||
|
||||
`victoriametrics/victoria-metrics:v1.136.0-enterprise-fips` – uses the FIPS-compatible binary and based on `scratch` image.
|
||||
`victoriametrics/victoria-metrics:v1.137.0-enterprise-fips` – uses the FIPS-compatible binary and based on `scratch` image.
|
||||
|
||||
## Monitoring license expiration
|
||||
|
||||
|
||||
@@ -23,7 +23,9 @@ VictoriaMetrics integrates with many popular monitoring solutions as remote stor
|
||||
* [Google PubSub](https://docs.victoriametrics.com/victoriametrics/integrations/pubsub/) (read, write)
|
||||
* [Kafka](https://docs.victoriametrics.com/victoriametrics/integrations/kafka/) (read, write)
|
||||
* [OpenShift](https://docs.victoriametrics.com/victoriametrics/integrations/openshift/) (read)
|
||||
* [Zabbix Connector](https://docs.victoriametrics.com/victoriametrics/integrations/zabbixconnector/)
|
||||
* [Zabbix Connector](https://docs.victoriametrics.com/victoriametrics/integrations/zabbixconnector/) (write)
|
||||
* [Bindplane](https://docs.victoriametrics.com/victoriametrics/integrations/bindplane/) (write)
|
||||
* [OpenTelemetry](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) (write)
|
||||
|
||||
If you think that community will benefit from new integrations, open a [feature request on GitHub](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
|
||||
|
||||
|
||||
BIN
docs/victoriametrics/integrations/bindplane-add-sources.webp
Normal file
|
After Width: | Height: | Size: 53 KiB |
BIN
docs/victoriametrics/integrations/bindplane-library.webp
Normal file
|
After Width: | Height: | Size: 25 KiB |
BIN
docs/victoriametrics/integrations/bindplane-metrics-otel.webp
Normal file
|
After Width: | Height: | Size: 21 KiB |
37
docs/victoriametrics/integrations/bindplane.md
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
title: Bindplane
|
||||
weight: 12
|
||||
menu:
|
||||
docs:
|
||||
identifier: "integrations-vm-bindplane"
|
||||
parent: "integrations-vm"
|
||||
weight: 12
|
||||
---
|
||||
|
||||
VictoriaMetrics integrates with [Bindplane](https://docs.bindplane.com/) via the [Bindplane application](https://app.bindplane.com/).
|
||||
|
||||
## Setup the destination
|
||||
|
||||
1. Sign up for a Bindplane account.
|
||||
2. Go to Agents and install the agent.
|
||||
3. Go to the Library and Add Destination. Choose VictoriaMetrics.
|
||||
4. Configure hostname, port, and headers.
|
||||
5. Name the destination and click on Save.
|
||||
|
||||

|
||||
|
||||
## Add a configuration
|
||||
|
||||
1. Go to Configurations, create Configuration.
|
||||
2. Give it a name and select the Agent Type and Platform.
|
||||
3. Add your telemetry sources such as OTLP, Prometheus scrape, or cloud services.
|
||||
4. Select the destination.
|
||||
|
||||

|
||||
|
||||
After that Bindplane will start sending metrics to VictoriaMetrics, and you can query them with PromQL/MetricsQL.
|
||||

|
||||
|
||||
You can check the global view in the Library to view the resource type, component type, and configurations.
|
||||
|
||||
For VictoriaLogs with Bindplane integration, check [this page](https://docs.victoriametrics.com/victorialogs/integrations/bindplane/).
|
||||
@@ -113,6 +113,56 @@ curl -d 'measurement,tag1=value1,tag2=value2 field1=123,field2=1.23' -X POST 'ht
|
||||
```
|
||||
|
||||
An arbitrary number of lines delimited by '\n' (aka newline char) can be sent in a single request.
|
||||
> **Note:** The above refers to newline characters (`\n`) used as line separators
|
||||
> between multiple points. According to the [Influx Line Protocol specification](https://docs.influxdata.com/influxdb/v2/reference/syntax/line-protocol/),
|
||||
> raw newline bytes **must not** appear inside quoted tag or field values. For example:
|
||||
>
|
||||
> ```text
|
||||
> SystemProperties.line.separator="
|
||||
> "
|
||||
> ```
|
||||
>
|
||||
> A raw newline inside a quoted value will result in a parsing error.
|
||||
>
|
||||
> If metrics are generated by Telegraf and written to VictoriaMetrics, fields containing raw newline characters
|
||||
> (such as `SystemProperties.line.separator`) can be pre-processed using a `regex` processor to escape newline
|
||||
> characters before serialization. For example:
|
||||
|
||||
<details>
|
||||
<summary><strong>telegraf.conf</strong></summary>
|
||||
|
||||
```toml
|
||||
[agent]
|
||||
interval = "60s"
|
||||
flush_interval = "30s"
|
||||
debug = true
|
||||
|
||||
[[inputs.jolokia2_agent]]
|
||||
urls = ["http://springboot:8080/jolokia"]
|
||||
name_prefix = "springBoot."
|
||||
|
||||
[[inputs.jolokia2_agent.metric]]
|
||||
name = "JavaRuntime"
|
||||
mbean = "java.lang:type=Runtime"
|
||||
paths = ["SystemProperties"]
|
||||
|
||||
[[processors.regex]]
|
||||
namepass = ["springBoot.JavaRuntime"]
|
||||
|
||||
[[processors.regex.fields]]
|
||||
key = "SystemProperties.line.separator"
|
||||
pattern = '\n'
|
||||
replacement = '\\n'
|
||||
|
||||
[[outputs.influxdb]]
|
||||
urls = ["http://victoriametrics:8428"]
|
||||
skip_database_creation = true
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
> This ensures the newline is properly escaped and the metric can be ingested successfully.
|
||||
|
||||
After that the data may be read via [/api/v1/export](https://docs.victoriametrics.com/victoriametrics/#how-to-export-data-in-json-line-format) endpoint:
|
||||
```sh
|
||||
curl -G 'http://<victoriametrics-addr>:8428/api/v1/export' -d 'match={__name__=~"measurement_.*"}'
|
||||
|
||||
@@ -23,7 +23,7 @@ Use `-kafka.consumer.topic.defaultFormat` or `-kafka.consumer.topic.format` comm
|
||||
and [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md).
|
||||
* `graphite` - [Graphite plaintext format](https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol).
|
||||
* `jsonline` - [JSON line format](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-data-in-json-line-format).
|
||||
* `opentelemetry`{{% available_from "v1.128.0" %}} - [Opentelemetry format](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry)
|
||||
* `opentelemetry`{{% available_from "v1.128.0" %}} - [Opentelemetry format](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/)
|
||||
|
||||
For Kafka messages in the `promremotewrite` format, `vmagent` will automatically detect whether they are using [the Prometheus remote write protocol](https://prometheus.io/docs/specs/remote_write_spec/#protocol)
|
||||
or [the VictoriaMetrics remote write protocol](https://docs.victoriametrics.com/victoriametrics/vmagent/#victoriametrics-remote-write-protocol), and handle them accordingly.
|
||||
|
||||
48
docs/victoriametrics/integrations/opentelemetry.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: OpenTelemetry
|
||||
weight: 13
|
||||
menu:
|
||||
docs:
|
||||
parent: "integrations-vm"
|
||||
identifier: "integrations-opentelemetry-vm"
|
||||
weight: 13
|
||||
---
|
||||
|
||||
VictoriaMetrics supports data ingestion via [OpenTelemetry protocol (OTLP) for metrics](https://github.com/open-telemetry/opentelemetry-specification/blob/97c826b70e2f89cfdf655d5150791f3f0c2bae19/specification/metrics/data-model.md) at `/opentelemetry/v1/metrics` path.
|
||||
It expects `protobuf`-encoded requests at `/opentelemetry/v1/metrics`. For gzip-compressed workload set HTTP request header `Content-Encoding: gzip`.
|
||||
|
||||
See how to configure [OpenTelemetry Collector](https://docs.victoriametrics.com/victoriametrics/data-ingestion/opentelemetry-collector/) to push metrics to VictoriaMetrics.
|
||||
|
||||
## Label sanitization
|
||||
|
||||
By default, VictoriaMetrics stores the ingested OpenTelemetry [metric points](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#metric-points) as is **without any transformations**.
|
||||
The following label sanitization options can be enabled:
|
||||
* `-usePromCompatibleNaming` - replaces characters unsupported by Prometheus with `_` in metric names and labels **for all ingestion protocols**.
|
||||
For example, `process.cpu.time{service.name="foo"}` is converted to `process_cpu_time{service_name="foo"}`.
|
||||
* `-opentelemetry.usePrometheusNaming` - converts metric names and labels according to [OTLP Metric points to Prometheus specification](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.33.0/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus) for metrics ingested via OTLP.
|
||||
For example, `process.cpu.time{service.name="foo"}` is converted to `process_cpu_time_seconds_total{service_name="foo"}`.
|
||||
* `-opentelemetry.convertMetricNamesToPrometheus` - converts **only metric names** according to [OTLP Metric points to Prometheus specification](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.33.0/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus) for metrics ingested via OTLP.
|
||||
For example, `process.cpu.time{service.name="foo"}` is converted to `process_cpu_time_seconds_total{service.name="foo"}`. See more about this use case [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9830).
|
||||
|
||||
> These flags can be applied on vmagent, vminsert or VictoriaMetrics single-node.
|
||||
|
||||
## Resource Attributes
|
||||
|
||||
By default, VictoriaMetrics promotes all [OpenTelemetry resource](https://opentelemetry.io/docs/specs/otel/resource/data-model/) attributes to labels and attaches them to all ingested OTLP metrics.
|
||||
|
||||
## Exponential histograms
|
||||
|
||||
OpenTelemetry [exponential histogram](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram) is automatically converted
|
||||
to [VictoriaMetrics histogram format](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) during ingestion. Since VictoriaMetrics histogram doesn't support negative observations, all buckets in the negative range are dropped.
|
||||
|
||||
## Delta Temporality
|
||||
|
||||
In OpenTelemetry, some metric types(including sums, histograms, and exponential histograms) support delta and cumulative aggregation temporality. VictoriaMetrics works best with cumulative temporality, and it's recommended to export metrics with cumulative temporality or convert delta to cumulative temporality using [OpenTelemetry Collector deltatocumulative processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/deltatocumulativeprocessor) before sending to VictoriaMetrics.
|
||||
VictoriaMetrics stores delta temporality metric values as is {{% available_from "v1.132.0" %}}, they can be queried with [sum_over_time()](https://docs.victoriametrics.com/victoriametrics/metricsql/#sum_over_time) and [rate_over_sum()](https://docs.victoriametrics.com/victoriametrics/metricsql/#rate_over_sum).
|
||||
|
||||
> Do not apply [deduplication](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) or [downsampling](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#downsampling) to delta temporality metrics, since it might cause data loss.
|
||||
|
||||
## References
|
||||
|
||||
- See [How to use OpenTelemetry metrics with VictoriaMetrics](https://docs.victoriametrics.com/guides/getting-started-with-opentelemetry/).
|
||||
- See more about [OpenTelemetry in VictoriaMetrics](https://docs.victoriametrics.com/opentelemetry/).
|
||||
@@ -35,8 +35,8 @@ scrape_configs:
|
||||
After you created the `scrape.yaml` file, download and unpack [single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) to the same directory:
|
||||
|
||||
```sh
|
||||
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.136.0/victoria-metrics-linux-amd64-v1.136.0.tar.gz
|
||||
tar xzf victoria-metrics-linux-amd64-v1.136.0.tar.gz
|
||||
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.137.0/victoria-metrics-linux-amd64-v1.137.0.tar.gz
|
||||
tar xzf victoria-metrics-linux-amd64-v1.137.0.tar.gz
|
||||
```
|
||||
|
||||
Then start VictoriaMetrics and instruct it to scrape targets defined in `scrape.yaml` and save scraped metrics
|
||||
@@ -150,8 +150,8 @@ Then start [single-node VictoriaMetrics](https://docs.victoriametrics.com/victor
|
||||
|
||||
```yaml
|
||||
# Download and unpack single-node VictoriaMetrics
|
||||
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.136.0/victoria-metrics-linux-amd64-v1.136.0.tar.gz
|
||||
tar xzf victoria-metrics-linux-amd64-v1.136.0.tar.gz
|
||||
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.137.0/victoria-metrics-linux-amd64-v1.137.0.tar.gz
|
||||
tar xzf victoria-metrics-linux-amd64-v1.137.0.tar.gz
|
||||
|
||||
# Run single-node VictoriaMetrics with the given scrape.yaml
|
||||
./victoria-metrics-prod -promscrape.config=scrape.yaml
|
||||
|
||||
@@ -213,12 +213,12 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/
|
||||
The maximum size in bytes of a single NewRelic request to /newrelic/infra/v2/metrics/events/bulk
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
|
||||
-opentelemetry.convertMetricNamesToPrometheus
|
||||
Whether to convert only metric names into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry
|
||||
Whether to convert only metric names into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
|
||||
-opentelemetry.maxRequestSize size
|
||||
The maximum size in bytes of a single OpenTelemetry request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
|
||||
-opentelemetry.usePrometheusNaming
|
||||
Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry
|
||||
Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
|
||||
-opentsdbHTTPListenAddr string
|
||||
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
|
||||
-opentsdbHTTPListenAddr.useProxyProtocol
|
||||
@@ -287,7 +287,7 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/victoriametrics/sd_configs/#dockerswarm_sd_configs for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs (default true)
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs (default false)
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/victoriametrics/sd_configs/#ec2_sd_configs for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
|
||||
@@ -312,7 +312,7 @@ in addition to the pull-based Prometheus-compatible targets' scraping:
|
||||
* DataDog "submit metrics" API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/).
|
||||
* InfluxDB line protocol via `http://<vmagent>:8429/write`. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/).
|
||||
* Graphite plaintext protocol if `-graphiteListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#ingesting).
|
||||
* OpenTelemetry http API. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry).
|
||||
* OpenTelemetry http API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
|
||||
* NewRelic API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/newrelic/#sending-data-from-agent).
|
||||
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentsdb/).
|
||||
* Zabbix Connector streaming protocol. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/zabbixconnector/#send-data-from-zabbix-connector).
|
||||
@@ -658,7 +658,7 @@ e.g. it sets `scrape_series_added` metric to zero. See [these docs](#automatical
|
||||
|
||||
## Metric metadata
|
||||
|
||||
`vmagent` accepts{{% available_from "#" %}} metric metadata exposed by scrape targets in [Prometheus exposition format](https://github.com/prometheus/docs/blob/main/docs/instrumenting/exposition_formats.md), received via [Prometheus remote write v1](https://prometheus.io/docs/specs/prw/remote_write_spec/) or [OpenTelemetry protocol](https://github.com/open-telemetry/opentelemetry-proto/blob/v1.7.0/opentelemetry/proto/metrics/v1/metrics.proto) by default. Set `-enableMetadata=false` to disable metadata processing{{% available_from "v1.125.1" %}}.
|
||||
`vmagent` accepts{{% available_from "v1.137.0" %}} metric metadata exposed by scrape targets in [Prometheus exposition format](https://github.com/prometheus/docs/blob/main/docs/instrumenting/exposition_formats.md), received via [Prometheus remote write v1](https://prometheus.io/docs/specs/prw/remote_write_spec/) or [OpenTelemetry protocol](https://github.com/open-telemetry/opentelemetry-proto/blob/v1.7.0/opentelemetry/proto/metrics/v1/metrics.proto) by default. Set `-enableMetadata=false` to disable metadata processing{{% available_from "v1.125.1" %}}.
|
||||
During processing, metadata won't be dropped or modified by [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) or [streaming aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/).
|
||||
|
||||
When `-enableMultitenantHandlers` is enabled, vmagent adds tenant info to metadata received via the [multitenant endpoints](https://docs.victoriametrics.com/victoriametrics/vmagent/#multitenancy) (`/insert/<accountID>/<suffix>`). However, if `vm_account_id` or `vm_project_id` labels are added directly to metrics before reaching vmagent, and vmagent writes to the [vminsert multitenant endpoints](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels), the tenant info won't be attached and the metadata will be stored under the default tenant of VictoriaMetrics cluster.
|
||||
|
||||
@@ -181,12 +181,12 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/vmagent/ .
|
||||
The maximum size in bytes of a single NewRelic request to /newrelic/infra/v2/metrics/events/bulk
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
|
||||
-opentelemetry.convertMetricNamesToPrometheus
|
||||
Whether to convert only metric names into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry
|
||||
Whether to convert only metric names into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
|
||||
-opentelemetry.maxRequestSize size
|
||||
The maximum size in bytes of a single OpenTelemetry request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
|
||||
-opentelemetry.usePrometheusNaming
|
||||
Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#sending-data-via-opentelemetry
|
||||
Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
|
||||
-opentsdbHTTPListenAddr string
|
||||
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
|
||||
-opentsdbHTTPListenAddr.useProxyProtocol
|
||||
@@ -253,7 +253,7 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/vmagent/ .
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/victoriametrics/sd_configs/#dockerswarm_sd_configs for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs (default true)
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs (default false)
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/victoriametrics/sd_configs/#ec2_sd_configs for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
|
||||