VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2026-06-28 21:18:23 +03:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	d1d98f39b5	lib/{logstorage,prefixfilter}: remove these packages, since they have been moved to https://github.com/VictoriaMetrics/VictoriaLogs/ repository	2025-07-07 03:25:25 +02:00
Aliaksandr Valialkin	8646b73efc	lib/timeutil: put TryParseUnixTimestamp function here This allows avoiding precision loss at VictoriaLogs query time when parsing fractional unix timestamps with millisecond, microsecond and nanosecond precisions. This is a follow-up for `1d0e96c8d2` See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8767#discussion_r2051657518	2025-07-06 17:17:07 +02:00
Aliaksandr Valialkin	1fa5ecd981	lib/logstorage: optimize tryParseUint64() a bit	2025-07-06 16:31:26 +02:00
Aliaksandr Valialkin	1d0e96c8d2	lib/logstorage: rewrite TryParseUnixTimestamp(), so it doesnt loose precision when parsing fractional timestamps with millisecond, microsecond and nanosecond precision - Optimize TryParseUnixTimestamp() by using custom parsing logic instead of generic strconv.Parse* functions. - Add benchmarks for TryParseUnixTimestamp(). - Add more tests for edge cases with various unix timestamps. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8767#discussion_r2051657518 This is a follow-up for `7294ffcdfc`	2025-07-06 16:10:59 +02:00
Vadim Alekseev	7294ffcdfc	lib/logstorage: add support for parsing Unix timestamp in format pipe (#8767 ) ### Describe Your Changes This PR adds support for parsing Unix timestamps (both integer and float) in the format pipe using the `time:` prefix. The timestamp precision (seconds, milliseconds, microseconds, or nanoseconds) is automatically determined based on the value. It might be worth creating a new prefix for this rather than reusing `time:`, but I haven't found any compelling reason to extend the syntax. ### Checklist The following checks are mandatory: - [X] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>	2025-07-06 12:33:13 +02:00
Phuong Le	857dd8edec	VictoriaLogs: fix rate_sum() inconsistency between normal queries and vmalert recording rules (#9304 ) Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9303 Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2025-07-03 07:25:08 +02:00
Nikolay	e553a41fa0	app: add vlagent component This commit introduces new component - VictoriaLogs Agent (vlagent). It accepts logs data via any data ingestion protocol supported by VictoriaLogs and forwards it to the provided remote storages. Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766	2025-06-30 17:01:05 +02:00
Aliaksandr Valialkin	442bfa6c35	lib/logstorage: add tests, which verify that NaN and Inf values cannot be parsed by tryParseFloat64 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8474	2025-06-25 23:00:56 +02:00
Max Kotliar	a845fe815a	lib/logstorage: clarify comment on writeBlockResultFunc usage constraints (#9235 ) ### Describe Your Changes The `DataBlock` contains structs with string fields, and while the original comment mentioned not holding references to `br`, it wasn't immediately clear that this also applies to fields like strings within the data. This change clarifies that the `writeBlockResultFunc` must not retain references to any part of `br`, including its fields. This makes it explicit that even seemingly safe types like strings must be copied if needed. ### Checklist The following checks are mandatory: - [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).	2025-06-19 16:30:56 +02:00
Aliaksandr Valialkin	eb7b088c91	lib/logstorage: provide standard string representation for all the priority and severity levels in Syslog and Journald protocols inside the "level" field It is better from usability PoV to provide string representation for all the priority and severity levels instead of merging some of them into a common groups. This is requested at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9209 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8535	2025-06-19 14:39:20 +02:00
Vadim Alekseev	dc2da9a71b	app/logstorage: optimize pipes after appending limit pipe (#9201 ) ### Describe Your Changes This PR adds a call to optimize pipes after the `limit` pipe has been appended. Related: #9200 While this approach is not ideal, since it forces us to re-optimize all pipes, but it is simpler. An alternative would be to reapply only the relevant optimizations specifically for this case, something like: ```go func (q Query) AddPipeLimit(n uint64) { if len(q.pipes) > 0 { ps, ok := q.pipes[len(q.pipes)-1].(pipeSort) if ok { if ps.limit == 0 \|\| n < ps.limit { ps.limit = n } return } pu, ok := q.pipes[len(q.pipes)-1].(pipeUniq) if ok { if pu.limit == 0 \|\| n < pu.limit { pu.limit = n } return } } q.pipes = append(q.pipes, &pipeLimit{ limit: n, }) } ``` ### Checklist The following checks are mandatory*: - [X] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).	2025-06-19 00:57:24 +02:00
Nikolay	c95990f47f	lib/logstorage: properly iterate over ForEachRow (#9222 ) Previously, ForEachRow always reset last row fields after iteration. It makes impossible concurrent iteration with forEachRow, since ForEachRow performed hidden mutation of LogRows. This commit resolves this issue by removal of fields reference. Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076	2025-06-19 00:24:50 +02:00
Aliaksandr Valialkin	f0442e40a0	lib/logstorage: follow-up for `5d06c74e2b` Move the lex.isQuotedToken() check to the top of the lexer.isInvalidQuotedString() function in order to simplify understanding the code. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9167 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9219	2025-06-19 00:19:17 +02:00
Andrii Chubatiuk	5d06c74e2b	lib/logstorage: fix panic when not paired quotes are passed as a pipe value (#9219 ) ### Describe Your Changes nextToken method, which is called prior to getCompoundTokenExt already unquotes string, paired quotes check inside getCompoundTokenExt is redundant fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9167 ### Checklist The following checks are mandatory: - [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist). Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2025-06-19 00:15:44 +02:00
Aliaksandr Valialkin	272a77a9c3	lib/logstorage: optimize OR filters where one of these filters is `` Such filters can be optimized to ``. This avoid executing other OR filters. For example, `foo or * or bar` is optimized to `*`, while `foo` and `bar` filters aren't executed. Such filters are frequently generated by Grafana, so this should improve query performance there.	2025-06-18 16:53:35 +02:00
Aliaksandr Valialkin	e72a3fdb67	lib/logstorage: properly parse unquoted regexp filters ending with `` Use getCompoundToken() instead of getCompoundFuncArg() for obtaining regexp filter value, since getCompoundFuncArg() skips trailing '' chars. This allows detecting invalid queries in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8582 .	2025-06-18 16:53:34 +02:00
Aliaksandr Valialkin	ff9cb3f821	app/vlinsert: use string representation of log level at logs ingested into VictoriaLogs via syslog and journald protocols It is better from usability PoV to use string representation for the 'level' log field instead of numeric representation. Remove the -journald.priorityAsLevel and -syslog.severityAsLevel command-line flags, since there are zero practical reasons when the `level` log field shouldn't be initialized automatically. Move the CHANGELOG description for this feature into the correct place at docs/victorialogs/CHANGELOG.md, and make it more human-readable. Document the 'level' log field at https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/ and at https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/ This is a follow-up for `50969ca780` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8535 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8553	2025-06-17 10:49:23 +02:00
Andrii Chubatiuk	50969ca780	app/vlinsert: introduced flags, that enable syslog severity and journald priority fields casting to a level field (#8553 ) ### Describe Your Changes fixes #8535 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2025-06-17 08:29:17 +02:00
Aliaksandr Valialkin	ee940e81ec	lib/logstorage: improve performance for isTokenChar() by using 256-byte lookup table This increases performance for the isTokenChar() by up to 30%. Thanks to @ahfuzhang for the initial idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9064/files#diff-27b31ccad49a8ceaf033f97deb3d876d62eab4119374cbb3ae65278e894f6c69	2025-06-09 20:59:37 +02:00
Aliaksandr Valialkin	695532fc8d	lib/logstorage: call isTokenChar() for ascii chars passed to isTokenRune() This improves isTokenRune() performance for ascii chars by up to 30%. Thanks to @ahfuzhang for the initial idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9064/files#diff-27b31ccad49a8ceaf033f97deb3d876d62eab4119374cbb3ae65278e894f6c69	2025-06-09 20:59:37 +02:00
Aliaksandr Valialkin	539498058e	lib/atomicutil: add CacheLineSize const equal to the size of CPU cache line, and use this const for padding against false sharing across the code base This should reduce the waste of memory on the padding from 128 bytes to 64 bytes on GOARCH=amd64, while preserving bigger padding for platforms with bigger cache line sizes. See https://stackoverflow.com/questions/68320687/why-are-most-cache-line-sizes-designed-to-be-64-byte-instead-of-32-128byte-now Thanks to @tIGO for the hint	2025-06-06 10:21:40 +02:00
Aliaksandr Valialkin	1f5d02e059	lib: make sure that frequently updated global counters are padded in order to protect from false sharing issues on multi-CPU systems Go linker packs global variables close to each other in the memory. This may lead to false sharing (https://en.wikipedia.org/wiki/False_sharing) among these variables if frequently updated vars are put close to mostly read-only vars like described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8682 . This commit adds padding to frequently updated global vars. This guarantees that these variables are put into distinct CPU cache lines comparing to the rest of global variables. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683#issuecomment-2943254119 Thanks to @tIGO for the intial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683	2025-06-05 11:40:20 +02:00
Aliaksandr Valialkin	f5c9c5bf01	lib/logstorage: allow using prefix filters on log fields in some LogsQL pipes This should simplify working with big number of log fields in LogsQL queries. Examples: - `... \| keep foo` leaves only fields starting with `foo` prefix - `... \| rm foo` removes all the fields starting with `foo` prefix - `... \| mv foo* bar` replaces `foo` prefix with `bar` prefix in log fields - `... \| sum(foo)` sums all the log fields starting with `foo` prefix	2025-06-02 22:41:57 +02:00
Aliaksandr Valialkin	001f9218b1	app/vlselect: properly sort results for /select/logsql/query with `limit` query arg and for /select/logsql/tail The DataBlock.GetTimestamps() was returning a slice of strings, which belong to the DataBlock. These strings are changed whenever the DataBlock is re-used for the next block. So these strings couldn't be assigned to logRow.timestamp and to tailProcessor.lastTimestamps, which outlive the DataBlock. The commit aa8c18fc9f5d44091d7ca92be6935eeaf3b85d7f broke this assumption, which triggered the following bugs: 1. The bug, which could return incorrectly sorted results from /select/logsql/query when the 'limit' query arg is passed to it. The endpoint must return the last 'limit' log entries on the selected time range in this case, and these log entries must be sorted by _time. 2. The bug, which could return incorrect results from /select/logsql/tail (e.g. it could incorrectly skip some matching logs, it could return the same logs multiple times and it could return out-of-order logs without proper sorting by _time). The solution is to return parsed timestamps from the DataBlock.GetTimestamps() function, so they could be safely used by the caller without worries that they could be changed while in use.	2025-06-02 21:34:02 +02:00
Aliaksandr Valialkin	94f3302aca	lib/logstorage: properly handle `stats` pipe in multi-level cluster setup when a vlselect queries another vlselect, which, in turn, queries vlstorage or another vlselect The intermediate `vlselect` should properly proxy the `stats` state from the lower-level nodes to the upper-level `vlselect`. Previously it was finalizing the state instead of proxying it to the upper-level `vlselect, so the upper-level `vlselect` couldn't read it. Fix this by introducing `proxy` mode for `stats` pipe. This mode accepts state from lower-level node, aggregates the state and then proxies it to the upper node. Thanks to @AndrewChubatiuk for the initial attempt to fix this issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9023 . Thanks to @func25 for the idea with introduction of a new `proxy` mode for `stats` pipe at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9023/files#r2107735835 , which has been implemented in this commit. This approach results in less code changes comparing to the approach taken at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9023 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8815	2025-05-29 20:59:05 +02:00
Aliaksandr Valialkin	1ddfd55e51	docs/victorialogs/logsql-examples.md: add an example how to get duration since the last seen log, which matches the given filter This is a follow-up for `5bb012b67b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9013	2025-05-28 14:04:14 +02:00
Phuong Le	5bb012b67b	logsql: math now() (#9014 ) Resolves https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9013	2025-05-28 13:43:23 +02:00
Phuong Le	78fb987bef	vlstorage: automatically recover missing parts.json files on startup (#9007 ) Fixes [#8873](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8873). Automatically recover missing `parts.json` files on startup. VictoriaLogs now scans existing part directories and recreates missing `parts.json` files instead of crashing. This aligns with VictoriaMetrics' approach. --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2025-05-28 13:19:05 +02:00
Phuong Le	f5ffbb4e00	logsql: Remove redundant suffix logic (#9022 ) 1. Add `!lex.isEnd()` to prevent an infinite loop. Although the current code doesn't trigger this bug, it's a latent issue that could occur if someone modifies the callers or adds new code paths without proper stop tokens.	2025-05-27 14:00:37 +02:00
Phuong Le	d8871f56ba	lib/logstorage/parse: fix incorrect endTime in AddTimeFilter (#8991 ) Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8985 When using `AddTimeFilter`, it creates a string representation with the exact same timestamps but doesn't transform the internal end value. This is different from the `parseFilterTime` function, which makes the behavior of these two paths different.	2025-05-22 16:45:12 +02:00
Aliaksandr Valialkin	632bab85cf	lib/logstorage: move fieldsFilter to lib/prefixfilter in the preparation for its use instead of fieldsSet While at it, make sure that _msg field name is uniformly treated as an empty field name ("") during data ingestion.	2025-05-12 08:34:07 +02:00
Aliaksandr Valialkin	b2bf237466	lib/logstorage: simplify blockResultColumn.getValuesEncoded() a bit by removing columnValuesEncodedCreator and searchValuesEncodedCreator abstractions There are three possible cases for blockResultColumn.getValuesEncoded(): - The blockResultColumn.valuesEncoded is already set. This is the case for manually constructed blockResultColumn, or if it is cloned via blockResult.clone(). - The blockResultColumn.chSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.bs, by applying br.bm filter. - The blockResultColumn.cSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.brSrc, by applying br.bm filter. It is better from maintainability and debuggability PoV to write this logic in a single getValuesEncoded() function instead of indirecting it via valuesEncodedCreator.	2025-05-10 13:12:54 +02:00
Aliaksandr Valialkin	3f379ee5fd	lib/logstorage: avoid reading timestamps when processing "filter" pipe and "if()" conditions at "stats" pipe This should speed up such queries a bit if the timestamps isn't used in such a queries.	2025-05-10 12:37:47 +02:00
Aliaksandr Valialkin	208c3fe061	lib/logstorage: fix improper calculation of min / max over numeric columns VictoriaLogs stores min and max column values per every data block. These values were incorrectly used by min() and max() stats functions inside updateStatsForAllRows() function. It was assumed that this function could use min / max values stored in the block, since all the rows in the blockResult must be processed. But the blockResult contains _filtered_ rows, e.g. it may have less rows than the number of rows in the original block. In this case it is unsafe assuming that the min / max values from the original block exist in the filtered rows inside blockResult. Add blockResult.isFull() function, which returns true if the blockResult contains all rows from the original block (e.g. they aren't filtered). Use this function in fast path, while fall back to slow path, which triggers reading the column values and iterating over them.	2025-05-10 03:05:33 +02:00
Aliaksandr Valialkin	378cb83f67	lib/logstorage: treat '_msg' and '' as the same field names inside filedsFilter The '_msg' and '' field names are interchangeable, so they must be treated the same in filters.	2025-05-09 23:23:35 +02:00
Aliaksandr Valialkin	623c8abf65	lib/logstorage: add `decolorize` pipe to LogsQL for removing ANSI color codes from the given log field	2025-05-08 16:50:32 +02:00
Aliaksandr Valialkin	c8cc2434e0	app/vlinsert: add an ability to remove ANSI color codes during data ingestion ANSI color codes may break or make hard search and analysis of the ingested logs, so it is a good idea to drop during data ingestion.	2025-05-08 16:50:30 +02:00
Andrii Chubatiuk	ac414d8b93	docs: fixed typos (#8878 ) ### Describe Your Changes fixed typos in docs and code fixed collision in cloud docs ### Checklist The following checks are mandatory: - [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).	2025-05-06 12:03:56 +02:00
Phuong Le	d9cc16772e	lib/logstorage: fix infinite loop and anchor misbehavior in replace_regexp This PR fixes two related bugs in the `replace_regexp` pipe: 1. Infinite loop on empty matches when `limit` is not set [#8625](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625) When a regex pattern like `\d`, `()`, or `\b` was used, the implementation could repeatedly match the same zero-width position without advancing the string, causing unbounded memory usage and eventual OOM. This is now fixed by collecting all matches up front, respecting the `limit`, and applying replacements in a single pass. 2. Incorrect handling of anchors (`^` and `$`)* The previous implementation applied regex matching to progressively sliced substrings (`s = s[end:]`), which unintentionally caused anchor patterns like `^` (start-of-string) to match at every new substring's start. As a result, patterns that should have matched only once (e.g., `^\|$`) ended up matching multiple times. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625	2025-05-06 11:17:10 +02:00
Vadim Alekseev	fdf530d6ab	app/vlinsert: better error reporting in the /jsonline handler This commit improves error messages in the /jsonline handler by returning a 400 Bad Request if all the JSON lines were invalid. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8818	2025-05-06 11:11:24 +02:00
Aliaksandr Valialkin	589ee65c83	lib/atomicutil: rename Slice.GetSlice to Slice.All for the sake of better readability	2025-04-30 16:13:57 +02:00
Aliaksandr Valialkin	ec6f33f526	lib/logstorage: prevent from slow memory leak at datadb.rb datadb.rb contains logRows shards, which weren't freed up after the data ingestion for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order to reduce the pressure on GC.	2025-04-26 22:40:32 +02:00
Aliaksandr Valialkin	3b7039679f	lib/logstorage: make golangc-lint happy by substituting unused function arg with _	2025-04-25 23:14:36 +02:00
Aliaksandr Valialkin	8ad81220d3	lib/logstorage: increase scalability of datadb.mustAddRows() on hosts with many CPU cores Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts. Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time. This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one from concurrently running goroutines, by 2x.	2025-04-25 19:35:33 +02:00
Aliaksandr Valialkin	7455e6c0a5	lib/logstorage: re-use newTestLogRows() for creating LogRows inside BenchmarkStorageMustAddRows	2025-04-25 19:35:32 +02:00
Aliaksandr Valialkin	0cfe28c2fc	Revert "ci: temporary disable vlogs tests for i386 " This reverts commit `fa6a32a39d`. Reason for revert: the broken tests were fixed on GOARCH=386 by skipping the check for the state size after improting the state of stats function, since the state size depends on the hardware architecture. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8710	2025-04-24 17:31:40 +02:00
Aliaksandr Valialkin	46d32af89a	lib/logstorage: add `sample N` for returning a random 1/Nth sample of matching logs	2025-04-22 16:37:07 +02:00
Aliaksandr Valialkin	5491d54c11	lib/logstorage: buffer the ingested log entries before converting them into searchable parts This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts when small number of log entries are passed to Storage.MustAddRows(). The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1, up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100. This should reduce CPU usage during data ingestion when every request contains small number of rows.	2025-04-22 13:49:17 +02:00
Aliaksandr Valialkin	14561a7ed3	lib/logstorage: add a benchmark for different number of rows added to the storage via Storage.MustAddRows()	2025-04-22 13:49:15 +02:00
Andrii Chubatiuk	0fee22e91a	lib/logstorage: expect message in a field with empty and _msg name (#8743 ) ### Describe Your Changes fixes #8707 ### Checklist The following checks are mandatory: - [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2025-04-17 19:55:37 +02:00

1 2 3 4 5 ...

388 Commits