Commit Graph

388 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
d1d98f39b5 lib/{logstorage,prefixfilter}: remove these packages, since they have been moved to https://github.com/VictoriaMetrics/VictoriaLogs/ repository 2025-07-07 03:25:25 +02:00
Aliaksandr Valialkin
8646b73efc lib/timeutil: put TryParseUnixTimestamp function here
This allows avoiding precision loss at VictoriaLogs query time when parsing fractional unix timestamps with millisecond, microsecond and nanosecond precisions.

This is a follow-up for 1d0e96c8d2

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8767#discussion_r2051657518
2025-07-06 17:17:07 +02:00
Aliaksandr Valialkin
1fa5ecd981 lib/logstorage: optimize tryParseUint64() a bit 2025-07-06 16:31:26 +02:00
Aliaksandr Valialkin
1d0e96c8d2 lib/logstorage: rewrite TryParseUnixTimestamp(), so it doesnt loose precision when parsing fractional timestamps with millisecond, microsecond and nanosecond precision
- Optimize TryParseUnixTimestamp() by using custom parsing logic instead of generic strconv.Parse* functions.
- Add benchmarks for TryParseUnixTimestamp().
- Add more tests for edge cases with various unix timestamps.

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8767#discussion_r2051657518

This is a follow-up for 7294ffcdfc
2025-07-06 16:10:59 +02:00
Vadim Alekseev
7294ffcdfc lib/logstorage: add support for parsing Unix timestamp in format pipe (#8767)
### Describe Your Changes

This PR adds support for parsing Unix timestamps (both integer and
float) in the format pipe using the `time:` prefix.
The timestamp precision (seconds, milliseconds, microseconds, or
nanoseconds) is automatically determined based on the value.

It might be worth creating a new prefix for this rather than reusing
`time:`, but I haven't found any compelling reason to extend the syntax.

### Checklist

The following checks are **mandatory**:

- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2025-07-06 12:33:13 +02:00
Phuong Le
857dd8edec VictoriaLogs: fix rate_sum() inconsistency between normal queries and vmalert recording rules (#9304)
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9303

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-07-03 07:25:08 +02:00
Nikolay
e553a41fa0 app: add vlagent component
This commit introduces new component - VictoriaLogs Agent (vlagent).

It accepts logs data via any data ingestion protocol supported by
VictoriaLogs and forwards it to the provided remote storages.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766
2025-06-30 17:01:05 +02:00
Aliaksandr Valialkin
442bfa6c35 lib/logstorage: add tests, which verify that NaN and Inf values cannot be parsed by tryParseFloat64
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8474
2025-06-25 23:00:56 +02:00
Max Kotliar
a845fe815a lib/logstorage: clarify comment on writeBlockResultFunc usage constraints (#9235)
### Describe Your Changes

The `DataBlock` contains structs with string fields, and while the
original comment mentioned not holding references to `br`, it wasn't
immediately clear that this also applies to fields like strings within
the data.

This change clarifies that the `writeBlockResultFunc` must not retain
references to any part of `br`, including its fields. This makes it
explicit that even seemingly safe types like strings must be copied if
needed.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-19 16:30:56 +02:00
Aliaksandr Valialkin
eb7b088c91 lib/logstorage: provide standard string representation for all the priority and severity levels in Syslog and Journald protocols inside the "level" field
It is better from usability PoV to provide string representation for all the priority and severity levels
instead of merging some of them into a common groups.

This is requested at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9209
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8535
2025-06-19 14:39:20 +02:00
Vadim Alekseev
dc2da9a71b app/logstorage: optimize pipes after appending limit pipe (#9201)
### Describe Your Changes

This PR adds a call to optimize pipes after the `limit` pipe has been
appended.

Related: #9200

While this approach is not ideal, since it forces us to re-optimize all
pipes, but it is simpler.

An alternative would be to reapply only the relevant optimizations
specifically for this case, something like:

```go
func (q *Query) AddPipeLimit(n uint64) {
	if len(q.pipes) > 0 {
		ps, ok := q.pipes[len(q.pipes)-1].(*pipeSort)
		if ok {
			if ps.limit == 0 || n < ps.limit {
				ps.limit = n
			}
			return
		}
		pu, ok := q.pipes[len(q.pipes)-1].(*pipeUniq)
		if ok {
			if pu.limit == 0 || n < pu.limit {
				pu.limit = n
			}
			return
		}
	}
	q.pipes = append(q.pipes, &pipeLimit{
		limit: n,
	})
}
```

### Checklist

The following checks are **mandatory**:

- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
2025-06-19 00:57:24 +02:00
Nikolay
c95990f47f lib/logstorage: properly iterate over ForEachRow (#9222)
Previously, ForEachRow always reset last row fields after iteration.
It makes impossible concurrent iteration with forEachRow, since
ForEachRow performed hidden mutation of LogRows.

 This commit resolves this issue by removal of fields reference.

Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076
2025-06-19 00:24:50 +02:00
Aliaksandr Valialkin
f0442e40a0 lib/logstorage: follow-up for 5d06c74e2b
Move the lex.isQuotedToken() check to the top of the lexer.isInvalidQuotedString() function
in order to simplify understanding the code.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9167
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9219
2025-06-19 00:19:17 +02:00
Andrii Chubatiuk
5d06c74e2b lib/logstorage: fix panic when not paired quotes are passed as a pipe value (#9219)
### Describe Your Changes

nextToken method, which is called prior to getCompoundTokenExt already
unquotes string, paired quotes check inside getCompoundTokenExt is
redundant
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9167

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-06-19 00:15:44 +02:00
Aliaksandr Valialkin
272a77a9c3 lib/logstorage: optimize OR filters where one of these filters is *
Such filters can be optimized to `*`. This avoid executing other OR filters.
For example, `foo or * or bar` is optimized to `*`, while `foo` and `bar` filters aren't executed.

Such filters are frequently generated by Grafana, so this should improve query performance there.
2025-06-18 16:53:35 +02:00
Aliaksandr Valialkin
e72a3fdb67 lib/logstorage: properly parse unquoted regexp filters ending with *
Use getCompoundToken() instead of getCompoundFuncArg() for obtaining regexp filter value,
since getCompoundFuncArg() skips trailing '*' chars.

This allows detecting invalid queries in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8582 .
2025-06-18 16:53:34 +02:00
Aliaksandr Valialkin
ff9cb3f821 app/vlinsert: use string representation of log level at logs ingested into VictoriaLogs via syslog and journald protocols
It is better from usability PoV to use string representation for the 'level' log field
instead of numeric representation.

Remove the -journald.priorityAsLevel and -syslog.severityAsLevel command-line flags,
since there are zero practical reasons when the `level` log field shouldn't be initialized automatically.

Move the CHANGELOG description for this feature into the correct place at docs/victorialogs/CHANGELOG.md,
and make it more human-readable.

Document the 'level' log field at https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/
and at https://docs.victoriametrics.com/victorialogs/data-ingestion/journald/

This is a follow-up for 50969ca780

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8535
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8553
2025-06-17 10:49:23 +02:00
Andrii Chubatiuk
50969ca780 app/vlinsert: introduced flags, that enable syslog severity and journald priority fields casting to a level field (#8553)
### Describe Your Changes

fixes #8535 

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-06-17 08:29:17 +02:00
Aliaksandr Valialkin
ee940e81ec lib/logstorage: improve performance for isTokenChar() by using 256-byte lookup table
This increases performance for the isTokenChar() by up to 30%.

Thanks to @ahfuzhang for the initial idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9064/files#diff-27b31ccad49a8ceaf033f97deb3d876d62eab4119374cbb3ae65278e894f6c69
2025-06-09 20:59:37 +02:00
Aliaksandr Valialkin
695532fc8d lib/logstorage: call isTokenChar() for ascii chars passed to isTokenRune()
This improves isTokenRune() performance for ascii chars by up to 30%.

Thanks to @ahfuzhang for the initial idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9064/files#diff-27b31ccad49a8ceaf033f97deb3d876d62eab4119374cbb3ae65278e894f6c69
2025-06-09 20:59:37 +02:00
Aliaksandr Valialkin
539498058e lib/atomicutil: add CacheLineSize const equal to the size of CPU cache line, and use this const for padding against false sharing across the code base
This should reduce the waste of memory on the padding from 128 bytes to 64 bytes on GOARCH=amd64,
while preserving bigger padding for platforms with bigger cache line sizes.

See https://stackoverflow.com/questions/68320687/why-are-most-cache-line-sizes-designed-to-be-64-byte-instead-of-32-128byte-now

Thanks to @tIGO for the hint
2025-06-06 10:21:40 +02:00
Aliaksandr Valialkin
1f5d02e059 lib: make sure that frequently updated global counters are padded in order to protect from false sharing issues on multi-CPU systems
Go linker packs global variables close to each other in the memory. This may lead to false sharing (https://en.wikipedia.org/wiki/False_sharing)
among these variables if frequently updated vars are put close to mostly read-only vars like described
at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8682 .

This commit adds padding to frequently updated global vars. This guarantees that these variables are put into distinct CPU cache lines
comparing to the rest of global variables. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683#issuecomment-2943254119

Thanks to @tIGO for the intial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8683
2025-06-05 11:40:20 +02:00
Aliaksandr Valialkin
f5c9c5bf01 lib/logstorage: allow using prefix filters on log fields in some LogsQL pipes
This should simplify working with big number of log fields in LogsQL queries.
Examples:

- `... | keep foo*` leaves only fields starting with `foo` prefix
- `... | rm foo*` removes all the fields starting with `foo` prefix
- `... | mv foo* bar*` replaces `foo` prefix with `bar` prefix in log fields
- `... | sum(foo*)` sums all the log fields starting with `foo` prefix
2025-06-02 22:41:57 +02:00
Aliaksandr Valialkin
001f9218b1 app/vlselect: properly sort results for /select/logsql/query with limit query arg and for /select/logsql/tail
The DataBlock.GetTimestamps() was returning a slice of strings, which belong to the DataBlock.
These strings are changed whenever the DataBlock is re-used for the next block.
So these strings couldn't be assigned to logRow.timestamp and to tailProcessor.lastTimestamps,
which outlive the DataBlock. The commit aa8c18fc9f5d44091d7ca92be6935eeaf3b85d7f broke this assumption,
which triggered the following bugs:

1. The bug, which could return incorrectly sorted results from /select/logsql/query when the 'limit' query arg is passed to it.
   The endpoint must return the last 'limit' log entries on the selected time range in this case, and these log entries
   must be sorted by _time.

2. The bug, which could return incorrect results from /select/logsql/tail (e.g. it could incorrectly skip some matching logs,
   it could return the same logs multiple times and it could return out-of-order logs without proper sorting by _time).

The solution is to return parsed timestamps from the DataBlock.GetTimestamps() function, so they could be safely
used by the caller without worries that they could be changed while in use.
2025-06-02 21:34:02 +02:00
Aliaksandr Valialkin
94f3302aca lib/logstorage: properly handle stats pipe in multi-level cluster setup when a vlselect queries another vlselect, which, in turn, queries vlstorage or another vlselect
The intermediate `vlselect` should properly proxy the `stats` state from the lower-level nodes to the upper-level `vlselect`.
Previously it was finalizing the state instead of proxying it to the upper-level `vlselect, so the upper-level `vlselect`
couldn't read it.

Fix this by introducing `proxy` mode for `stats` pipe. This mode accepts state from lower-level node, aggregates the state
and then proxies it to the upper node.

Thanks to @AndrewChubatiuk for the initial attempt to fix this issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9023 .
Thanks to @func25 for the idea with introduction of a new `proxy` mode for `stats` pipe at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9023/files#r2107735835 ,
which has been implemented in this commit. This approach results in less code changes comparing to the approach
taken at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9023

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8815
2025-05-29 20:59:05 +02:00
Aliaksandr Valialkin
1ddfd55e51 docs/victorialogs/logsql-examples.md: add an example how to get duration since the last seen log, which matches the given filter
This is a follow-up for 5bb012b67b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9013
2025-05-28 14:04:14 +02:00
Phuong Le
5bb012b67b logsql: math now() (#9014)
Resolves https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9013
2025-05-28 13:43:23 +02:00
Phuong Le
78fb987bef vlstorage: automatically recover missing parts.json files on startup (#9007)
Fixes
[#8873](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8873).

Automatically recover missing `parts.json` files on startup.
VictoriaLogs now scans existing part directories and recreates missing
`parts.json` files instead of crashing. This aligns with
VictoriaMetrics' approach.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-05-28 13:19:05 +02:00
Phuong Le
f5ffbb4e00 logsql: Remove redundant suffix logic (#9022)
1. Add `!lex.isEnd()` to prevent an infinite loop. Although the current
code doesn't trigger this bug, it's a latent issue that could occur if
someone modifies the callers or adds new code paths without proper stop
tokens.
2025-05-27 14:00:37 +02:00
Phuong Le
d8871f56ba lib/logstorage/parse: fix incorrect endTime in AddTimeFilter (#8991)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8985

When using `AddTimeFilter`, it creates a string representation with the
exact same timestamps but doesn't transform the internal end value. This
is different from the `parseFilterTime` function, which makes the
behavior of these two paths different.
2025-05-22 16:45:12 +02:00
Aliaksandr Valialkin
632bab85cf lib/logstorage: move fieldsFilter to lib/prefixfilter in the preparation for its use instead of fieldsSet
While at it, make sure that _msg field name is uniformly treated as an empty field name ("") during data ingestion.
2025-05-12 08:34:07 +02:00
Aliaksandr Valialkin
b2bf237466 lib/logstorage: simplify blockResultColumn.getValuesEncoded() a bit by removing columnValuesEncodedCreator and searchValuesEncodedCreator abstractions
There are three possible cases for blockResultColumn.getValuesEncoded():

- The blockResultColumn.valuesEncoded is already set. This is the case for manually constructed blockResultColumn,
  or if it is cloned via blockResult.clone().

- The blockResultColumn.chSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.bs,
  by applying br.bm filter.

- The blockResultColumn.cSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.brSrc,
  by applying br.bm filter.

It is better from maintainability and debuggability PoV to write this logic in a single getValuesEncoded() function
instead of indirecting it via valuesEncodedCreator.
2025-05-10 13:12:54 +02:00
Aliaksandr Valialkin
3f379ee5fd lib/logstorage: avoid reading timestamps when processing "filter" pipe and "if()" conditions at "stats" pipe
This should speed up such queries a bit if the timestamps isn't used in such a queries.
2025-05-10 12:37:47 +02:00
Aliaksandr Valialkin
208c3fe061 lib/logstorage: fix improper calculation of min / max over numeric columns
VictoriaLogs stores min and max column values per every data block.
These values were incorrectly used by min() and max() stats functions
inside updateStatsForAllRows() function. It was assumed that this function
could use min / max values stored in the block, since all the rows in the blockResult
must be processed. But the blockResult contains _filtered_ rows,
e.g. it may have less rows than the number of rows in the original block.
In this case it is unsafe assuming that the min / max values from the original block
exist in the filtered rows inside blockResult.

Add blockResult.isFull() function, which returns true if the blockResult contains all rows
from the original block (e.g. they aren't filtered). Use this function in fast path,
while fall back to slow path, which triggers reading the column values and iterating over them.
2025-05-10 03:05:33 +02:00
Aliaksandr Valialkin
378cb83f67 lib/logstorage: treat '_msg' and '' as the same field names inside filedsFilter
The '_msg' and '' field names are interchangeable, so they must be treated the same in filters.
2025-05-09 23:23:35 +02:00
Aliaksandr Valialkin
623c8abf65 lib/logstorage: add decolorize pipe to LogsQL for removing ANSI color codes from the given log field 2025-05-08 16:50:32 +02:00
Aliaksandr Valialkin
c8cc2434e0 app/vlinsert: add an ability to remove ANSI color codes during data ingestion
ANSI color codes may break or make hard search and analysis of the ingested logs,
so it is a good idea to drop during data ingestion.
2025-05-08 16:50:30 +02:00
Andrii Chubatiuk
ac414d8b93 docs: fixed typos (#8878)
### Describe Your Changes

fixed typos in docs and code
fixed collision in cloud docs

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
2025-05-06 12:03:56 +02:00
Phuong Le
d9cc16772e lib/logstorage: fix infinite loop and anchor misbehavior in replace_regexp
This PR fixes two related bugs in the `replace_regexp` pipe:


1. **Infinite loop on empty matches when `limit` is not set**
[#8625](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625)
When a regex pattern like `\d*`, `()`, or `\b` was used, the
implementation could repeatedly match the same zero-width position
without advancing the string, causing unbounded memory usage and
eventual OOM. This is now fixed by collecting all matches up front,
respecting the `limit`, and applying replacements in a single pass.

2. **Incorrect handling of anchors (`^` and `$`)**  
The previous implementation applied regex matching to progressively
sliced substrings (`s = s[end:]`), which unintentionally caused anchor
patterns like `^` (start-of-string) to match at every new substring's
start. As a result, patterns that should have matched only once (e.g.,
`^|$`) ended up matching multiple times.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625
2025-05-06 11:17:10 +02:00
Vadim Alekseev
fdf530d6ab app/vlinsert: better error reporting in the /jsonline handler
This commit improves error messages in the /jsonline handler by
returning a 400 Bad Request if all the JSON lines were invalid.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8818
2025-05-06 11:11:24 +02:00
Aliaksandr Valialkin
589ee65c83 lib/atomicutil: rename Slice.GetSlice to Slice.All for the sake of better readability 2025-04-30 16:13:57 +02:00
Aliaksandr Valialkin
ec6f33f526 lib/logstorage: prevent from slow memory leak at datadb.rb
datadb.rb contains logRows shards, which weren't freed up after the data ingestion
for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs
for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards
after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order
to reduce the pressure on GC.
2025-04-26 22:40:32 +02:00
Aliaksandr Valialkin
3b7039679f lib/logstorage: make golangc-lint happy by substituting unused function arg with _ 2025-04-25 23:14:36 +02:00
Aliaksandr Valialkin
8ad81220d3 lib/logstorage: increase scalability of datadb.mustAddRows() on hosts with many CPU cores
Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts.
Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time.

This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one
from concurrently running goroutines, by 2x.
2025-04-25 19:35:33 +02:00
Aliaksandr Valialkin
7455e6c0a5 lib/logstorage: re-use newTestLogRows() for creating LogRows inside BenchmarkStorageMustAddRows 2025-04-25 19:35:32 +02:00
Aliaksandr Valialkin
0cfe28c2fc Revert "ci: temporary disable vlogs tests for i386 "
This reverts commit fa6a32a39d.

Reason for revert: the broken tests were fixed on GOARCH=386 by skipping the check for the state size
after improting the state of stats function, since the state size depends on the hardware architecture.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8710
2025-04-24 17:31:40 +02:00
Aliaksandr Valialkin
46d32af89a lib/logstorage: add sample N for returning a random 1/Nth sample of matching logs 2025-04-22 16:37:07 +02:00
Aliaksandr Valialkin
5491d54c11 lib/logstorage: buffer the ingested log entries before converting them into searchable parts
This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts
when small number of log entries are passed to Storage.MustAddRows().

The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1,
up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100.

This should reduce CPU usage during data ingestion when every request contains small number of rows.
2025-04-22 13:49:17 +02:00
Aliaksandr Valialkin
14561a7ed3 lib/logstorage: add a benchmark for different number of rows added to the storage via Storage.MustAddRows() 2025-04-22 13:49:15 +02:00
Andrii Chubatiuk
0fee22e91a lib/logstorage: expect message in a field with empty and _msg name (#8743)
### Describe Your Changes

fixes #8707

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-04-17 19:55:37 +02:00