### Describe Your Changes
This PR adds support for parsing Unix timestamps (both integer and
float) in the format pipe using the `time:` prefix.
The timestamp precision (seconds, milliseconds, microseconds, or
nanoseconds) is automatically determined based on the value.
It might be worth creating a new prefix for this rather than reusing
`time:`, but I haven't found any compelling reason to extend the syntax.
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
This commit introduces new component - VictoriaLogs Agent (vlagent).
It accepts logs data via any data ingestion protocol supported by
VictoriaLogs and forwards it to the provided remote storages.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766
### Describe Your Changes
The `DataBlock` contains structs with string fields, and while the
original comment mentioned not holding references to `br`, it wasn't
immediately clear that this also applies to fields like strings within
the data.
This change clarifies that the `writeBlockResultFunc` must not retain
references to any part of `br`, including its fields. This makes it
explicit that even seemingly safe types like strings must be copied if
needed.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
### Describe Your Changes
This PR adds a call to optimize pipes after the `limit` pipe has been
appended.
Related: #9200
While this approach is not ideal, since it forces us to re-optimize all
pipes, but it is simpler.
An alternative would be to reapply only the relevant optimizations
specifically for this case, something like:
```go
func (q *Query) AddPipeLimit(n uint64) {
if len(q.pipes) > 0 {
ps, ok := q.pipes[len(q.pipes)-1].(*pipeSort)
if ok {
if ps.limit == 0 || n < ps.limit {
ps.limit = n
}
return
}
pu, ok := q.pipes[len(q.pipes)-1].(*pipeUniq)
if ok {
if pu.limit == 0 || n < pu.limit {
pu.limit = n
}
return
}
}
q.pipes = append(q.pipes, &pipeLimit{
limit: n,
})
}
```
### Checklist
The following checks are **mandatory**:
- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
Previously, ForEachRow always reset last row fields after iteration.
It makes impossible concurrent iteration with forEachRow, since
ForEachRow performed hidden mutation of LogRows.
This commit resolves this issue by removal of fields reference.
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076
Such filters can be optimized to `*`. This avoid executing other OR filters.
For example, `foo or * or bar` is optimized to `*`, while `foo` and `bar` filters aren't executed.
Such filters are frequently generated by Grafana, so this should improve query performance there.
Use getCompoundToken() instead of getCompoundFuncArg() for obtaining regexp filter value,
since getCompoundFuncArg() skips trailing '*' chars.
This allows detecting invalid queries in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8582 .
### Describe Your Changes
fixes#8535
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This should simplify working with big number of log fields in LogsQL queries.
Examples:
- `... | keep foo*` leaves only fields starting with `foo` prefix
- `... | rm foo*` removes all the fields starting with `foo` prefix
- `... | mv foo* bar*` replaces `foo` prefix with `bar` prefix in log fields
- `... | sum(foo*)` sums all the log fields starting with `foo` prefix
The DataBlock.GetTimestamps() was returning a slice of strings, which belong to the DataBlock.
These strings are changed whenever the DataBlock is re-used for the next block.
So these strings couldn't be assigned to logRow.timestamp and to tailProcessor.lastTimestamps,
which outlive the DataBlock. The commit aa8c18fc9f5d44091d7ca92be6935eeaf3b85d7f broke this assumption,
which triggered the following bugs:
1. The bug, which could return incorrectly sorted results from /select/logsql/query when the 'limit' query arg is passed to it.
The endpoint must return the last 'limit' log entries on the selected time range in this case, and these log entries
must be sorted by _time.
2. The bug, which could return incorrect results from /select/logsql/tail (e.g. it could incorrectly skip some matching logs,
it could return the same logs multiple times and it could return out-of-order logs without proper sorting by _time).
The solution is to return parsed timestamps from the DataBlock.GetTimestamps() function, so they could be safely
used by the caller without worries that they could be changed while in use.
1. Add `!lex.isEnd()` to prevent an infinite loop. Although the current
code doesn't trigger this bug, it's a latent issue that could occur if
someone modifies the callers or adds new code paths without proper stop
tokens.
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8985
When using `AddTimeFilter`, it creates a string representation with the
exact same timestamps but doesn't transform the internal end value. This
is different from the `parseFilterTime` function, which makes the
behavior of these two paths different.
There are three possible cases for blockResultColumn.getValuesEncoded():
- The blockResultColumn.valuesEncoded is already set. This is the case for manually constructed blockResultColumn,
or if it is cloned via blockResult.clone().
- The blockResultColumn.chSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.bs,
by applying br.bm filter.
- The blockResultColumn.cSrc is non-nil. In this case the valuesEncoded must be read from the corresponding br.brSrc,
by applying br.bm filter.
It is better from maintainability and debuggability PoV to write this logic in a single getValuesEncoded() function
instead of indirecting it via valuesEncodedCreator.
VictoriaLogs stores min and max column values per every data block.
These values were incorrectly used by min() and max() stats functions
inside updateStatsForAllRows() function. It was assumed that this function
could use min / max values stored in the block, since all the rows in the blockResult
must be processed. But the blockResult contains _filtered_ rows,
e.g. it may have less rows than the number of rows in the original block.
In this case it is unsafe assuming that the min / max values from the original block
exist in the filtered rows inside blockResult.
Add blockResult.isFull() function, which returns true if the blockResult contains all rows
from the original block (e.g. they aren't filtered). Use this function in fast path,
while fall back to slow path, which triggers reading the column values and iterating over them.
### Describe Your Changes
fixed typos in docs and code
fixed collision in cloud docs
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
This PR fixes two related bugs in the `replace_regexp` pipe:
1. **Infinite loop on empty matches when `limit` is not set**
[#8625](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625)
When a regex pattern like `\d*`, `()`, or `\b` was used, the
implementation could repeatedly match the same zero-width position
without advancing the string, causing unbounded memory usage and
eventual OOM. This is now fixed by collecting all matches up front,
respecting the `limit`, and applying replacements in a single pass.
2. **Incorrect handling of anchors (`^` and `$`)**
The previous implementation applied regex matching to progressively
sliced substrings (`s = s[end:]`), which unintentionally caused anchor
patterns like `^` (start-of-string) to match at every new substring's
start. As a result, patterns that should have matched only once (e.g.,
`^|$`) ended up matching multiple times.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8625
datadb.rb contains logRows shards, which weren't freed up after the data ingestion
for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs
for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards
after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order
to reduce the pressure on GC.
Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts.
Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time.
This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one
from concurrently running goroutines, by 2x.
This reverts commit fa6a32a39d.
Reason for revert: the broken tests were fixed on GOARCH=386 by skipping the check for the state size
after improting the state of stats function, since the state size depends on the hardware architecture.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8710
This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts
when small number of log entries are passed to Storage.MustAddRows().
The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1,
up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100.
This should reduce CPU usage during data ingestion when every request contains small number of rows.