Commit Graph

30 Commits

Author SHA1 Message Date
Nikolay
e553a41fa0 app: add vlagent component
This commit introduces new component - VictoriaLogs Agent (vlagent).

It accepts logs data via any data ingestion protocol supported by
VictoriaLogs and forwards it to the provided remote storages.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766
2025-06-30 17:01:05 +02:00
Nikolay
c95990f47f lib/logstorage: properly iterate over ForEachRow (#9222)
Previously, ForEachRow always reset last row fields after iteration.
It makes impossible concurrent iteration with forEachRow, since
ForEachRow performed hidden mutation of LogRows.

 This commit resolves this issue by removal of fields reference.

Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076
2025-06-19 00:24:50 +02:00
Aliaksandr Valialkin
632bab85cf lib/logstorage: move fieldsFilter to lib/prefixfilter in the preparation for its use instead of fieldsSet
While at it, make sure that _msg field name is uniformly treated as an empty field name ("") during data ingestion.
2025-05-12 08:34:07 +02:00
Aliaksandr Valialkin
c8cc2434e0 app/vlinsert: add an ability to remove ANSI color codes during data ingestion
ANSI color codes may break or make hard search and analysis of the ingested logs,
so it is a good idea to drop during data ingestion.
2025-05-08 16:50:30 +02:00
Andrii Chubatiuk
ac414d8b93 docs: fixed typos (#8878)
### Describe Your Changes

fixed typos in docs and code
fixed collision in cloud docs

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).
2025-05-06 12:03:56 +02:00
Aliaksandr Valialkin
ec6f33f526 lib/logstorage: prevent from slow memory leak at datadb.rb
datadb.rb contains logRows shards, which weren't freed up after the data ingestion
for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs
for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards
after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order
to reduce the pressure on GC.
2025-04-26 22:40:32 +02:00
Aliaksandr Valialkin
8ad81220d3 lib/logstorage: increase scalability of datadb.mustAddRows() on hosts with many CPU cores
Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts.
Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time.

This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one
from concurrently running goroutines, by 2x.
2025-04-25 19:35:33 +02:00
Aliaksandr Valialkin
5491d54c11 lib/logstorage: buffer the ingested log entries before converting them into searchable parts
This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts
when small number of log entries are passed to Storage.MustAddRows().

The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1,
up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100.

This should reduce CPU usage during data ingestion when every request contains small number of rows.
2025-04-22 13:49:17 +02:00
Andrii Chubatiuk
0fee22e91a lib/logstorage: expect message in a field with empty and _msg name (#8743)
### Describe Your Changes

fixes #8707

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-04-17 19:55:37 +02:00
Aliaksandr Valialkin
7a46af3920 victorialogs: add cluster mode
Cluster mode is enabled when -storageNode command-line flag is passed to VictoriaLogs.
In this mode it spreads the ingested logs among storage nodes specified in the -storageNode flag.
It also queries storage nodes during `select` queries.

Cluster mode allows building multi-level cluster setup when top-level select node can query multiple lower-level clusters
and get global querying view.

See https://docs.victoriametrics.com/victorialogs/cluster/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5077
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7950
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8223
2025-04-10 16:55:23 +02:00
Guillem Jover
76d205feae spelling and grammar fixes via codespell (#8497)
### Describe Your Changes

Fix many spelling errors and some grammar, including misspellings in
filenames. 

The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards. 
So it seems to have low impact on users.

The change also deprecates `cspell` as it is much heavier and less usable. 
---------

Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2025-03-17 16:32:10 +01:00
Aliaksandr Valialkin
336f954056 lib/logstorage: switch the type of LogRows.streamTagCanonicals from [][]byte to []string
This reduces the size of LogRows.streamTagCanonicals by 1/3 because of the eliminated `cap` field
in the slice header (reflect.SliceHeader) compared to the string header (reflect.StringHeader).
2025-03-17 15:02:51 +01:00
Aliaksandr Valialkin
c60b4175bb app/vlinsert: add an ability to ignore log fields starting with the given prefixes
The `ignore_fields` HTTTP query args can contain prefixes ending with '*'.
For example, `ignore_fields=foo.*,bar` skips all the fields starting with `foo.`
during data ingestion.
2025-03-15 00:03:02 +01:00
Aliaksandr Valialkin
dce5eb88d3 lib/logstorage: remove optimizations from LogRows.sortFieldsInRows
It has been appeared these optimizatios do not give measurable performance improvements,
while they complicate the code too much and may result in slowdown when the ingested logs have
different sets of fields.

This is a follow-up for 630601488e
2025-02-19 12:35:06 +01:00
Aliaksandr Valialkin
630601488e lib/logstorage: LogRows.mustAddInternal a bit
- Re-use column names and values from the previously added rows if possible.
  This increases locality of reference for field names and values, while improving
  access speed for the field names and values.

- Postpone sorting fields in the added rows until creating inmemory part from them.
  This allows optimizing the sorting for log fields with the same set of fields.
  This is usually the case for logs, which belong to the same logs stream.
2025-02-19 01:45:07 +01:00
Aliaksandr Valialkin
95f182053b lib/logstorage: remove unnecesary abstraction - RowsFormatter
It is better to use the AppendFieldsToJSON function directly
instead of hiding it under RowsFormatter abstraction.
2025-01-28 18:03:18 +01:00
Aliaksandr Valialkin
3c036e0d31 lib/logstorage: ignore logs with too long field names during data ingestion
Previously too long field names were silently truncated. This is not what most users expect.
It is better ignoring the whole log entry in this case and logging it with the WARNING message,
so human operator could notice and fix the ingestion of incorrect logs ASAP.

The commit also adds and updates the following entries to VictoriaLogs faq:

- https://docs.victoriametrics.com/victorialogs/faq/#how-many-fields-a-single-log-entry-may-contain
- https://docs.victoriametrics.com/victorialogs/faq/#what-is-the-maximum-supported-field-name-length
- https://docs.victoriametrics.com/victorialogs/faq/#what-length-a-log-record-is-expected-to-have

These entries are referred at `-insert.maxLineSizeBytes` and `-insert.maxFieldsPerLine` command-line descriptions
and at the WARNING messages, which are emitted when log entries are ignored because of some of these limits
are exceeded.
2025-01-28 16:55:48 +01:00
Aliaksandr Valialkin
256924e2d6 lib/logstorage: improve error message by adding a link with the explanation why VictoriaLogs ignores logs with the size exceeding 2MB
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7972
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7984
2025-01-26 22:50:26 +01:00
Aliaksandr Valialkin
cdc0db8ad7 lib/logstorage: properly ignore log fields when they are passed via streamFields arg to LogRows.MustAdd()
Previously streamFields were unconditionally added to log stream fields, even if they were listed in the ignoreFields.
Also do not add extraStreamFields to log stream fields if streamFields is non-nil, since this may confuse users.

This is a follow-up for 17b813ba28

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554
2024-12-04 21:45:06 +01:00
Aliaksandr Valialkin
17b813ba28 app/vlinsert: use default set of log stream fields for Loki and OpenTelemetry protocols if _stream_fields query arg is empty
Loki protocol supports a list of log stream labels - see https://grafana.com/docs/loki/latest/get-started/labels/

OpenTelemetry protocol also supports a list of log stream labels, which are named resource attributes there.
See https://opentelemetry.io/docs/concepts/resources/#semantic-attributes-with-sdk-provided-default-value

Simplify logs' ingestion into VictoriaLogs for these protocols by allowing the data ingestion without
the need to specify _stream_fields query arg or VL-Stream-Fields HTTP header. In this case the upstream log stream fields
are used during data ingestion. The set of log stream fields can be overriden via _stream_fields query arg
and via VL-Stream-Fields HTTP header if needed.

Thanks to @AndrewChubatiuk for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554
2024-12-04 13:57:23 +01:00
Aliaksandr Valialkin
6a71921565 lib/logstorage: ignore logs with too many fields instead of trying to store them
The storage isn't designed to work efficiently with logs containing too many log fields.
It is better to emit a warning to the user and ignore such logs instead of trying to store them.
This will allow fixing the issue by the user ASAP, and won't lead to excess resource usage
at VictoriaLogs side, such as RAM, CPU, disk IO and disk space.

While at it, ignore too long logs with the size exceeding the maximum block size during data ingestion.
This should prevent from possible issues when dealing with such long logs if they were stored in the storage.
Emit a warning in this case, so the user could identify and fix the issue ASAP.

This is a follow-up for 22e6385f56

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568
2024-12-04 12:18:34 +01:00
Aliaksandr Valialkin
4478e48eb6 app/vlinsert: implement the ability to add extra fields to the ingested logs
This can be done via extra_fields query arg or via VL-Extra-Fields HTTP header.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7354#issuecomment-2448671445
2024-11-01 20:06:17 +01:00
Aliaksandr Valialkin
2b6a634ec0 lib/logstorage: work-in-progress 2024-06-17 12:13:18 +02:00
Aliaksandr Valialkin
9dbd0f9085 lib/logstorage: initial implementation of pipes in LogsQL
See https://docs.victoriametrics.com/victorialogs/logsql/#pipes
2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin
8dce4eb189 lib/logstorage: follow-up for 94627113db
- Move uniqueFields from rows to blockStreamMerger struct.
  This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(),
  which should improve readability and maintainability of the code.

- Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock,
  since the provided logging didn't provide the solution for the issue with too many columns.
  I couldn't figure out the proper solution, which could be helpful for end user,
  so decided to remove the logging until we find the solution.

This commit also contains the following additional changes:

- It truncates field names longer than 128 chars during logs ingestion.
  This should prevent from ingesting bogus field names.
  This also should prevent from too big columnsHeader blocks,
  which could negatively affect search query performance,
  since columnsHeader is read on every scan of the corresponding data block.

- It limits the maximum length of const column value to 256.
  Longer values are stored in an ordinary columns.
  This helps limiting the size of columnsHeader blocks
  and improving search query performance by avoiding
  reading too long const columns on every scan of the corresponding data block.

- It deduplicates columns with identical names during data ingestion
  and background merging. Previously it was possible to pass columns with duplicate names
  to block.mustInitFromRows(), and they were stored as is in the block.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969
2023-10-02 19:19:08 +02:00
Aliaksandr Valialkin
7b33a27874 lib/logstorage: follow-up for 8a23d08c21
- Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes
  directly inside the Storage.IsReadOnly(). This should work fast in most cases.
  This simplifies the logic at lib/storage.

- Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since
  it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes.
  The background merge logic uses another mechanism for determining whether there is enough
  disk space for the merge - it reserves the needed disk space before the merge
  and releases it after the merge. This prevents from out of disk space errors during background merge.

- Properly handle corner cases for flushing in-memory data to disk when the storage
  enters read-only mode. This is better than losing the in-memory data.

- Return back Storage.MustAddRows() instead of Storage.AddRows(),
  since the only case when AddRows() can return error is when the storage is in read-only mode.
  This case must be handled by the caller by calling Storage.IsReadOnly()
  before adding rows to the storage.
  This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle
  errors returned by Storage.AddRows().

- Properly store parsed logs to Storage if parts of the request contain invalid log lines.
  Previously the parsed logs could be lost in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945
2023-10-02 16:52:23 +02:00
Zakhar Bessarab
8a23d08c21 lib/logstorage: switch to read-only mode when running out of disk space (#4945)
* lib/logstorage: switch to read-only mode when running out of disk space

Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix error handling logic during merge

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix log level

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-29 11:55:38 +02:00
Aliaksandr Valialkin
f548adce0b app/vlinsert/loki: follow-up after 09df5b66fd
- Parse protobuf if Content-Type isn't set to `application/json` - this behavior is documented at https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki

- Properly handle gzip'ped JSON requests. The `gzip` header must be read from `Content-Encoding` instead of `Content-Type` header

- Properly flush all the parsed logs with the explicit call to vlstorage.MustAddRows() at the end of query handler

- Check JSON field types more strictly.

- Allow parsing Loki timestamp as floating-point number. Such a timestamp can be generated by some clients,
  which store timestamps in float64 instead of int64.

- Optimize parsing of Loki labels in Prometheus text exposition format.

- Simplify tests.

- Remove lib/slicesutil, since there are no more users for it.

- Update docs with missing info and fix various typos. For example, it should be enough to have `instance` and `job` labels
  as stream fields in most Loki setups.

- Allow empty of missing timestamps in the ingested logs.
  The current timestamp at VictoriaLogs side is then used for the ingested logs.
  This simplifies debugging and testing of the provided HTTP-based data ingestion APIs.

The remaining MAJOR issue, which needs to be addressed: victoria-logs binary size increased from 13MB to 22MB
after adding support for Loki data ingestion protocol at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4482 .
This is because of shitty protobuf dependencies. They must be replaced with another protobuf implementation
similar to the one used at lib/prompb or lib/prompbmarshal .
2023-07-20 16:48:21 -07:00
Aliaksandr Valialkin
00c3dbd15d app/victoria-logs: add ability to debug data ingestion by passing debug query arg to data ingestion API 2023-06-20 20:02:46 -07:00
Aliaksandr Valialkin
87b66db47d app/victoria-logs: initial code release 2023-06-19 22:55:12 -07:00