VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2026-05-17 08:36:55 +03:00

Author	SHA1	Message	Date
Nikolay	e553a41fa0	app: add vlagent component This commit introduces new component - VictoriaLogs Agent (vlagent). It accepts logs data via any data ingestion protocol supported by VictoriaLogs and forwards it to the provided remote storages. Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8766	2025-06-30 17:01:05 +02:00
Nikolay	c95990f47f	lib/logstorage: properly iterate over ForEachRow (#9222 ) Previously, ForEachRow always reset last row fields after iteration. It makes impossible concurrent iteration with forEachRow, since ForEachRow performed hidden mutation of LogRows. This commit resolves this issue by removal of fields reference. Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9076	2025-06-19 00:24:50 +02:00
Aliaksandr Valialkin	632bab85cf	lib/logstorage: move fieldsFilter to lib/prefixfilter in the preparation for its use instead of fieldsSet While at it, make sure that _msg field name is uniformly treated as an empty field name ("") during data ingestion.	2025-05-12 08:34:07 +02:00
Aliaksandr Valialkin	c8cc2434e0	app/vlinsert: add an ability to remove ANSI color codes during data ingestion ANSI color codes may break or make hard search and analysis of the ingested logs, so it is a good idea to drop during data ingestion.	2025-05-08 16:50:30 +02:00
Andrii Chubatiuk	ac414d8b93	docs: fixed typos (#8878 ) ### Describe Your Changes fixed typos in docs and code fixed collision in cloud docs ### Checklist The following checks are mandatory: - [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/).	2025-05-06 12:03:56 +02:00
Aliaksandr Valialkin	ec6f33f526	lib/logstorage: prevent from slow memory leak at datadb.rb datadb.rb contains logRows shards, which weren't freed up after the data ingestion for the given per-day datadb is stopped. This leads to slow memory leak when VictoriaLogs runs for multiple days without restarts. Avoid this memory leak by freeing up the logRows shards after converting them to in-memory parts. Re-use the freed up logRows shards via a pool in order to reduce the pressure on GC.	2025-04-26 22:40:32 +02:00
Aliaksandr Valialkin	8ad81220d3	lib/logstorage: increase scalability of datadb.mustAddRows() on hosts with many CPU cores Use multiple independent logRows shards for storing the pending log entries before converting them to searchable parts. Every shard is protected by its own mutex, so multiple CPU cores may add multiple log rows into datadb at the same time. This increases the performance of BenchmarkStorageMustAddRows/rowsPerInsert-1, which ingests log rows own-by-one from concurrently running goroutines, by 2x.	2025-04-25 19:35:33 +02:00
Aliaksandr Valialkin	5491d54c11	lib/logstorage: buffer the ingested log entries before converting them into searchable parts This reduces the overhead needed for converting the ingested log entries to searchable in-memory parts when small number of log entries are passed to Storage.MustAddRows(). The BenchmarkStorageMustAddRows shows up to 10x performance increase for rowsPerInsert=1, up to 5x performance increase for rowsPerInsert=10 and up to 2x performance increase for rowsPerInsert=100. This should reduce CPU usage during data ingestion when every request contains small number of rows.	2025-04-22 13:49:17 +02:00
Andrii Chubatiuk	0fee22e91a	lib/logstorage: expect message in a field with empty and _msg name (#8743 ) ### Describe Your Changes fixes #8707 ### Checklist The following checks are mandatory: - [ ] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2025-04-17 19:55:37 +02:00
Aliaksandr Valialkin	7a46af3920	victorialogs: add cluster mode Cluster mode is enabled when -storageNode command-line flag is passed to VictoriaLogs. In this mode it spreads the ingested logs among storage nodes specified in the -storageNode flag. It also queries storage nodes during `select` queries. Cluster mode allows building multi-level cluster setup when top-level select node can query multiple lower-level clusters and get global querying view. See https://docs.victoriametrics.com/victorialogs/cluster/ Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5077 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7950 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8223	2025-04-10 16:55:23 +02:00
Guillem Jover	76d205feae	spelling and grammar fixes via codespell (#8497 ) ### Describe Your Changes Fix many spelling errors and some grammar, including misspellings in filenames. The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`. While this is a breaking change, this metric isn't used in alerts or dashboards. So it seems to have low impact on users. The change also deprecates `cspell` as it is much heavier and less usable. --------- Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com> Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>	2025-03-17 16:32:10 +01:00
Aliaksandr Valialkin	336f954056	lib/logstorage: switch the type of LogRows.streamTagCanonicals from [][]byte to []string This reduces the size of LogRows.streamTagCanonicals by 1/3 because of the eliminated `cap` field in the slice header (reflect.SliceHeader) compared to the string header (reflect.StringHeader).	2025-03-17 15:02:51 +01:00
Aliaksandr Valialkin	c60b4175bb	app/vlinsert: add an ability to ignore log fields starting with the given prefixes The `ignore_fields` HTTTP query args can contain prefixes ending with ''. For example, `ignore_fields=foo.,bar` skips all the fields starting with `foo.` during data ingestion.	2025-03-15 00:03:02 +01:00
Aliaksandr Valialkin	dce5eb88d3	lib/logstorage: remove optimizations from LogRows.sortFieldsInRows It has been appeared these optimizatios do not give measurable performance improvements, while they complicate the code too much and may result in slowdown when the ingested logs have different sets of fields. This is a follow-up for `630601488e`	2025-02-19 12:35:06 +01:00
Aliaksandr Valialkin	630601488e	lib/logstorage: LogRows.mustAddInternal a bit - Re-use column names and values from the previously added rows if possible. This increases locality of reference for field names and values, while improving access speed for the field names and values. - Postpone sorting fields in the added rows until creating inmemory part from them. This allows optimizing the sorting for log fields with the same set of fields. This is usually the case for logs, which belong to the same logs stream.	2025-02-19 01:45:07 +01:00
Aliaksandr Valialkin	95f182053b	lib/logstorage: remove unnecesary abstraction - RowsFormatter It is better to use the AppendFieldsToJSON function directly instead of hiding it under RowsFormatter abstraction.	2025-01-28 18:03:18 +01:00
Aliaksandr Valialkin	3c036e0d31	lib/logstorage: ignore logs with too long field names during data ingestion Previously too long field names were silently truncated. This is not what most users expect. It is better ignoring the whole log entry in this case and logging it with the WARNING message, so human operator could notice and fix the ingestion of incorrect logs ASAP. The commit also adds and updates the following entries to VictoriaLogs faq: - https://docs.victoriametrics.com/victorialogs/faq/#how-many-fields-a-single-log-entry-may-contain - https://docs.victoriametrics.com/victorialogs/faq/#what-is-the-maximum-supported-field-name-length - https://docs.victoriametrics.com/victorialogs/faq/#what-length-a-log-record-is-expected-to-have These entries are referred at `-insert.maxLineSizeBytes` and `-insert.maxFieldsPerLine` command-line descriptions and at the WARNING messages, which are emitted when log entries are ignored because of some of these limits are exceeded.	2025-01-28 16:55:48 +01:00
Aliaksandr Valialkin	256924e2d6	lib/logstorage: improve error message by adding a link with the explanation why VictoriaLogs ignores logs with the size exceeding 2MB Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7972 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7984	2025-01-26 22:50:26 +01:00
Aliaksandr Valialkin	cdc0db8ad7	lib/logstorage: properly ignore log fields when they are passed via streamFields arg to LogRows.MustAdd() Previously streamFields were unconditionally added to log stream fields, even if they were listed in the ignoreFields. Also do not add extraStreamFields to log stream fields if streamFields is non-nil, since this may confuse users. This is a follow-up for `17b813ba28` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554	2024-12-04 21:45:06 +01:00
Aliaksandr Valialkin	17b813ba28	app/vlinsert: use default set of log stream fields for Loki and OpenTelemetry protocols if _stream_fields query arg is empty Loki protocol supports a list of log stream labels - see https://grafana.com/docs/loki/latest/get-started/labels/ OpenTelemetry protocol also supports a list of log stream labels, which are named resource attributes there. See https://opentelemetry.io/docs/concepts/resources/#semantic-attributes-with-sdk-provided-default-value Simplify logs' ingestion into VictoriaLogs for these protocols by allowing the data ingestion without the need to specify _stream_fields query arg or VL-Stream-Fields HTTP header. In this case the upstream log stream fields are used during data ingestion. The set of log stream fields can be overriden via _stream_fields query arg and via VL-Stream-Fields HTTP header if needed. Thanks to @AndrewChubatiuk for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554	2024-12-04 13:57:23 +01:00
Aliaksandr Valialkin	6a71921565	lib/logstorage: ignore logs with too many fields instead of trying to store them The storage isn't designed to work efficiently with logs containing too many log fields. It is better to emit a warning to the user and ignore such logs instead of trying to store them. This will allow fixing the issue by the user ASAP, and won't lead to excess resource usage at VictoriaLogs side, such as RAM, CPU, disk IO and disk space. While at it, ignore too long logs with the size exceeding the maximum block size during data ingestion. This should prevent from possible issues when dealing with such long logs if they were stored in the storage. Emit a warning in this case, so the user could identify and fix the issue ASAP. This is a follow-up for `22e6385f56` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568	2024-12-04 12:18:34 +01:00
Aliaksandr Valialkin	4478e48eb6	app/vlinsert: implement the ability to add extra fields to the ingested logs This can be done via extra_fields query arg or via VL-Extra-Fields HTTP header. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7354#issuecomment-2448671445	2024-11-01 20:06:17 +01:00
Aliaksandr Valialkin	2b6a634ec0	lib/logstorage: work-in-progress	2024-06-17 12:13:18 +02:00
Aliaksandr Valialkin	9dbd0f9085	lib/logstorage: initial implementation of pipes in LogsQL See https://docs.victoriametrics.com/victorialogs/logsql/#pipes	2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin	8dce4eb189	lib/logstorage: follow-up for `94627113db` - Move uniqueFields from rows to blockStreamMerger struct. This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(), which should improve readability and maintainability of the code. - Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock, since the provided logging didn't provide the solution for the issue with too many columns. I couldn't figure out the proper solution, which could be helpful for end user, so decided to remove the logging until we find the solution. This commit also contains the following additional changes: - It truncates field names longer than 128 chars during logs ingestion. This should prevent from ingesting bogus field names. This also should prevent from too big columnsHeader blocks, which could negatively affect search query performance, since columnsHeader is read on every scan of the corresponding data block. - It limits the maximum length of const column value to 256. Longer values are stored in an ordinary columns. This helps limiting the size of columnsHeader blocks and improving search query performance by avoiding reading too long const columns on every scan of the corresponding data block. - It deduplicates columns with identical names during data ingestion and background merging. Previously it was possible to pass columns with duplicate names to block.mustInitFromRows(), and they were stored as is in the block. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969	2023-10-02 19:19:08 +02:00
Aliaksandr Valialkin	7b33a27874	lib/logstorage: follow-up for `8a23d08c21` - Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes directly inside the Storage.IsReadOnly(). This should work fast in most cases. This simplifies the logic at lib/storage. - Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes. The background merge logic uses another mechanism for determining whether there is enough disk space for the merge - it reserves the needed disk space before the merge and releases it after the merge. This prevents from out of disk space errors during background merge. - Properly handle corner cases for flushing in-memory data to disk when the storage enters read-only mode. This is better than losing the in-memory data. - Return back Storage.MustAddRows() instead of Storage.AddRows(), since the only case when AddRows() can return error is when the storage is in read-only mode. This case must be handled by the caller by calling Storage.IsReadOnly() before adding rows to the storage. This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle errors returned by Storage.AddRows(). - Properly store parsed logs to Storage if parts of the request contain invalid log lines. Previously the parsed logs could be lost in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945	2023-10-02 16:52:23 +02:00
Zakhar Bessarab	8a23d08c21	lib/logstorage: switch to read-only mode when running out of disk space (#4945 ) * lib/logstorage: switch to read-only mode when running out of disk space Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/logstorage: fix error handling logic during merge Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/logstorage: fix log level Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-09-29 11:55:38 +02:00
Aliaksandr Valialkin	f548adce0b	app/vlinsert/loki: follow-up after `09df5b66fd` - Parse protobuf if Content-Type isn't set to `application/json` - this behavior is documented at https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki - Properly handle gzip'ped JSON requests. The `gzip` header must be read from `Content-Encoding` instead of `Content-Type` header - Properly flush all the parsed logs with the explicit call to vlstorage.MustAddRows() at the end of query handler - Check JSON field types more strictly. - Allow parsing Loki timestamp as floating-point number. Such a timestamp can be generated by some clients, which store timestamps in float64 instead of int64. - Optimize parsing of Loki labels in Prometheus text exposition format. - Simplify tests. - Remove lib/slicesutil, since there are no more users for it. - Update docs with missing info and fix various typos. For example, it should be enough to have `instance` and `job` labels as stream fields in most Loki setups. - Allow empty of missing timestamps in the ingested logs. The current timestamp at VictoriaLogs side is then used for the ingested logs. This simplifies debugging and testing of the provided HTTP-based data ingestion APIs. The remaining MAJOR issue, which needs to be addressed: victoria-logs binary size increased from 13MB to 22MB after adding support for Loki data ingestion protocol at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4482 . This is because of shitty protobuf dependencies. They must be replaced with another protobuf implementation similar to the one used at lib/prompb or lib/prompbmarshal .	2023-07-20 16:48:21 -07:00
Aliaksandr Valialkin	00c3dbd15d	app/victoria-logs: add ability to debug data ingestion by passing `debug` query arg to data ingestion API	2023-06-20 20:02:46 -07:00
Aliaksandr Valialkin	87b66db47d	app/victoria-logs: initial code release	2023-06-19 22:55:12 -07:00

30 Commits