- Add a fast path for timestamps ending with 'Z'
- Use strings.LastIndexAny instead of strings.IndexAny for searching
for timezone offset at the end of the string. This works faster
for timestamps with sub-second precision.
(cherry picked from commit 335071cf3d)
Previously columns with negative int64 values were stored either as float64 or string
depending on whether the negative int64 values are bigger or smaller than -2^53.
If the integer values are smaller than -2^53, then they are stored as string, since float64 cannot
hold such values without precision loss. Now such values are stored as int64.
This should improve compression ratio and query performance over columns with negative int64 values.
Previously field values could be automatically converted to float64 with precision loss.
This could lead to unexpected results when querying such field values.
For example, "10007199254740992" was incorrectly represented as 10007199254740993.
This commit prevents from such lossy conversions when storing field values.
While at it, prevent from int64 overflow at tryParseBytes and tryParseDuration functions,
which are used for parsing constants in queries for byte sizes and durations.
Now these functions return 1<<63-1 (the maximum int64 value) for constants exceeding
this value. Previously they could return arbitrary garbage for such constants.
Refer the original byte slice with the marshaled columnsHeader for columns names and dictionary-encoded column values.
This improves query performance a bit when big number of blocks with big number of columns are scanned during the query.
(cherry picked from commit 279e25e7c8)
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.
While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.
Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.
While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
Change the return values for these functions - now they return the unmarshaled result plus
the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling.
This improves performance of these functions in hot loops of VictoriaLogs a bit.
- Move uniqueFields from rows to blockStreamMerger struct.
This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(),
which should improve readability and maintainability of the code.
- Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock,
since the provided logging didn't provide the solution for the issue with too many columns.
I couldn't figure out the proper solution, which could be helpful for end user,
so decided to remove the logging until we find the solution.
This commit also contains the following additional changes:
- It truncates field names longer than 128 chars during logs ingestion.
This should prevent from ingesting bogus field names.
This also should prevent from too big columnsHeader blocks,
which could negatively affect search query performance,
since columnsHeader is read on every scan of the corresponding data block.
- It limits the maximum length of const column value to 256.
Longer values are stored in an ordinary columns.
This helps limiting the size of columnsHeader blocks
and improving search query performance by avoiding
reading too long const columns on every scan of the corresponding data block.
- It deduplicates columns with identical names during data ingestion
and background merging. Previously it was possible to pass columns with duplicate names
to block.mustInitFromRows(), and they were stored as is in the block.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969