9 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
05943abc11 lib/persistentqueue: run go fix -rangeint 2026-02-18 14:28:31 +01:00
Aliaksandr Valialkin
e35a9a366c all: consistently use sync.WaitGroup.Go() instead of sync.WaitGroup.Add(1) + sync.WaitGroup.Done()
This improves code readability a bit.
2026-01-27 00:29:47 +01:00
Aliaksandr Valialkin
83da33d8cf lib/fs: simplify the code for directory removal and make it compatible with object storage (S3) and NFS
- Drop the code needed for asynchronous removal of the directory on NFS shares.
  This code was needed when VictoriaMetrics could keep open files after their deletion
  or renaming. This is no longer the case after the commit 43b24164ef .
  Now files are deleted only after all the readers close them.
  This updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61

- Unify MustRemoveAll() and MustRemoveDirAtomic() into MustRemoveDir() and MustRemovePath()
  functions:

  - The MustRemoveDir() deletes the given directory with all its contents, in an "atomic" way:
    it creates a special `.delete-this-dir` file in the directory, then removes all its contents
    except of this file, and later removes the `.delete-this-dir` file together with the directory
    itself. This makes possible easily determining whether the given directory needs to be deleted
    after unclean shutdown - if it contains the `.delete-this-dir` file or if it is empty, it must be deleted.
    Add IsPartiallyRemovedDir() function, which can be used for detecting whether the given directory must be removed
    at starup.

    Previously the MustRemoveDirAtomic() was using a "trick" for atomic directory removal: it was "atomically" renaming
    the directory to a temporary directory with '.must-remove.' marker in the directory name, and after that it
    was removing the renamed directory. On startup all the directories with the `.must-remove.` marker were deleted
    if they are left after unclean shutdown. This "trick" doesn't work for NFS and object storage such as S3,
    since these storage systems do not support atomic renaming of directories with multiple entries inside.
    The new MustRemoveDir() function doesn't use this "trick", so it can be safely used in NFS and S3-like storage systems.

    This is based on the pull request from @func25 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9486/files .

  - The MustRemovePath() deletes the given file or an empty directory.

- Delete the existing parts and partitions at startup if they were partially deleted.

- Consistently use fs.MustRemoveDir() and fs.MustRemovePath() instead of os.RemoveAll() across the codebase.
  This reduces the amounts of bolierplate code related to error handling.

- Consistently use fs.MustWriteSync() instead of os.WriteFile() across the codebase.
2025-07-25 19:54:03 +02:00
Aliaksandr Valialkin
5034aa0773 app/vmagent: follow-up for 090cb2c9de
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
  for the result returned from these functions.

- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.

- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
  after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.

- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
  of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
  cannot keep up with the data ingestion rate.

- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.

- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
  data to persistent queue when -remoteWrite.disableOnDiskQueue is set.

- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
  because they are replaced with vmagent_remotewrite_samples_dropped_total metric.

- Update 'Disabling on-disk persistence' docs at docs/vmagent.md

- Update stale comments in the code

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 12:09:44 +02:00
Nikolay
090cb2c9de app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-24 13:42:11 +01:00
Aliaksandr Valialkin
edee262ecc Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Aliaksandr Valialkin
49d7cb1a3f all: fix golangci-lint issues 2020-03-10 19:41:46 +02:00
Aliaksandr Valialkin
76036c1897 app/vmagent: add -remoteWrite.maxDiskUsagePerURL for limiting the maximum disk usage for each -remoteWrite.url buffer
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/352
2020-03-03 19:49:07 +02:00
Aliaksandr Valialkin
04762344c6 app/vmagent: initial implementation for vmagent 2020-02-23 13:36:03 +02:00