mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2026-05-17 08:36:55 +03:00
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10680 We noticed that backup restores in our environment were much slower than the hardware/bandwidth constraints would suggest and we traced this down to a couple of bottlenecks. This PR attempts to address all of them. #### Lack of pre-allocation of files, This was causing writes far into files to be quite slow as new blocks needed to be continually allocated. This was particularly bad on ext4 for us, but will likely be applicable to most disks and filesystems, you'll see the impl here is linux specific but this is mostly because I don't have a test env for any other platform and didn't want to blindly make changes without a validation env. This comes with the downside of no longer being to to resume a restore mid file, and requiring the re-downloading of parts already in the file size the file will appear at full size from the very start. This is I think _generally_ a good tradeoff for the restore speed gains, it is definitely a tradeoff so I've included a flag to disable the pre-allocation behavior and fall back to the existing part diffing logic. #### Fsync after each part With many small parts in relatively few files, or in high concurrency setups the the writerCloser fsync on each part(actually double fsync since both `filestream.Writer.mustFlush` and `filestream.Writer.mustClose` both fsync). Was causing slowdowns since we would be continually queuing fsyncs. With the pre-allocation pattern the file is only "ready" once re-named so I moved to a per file fsync after rename. #### Concurrent read/write The previous download pattern was to do a read from the remoteFs, with whatever latency that entailed, then sequentially do a write, again with whatever latency that entailed. This meant that throughput was limited to `readLatency + writeLatency * blockSize`. Similar to how `crossTypeCopy` is implemented in the backup process we can instead use `io.pipe` to allow two goroutines to work in parallel with a small buffer between them. #### Pagecache avoidance `filestream.Writer` does quite a lot to avoid polluting the page cache, but this is not relevent in a restore context and with large sequential block writes its much more effecient to let the OS flush the pagecache whenever it wants rather than doing a bunch of small buffer syscalls to flush blocks. Therefore this switches over to a much simplier directWriterCloser that does direct file IO and lets the OS handle flushes while mid write. ### Performance Before the changes we were seeing writes speeds of only 100MBps, this was a restore from EBS volumes, ext with 1GB/s throughput with <img width="1613" height="586" alt="Screenshot 2026-03-16 at 1 29 46 PM" src="https://github.com/user-attachments/assets/5d54dcb7-cb59-43e0-9247-fda8c70feb2f" /> After these changes in the same restore env we're seeing 600MBs flat rates. <img width="1611" height="471" alt="Screenshot 2026-03-16 at 1 31 33 PM" src="https://github.com/user-attachments/assets/ea8e2eb7-533a-48fa-99e0-0b38286e5572" /> Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com> Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>