mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2026-06-28 21:18:23 +03:00
Go runtime executes all the goroutines on GOMAXPROCS operating system threads. Go runtime cannot switch the OS thread to another goroutine if the current goroutine is stuck in the major pagefault while reading the data from memory-mapped file, because Go runtime doesn't distiguinsh between reading from regular memory and reading from memory-mapped file. So the OS thread becomes stuck while waiting until the OS reads the data from file at the requested memory address and returns back control to Go application. In the worst case it is possible that all the GOMAXPROCS threads are stuck in major pagefaults, so Go runtime pauses executing all the goroutines. This state is possible in environments with small GOMAXPROCS and high-latency disks such as NFS or small HDD-based disks at AWS. See https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d for more details. This commit protects from such stalls by verifying whether the given memory location from memory-mapped file is already loaded in the OS page cache before reading from that memory. If the location isn't in the OS page cache, then it falls back to pread() syscall for reading the data from file. Go runtime allocates extra OS threads for long-running syscalls, so it can continue executing goroutines across all the GOMAXPROCS threads while reading the data from slow storage via pread() syscall. This commit uses mincore() syscall for detecting whether the given memory page is available in the OS page cache. It also caches mincore() results for up to a minute in order to reduce the overhead for the mincore() syscall. This commit reduces the increase rate for the process_major_pagefaults_total metric by multiple orders of magnitude on systems with high-latency disks.