I noticed a bug in the semaphore handling, when using the System V semaphore
backend:
$ LD_PRELOAD=./src/libfaketime.so.1 bash -c "echo foo | sed s/foo/bar/"
libfaketime: In lock_for_stat(), ft_sem_lock failed: Invalid argument
[...exited with error...]
(Beware, the above command-line is not 100% deterministic; sometimes it
succeeds.)
Looking at the strace for the above command-line, it seems the bash echo
builtin process (or thread?) decides to remove the semaphore upon
exiting, while it's still in use by the sed process. sed then gets
EINVAL error ("Invalid argument") on its next semop call.
The root cause is a semantic difference between POSIX sem_unlink and
SysV semop(..., IPC_RMID), the two implementations for ft_sem_unlink:
* sem_unlink allows the semaphore to be used afterwards, as long as a
process has a reference to the semaphore.
* semop(..., IPC_RMID) removes the semaphore immediately, and further
use results in EINVAL error.
AFAICT, the simplest fix is to only let the owner of the semaphore (and
shared memory) do the clean up, which is what this patch does. Both
semaphore backends pass the tests with this change.
ft_sem_create() is called with an argument located on the stack, which
means it's a bad idea to keep a reference to it in the 'name' field of
ft_sem_t -- the pointed to data goes out of scope and results in
unpredictable behaviour.
Fix it by making a copy of the semaphore name. Allocate a 256 char
buffer, to match existing code.
Fixes: 2649cdb156 ("Add semaphore abstraction layer")
musl defines stat64 as stat, leading to this build error:
gcc -o libfaketime.o -c -std=gnu99 -Wall -Wextra -Werror -DFAKE_PTHREAD -DFAKE_STAT -DFAKE_UTIME -DFAKE_SLEEP -DFAKE_TIMERS -DFAKE_INTERNAL_CALLS -fPIC -DPREFIX='"'/nix/store/qpyvvrcas950da98mssw6ixlw7ckvyrb-libfaketime-0.9.11'"' -DLIBDIRNAME='"'/lib'"' -Wno-nonnull-compare libfaketime.c
In file included from libfaketime.c:55:
libfaketime.c:1276:5: error: redefinition of ‘stat’
1276 | int stat64 (const char *path, struct stat64 *buf)
| ^~~~~~
/nix/store/g9cgi4yyn5vrd1f9axj8gxdvwzv5ssvk-musl-1.2.5-dev/include/sys/stat.h:80:5: note: previous definition of ‘stat’ with type ‘int(const char *, struct stat *)’
80 | int stat(const char *__restrict, struct stat *__restrict);
| ^~~~
make[1]: *** [Makefile:161: libfaketime.o] Error 1
Fix it by only defining stat64 when building against glibc, since it's
not straight forward to detect musl, and it's the safest approach; there
might be other libc implementations that behave like musl.
Fixes: 53ba71e547 ("Handle stat64() call")
Add ft_sem_*() functions that use the POSIX semaphore API.
In preparation for adding System V semaphores as an alternative to POSIX
semaphores, because glibc breaks POSIX semaphores when operating in
mixed 32- and 64-bit environments[1].
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=17980
To give more context, stat64 is a child of large-file support (LFS)
back in 1996, during the transition from 32-bit to 64-bit. People
wanted 64-bit inodes in 32-bit systems, hence stat and stat64.
Nowadays where everything is 64-bit, stat64 is mostly just an alias
to stat, as stat is already native 64-bit. On modern implementations
like musl, stat64 is even dropped entirely as a sane default. We
observe the same in darwin's stat.h:
#if !__DARWIN_ONLY_64_BIT_INO_T
struct stat64 __DARWIN_STRUCT_STAT64;
#endif /* !__DARWIN_ONLY_64_BIT_INO_T */
Because struct stat64 doesn't ever exist on aarch64-darwin, and we
don't have to worry about people using stat64 calls, we can safely
remove all stat64 bloat, according to __DARWIN_ONLY_64_BIT_INO_T.
I nuked fake_stat64buf because only STAT64_HANDLER is using it, and
only non-darwin stat64 things use that handler. I didn't do more
because people might still use stat64 things on x86_64 (on glibc)
and other older 32-bit platforms, and we still need to hook those.
A loose follow up to PR #453. Fixes the remaining clang warnings on
aarch64-darwin.
This fixes the recursive pthread_once deadlock on darwin platforms.
It looks something like this:
Trace/BPT trap: 5
BUG IN CLIENT OF LIBPLATFORM: Trying to recursively lock an os_once_t
The macro __APPLEOSX__ is never defined, instead __APPLE__ should be used.
This mistake inadvertently caused system_time_from_system() to always take
the linux code path on darwin, leading to recursive calls during ftpl_init().
This was exposed by PR #488 which removed the ad-hoc recursion detection
that previously masked this issue.
This reverts commit 8ef74e33b6
"Swapped out pthread_rwlock_xxlock() ..."
This could result in concurrent uses of pthread_cond_* erroneously
returning EAGAIN, which is not permitted by the spec and which the
application way well treat as a bug. This seems to be happening in
gem2deb in ci.debian.net.
The commit message in 8ef74e33b6 says (rewrapped)
Swapped out pthread_rwlock_xxlock(), which doesn't return if it
can't obtain the lock, with pthread_rwlock_xxtrylock() followed by
sched yield and error code return. The issue is sometimes a thread
calling pthread_cond_init() or pthread_cond_destroy() can't
acquire the lock when another thread is waiting on a condition
variable notification via pthread_cond_timedwait(), and thus the
thread calling pthread_cond_init() or pthread_cond_destroy() end
up hanging indefinitely.
I don't think this is true. The things that are done with
monotonic_conds_lock held are HASH_ADD_PTR HASH_FIND_PTR etc. on
monotonic_conds, which should all be fast and AFAICT don't in turn
take any locks. So it shouldn't deadlock.
I conjecture that the underlying bug being experienced by the author
of "Swapped out pthread_rwlock_xxlock" was the lack of ftpl_init - ie,
access to an uninitialised pthread_rwlock_t. That might result in a
hang.