POSIX named semaphores (sem_t) have architecture-dependent internal
layout in glibc: 32 bytes on 64-bit, 16 bytes on 32-bit. When a
64-bit faketime wrapper creates a semaphore and spawns a 32-bit child,
the child misinterprets the counter and hangs on sem_wait forever.
Extract ft_sem_* abstraction into shared ft_sem.h/ft_sem.c with three
backends: FT_POSIX (existing), FT_SYSV (existing), FT_FLOCK (new
default). The flock backend uses kernel-mediated file locking on
/dev/shm/faketime_lock_<pid>, which is architecture-independent and
auto-releases on process death.
Both libfaketime.so and the faketime wrapper now use the same shared
abstraction, ensuring protocol agreement regardless of backend.
Replace struct timespec (arch-dependent long/time_t) with fixed-width
int64_t pairs in ft_shared_s so that 32-bit and 64-bit processes
interpret the same shared memory layout identically.
Fix ftruncate calls that allocated sizeof(uint64_t) (8 bytes) instead
of sizeof(struct ft_shared_s) (64-80 bytes) for the shared memory
region. Fix munmap in ft_cleanup using the wrong size.
Add struct layout test and cross-process shared memory functional test.
I noticed a bug in the semaphore handling, when using the System V semaphore
backend:
$ LD_PRELOAD=./src/libfaketime.so.1 bash -c "echo foo | sed s/foo/bar/"
libfaketime: In lock_for_stat(), ft_sem_lock failed: Invalid argument
[...exited with error...]
(Beware, the above command-line is not 100% deterministic; sometimes it
succeeds.)
Looking at the strace for the above command-line, it seems the bash echo
builtin process (or thread?) decides to remove the semaphore upon
exiting, while it's still in use by the sed process. sed then gets
EINVAL error ("Invalid argument") on its next semop call.
The root cause is a semantic difference between POSIX sem_unlink and
SysV semop(..., IPC_RMID), the two implementations for ft_sem_unlink:
* sem_unlink allows the semaphore to be used afterwards, as long as a
process has a reference to the semaphore.
* semop(..., IPC_RMID) removes the semaphore immediately, and further
use results in EINVAL error.
AFAICT, the simplest fix is to only let the owner of the semaphore (and
shared memory) do the clean up, which is what this patch does. Both
semaphore backends pass the tests with this change.
ft_sem_create() is called with an argument located on the stack, which
means it's a bad idea to keep a reference to it in the 'name' field of
ft_sem_t -- the pointed to data goes out of scope and results in
unpredictable behaviour.
Fix it by making a copy of the semaphore name. Allocate a 256 char
buffer, to match existing code.
Fixes: 2649cdb156 ("Add semaphore abstraction layer")
musl defines stat64 as stat, leading to this build error:
gcc -o libfaketime.o -c -std=gnu99 -Wall -Wextra -Werror -DFAKE_PTHREAD -DFAKE_STAT -DFAKE_UTIME -DFAKE_SLEEP -DFAKE_TIMERS -DFAKE_INTERNAL_CALLS -fPIC -DPREFIX='"'/nix/store/qpyvvrcas950da98mssw6ixlw7ckvyrb-libfaketime-0.9.11'"' -DLIBDIRNAME='"'/lib'"' -Wno-nonnull-compare libfaketime.c
In file included from libfaketime.c:55:
libfaketime.c:1276:5: error: redefinition of ‘stat’
1276 | int stat64 (const char *path, struct stat64 *buf)
| ^~~~~~
/nix/store/g9cgi4yyn5vrd1f9axj8gxdvwzv5ssvk-musl-1.2.5-dev/include/sys/stat.h:80:5: note: previous definition of ‘stat’ with type ‘int(const char *, struct stat *)’
80 | int stat(const char *__restrict, struct stat *__restrict);
| ^~~~
make[1]: *** [Makefile:161: libfaketime.o] Error 1
Fix it by only defining stat64 when building against glibc, since it's
not straight forward to detect musl, and it's the safest approach; there
might be other libc implementations that behave like musl.
Fixes: 53ba71e547 ("Handle stat64() call")
Add ft_sem_*() functions that use the POSIX semaphore API.
In preparation for adding System V semaphores as an alternative to POSIX
semaphores, because glibc breaks POSIX semaphores when operating in
mixed 32- and 64-bit environments[1].
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=17980
To give more context, stat64 is a child of large-file support (LFS)
back in 1996, during the transition from 32-bit to 64-bit. People
wanted 64-bit inodes in 32-bit systems, hence stat and stat64.
Nowadays where everything is 64-bit, stat64 is mostly just an alias
to stat, as stat is already native 64-bit. On modern implementations
like musl, stat64 is even dropped entirely as a sane default. We
observe the same in darwin's stat.h:
#if !__DARWIN_ONLY_64_BIT_INO_T
struct stat64 __DARWIN_STRUCT_STAT64;
#endif /* !__DARWIN_ONLY_64_BIT_INO_T */
Because struct stat64 doesn't ever exist on aarch64-darwin, and we
don't have to worry about people using stat64 calls, we can safely
remove all stat64 bloat, according to __DARWIN_ONLY_64_BIT_INO_T.
I nuked fake_stat64buf because only STAT64_HANDLER is using it, and
only non-darwin stat64 things use that handler. I didn't do more
because people might still use stat64 things on x86_64 (on glibc)
and other older 32-bit platforms, and we still need to hook those.
A loose follow up to PR #453. Fixes the remaining clang warnings on
aarch64-darwin.
This fixes the recursive pthread_once deadlock on darwin platforms.
It looks something like this:
Trace/BPT trap: 5
BUG IN CLIENT OF LIBPLATFORM: Trying to recursively lock an os_once_t
The macro __APPLEOSX__ is never defined, instead __APPLE__ should be used.
This mistake inadvertently caused system_time_from_system() to always take
the linux code path on darwin, leading to recursive calls during ftpl_init().
This was exposed by PR #488 which removed the ad-hoc recursion detection
that previously masked this issue.
This reverts commit 8ef74e33b6
"Swapped out pthread_rwlock_xxlock() ..."
This could result in concurrent uses of pthread_cond_* erroneously
returning EAGAIN, which is not permitted by the spec and which the
application way well treat as a bug. This seems to be happening in
gem2deb in ci.debian.net.
The commit message in 8ef74e33b6 says (rewrapped)
Swapped out pthread_rwlock_xxlock(), which doesn't return if it
can't obtain the lock, with pthread_rwlock_xxtrylock() followed by
sched yield and error code return. The issue is sometimes a thread
calling pthread_cond_init() or pthread_cond_destroy() can't
acquire the lock when another thread is waiting on a condition
variable notification via pthread_cond_timedwait(), and thus the
thread calling pthread_cond_init() or pthread_cond_destroy() end
up hanging indefinitely.
I don't think this is true. The things that are done with
monotonic_conds_lock held are HASH_ADD_PTR HASH_FIND_PTR etc. on
monotonic_conds, which should all be fast and AFAICT don't in turn
take any locks. So it shouldn't deadlock.
I conjecture that the underlying bug being experienced by the author
of "Swapped out pthread_rwlock_xxlock" was the lack of ftpl_init - ie,
access to an uninitialised pthread_rwlock_t. That might result in a
hang.
Otherwise we can use this in an uninitialised state, which is not
allowed.
We call ftpl_init in pthread_cond_init_232, but the application might
not have called that. For example, it might have a static condition
variable set up with PTHREAD_COND_INITIALIZER.
timespec.tv_nsec is 32-bit, even though timeval.tv_usec is
64-bit (weirdly). This doesn't matter very much in practice because
* on little endian architectures (which is all our 32-bit release
arches) writing to a too big integer ends up writing the
desired value in the desired location, and
* it doesn't affect the overall struct size on any of our actual
architectures (which align the uint64_t to 8 so must make the
whole struct 16 not 12), so the write overflow is harmless.
> #include <time.h>
> #include <sys/time.h>
> #include <stdio.h>
> struct timeval tv;
> struct timespec ts;
> int main(void) {
> printf("time_t %lld\n", (unsigned long long) sizeof(time_t));
> printf("timeval %lld %lld %lld\n",
> (unsigned long long) sizeof(tv),
> (unsigned long long) sizeof(tv.tv_sec),
> (unsigned long long) sizeof(tv.tv_usec)
> );
> printf("timespec %lld %lld %lld\n",
> (unsigned long long) sizeof(ts),
> (unsigned long long) sizeof(ts.tv_sec),
> (unsigned long long) sizeof(ts.tv_nsec)
> );
> }
> (sid_armhf-dchroot)iwj@amdahl:~/Faketime/test$ gcc t.c
> (sid_armhf-dchroot)iwj@amdahl:~/Faketime/test$ ./a.out
> time_t 8
> timeval 16 8 8
> timespec 16 8 4
> (sid_armhf-dchroot)iwj@amdahl:~/Faketime/test$
Since debian generally added 64-bit time support on 32-bit
arches, now glibc sometimes calls the clock_gettime64 syscall
(and library wrapper). This function was missing, and is added here.
Patch originally supplied here
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1064555
The shared semaphore is closed but it's not assigned to null.
That's required because the logic check the semaphore status if it's not null. For this reason, we are getting a core some times.
The use of shared memory has side effects. Currently, the only way to
opt out of shared memory is by compiling with -DFAKE_STATELESS.
To allow disabling shared memory without recompiling, this patch
introduces the --disable-shm option to `faketime`, equivalent to
setting the `FAKETIME_DISABLE_SHM=1` environment variable.