To give more context, stat64 is a child of large-file support (LFS)
back in 1996, during the transition from 32-bit to 64-bit. People
wanted 64-bit inodes in 32-bit systems, hence stat and stat64.
Nowadays where everything is 64-bit, stat64 is mostly just an alias
to stat, as stat is already native 64-bit. On modern implementations
like musl, stat64 is even dropped entirely as a sane default. We
observe the same in darwin's stat.h:
#if !__DARWIN_ONLY_64_BIT_INO_T
struct stat64 __DARWIN_STRUCT_STAT64;
#endif /* !__DARWIN_ONLY_64_BIT_INO_T */
Because struct stat64 doesn't ever exist on aarch64-darwin, and we
don't have to worry about people using stat64 calls, we can safely
remove all stat64 bloat, according to __DARWIN_ONLY_64_BIT_INO_T.
I nuked fake_stat64buf because only STAT64_HANDLER is using it, and
only non-darwin stat64 things use that handler. I didn't do more
because people might still use stat64 things on x86_64 (on glibc)
and other older 32-bit platforms, and we still need to hook those.
A loose follow up to PR #453. Fixes the remaining clang warnings on
aarch64-darwin.
This fixes the recursive pthread_once deadlock on darwin platforms.
It looks something like this:
Trace/BPT trap: 5
BUG IN CLIENT OF LIBPLATFORM: Trying to recursively lock an os_once_t
The macro __APPLEOSX__ is never defined, instead __APPLE__ should be used.
This mistake inadvertently caused system_time_from_system() to always take
the linux code path on darwin, leading to recursive calls during ftpl_init().
This was exposed by PR #488 which removed the ad-hoc recursion detection
that previously masked this issue.
This reverts commit 8ef74e33b6
"Swapped out pthread_rwlock_xxlock() ..."
This could result in concurrent uses of pthread_cond_* erroneously
returning EAGAIN, which is not permitted by the spec and which the
application way well treat as a bug. This seems to be happening in
gem2deb in ci.debian.net.
The commit message in 8ef74e33b6 says (rewrapped)
Swapped out pthread_rwlock_xxlock(), which doesn't return if it
can't obtain the lock, with pthread_rwlock_xxtrylock() followed by
sched yield and error code return. The issue is sometimes a thread
calling pthread_cond_init() or pthread_cond_destroy() can't
acquire the lock when another thread is waiting on a condition
variable notification via pthread_cond_timedwait(), and thus the
thread calling pthread_cond_init() or pthread_cond_destroy() end
up hanging indefinitely.
I don't think this is true. The things that are done with
monotonic_conds_lock held are HASH_ADD_PTR HASH_FIND_PTR etc. on
monotonic_conds, which should all be fast and AFAICT don't in turn
take any locks. So it shouldn't deadlock.
I conjecture that the underlying bug being experienced by the author
of "Swapped out pthread_rwlock_xxlock" was the lack of ftpl_init - ie,
access to an uninitialised pthread_rwlock_t. That might result in a
hang.
Otherwise we can use this in an uninitialised state, which is not
allowed.
We call ftpl_init in pthread_cond_init_232, but the application might
not have called that. For example, it might have a static condition
variable set up with PTHREAD_COND_INITIALIZER.
timespec.tv_nsec is 32-bit, even though timeval.tv_usec is
64-bit (weirdly). This doesn't matter very much in practice because
* on little endian architectures (which is all our 32-bit release
arches) writing to a too big integer ends up writing the
desired value in the desired location, and
* it doesn't affect the overall struct size on any of our actual
architectures (which align the uint64_t to 8 so must make the
whole struct 16 not 12), so the write overflow is harmless.
> #include <time.h>
> #include <sys/time.h>
> #include <stdio.h>
> struct timeval tv;
> struct timespec ts;
> int main(void) {
> printf("time_t %lld\n", (unsigned long long) sizeof(time_t));
> printf("timeval %lld %lld %lld\n",
> (unsigned long long) sizeof(tv),
> (unsigned long long) sizeof(tv.tv_sec),
> (unsigned long long) sizeof(tv.tv_usec)
> );
> printf("timespec %lld %lld %lld\n",
> (unsigned long long) sizeof(ts),
> (unsigned long long) sizeof(ts.tv_sec),
> (unsigned long long) sizeof(ts.tv_nsec)
> );
> }
> (sid_armhf-dchroot)iwj@amdahl:~/Faketime/test$ gcc t.c
> (sid_armhf-dchroot)iwj@amdahl:~/Faketime/test$ ./a.out
> time_t 8
> timeval 16 8 8
> timespec 16 8 4
> (sid_armhf-dchroot)iwj@amdahl:~/Faketime/test$
Since debian generally added 64-bit time support on 32-bit
arches, now glibc sometimes calls the clock_gettime64 syscall
(and library wrapper). This function was missing, and is added here.
Patch originally supplied here
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1064555
We should ignore the return value for logging function, to fix a new gcc ftbfs
libmallocintercept.c: In function ‘print_msg’:
libmallocintercept.c:27:9: error: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
27 | write(0, msg, strlen(msg));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
The shared semaphore is closed but it's not assigned to null.
That's required because the logic check the semaphore status if it's not null. For this reason, we are getting a core some times.
The use of shared memory has side effects. Currently, the only way to
opt out of shared memory is by compiling with -DFAKE_STATELESS.
To allow disabling shared memory without recompiling, this patch
introduces the --disable-shm option to `faketime`, equivalent to
setting the `FAKETIME_DISABLE_SHM=1` environment variable.
One callsite of ftpl_init wasn't protected by the "if (!initialized)"
condition, specifically:
static void ftpl_init (void) __attribute__ ((constructor));
If another "constructor" was called before this one, and that other
constructor used time or filesystem functions, ftlp_init would be
initialized by that other constructor, and then reinitialized by the
ftpl_init constructor. At that point, confusion ensues.
Function `timespec_get` is not guaranteed to be declared in MacOS
since its standard library is non-conformance to modern standards.
Therefore, skip patching `timespec_get` if it is undeclared by the
standard library.
The detection of `timespec_get` is based on the conjecture that the
macro `TIME_UTC` is only defined when `timespec_get` is declared.
Cast t to unsigned long and use the %lu format specifier instead of %zd.
This is more portable to 32-bit arches, avoiding the following compiler
error:
snippets/time.c:2:31: error: format ‘%zd’ expects argument of type ‘signed size_t’, but argument 3 has type ‘time_t’ {aka ‘long int’} [-Werror=format=]
2 | printf("[%s] time() yielded %zd\n", where, t);
| ~~^ ~
| | |
| int time_t {aka long int}
| %ld
cc1: all warnings being treated as errors
Without the changes Sun studio 12.8 compiler fails, for example
Note that only the errors are fixed. On the compiler above, some
warnings still remain, so -Werror has to be removed as well from the
compiler switches in order for the compilation to succeed.
We have a long-running program that we want to run under sanitizers for
extra error checking and under faketime to speed up the clock. This
program hangs after a while. Backtraces suggest that the hangs occur
because of recursion in the memory allocator, which apparently locks a
non-recursive mutex.
Specifically, what we see is that due to our use of FAKETIME_NO_CACHE=1,
libfaketime wants to reload the config file inside a (random) call to
clock_gettime(). libfaketime then uses fopen() / fclose() for reading
the config files. These function allocate / free a buffer for reading
data and specifically the call to free() that happens inside fclose()
ends up calling clock_gettime() again. At this point, libfaketime locks
up because it has a time_mutex that is locked and none-recursive.
Sadly, I did not manage to come up with a stand-alone reproducer for
this problem. Also, the above description is from memory after half a
week of vacation, so it might be (partially) wrong.
More information can be found here:
- https://github.com/wolfcw/libfaketime/issues/365#issuecomment-1115802530
- https://github.com/wolfcw/libfaketime/issues/365#issuecomment-1116178907
This commit first adds a test case. This new test uses the already
existing libmallocintercept.so to cause calls to clock_gettime() during
memory allocation routines. The difference to the already existing
version is that additionally FAKETIME_NO_CACHE and
FAKETIME_TIMESTAMP_FILE are set. This new test hung with its last output
suggesting a recursive call to malloc:
Called malloc() from libmallocintercept...successfully
Called free() on from libmallocintercept...successfully
Called malloc() from libmallocintercept...Called malloc() from libmallocintercept...
Sadly, gdb was unable to provide a meaningful backtrace for this hang.
Next, I replaced the use of fopen()/fgets()/fgets() with
open()/read()/close(). This code no longer reads the config file
line-by-line, but instead it reads all of it at once and then "filters
out" the result (ignores comment lines, removes end of line markers).
I tried to keep the behaviour of the code the same, but I know at least
one difference: Previously, the config file was read line-by-line and
lines that began with a comment character were immediately ignored. The
new code only reads the config once and then removes comment lines.
Since the buffer that is used contains only 256 characters, it is
possible that config files that were previously parsed correctly now
longer parse. A specific example: if a file begins with 500 '#'
characters in its first line and then a timestamp in the second line,
the old code was able to parse this file while the new code would only
see an empty file.
After this change, the new test no longer hangs. Sadly, I do not
actually know its effect on the "actual bug" that I wanted to fix, but
since there are no longer any calls to fclose(), there cannot be any
hangs inside fclose().
Signed-off-by: Uli Schlachter <psychon@znc.in>
This commit adds a new define FAIL_PRE_INIT_CALLS. When that define is
set, calls to clock_gettime() that occur before ftpl_init() was called
(due to being marked with __attribute__((constructor))) will just fail
and return -1.
After this commit, the test case added in the previous commit no longer
hangs. To make this actually work, this new define is enabled by
default.
Fixes: https://github.com/wolfcw/libfaketime/issues/365
Signed-off-by: Uli Schlachter <psychon@znc.in>
Backtraces suggest that AddressSanitizer replaces malloc() with a
function that
- locks a mutex and
- calls clock_gettime() while the mutex is held
This commit adds a test that implements a trivial malloc() that behaves
similarly. Currently, this test hangs.
Signed-off-by: Uli Schlachter <psychon@znc.in>
faketime \- manipulate the system time for a given command
.SHSYNOPSIS
@@ -32,6 +32,9 @@ use the advanced timestamp specification format.
\fB\--exclude-monotonic\fR
Do not fake time when the program makes a call to clock_gettime with a CLOCK_MONOTONIC clock.
.TP
\fB\--disable-shm\fR
Disable use of shared memory by libfaketime.
.TP
\fB\--date-prog <PATH>\fR
Use a specific GNU-date compatible implementation of the helper used to transform "timestamp format" strings into programmatically usable dates, instead of a compile-time default guess for the generic target platform.
printf("(Intentionally sleeping 1 second..., see docs about CLOCK_MONOTONIC test)\n");
printf("(Intentionally sleeping 1 second...)\n");
printf("(If this test hangs for more than a few seconds, please report\n your glibc version and system details as FORCE_MONOTONIC_FIX\n issue at https://github.com/wolfcw/libfaketime)\n");
fflush(stdout);
pthread_mutex_lock(&fakeMutex);
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.