Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)]
Remove <libc-tsd.h>
Use __thread variables directly instead. The macros do not save any
typing. It seems unlikely that a future port will lack __thread
variable support.
Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names. Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.
Florian Weimer [Fri, 16 May 2025 14:47:02 +0000 (16:47 +0200)]
manual: Clarifications for listing directories
Support for seeking is limited. Using the d_off and d_reclen members
of struct dirent is discouraged, especially with readdir. Concurrent
modification of directories during iteration may result in duplicate
or missing etnries.
Existing benchtests for malloc infrastructure seem to be rather generic
to test global malloc implementation performance. This new benchtest
focus on reducing any non tcache related side effects, allowing to more
realistically predict performance impacts of tcache code changes.
The test was inpired in bench-[cm]alloc-thread code, with severe
simplifications:
- forces single thread execution, reducing concurrency side-effects,
like cache incoherence penalties due simultaneous writes to the same
cache pages;
- Focus on allocating and deallocating a single size for all the
duration of the benchmark. Since all it does is allocate and
deallocate, it will measure the tcache hotpath without any
side-effects.
- Allows to specify the allocation size as input argument.
Joseph Myers [Wed, 14 May 2025 10:51:46 +0000 (10:51 +0000)]
Implement C23 rootn.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the rootn functions, which compute the Yth root of X for
integer Y (with a domain error if Y is 0, even if X is a NaN). The
integer exponent has type long long int in C23; it was intmax_t in TS
18661-4, and as with other interfaces changed after their initial
appearance in the TS, I don't think we need to support the original
version of the interface.
As with pown and compoundn, I strongly encourage searching for worst
cases for ulps error for these implementations (necessarily
non-exhaustively, given the size of the input space). I also expect a
custom implementation for a given format could be much faster as well
as more accurate, although the implementation is simpler than those
for pown and compoundn.
This completes adding to glibc those TS 18661-4 functions (ignoring
DFP) that are included in C23. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118592 regarding the C23
mathematical functions (not just the TS 18661-4 ones) missing built-in
functions in GCC, where such functions might usefully be added.
Tested for x86_64 and x86, and with build-many-glibcs.py.
Wilco Dijkstra [Fri, 9 May 2025 16:02:14 +0000 (16:02 +0000)]
malloc: Improve performance of __libc_calloc
Improve performance of __libc_calloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_calloc doesn't need
to setup a frame.
On Neoverse V2, bench-calloc-simple improves by 5.0% overall.
Bench-calloc-thread 1 improves by 24%.
Stefan Liebler [Tue, 13 May 2025 11:28:56 +0000 (13:28 +0200)]
powerpc64le: Remove configure check for objcopy >= 2.26.
Due to raising the minimum binutils version to >= 2.26, the configure
check for testing support of --update-section is not needed anymore. Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
The current minimum bintuils version 2.25 was released end of 2014. This patch
now raises the minimum binutils version to 2.39 which was also released in 2022.
The hint for ARC is not needed anymore.
In sysdeps/[alpha|hppa|csky]/configure.ac, PIE is unsupported with this comment:
PIE builds fail on binutils 2.37 and earlier, see:
https://sourceware.org/bugzilla/show_bug.cgi?id=28672
This patch keeps PIE unsupported and let the machine maintainers test and
enable it later.
In sysdeps/arm/configure.ac, there is a check whether TPOFF relocs with addends
are assembled correctly, which is known to be broken in binutils 2.24 and 2.25.
See: https://sourceware.org/bugzilla/show_bug.cgi?id=18383
This patch keeps the check as is and let the machine maintainers check if it
still required.
According to Florian Weimer:
Having at least binutils 2.38 will allow us to assume that this linker
bug is fixed:
Bug 28743 - -z relro creats holes in the process image on GNU/Linux
<https://sourceware.org/bugzilla/show_bug.cgi?id=28743> Reviewed-by: Florian Weimer <fweimer@redhat.com>
Yury Khrustalev [Thu, 8 May 2025 12:53:38 +0000 (13:53 +0100)]
aarch64: fix unwinding in longjmp
Previously, longjmp() on aarch64 was using CFI directives around the
call to __libc_arm_za_disable() after CFA was redefined at the start
of longjmp(). This may result in unwinding issues. Move the call and
surrounding CFI directives to the beginning of longjmp().
Wilco Dijkstra [Thu, 1 May 2025 19:58:38 +0000 (19:58 +0000)]
malloc: Improve malloc initialization
Move malloc initialization to __libc_early_init. Use a hidden __ptmalloc_init
for initialization and a weak call to avoid pulling in the system malloc in a
static binary. All previous initialization checks can now be removed.
Joseph Myers [Mon, 12 May 2025 14:56:07 +0000 (14:56 +0000)]
Document all CLOCK_* values
The manual documents CLOCK_REALTIME and CLOCK_MONOTONIC but not other
CLOCK_* values. Add documentation of the POSIX clocks
CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, along with a
reference to the Linux man pages for the semantics of the
Linux-specific clocks supported (as with some other functionality
coming direct from the Linux kernel where the man pages can be
considered the main documentation).
Note: CLOCK_MONOTONIC_RAW, CLOCK_REALTIME_COARSE and
CLOCK_MONOTONIC_COARSE are also defined in the toplevel bits/time.h,
as used for Hurd. Nevertheless, I see no sign that the Hurd code in
glibc actually has any support for those clocks, so I think it is
correct to document them as Linux-specific (and to refer only to the
Linux man pages for their semantics).
David Lau [Mon, 12 May 2025 11:42:17 +0000 (11:42 +0000)]
malloc: Improved double free detection in the tcache
The previous double free detection did not account for an attacker to
use a terminating null byte overflowing from the previous
chunk to change the size of a memory chunk is being sorted into.
So that the check in 'tcache_double_free_verify' would pass
even though it is a double free.
Solution:
Let 'tcache_double_free_verify' iterate over all tcache entries to
detect double frees.
This patch only protects from buffer overflows by one byte.
But I would argue that off by one errors are the most common
errors to be made.
Alternatives Considered:
Store the size of a memory chunk in big endian and thus
the chunk size would not get overwritten because entries in the
tcache are not that big.
Move the tcache_key before the actual memory chunk so that it
does not have to be checked at all, this would work better in general
but also it would increase the memory usage.
Signed-off-by: David Lau <david.lau@fau.de> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Joseph Myers [Fri, 9 May 2025 15:17:27 +0000 (15:17 +0000)]
Implement C23 compoundn
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1). The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.
Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).
As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space). I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.
As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).
Tested for x86_64 and x86, and with build-many-glibcs.py.
Collin Funk [Mon, 5 May 2025 02:31:34 +0000 (19:31 -0700)]
nss: remove undefined behavior and optimize getaddrinfo
On x86-64 and compiling with -O2 using stdc_leading_zeros compiles to
the bsr instruction. The fls function removed by this patch is inlined
but still loops while checking each bit individually.
* nss/getaddrinfo.c: Include <stdbit.h>.
(fls): Remove function. This function contains a left shift of 31 on an
'int' which is undefined.
(rfc3484_sort): Use stdc_leading_zeros instead of fls.
These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.
DJ Delorie [Sat, 3 May 2025 00:51:18 +0000 (20:51 -0400)]
manual: add more pthread functions
Add stubs and partial docs for many undocumented pthreads functions.
While neither exhaustive nor complete, gives minimal usage docs
for many functions and expands the pthreads chapters, making it
easier to continue improving this section in the future.
Stefan Liebler [Tue, 29 Apr 2025 11:28:58 +0000 (13:28 +0200)]
S390: Add new s390 platform z17.
The glibc-hwcaps subdirectories are extended by "z17". Libraries are loaded if
the z17 facility bits are active:
- Miscellaneous-instruction-extensions facility 4
- Vector-enhancements-facility 3
- Vector-Packed-Decimal-Enhancement Facility 3
- CPU: Concurrent-Functions Facility
tst-glibc-hwcaps.c is extended in order to test z17 via new marker6.
In case of running on a z17 with a kernel not recognizing z17 yet,
AT_PLATFORM will be z900 but vector-bit in AT_HWCAP is set. This situation
is now recognized and this testcase does not fail.
A fatal glibc error is dumped if glibc was build with architecture
level set for z17, but run on an older machine (See dl-hwcap-check.h).
Note, you might get an SIGILL before this check if you don't use:
configure --with-rtld-early-cflags=-march=<older-machine>
ld.so --list-diagnostics now also dumps information about s390.cpu_features.
Independent from z17, the s390x kernel won't introduce new HWCAP-Bits if there
is no special handling needed in kernel itself. For z17, we don't have new
HWCAP flags, but have to check the facility bits retrieved by
stfle-instruction.
Instead of storing all the stfle-bits (currently four 64bit values) in the
cpu_features struct, we now only store those bits, which are needed within
glibc itself. Note that we have this list twice, one with original values and
the other one which can be filtered with GLIBC_TUNABLES=glibc.cpu.hwcaps.
Those new fields are stored in so far reserved space in cpu_features struct.
Thus processes started in between the update of glibc package and we e.g. have
a new ld.so and an old libc.so, won't crash. The glibc internal ifunc-resolvers
would not select the best optimized variant.
The users of stfle-bits are also updated:
- parsing of GLIBC_TUNABLES=glibc.cpu.hwcaps
- glibc internal ifunc-resolvers
- __libc_ifunc_impl_list
- sysconf
Joseph Myers [Thu, 1 May 2025 22:28:59 +0000 (22:28 +0000)]
Correct test descriptors in libm-test-pown.inc
While working on implementing compoundn, I noticed that
libm-test-pown.inc was wrongly using TEST_ff_f and AUTO_TESTS_ff_f
when the actual types involved meant fL_f should be used instead of
ff_f; fix to use the correct descriptor strings for pown. (These
strings affect how gen-libm-test.py generates a C file in some cases.
The structure type test_fL_f_data for expected results and the use of
RUN_TEST_LOOP_fL_f in the ALL_RM_TEST call were already correct.)
Tested for x86_64. The generated libm-test-pown.c was actually
unchanged, but the old descriptor strings were still logically
incorrect.
Inline tcache_try_malloc into calloc since it is the only caller. Also fix
usize2tidx and use it in __libc_malloc, __libc_calloc and _mid_memalign.
The result is simpler, cleaner code.
nptl: Fix pthread_getattr_np when modules with execstack are allowed (BZ 32897)
The BZ 32653 fix (12a497c716f0a06be5946cabb8c3ec22a079771e) kept the
stack pointer zeroing from make_main_stack_executable on
_dl_make_stack_executable. However, previously the 'stack_endp'
pointed to temporary variable created before the call of
_dl_map_object_from_fd; while now we use the __libc_stack_end
directly.
Since pthread_getattr_np relies on correct __libc_stack_end, if
_dl_make_stack_executable is called (for instance, when
glibc.rtld.execstack=2 is set) __libc_stack_end will be set to zero,
and the call will always fail.
The __libc_stack_end zero was used a mitigation hardening, but since 52a01100ad011293197637e42b5be1a479a2f4ae it is used solely on
pthread_getattr_np code. So there is no point in zeroing anymore.
Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Sam James <sam@gentoo.org>
Julian Zhu [Fri, 8 Nov 2024 13:41:43 +0000 (21:41 +0800)]
RISC-V: Use builtin for ffs and ffsll while supported extension available
Hardware ctz instructions are available in the RISC-V Zbb and XTheadBb extension. With special `-march` flags defined, we can generate more simplified code compared to the generic implementation of `ffs`/`ffsll`.
The __printf_fp_buffer_1 issues count_leading_zeros with 0 argument,
which might leads to call __builtin_ctz depending of the ABI.
Replace with stdbit.h function instead.
Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
benchtest: Correct shell script related to bench-malloc-thread
This patch changes the shell script that selects which arguments are used
for the execution of bench-malloc-thread.
The problem seems to have been introduced in commit:
With current condition, the following error "/bin/sh: 3: [[: not found"
occurs when executing `make bench BENCHSET="malloc-thread"` and the else
path is taken, using incorrect arguments for bench test execution.
H. Peter Anvin [Fri, 25 Apr 2025 05:30:59 +0000 (07:30 +0200)]
linux/termio: remove <termio.h> and struct termio
The <termio.h> interface is absolutely ancient: it was obsoleted by
<termios.h> already in the first version of POSIX (1988) and thus
predates the very first version of Linux. Unfortunately, some constant
macros are used both by <termio.h> and <termios.h>; particularly
problematic is the baud rate constants since the termio interface
*requires* that the baud rate is set via an enumeration as part of
c_cflag.
In preparation of revamping the termios interface to support the
arbitrary baud rate capability that the Linux kernel has supported
since 2008, remove <termio.h> in the hope that no one still uses this
archaic interface.
Note that there is no actual code in glibc to support termio: it is
purely an unabstracted ioctl() interface.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
elf: tst-audit10: split AVX512F code into dedicated functions [BZ #32882]
"Recent" GCC versions (since commit fc62716fe8d1, backported to stable
branches) emit a vzeroupper instruction at the end of functions
containing AVX instructions. This causes the tst-audit10 test to fail
on CPUs lacking AVX instructions, despite the AVX512F check. The crash
occurs in the pltenter function of tst-auditmod10b.c.
Fix that by moving the code guarded by the check_avx512 function into
specific functions using the target ("avx512f") attribute. Note that
since commit 5359c3bc91cc ("x86-64: Remove compiler -mavx512f check") it
is safe to assume that the compiler has AVX512F support, thus the
__AVX512F__ checks can be dropped.
Joseph Myers [Tue, 22 Apr 2025 17:00:34 +0000 (17:00 +0000)]
Add AT_* constants from Linux 6.12
Linux 6.12 adds AT_RENAME_* aliases for RENAME_* flags for renameat2,
and also AT_HANDLE_MNT_ID_UNIQUE. Add the first set of aliases to
stdio.h alongside the RENAME_* names, and AT_HANDLE_MNT_ID_UNIQUE to
bits/fcntl-linux.h.
Samuel Thibault [Mon, 21 Apr 2025 20:21:17 +0000 (22:21 +0200)]
hurd: Make symlink return EEXIST on existing target directory
8ef17919509e ("hurd: Fix EINVAL error on linking to a slash-trailing path
[BZ #32569]) made symlink return ENOTDIR, but the gnulib testsuite does
not recognize it for such a situation, and EEXIST is indeed more
comprehensible to users.
Samuel Thibault [Mon, 21 Apr 2025 17:42:27 +0000 (19:42 +0200)]
hurd: Do not restore xstate when it is not initialized
If the process has never used fp before getting a signal, xstate is set
(and thus the x87 state is not initialized) but xstate->initialized is still
0, and we should not restore anything.
Luca Dariz [Wed, 19 Mar 2025 17:11:18 +0000 (18:11 +0100)]
hurd: save xstate during signal handling
* hurd/Makefile: add new tests
* hurd/test-sig-rpc-interrupted.c: check xstate save and restore in
the case where a signal is delivered to a thread which is waiting
for an rpc. This test implements the rpc interruption protocol used
by the hurd servers. It was so far passing on Debian thanks to the
local-intr-msg-clobber.diff patch, which is now obsolete.
* hurd/test-sig-xstate.c: check xstate save and restore in the case
where a signal is delivered to a running thread, making sure that
the xstate is modified in the signal handler.
* hurd/test-xstate.h: add helpers to test xstate
* sysdeps/mach/hurd/i386/bits/sigcontext.h: add xstate to the
sigcontext structure.
+ sysdeps/mach/hurd/i386/sigreturn.c: restore xstate from the saved
context
* sysdeps/mach/hurd/x86/trampoline.c: save xstate if
supported. Otherwise we fall back to the previous behaviour of
ignoring xstate.
* sysdeps/mach/hurd/x86_64/bits/sigcontext.h: add xstate to the
sigcontext structure.
* sysdeps/mach/hurd/x86_64/sigreturn.c: restore xstate from the saved
context
Signed-off-by: Luca Dariz <luca@orpolo.org> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-ID: <20250319171118.142163-1-luca@orpolo.org>
This patch moves any calls of tcache_init away after tcache hot paths.
Since there is no reason to initialize tcaches in the hot path and since
we need to be able to check tcache != NULL in any case, because of
tcache_thread_shutdown function, moving tcache_init away from hot path
can only be beneficial.
The patch also removes the initialization of tcaches within the
__libc_free call. It only makes sense to initialize tcaches for the
thread after it calls one of the allocation functions. Also the patch
removes the save/restore of errno from tcache_init code, as it is no
longer needed.
Andrew Pinski [Fri, 21 Feb 2025 23:13:53 +0000 (15:13 -0800)]
aarch64: Add back non-temporal load/stores from oryon-1's memset
I misunderstood the recommendation from the hardware team about non-temporal
load/stores. It is still recommended to use them in memset for large sizes. It
was not recommended for their use with device memory and memset is already
not valid to be used with device memory.
This reverts commit e6590f0c86632c36c9a784cf96075f4be2e920d2. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Andrew Pinski [Fri, 21 Feb 2025 23:10:18 +0000 (15:10 -0800)]
aarch64: Add back non-temporal load/stores from oryon-1's memcpy
I misunderstood the recommendation from the hardware team about non-temporal
load/stores. It is still recommended to use them in memcpy for large sizes. It
was not recommended for their use with device memory and memcpy is already
not valid to be use with device memory.
This reverts commit eb5eeb47403e0a91de834868e501b4d62b8d2cb9. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Wilco Dijkstra [Mon, 31 Mar 2025 12:19:06 +0000 (12:19 +0000)]
malloc: Use tailcalls in __libc_free
Use tailcalls to avoid the overhead of a frame on the free fastpath.
Move tcache initialization to _int_free_chunk(). Add malloc_printerr_tail()
which can be tailcalled without forcing a frame like no-return functions.
Change tcache_double_free_verify() to retry via __libc_free() after clearing
the key.
Wilco Dijkstra [Mon, 31 Mar 2025 11:44:02 +0000 (11:44 +0000)]
malloc: Improve free checks
The checks on size can be merged and use __builtin_add_overflow. Since
tcache only handles small sizes (and rejects sizes < MINSIZE), delay this
check until after tcache.
Wilco Dijkstra [Mon, 24 Mar 2025 18:23:37 +0000 (18:23 +0000)]
malloc: Move mmap code out of __libc_free hotpath
Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it. Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first. Move saving of errno
to cases that actually need it. Remove a safety check that fails on mmap
chunks and a check that mmap chunks cannot be added to tcache.
Performance of bench-malloc-thread improves by 9.2% for 1 thread and
6.9% for 32 threads on Neoverse V2.
Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
manual: Update standardization of getline and getdelim [BZ #32830]
* manual/stdio.texi (Line Input): Document that getline and getdelim
where GNU extensions until standardized in POSIX.1-2008. Add restrict
to function prototypes.
Since one path uses _IO_SYNC and the other _IO_OVERFLOW, the newly added
test cases verifies that `fflush (FILE)` and `fflush (NULL)` are
semantically equivalent from the FILE perspective.
libio: Synthesize ESPIPE error if lseek returns 0 after reading bytes
This is required so that fclose, when trying to seek to the right
position after filling the input buffer, does not fail with EINVAL.
This fclose code path only ignores ESPIPE errors.
Reported by Petr Pisar on
<https://bugzilla.redhat.com/show_bug.cgi?id=2358265>.
As we discussed, this patch just makes C23 include every check that is
performed by C11.
I tested the commit by adding the ISO23 Make and Python variables to be
the same as ISO11. So the only difference was compiling with -DISO23
instead of -DISO11. And changed the temporary directories to instead
use the format f'/tmp/glibc-{self.standard}-{self.header}'. Then I used
a shell script to run 'cmp' on each file in the ISO11 and ISO23
directories for each header to make sure they were the same.
-- 8< --
Make C23 checks include every test that is performed by C11. Done by
running the following command:
find conform -name '*.h-data' | xargs sed -i \
-e 's| !defined ISO11| !defined ISO11 \&\& !defined ISO23|g' \
-e 's| defined ISO11| defined ISO11 \|\| defined ISO23|g' \
-e 's|ifdef ISO11|if defined ISO11 \|\| defined ISO23|g' \
-e 's|ifndef ISO11|if !defined ISO11 \&\& !defined ISO23|g'
This hopefully provides additional information about why the
test failed, in case the fix in commit 62db87ab24f9ca483f97f
("timezone: Fix tst-bz28707 Makefile rule") turns out to be
insufficient.
Jakub Jelinek [Wed, 9 Apr 2025 16:24:11 +0000 (18:24 +0200)]
math: Fix up THREEp96 constant in expf128 [BZ #32411]
As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.
The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.
DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.
elf: Extend glibc.rtld.execstack tunable to force executable stack (BZ 32653)
From the bug report [1], multiple programs still require to dlopen
shared libraries with either missing PT_GNU_STACK or with the executable
bit set. Although, in some cases, it seems to be a hard-craft assembly
source without the required .note.GNU-stack marking (so the static linker
is forced to set the stack executable if the ABI requires it), other
cases seem that the library uses trampolines [2].
Unfortunately, READ_IMPLIES_EXEC is not an option since on some ABIs
(x86_64), the kernel clears the bit, making it unsupported. To avoid
reinstating the broken code that changes stack permission on dlopen
(0ca8785a28), this patch extends the glibc.rtld.execstack tunable to
allow an option to force an executable stack at the program startup.
The tunable is a security issue because it defeats the PT_GNU_STACK
hardening. It has the slight advantage of making it explicit by the
caller, and, as for other tunables, this is disabled for setuid binaries.
A tunable also allows us to eventually remove it, but from previous
experiences, it would require some time.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=32653
[2] https://github.com/conda-forge/ctng-compiler-activation-feedstock/issues/143 Reviewed-by: Sam James <sam@gentoo.org>
stdio-common: In tst-setvbuf2, close helper thread descriptor only if opened
The helper thread may get canceled before the open system
call succeds. Then ThreadData.fd remains zero, and eventually
the xclose call in end_reader_thread fails because descriptor 0
is not open.
Instead, initialize the fd member to -1 (not a valid descriptor)
and close the descriptor only if valid. Do this in a new end_thread
helper routine.
Scan xstate IDs up to the maximum supported xstate ID. Remove the
separate AMX xstate calculation. Instead, exclude the AMX space from
the start of TILECFG to the end of TILEDATA in xsave_state_size.
Completed validation on SKL/SKX/SPR/SDE and compared xsave state size
with "ld.so --list-diagnostics" option, no regression.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
stdlib: Fix qsort memory leak if callback throws (BZ 32058)
If the input buffer exceeds the stack auxiliary buffer, qsort will
malloc a temporary one to call mergesort. Since C++ standard does
allow the callback comparison function to throw [1], the glibc
implementation can potentially leak memory.
The fixes uses a pthread_cleanup_combined_push and
pthread_cleanup_combined_pop, so it can work with and without
exception enables. The qsort code path that calls malloc now
requires some extra setup and a call to __pthread_cleanup_push
anmd __pthread_cleanup_pop (which should be ok since they just
setup some buffer state).
Support older versions of GCC to build glibc 2.42:
1. Need to work around bugs in older versions of GCC.
2. Can't use the new features in newer versions of GCC, which may be
required for new features, like _Float16 which requires GCC 12.1 or
above, in glibc,
The main benefit of supporting older versions of GCC is easier backport
of bug fixes to the older releases of glibc, which can be mitigated by
avoiding incompatible features in newer versions of GCC for critical bug
fixes. Require GCC 12.1 or newer to build. Remove GCC version check for
PowerPC and s390x.
TEST_CC and TEST_CXX can be used to test the glibc build with the older
versions of GCC.
For glibc developers who are using Linux OSes which don't come with GCC
12.1 or newer, they should build and install GCC 12.1 or newer to work
on glibc.
This fixes BZ #32539.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
Collin Funk [Thu, 27 Mar 2025 06:36:30 +0000 (23:36 -0700)]
manual: Document functions adopted by POSIX.1-2024.
Here is a patch updating the documentation to mention GNU and BSD
extensions that were adopted by POSIX.1-2024.
* manual/llio.texi (Memory-mapped I/O): Add that MAP_ANON and
MAP_ANONYMOUS were added by POSIX.1-2024.
* manual/memory.texi (Changing Block Size): Mention that reallocarray
was added by POSIX.1-2024.
* manual/message.texi (Message Translation): Adjust wording to match
standardization.
(Translation with gettext): Mention the gettext family of functions were
added by POSIX.1-2024.
* manual/pattern.texi (Wildcard Matching): Mention that FNM_CASEFOLD was
added by POSIX.1-2024.
* manual/process.texi (Creating a Process): Mention that _Fork and
WCOREDUMP were added by POSIX.1-2024.
* manual/signal.texi (Miscellaneous Signals): Mention that SIGWINCH was
added by POSIX-1.2024.
* manual/startup.texi (Environment Access): Mention that secure_getenv
was added by POSIX.1-2024.
* manual/string.texi (Truncating Strings): Mention that strlcpy,
strlcat, wcslcpy, and wslcat were added by POSIX-1.2024.
(Search Functions): Document that memmem was added by POSIX-1.2024.
* manual/terminal.texi (Allocation): Mention that ptsname_r was added by
POSIX-1.2024.
* manual/threads.texi (Waiting with Explicit Clocks): Move node under
POSIX Threads. Mention pthread_cond_clockwait,
pthread_rwlock_clockrdlock, and pthread_rwlock_clockwrlock were added by
POSIX-1.2024.
(Joining Threads): New node under Non-POSIX Extensions.
aarch64: Fix _dl_tlsdesc_dynamic unwind for pac-ret (BZ 32612)
When libgcc is built with pac-ret, it requires to autenticate the
unwinding frame based on CFI information. The _dl_tlsdesc_dynamic
uses a custom calling convention, where it is responsible to save
and restore all registers it might use (even volatile).
The pac-ret support added by 1be3d6eb823d8b952fa54b7bbc90cbecb8981380
was added only on the slow-path, but the fast path also adds DWARF
Register Rule Instruction (cfi_adjust_cfa_offset) since it requires
to save/restore some auxiliary register. It seems that this is not
fully supported neither by libgcc nor AArch64 ABI [1].
Instead, move paciasp/autiasp to function prologue/epilogue to be
used on both fast and slow paths.
I also corrected the _dl_tlsdesc_dynamic comment description, it was
copied from i386 implementation without any adjustment.
Checked on aarch64-linux-gnu with a toolchain built with
--enable-standard-branch-protection on a system with pac-ret
support.
Florian Weimer [Fri, 28 Mar 2025 08:26:59 +0000 (09:26 +0100)]
x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)
Previously, the initialization code reused the xsave_state_full_size
member of struct cpu_features for the TLSDESC state size. However,
the tunable processing code assumes that this member has the
original XSAVE (non-compact) state size, so that it can use its
value if XSAVEC is disabled via tunable.
This change uses a separate variable and not a struct member because
the value is only needed in ld.so and the static libc, but not in
libc.so. As a result, struct cpu_features layout does not change,
helping a future backport of this change.