The new float and double implementation does not required an
extra function call and error handling uses math_err function,
which results in better performance on i386 as well.
With gcc-14 on AMD AMD Ryzen 9 5900X, master shows:
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
The subnormal exponent calculation invokes UB by left shifting the
signed exponent to find the first leading bit. The implementation
also uses 32 bits operations, which generates suboptimal code in
64 bits architectures.
The patch reimplements ilogb using the math_config.h macros and
uses the new stdbit function to simplify the subnormal handling.
Arjun Shankar [Mon, 2 Jun 2025 08:41:02 +0000 (10:41 +0200)]
manual: Correct return value description of 'clock_nanosleep'
Commit 1a3d8f2201d4d613401ce5be9a283f4f28c43093 incorrectly described
'clock_nanosleep' as having the same return values as 'nanosleep'. Fix
this, clarifying that 'clock_nanosleep' returns a positive error number
upon failure instead of setting 'errno'. Also clarify that 'nanosleep'
returns '-1' upon error.
Fixes: 1a3d8f2201d4d613401ce5be9a283f4f28c43093 Reported-by: Mark Harris <mark.hsj@gmail.com> Reviewed-by: Mark Harris <mark.hsj@gmail.com>
Arjun Shankar [Fri, 30 May 2025 00:09:50 +0000 (02:09 +0200)]
manual: Document clock_nanosleep
Make minor clarifications in the documentation for 'nanosleep' and add
an entry for 'clock_nanosleep' as a generalized variant of the former
function that allows clock selection. Reviewed-by: Maciej W. Rozycki <macro@redhat.com>
manual: Fix invalid 'illegal' usage with 'nanosleep'
The GNU Coding Standards demand that 'illegal' only be used to refer to
activities prohibited by law. Replace it with 'invalid' accordingly in
the description of the EINVAL error condition for 'nanosleep'.
Add missing EAFNOSUPPORT, ESOCKTNOSUPPORT, EPROTOTYPE, EINVAL, EPERM,
and ENOMEM error codes, and adjust existing descriptions accordingly.
On Linux either ENOBUFS or ENOMEM is returned in the case of a memory
allocation failure, depending on the namespace requested, e.g. AF_INET
returns ENOMEM while AF_INET6 returns ENOBUFS, so document these codes
as alternatives.
Similarly EPERM is returned rather than EACCES on Linux, so document
these codes as alternatives as well. We might want to convert EPERM to
EACCES for POSIX compliance, but it is beyond the scope of this change,
and software has to expect either anyway, owing to the long-established
practice.
Finally ESOCKTNOSUPPORT is returned rather than EPROTONOSUPPORT for an
unsupported style except for the AF_QIPCRTR namespace where EPROTOTYPE
is used, so document these codes as alternatives too.
stdio-common: Consistently use 'num_digits_len' in 'vfscanf'
Make the only place use 'num_digits_len' enumeration constant where 10
is referred literally for a digit index in i18n handling for decimal
integers. No change in code produced.
Joseph Myers [Thu, 29 May 2025 19:21:46 +0000 (19:21 +0000)]
Update syscall lists for Linux 6.15
Linux 6.15 adds the new syscall open_tree_attr. Update
syscall-names.list and regenerate the arch-syscall.h headers with
build-many-glibcs.py update-syscalls.
Wilco Dijkstra [Thu, 29 May 2025 15:08:15 +0000 (15:08 +0000)]
AArch64: Improve enabling of SVE for libmvec
When using a -mcpu option in CFLAGS, GCC can report errors when building libmvec.
Fix this by overriding both -mcpu and -march with a generic variant with SVE added.
Also use a tune for a modern SVE core.
Andreas Schwab [Mon, 16 Oct 2023 11:13:17 +0000 (13:13 +0200)]
Update RISC-V relocations
Update the list of RISC-V relocations from the ELF psABI as of June 2024.
It removes binutils-internal only relocations that were never part of
actual object files. The GNU_VTINHERIT and GNU_VTENTRY relocations were
never used because the corresponding GCC option -fvtable-gc was never
supported on RISC-V.
Wilco Dijkstra [Tue, 27 May 2025 13:32:45 +0000 (13:32 +0000)]
malloc: Fix malloc init order
__ptmalloc_init was called too early in __libc_early_init: it uses
__libc_initial which is not set yet. Fix this by moving initialization
to the end of __libc_early_init.
Stefan Liebler [Wed, 14 May 2025 12:26:36 +0000 (14:26 +0200)]
S390: Use cfi_val_offset instead of cfi_escape. 31bit part
Due to raising the minimum binutils version to version >=2.28,
the used cfi_escape for cfi_val_offset can now be ommitted.
The commit 0fc76d876261ee8253fef198ffec48c832edd4ff
has already adjusted it for the 64bit part of mcount.
This patch also adjusts it for the 31bit part of mcount.
Checked with "objdump -WF" / "objdump -Wf" that the previous
cfi_escape and the new cfi_val_offset are equal.
Mark Wielaard [Wed, 14 May 2025 21:11:15 +0000 (23:11 +0200)]
INSTALL: Regenerate with texinfo 7.2
This fixes make dist on systems with the latest texinfo installed.
GNU texinfo 7.2 changes @xrefs in proper plain text sentences instead
of pseudo info references.
Tested-By: Collin Funk <collin.funk1@gmail.com> Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
Florian Weimer [Thu, 22 May 2025 12:36:37 +0000 (14:36 +0200)]
Fix error reporting (false negatives) in SGID tests
And simplify the interface of support_capture_subprogram_self_sgid.
Use the existing framework for temporary directories (now with
mode 0700) and directory/file deletion. Handle all execution
errors within support_capture_subprogram_self_sgid. In particular,
this includes test failures because the invoked program did not
exit with exit status zero. Existing tests that expect exit
status 42 are adjusted to use zero instead.
In addition, fix callers not to call exit (0) with test failures
pending (which may mask them, especially when running with --direct).
Carlos O'Donell [Tue, 20 May 2025 11:45:16 +0000 (07:45 -0400)]
posix: Use more inclusive language in test data.
Remove Changelog entries that use 'blacklist' or 'master' in the
test data. The test data still contains enough accented characters
to serve the purposes of the posix/tst-regex.c test. Reviewed-by: Florian Weimer <fweimer@redhat.com>
Wilco Dijkstra [Wed, 14 May 2025 11:38:19 +0000 (11:38 +0000)]
AArch64: Cleanup SVE config and defines
Now we finally support modern GCC and binutils, it's time for a cleanup.
Remove HAVE_AARCH64_SVE_ASM define and conditional compilation. Remove SVE
configure checks for SVE, ACLE and variant-PCS support.
Wilco Dijkstra [Wed, 14 May 2025 16:32:31 +0000 (16:32 +0000)]
AArch64: Cleanup PAC and BTI
Now we finally support modern GCC and binutils, it's time for a cleanup.
Use PAC and BTI instructions unconditionally and use proper assembler syntax.
Remove the PR target/94791 strip_pac workarounds for buggy GCCs. Remove the
PAC/BTI configure checks - always emit GNU property notes on assembly files.
Change cfi_window_save to the correct cfi_negate_ra_state unwind directive.
Reviewed-by: Matthieu Longo <matthieu.longo@arm.com>
Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)]
Remove <libc-tsd.h>
Use __thread variables directly instead. The macros do not save any
typing. It seems unlikely that a future port will lack __thread
variable support.
Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names. Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.
Florian Weimer [Fri, 16 May 2025 14:47:02 +0000 (16:47 +0200)]
manual: Clarifications for listing directories
Support for seeking is limited. Using the d_off and d_reclen members
of struct dirent is discouraged, especially with readdir. Concurrent
modification of directories during iteration may result in duplicate
or missing etnries.
Existing benchtests for malloc infrastructure seem to be rather generic
to test global malloc implementation performance. This new benchtest
focus on reducing any non tcache related side effects, allowing to more
realistically predict performance impacts of tcache code changes.
The test was inpired in bench-[cm]alloc-thread code, with severe
simplifications:
- forces single thread execution, reducing concurrency side-effects,
like cache incoherence penalties due simultaneous writes to the same
cache pages;
- Focus on allocating and deallocating a single size for all the
duration of the benchmark. Since all it does is allocate and
deallocate, it will measure the tcache hotpath without any
side-effects.
- Allows to specify the allocation size as input argument.
Joseph Myers [Wed, 14 May 2025 10:51:46 +0000 (10:51 +0000)]
Implement C23 rootn.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the rootn functions, which compute the Yth root of X for
integer Y (with a domain error if Y is 0, even if X is a NaN). The
integer exponent has type long long int in C23; it was intmax_t in TS
18661-4, and as with other interfaces changed after their initial
appearance in the TS, I don't think we need to support the original
version of the interface.
As with pown and compoundn, I strongly encourage searching for worst
cases for ulps error for these implementations (necessarily
non-exhaustively, given the size of the input space). I also expect a
custom implementation for a given format could be much faster as well
as more accurate, although the implementation is simpler than those
for pown and compoundn.
This completes adding to glibc those TS 18661-4 functions (ignoring
DFP) that are included in C23. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118592 regarding the C23
mathematical functions (not just the TS 18661-4 ones) missing built-in
functions in GCC, where such functions might usefully be added.
Tested for x86_64 and x86, and with build-many-glibcs.py.
Wilco Dijkstra [Fri, 9 May 2025 16:02:14 +0000 (16:02 +0000)]
malloc: Improve performance of __libc_calloc
Improve performance of __libc_calloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_calloc doesn't need
to setup a frame.
On Neoverse V2, bench-calloc-simple improves by 5.0% overall.
Bench-calloc-thread 1 improves by 24%.
Stefan Liebler [Tue, 13 May 2025 11:28:56 +0000 (13:28 +0200)]
powerpc64le: Remove configure check for objcopy >= 2.26.
Due to raising the minimum binutils version to >= 2.26, the configure
check for testing support of --update-section is not needed anymore. Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
The current minimum bintuils version 2.25 was released end of 2014. This patch
now raises the minimum binutils version to 2.39 which was also released in 2022.
The hint for ARC is not needed anymore.
In sysdeps/[alpha|hppa|csky]/configure.ac, PIE is unsupported with this comment:
PIE builds fail on binutils 2.37 and earlier, see:
https://sourceware.org/bugzilla/show_bug.cgi?id=28672
This patch keeps PIE unsupported and let the machine maintainers test and
enable it later.
In sysdeps/arm/configure.ac, there is a check whether TPOFF relocs with addends
are assembled correctly, which is known to be broken in binutils 2.24 and 2.25.
See: https://sourceware.org/bugzilla/show_bug.cgi?id=18383
This patch keeps the check as is and let the machine maintainers check if it
still required.
According to Florian Weimer:
Having at least binutils 2.38 will allow us to assume that this linker
bug is fixed:
Bug 28743 - -z relro creats holes in the process image on GNU/Linux
<https://sourceware.org/bugzilla/show_bug.cgi?id=28743> Reviewed-by: Florian Weimer <fweimer@redhat.com>
Yury Khrustalev [Thu, 8 May 2025 12:53:38 +0000 (13:53 +0100)]
aarch64: fix unwinding in longjmp
Previously, longjmp() on aarch64 was using CFI directives around the
call to __libc_arm_za_disable() after CFA was redefined at the start
of longjmp(). This may result in unwinding issues. Move the call and
surrounding CFI directives to the beginning of longjmp().
Wilco Dijkstra [Thu, 1 May 2025 19:58:38 +0000 (19:58 +0000)]
malloc: Improve malloc initialization
Move malloc initialization to __libc_early_init. Use a hidden __ptmalloc_init
for initialization and a weak call to avoid pulling in the system malloc in a
static binary. All previous initialization checks can now be removed.
Joseph Myers [Mon, 12 May 2025 14:56:07 +0000 (14:56 +0000)]
Document all CLOCK_* values
The manual documents CLOCK_REALTIME and CLOCK_MONOTONIC but not other
CLOCK_* values. Add documentation of the POSIX clocks
CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, along with a
reference to the Linux man pages for the semantics of the
Linux-specific clocks supported (as with some other functionality
coming direct from the Linux kernel where the man pages can be
considered the main documentation).
Note: CLOCK_MONOTONIC_RAW, CLOCK_REALTIME_COARSE and
CLOCK_MONOTONIC_COARSE are also defined in the toplevel bits/time.h,
as used for Hurd. Nevertheless, I see no sign that the Hurd code in
glibc actually has any support for those clocks, so I think it is
correct to document them as Linux-specific (and to refer only to the
Linux man pages for their semantics).
David Lau [Mon, 12 May 2025 11:42:17 +0000 (11:42 +0000)]
malloc: Improved double free detection in the tcache
The previous double free detection did not account for an attacker to
use a terminating null byte overflowing from the previous
chunk to change the size of a memory chunk is being sorted into.
So that the check in 'tcache_double_free_verify' would pass
even though it is a double free.
Solution:
Let 'tcache_double_free_verify' iterate over all tcache entries to
detect double frees.
This patch only protects from buffer overflows by one byte.
But I would argue that off by one errors are the most common
errors to be made.
Alternatives Considered:
Store the size of a memory chunk in big endian and thus
the chunk size would not get overwritten because entries in the
tcache are not that big.
Move the tcache_key before the actual memory chunk so that it
does not have to be checked at all, this would work better in general
but also it would increase the memory usage.
Signed-off-by: David Lau <david.lau@fau.de> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Joseph Myers [Fri, 9 May 2025 15:17:27 +0000 (15:17 +0000)]
Implement C23 compoundn
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1). The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.
Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).
As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space). I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.
As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).
Tested for x86_64 and x86, and with build-many-glibcs.py.
Collin Funk [Mon, 5 May 2025 02:31:34 +0000 (19:31 -0700)]
nss: remove undefined behavior and optimize getaddrinfo
On x86-64 and compiling with -O2 using stdc_leading_zeros compiles to
the bsr instruction. The fls function removed by this patch is inlined
but still loops while checking each bit individually.
* nss/getaddrinfo.c: Include <stdbit.h>.
(fls): Remove function. This function contains a left shift of 31 on an
'int' which is undefined.
(rfc3484_sort): Use stdc_leading_zeros instead of fls.
These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.
DJ Delorie [Sat, 3 May 2025 00:51:18 +0000 (20:51 -0400)]
manual: add more pthread functions
Add stubs and partial docs for many undocumented pthreads functions.
While neither exhaustive nor complete, gives minimal usage docs
for many functions and expands the pthreads chapters, making it
easier to continue improving this section in the future.
Stefan Liebler [Tue, 29 Apr 2025 11:28:58 +0000 (13:28 +0200)]
S390: Add new s390 platform z17.
The glibc-hwcaps subdirectories are extended by "z17". Libraries are loaded if
the z17 facility bits are active:
- Miscellaneous-instruction-extensions facility 4
- Vector-enhancements-facility 3
- Vector-Packed-Decimal-Enhancement Facility 3
- CPU: Concurrent-Functions Facility
tst-glibc-hwcaps.c is extended in order to test z17 via new marker6.
In case of running on a z17 with a kernel not recognizing z17 yet,
AT_PLATFORM will be z900 but vector-bit in AT_HWCAP is set. This situation
is now recognized and this testcase does not fail.
A fatal glibc error is dumped if glibc was build with architecture
level set for z17, but run on an older machine (See dl-hwcap-check.h).
Note, you might get an SIGILL before this check if you don't use:
configure --with-rtld-early-cflags=-march=<older-machine>
ld.so --list-diagnostics now also dumps information about s390.cpu_features.
Independent from z17, the s390x kernel won't introduce new HWCAP-Bits if there
is no special handling needed in kernel itself. For z17, we don't have new
HWCAP flags, but have to check the facility bits retrieved by
stfle-instruction.
Instead of storing all the stfle-bits (currently four 64bit values) in the
cpu_features struct, we now only store those bits, which are needed within
glibc itself. Note that we have this list twice, one with original values and
the other one which can be filtered with GLIBC_TUNABLES=glibc.cpu.hwcaps.
Those new fields are stored in so far reserved space in cpu_features struct.
Thus processes started in between the update of glibc package and we e.g. have
a new ld.so and an old libc.so, won't crash. The glibc internal ifunc-resolvers
would not select the best optimized variant.
The users of stfle-bits are also updated:
- parsing of GLIBC_TUNABLES=glibc.cpu.hwcaps
- glibc internal ifunc-resolvers
- __libc_ifunc_impl_list
- sysconf
Joseph Myers [Thu, 1 May 2025 22:28:59 +0000 (22:28 +0000)]
Correct test descriptors in libm-test-pown.inc
While working on implementing compoundn, I noticed that
libm-test-pown.inc was wrongly using TEST_ff_f and AUTO_TESTS_ff_f
when the actual types involved meant fL_f should be used instead of
ff_f; fix to use the correct descriptor strings for pown. (These
strings affect how gen-libm-test.py generates a C file in some cases.
The structure type test_fL_f_data for expected results and the use of
RUN_TEST_LOOP_fL_f in the ALL_RM_TEST call were already correct.)
Tested for x86_64. The generated libm-test-pown.c was actually
unchanged, but the old descriptor strings were still logically
incorrect.
Inline tcache_try_malloc into calloc since it is the only caller. Also fix
usize2tidx and use it in __libc_malloc, __libc_calloc and _mid_memalign.
The result is simpler, cleaner code.