git.ipfire.org Git - thirdparty/glibc.git/log

Optimize __libc_tsd_* thread variable access

These variables are not exported, and libc.so TLS is initial-exec
anyway. Declare these variables as hidden and use the initial-exec
TLS model.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>

Remove <libc-tsd.h>

Use __thread variables directly instead.  The macros do not save any
typing.  It seems unlikely that a future port will lack __thread
variable support.

Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names.  Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>

manual: add sched_getcpu()

Reviewed-by: Collin Funk <collin.funk1@gmail.com>

manual: Clarifications for listing directories

Support for seeking is limited. Using the d_off and d_reclen members
of struct dirent is discouraged, especially with readdir. Concurrent
modification of directories during iteration may result in duplicate
or missing etnries.

manual: add remaining CPU_* macros

Adds remaining CPU_* macros, including the CPU_*_S macros
for dynamic-sized cpu sets.

Reviewed-by: Collin Funk <collin.funk1@gmail.com>

powerpc: Remove check for -mabi=ibmlongdouble

The -mabi=ibmlongdouble option has been added in gcc 4.2, thus can be
assumed to always exist.

aarch64: update tests for SME

Add test that checks that ZA state is disabled after setjmp and sigsetjmp
Update existing SME test that uses setjmp

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

aarch64: Disable ZA state of SME in setjmp and sigsetjmp

Due to the nature of the ZA state, setjmp() should clear it in the
same manner as it is already done by longjmp.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

benchtest: malloc tcache hotpath benchtest

Existing benchtests for malloc infrastructure seem to be rather generic
to test global malloc implementation performance.  This new benchtest
focus on reducing any non tcache related side effects, allowing to more
realistically predict performance impacts of tcache code changes.
The test was inpired in bench-[cm]alloc-thread code, with severe
simplifications:
- forces single thread execution, reducing concurrency side-effects,
   like cache incoherence penalties due simultaneous writes to the same
   cache pages;
- Focus on allocating and deallocating a single size for all the
   duration of the benchmark. Since all it does is allocate and
   deallocate, it will measure the tcache hotpath without any
   side-effects.
- Allows to specify the allocation size as input argument.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

Implement C23 rootn.

C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the rootn functions, which compute the Yth root of X for
integer Y (with a domain error if Y is 0, even if X is a NaN).  The
integer exponent has type long long int in C23; it was intmax_t in TS
18661-4, and as with other interfaces changed after their initial
appearance in the TS, I don't think we need to support the original
version of the interface.

As with pown and compoundn, I strongly encourage searching for worst
cases for ulps error for these implementations (necessarily
non-exhaustively, given the size of the input space).  I also expect a
custom implementation for a given format could be much faster as well
as more accurate, although the implementation is simpler than those
for pown and compoundn.

This completes adding to glibc those TS 18661-4 functions (ignoring
DFP) that are included in C23.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118592 regarding the C23
mathematical functions (not just the TS 18661-4 ones) missing built-in
functions in GCC, where such functions might usefully be added.

Tested for x86_64 and x86, and with build-many-glibcs.py.

malloc: Improve performance of __libc_calloc

Improve performance of __libc_calloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_calloc doesn't need
to setup a frame.

On Neoverse V2, bench-calloc-simple improves by 5.0% overall.
Bench-calloc-thread 1 improves by 24%.

Reviewed-by: DJ Delorie <dj@redhat.com>

S390: Use cfi_val_offset instead of cfi_escape.

Due to raising the minimum binutils version to version >=2.28,
the used cfi_escape for cfi_val_offset can now be ommitted.

Checked with "objdump -WF" / "objdump -Wf" that the previous
cfi_escape and the new cfi_val_offset are equal.

powerpc64le: Remove configure check for objcopy >= 2.26.

Due to raising the minimum binutils version to >= 2.26, the configure
check for testing support of --update-section is not needed anymore.
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

Raise the minimum binutils version to 2.39

The recent commit 27b96e069aad17cefea9437542180bff448ac3a0 raises the minimum
GCC version to 12.1 which was released in 2022.

The current minimum bintuils version 2.25 was released end of 2014. This patch
now raises the minimum binutils version to 2.39 which was also released in 2022.

The hint for ARC is not needed anymore.

In sysdeps/[alpha|hppa|csky]/configure.ac, PIE is unsupported with this comment:
PIE builds fail on binutils 2.37 and earlier, see:
https://sourceware.org/bugzilla/show_bug.cgi?id=28672
This patch keeps PIE unsupported and let the machine maintainers test and
enable it later.

In sysdeps/arm/configure.ac, there is a check whether TPOFF relocs with addends
are assembled correctly, which is known to be broken in binutils 2.24 and 2.25.
See: https://sourceware.org/bugzilla/show_bug.cgi?id=18383
This patch keeps the check as is and let the machine maintainers check if it
still required.

According to Florian Weimer:
Having at least binutils 2.38 will allow us to assume that this linker
bug is fixed:
Bug 28743 - -z relro creats holes in the process image on GNU/Linux
<https://sourceware.org/bugzilla/show_bug.cgi?id=28743>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

added benchtest inputs for log2l

added benchtest inputs for expl

aarch64: fix unwinding in longjmp

Previously, longjmp() on aarch64 was using CFI directives around the
call to __libc_arm_za_disable() after CFA was redefined at the start
of longjmp(). This may result in unwinding issues. Move the call and
surrounding CFI directives to the beginning of longjmp().

Suggested-by: Wilco Dijkstra <wilco.dijkstra@arm.com>

added benchtest inputs for powl

changes in v2:
* fixed the missing Makefile entry in the first version
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

added benchtest inputs for fmal

manual: fix typo for sched_[sg]etattr

Originally added in 41a90f3f5f which says it's adding sched_getattr
and sched_setattr.

Reviewed-by: Collin Funk <collin.funk1@gmail.com>

malloc: Improve malloc initialization

Move malloc initialization to __libc_early_init. Use a hidden __ptmalloc_init
for initialization and a weak call to avoid pulling in the system malloc in a
static binary. All previous initialization checks can now be removed.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Document all CLOCK_* values

The manual documents CLOCK_REALTIME and CLOCK_MONOTONIC but not other
CLOCK_* values. Add documentation of the POSIX clocks
CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, along with a
reference to the Linux man pages for the semantics of the
Linux-specific clocks supported (as with some other functionality
coming direct from the Linux kernel where the man pages can be
considered the main documentation).

Note: CLOCK_MONOTONIC_RAW, CLOCK_REALTIME_COARSE and
CLOCK_MONOTONIC_COARSE are also defined in the toplevel bits/time.h,
as used for Hurd. Nevertheless, I see no sign that the Hurd code in
glibc actually has any support for those clocks, so I think it is
correct to document them as Linux-specific (and to refer only to the
Linux man pages for their semantics).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

malloc: Improved double free detection in the tcache

The previous double free detection did not account for an attacker to
use a terminating null byte overflowing from the previous
chunk to change the size of a memory chunk is being sorted into.
So that the check in 'tcache_double_free_verify' would pass
even though it is a double free.

Solution:
Let 'tcache_double_free_verify' iterate over all tcache entries to
detect double frees.

This patch only protects from buffer overflows by one byte.
But I would argue that off by one errors are the most common
errors to be made.

Alternatives Considered:
  Store the size of a memory chunk in big endian and thus
  the chunk size would not get overwritten because entries in the
  tcache are not that big.

  Move the tcache_key before the actual memory chunk so that it
  does not have to be checked at all, this would work better in general
  but also it would increase the memory usage.

Signed-off-by: David Lau <david.lau@fau.de>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

Correct spelling mistake in test file

There are some spelling mistakes in the test file. Fix them

Reviewed-by: guoce <guoce@kylinos.cn>
Signed-off-by: panzhe0328 <panzhe@kylinos.cn>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

hurd: Make rename refuse trailing slashes [BZ #32570]

As tested by Gnulib's renameatu module.

Reported by Collin Funk on
https://sourceware.org/bugzilla/show_bug.cgi?id=32570

Implement C23 compoundn

C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1).  The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.

Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).

As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space).  I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.

As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).

Tested for x86_64 and x86, and with build-many-glibcs.py.

hurd: Fix tst-stack2 test build on Hurd

It requires $(shared-thread-library). Fixes 0c342594237.

Checked on a i686-gnu build.

nss: remove undefined behavior and optimize getaddrinfo

On x86-64 and compiling with -O2 using stdc_leading_zeros compiles to
the bsr instruction. The fls function removed by this patch is inlined
but still loops while checking each bit individually.

* nss/getaddrinfo.c: Include <stdbit.h>.
(fls): Remove function. This function contains a left shift of 31 on an
'int' which is undefined.
(rfc3484_sort): Use stdc_leading_zeros instead of fls.

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

powerpc: Remove POWER7 strncasecmp optimization

These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.

[1] https://www.gnu.org/software/gnulib/manual/gnulib.html#C-strings

Checked with a build for some powerpc variations.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

manual: add more pthread functions

Add stubs and partial docs for many undocumented pthreads functions.
While neither exhaustive nor complete, gives minimal usage docs
for many functions and expands the pthreads chapters, making it
easier to continue improving this section in the future.

Reviewed-by: Collin Funk <collin.funk1@gmail.com>

S390: Add new s390 platform z17.

The glibc-hwcaps subdirectories are extended by "z17".  Libraries are loaded if
the z17 facility bits are active:
- Miscellaneous-instruction-extensions facility 4
- Vector-enhancements-facility 3
- Vector-Packed-Decimal-Enhancement Facility 3
- CPU: Concurrent-Functions Facility

tst-glibc-hwcaps.c is extended in order to test z17 via new marker6.
In case of running on a z17 with a kernel not recognizing z17 yet,
AT_PLATFORM will be z900 but vector-bit in AT_HWCAP is set.  This situation
is now recognized and this testcase does not fail.

A fatal glibc error is dumped if glibc was build with architecture
level set for z17, but run on an older machine (See dl-hwcap-check.h).
Note, you might get an SIGILL before this check if you don't use:
configure --with-rtld-early-cflags=-march=<older-machine>

ld.so --list-diagnostics now also dumps information about s390.cpu_features.

Independent from z17, the s390x kernel won't introduce new HWCAP-Bits if there
is no special handling needed in kernel itself.  For z17, we don't have new
HWCAP flags, but have to check the facility bits retrieved by
stfle-instruction.

Instead of storing all the stfle-bits (currently four 64bit values) in the
cpu_features struct, we now only store those bits, which are needed within
glibc itself.  Note that we have this list twice, one with original values and
the other one which can be filtered with GLIBC_TUNABLES=glibc.cpu.hwcaps.
Those new fields are stored in so far reserved space in cpu_features struct.
Thus processes started in between the update of glibc package and we e.g. have
a new ld.so and an old libc.so, won't crash. The glibc internal ifunc-resolvers
would not select the best optimized variant.

The users of stfle-bits are also updated:
- parsing of GLIBC_TUNABLES=glibc.cpu.hwcaps
- glibc internal ifunc-resolvers
- __libc_ifunc_impl_list
- sysconf

Correct test descriptors in libm-test-pown.inc

While working on implementing compoundn, I noticed that
libm-test-pown.inc was wrongly using TEST_ff_f and AUTO_TESTS_ff_f
when the actual types involved meant fL_f should be used instead of
ff_f; fix to use the correct descriptor strings for pown. (These
strings affect how gen-libm-test.py generates a C file in some cases.
The structure type test_fL_f_data for expected results and the use of
RUN_TEST_LOOP_fL_f in the ALL_RM_TEST call were already correct.)

Tested for x86_64. The generated libm-test-pown.c was actually
unchanged, but the old descriptor strings were still logically
incorrect.

malloc: Inline tcache_try_malloc

Inline tcache_try_malloc into calloc since it is the only caller. Also fix
usize2tidx and use it in __libc_malloc, __libc_calloc and _mid_memalign.
The result is simpler, cleaner code.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Fix UB on sinpif (BZ 32925)

The left shift overflows for 'int', use uint32_t instead. It syncs
with CORE-MATH commit bbfabd99.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on erfcf (BZ 32924)

The left shift overflows for 'int', use uint64_t instead. It syncs
with CORE-MATH commit d0a2be200cbc1344d800d9ef0ebee9ad67dd3ad8.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on cospif (BZ 32923)

The left shift overflows for 'int', use uint32_t instead. It syncs
with CORE-MATH commit bbfabd993a71b049c210b0febfd06d18369fadc1.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on cbrtf (BZ 32922)

The left shift overflows for 'int64_t', use unsigned instead. It syncs
with CORE-MATH commit f7c7408d1749ec2859ea249495af699359ae559b.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on sinhf (BZ 32921)

The left shift overflows for 'int', use uint64_t instead. It syncs
with CORE-MATH commit bbfabd99.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on logf (BZ 32920)

The left shift overflows for 'int', use a literal instead. It syncs
with OPTIMIZED-ROUTINES commit 0f87f607b976820ef41fe64d004fe67dc7af8236.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on coshf (BZ 32919)

The left shift overflows for 'int', use uint64_t instead. It syncs
with CORE-MATH commit 4d6192d2.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix UB on atanhf (BZ 32918)

The left shift overflows for 'int', use unsigned instead. It syncs
with CORE-MATH commit 4d6192d2.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

nptl: Fix pthread_getattr_np when modules with execstack are allowed (BZ 32897)

The BZ 32653 fix (12a497c716f0a06be5946cabb8c3ec22a079771e) kept the
stack pointer zeroing from make_main_stack_executable on
_dl_make_stack_executable. However, previously the 'stack_endp'
pointed to temporary variable created before the call of
_dl_map_object_from_fd; while now we use the __libc_stack_end
directly.

Since pthread_getattr_np relies on correct __libc_stack_end, if
_dl_make_stack_executable is called (for instance, when
glibc.rtld.execstack=2 is set) __libc_stack_end will be set to zero,
and the call will always fail.

The __libc_stack_end zero was used a mitigation hardening, but since
52a01100ad011293197637e42b5be1a479a2f4ae it is used solely on
pthread_getattr_np code. So there is no point in zeroing anymore.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Sam James <sam@gentoo.org>

RISC-V: Use builtin for ffs and ffsll while supported extension available

Hardware ctz instructions are available in the RISC-V Zbb and XTheadBb extension. With special `-march` flags defined, we can generate more simplified code compared to the generic implementation of `ffs`/`ffsll`.

Signed-off-by: Julian Zhu <julian.oerv@isrc.iscas.ac.cn>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

stdio: Remove UB on printf_fp

The __printf_fp_buffer_1 issues count_leading_zeros with 0 argument,
which might leads to call __builtin_ctz depending of the ABI.
Replace with stdbit.h function instead.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>

benchtest: Correct shell script related to bench-malloc-thread

This patch changes the shell script that selects which arguments are used
for the execution of bench-malloc-thread.
The problem seems to have been introduced in commit:

  commit 2d6427a63cad8056ba6bcaaaa8df21977c8dde3d
  Author: Wangyang Guo <wangyang.guo@intel.com>
  Date:   Fri Nov 29 16:05:35 2024 +0800
  benchtests: Add calloc test

With current condition, the following error "/bin/sh: 3: [[: not found"
occurs when executing `make bench BENCHSET="malloc-thread"` and the else
path is taken, using incorrect arguments for bench test execution.

Error is reproducible in Debian based distros.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

linux/termio: remove <termio.h> and struct termio

The <termio.h> interface is absolutely ancient: it was obsoleted by
<termios.h> already in the first version of POSIX (1988) and thus
predates the very first version of Linux. Unfortunately, some constant
macros are used both by <termio.h> and <termios.h>; particularly
problematic is the baud rate constants since the termio interface
*requires* that the baud rate is set via an enumeration as part of
c_cflag.

In preparation of revamping the termios interface to support the
arbitrary baud rate capability that the Linux kernel has supported
since 2008, remove <termio.h> in the hope that no one still uses this
archaic interface.

Note that there is no actual code in glibc to support termio: it is
purely an unabstracted ioctl() interface.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

elf: tst-audit10: split AVX512F code into dedicated functions [BZ #32882]

"Recent" GCC versions (since commit fc62716fe8d1, backported to stable
branches) emit a vzeroupper instruction at the end of functions
containing AVX instructions. This causes the tst-audit10 test to fail
on CPUs lacking AVX instructions, despite the AVX512F check. The crash
occurs in the pltenter function of tst-auditmod10b.c.

Fix that by moving the code guarded by the check_avx512 function into
specific functions using the target ("avx512f") attribute. Note that
since commit 5359c3bc91cc ("x86-64: Remove compiler -mavx512f check") it
is safe to assume that the compiler has AVX512F support, thus the
__AVX512F__ checks can be dropped.

Tested on non-AVX, AVX2 and AVX512F machines.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Add NT_ARM_GCS and NT_RISCV_TAGGED_ADDR_CTRL from Linux 6.13 to elf.h

Linux 6.13 adds new ELF note types NT_ARM_GCS and
NT_RISCV_TAGGED_ADDR_CTRL. Add these to glibc's elf.h.

Tested for x86_64.

Add AT_* constants from Linux 6.12

Linux 6.12 adds AT_RENAME_* aliases for RENAME_* flags for renameat2,
and also AT_HANDLE_MNT_ID_UNIQUE. Add the first set of aliases to
stdio.h alongside the RENAME_* names, and AT_HANDLE_MNT_ID_UNIQUE to
bits/fcntl-linux.h.

Tested for x86_64.

hurd: Make symlink return EEXIST on existing target directory

8ef17919509e ("hurd: Fix EINVAL error on linking to a slash-trailing path
[BZ #32569]) made symlink return ENOTDIR, but the gnulib testsuite does
not recognize it for such a situation, and EEXIST is indeed more
comprehensible to users.

hurd: Clear FP exceptions before calling signal handler

This avoids SIGFPE handlers (or code longjmp-ed to) getting disturbed by the
exception that generated it.

Note: gcc's unwinding depends on the rpc_wait_trampoline/trampoline exact
code, so we here avoid breaking it.

hurd: Do not check for xstate level if it was not initialized

If __thread_get_state failed, there is no xstate level to check.
ok is 0 already and the memory exists, but better not read uninitialized
memory.

hurd: Do not restore xstate when it is not initialized

If the process has never used fp before getting a signal, xstate is set
(and thus the x87 state is not initialized) but xstate->initialized is still
0, and we should not restore anything.

hurd: Make *utime*s catch invalid times [BZ #32802, BZ #32803]

hurd: save xstate during signal handling

* hurd/Makefile: add new tests
* hurd/test-sig-rpc-interrupted.c: check xstate save and restore in
  the case where a signal is delivered to a thread which is waiting
  for an rpc. This test implements the rpc interruption protocol used
  by the hurd servers. It was so far passing on Debian thanks to the
  local-intr-msg-clobber.diff patch, which is now obsolete.
* hurd/test-sig-xstate.c: check xstate save and restore in the case
  where a signal is delivered to a running thread, making sure that
  the xstate is modified in the signal handler.
* hurd/test-xstate.h: add helpers to test xstate
* sysdeps/mach/hurd/i386/bits/sigcontext.h: add xstate to the
  sigcontext structure.
+ sysdeps/mach/hurd/i386/sigreturn.c: restore xstate from the saved
  context
* sysdeps/mach/hurd/x86/trampoline.c: save xstate if
  supported. Otherwise we fall back to the previous behaviour of
  ignoring xstate.
* sysdeps/mach/hurd/x86_64/bits/sigcontext.h: add xstate to the
  sigcontext structure.
* sysdeps/mach/hurd/x86_64/sigreturn.c: restore xstate from the saved
  context

Signed-off-by: Luca Dariz <luca@orpolo.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-ID: <20250319171118.142163-1-luca@orpolo.org>

hurd: Check return value of mach_port_mod_refs() in the dup routine of fcntl()

Message-ID: <20250310084409.24177-1-zhmingluo@163.com>

malloc: move tcache_init out of hot tcache paths

This patch moves any calls of tcache_init away after tcache hot paths.
Since there is no reason to initialize tcaches in the hot path and since
we need to be able to check tcache != NULL in any case, because of
tcache_thread_shutdown function, moving tcache_init away from hot path
can only be beneficial.
The patch also removes the initialization of tcaches within the
__libc_free call. It only makes sense to initialize tcaches for the
thread after it calls one of the allocation functions. Also the patch
removes the save/restore of errno from tcache_init code, as it is no
longer needed.

aarch64: Add back non-temporal load/stores from oryon-1's memset

I misunderstood the recommendation from the hardware team about non-temporal
load/stores. It is still recommended to use them in memset for large sizes. It
was not recommended for their use with device memory and memset is already
not valid to be used with device memory.

This reverts commit e6590f0c86632c36c9a784cf96075f4be2e920d2.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

aarch64: Add back non-temporal load/stores from oryon-1's memcpy

I misunderstood the recommendation from the hardware team about non-temporal
load/stores. It is still recommended to use them in memcpy for large sizes. It
was not recommended for their use with device memory and memcpy is already
not valid to be use with device memory.

This reverts commit eb5eeb47403e0a91de834868e501b4d62b8d2cb9.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Use tailcalls in __libc_free

Use tailcalls to avoid the overhead of a frame on the free fastpath.
Move tcache initialization to _int_free_chunk(). Add malloc_printerr_tail()
which can be tailcalled without forcing a frame like no-return functions.
Change tcache_double_free_verify() to retry via __libc_free() after clearing
the key.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Inline tcache_free

Inline tcache_free since it's only used by __libc_free. Add __glibc_likely
for the tcache checks.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Improve free checks

The checks on size can be merged and use __builtin_add_overflow. Since
tcache only handles small sizes (and rejects sizes < MINSIZE), delay this
check until after tcache.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Inline _int_free_check

Inline _int_free_check since it is only used by __libc_free.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Inline _int_free

Inline _int_free since it is a small function and only really used by
__libc_free.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Move mmap code out of __libc_free hotpath

Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it.  Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first.  Move saving of errno
to cases that actually need it.  Remove a safety check that fails on mmap
chunks and a check that mmap chunks cannot be added to tcache.

Performance of bench-malloc-thread improves by 9.2% for 1 thread and
6.9% for 32 threads on Neoverse V2.

Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

manual/tunables: fix a trivial typo

Fixes: 12a497c716f0 ("elf: Extend glibc.rtld.execstack tunable to force executable stack (BZ 32653)"
Reviewed-by: Florian Weimer <fweimer@redhat.com>

Fix spelling mistake "trucate" -> "truncate"

There is a spelling mistake in a test filename. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Fix spelling mistake "suports" -> "supports"

There are spelling mistakes in assert messages. Fix them.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Fix spelling mistake "succsefully" -> "successfully"

There is a spelling mistake in a puts statement. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

manual: Mention POSIX-1.2024 requires time_t to be 64 bit or wider.

* manual/time.texi (Time Types): Mention POSIX-1.2024 requires 64 bit
time_t.

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

manual: Update standardization of getline and getdelim [BZ #32830]

* manual/stdio.texi (Line Input): Document that getline and getdelim
where GNU extensions until standardized in POSIX.1-2008. Add restrict
to function prototypes.

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

libio: Add test case for fflush

Since one path uses _IO_SYNC and the other _IO_OVERFLOW, the newly added
test cases verifies that `fflush (FILE)` and `fflush (NULL)` are
semantically equivalent from the FILE perspective.

Reviewed-by: Joseph Myers <josmyers@redhat.com>

libio: Synthesize ESPIPE error if lseek returns 0 after reading bytes

This is required so that fclose, when trying to seek to the right
position after filling the input buffer, does not fail with EINVAL.
This fclose code path only ignores ESPIPE errors.

Reported by Petr Pisar on
<https://bugzilla.redhat.com/show_bug.cgi?id=2358265>.

Fixes commit be6818be31e756398e45f70e2819d78be0961223 ("Make fclose
seek input file to right offset (bug 12724)").

Reviewed-by: Frédéric Bérat <fberat@redhat.com>

x86: Detect Intel Diamond Rapids

Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>

x86: Handle unknown Intel processor with default tuning

Enable default tuning for unknown Intel processor.

Tested on x86, no regression.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

conform: Add initial support for C23.

Hi Joseph,

As we discussed, this patch just makes C23 include every check that is
performed by C11.

I tested the commit by adding the ISO23 Make and Python variables to be
the same as ISO11.  So the only difference was compiling with -DISO23
instead of -DISO11.  And changed the temporary directories to instead
use the format f'/tmp/glibc-{self.standard}-{self.header}'.  Then I used
a shell script to run 'cmp' on each file in the ISO11 and ISO23
directories for each header to make sure they were the same.

-- 8< --

Make C23 checks include every test that is performed by C11.  Done by
running the following command:

find conform -name '*.h-data' | xargs sed -i \
  -e 's| !defined ISO11| !defined ISO11 \&\& !defined ISO23|g' \
  -e 's| defined ISO11| defined ISO11 \|\| defined ISO23|g' \
  -e 's|ifdef ISO11|if defined ISO11 \|\| defined ISO23|g' \
  -e 's|ifndef ISO11|if !defined ISO11 \&\& !defined ISO23|g'

Signed-off-by: Collin Funk <collin.funk1@gmail.com>

x86: Add ARL/PTL/CWF model detection support

- Add ARROWLAKE model detection.
- Add PANTHERLAKE model detection.
- Add CLEARWATERFOREST model detection.

Intel® Architecture Instruction Set Extensions Programming Reference
https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2.

No regression, validated model detection on SDE.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

timezone: Enhance tst-bz28707 diagnostics

This hopefully provides additional information about why the
test failed, in case the fix in commit 62db87ab24f9ca483f97f
("timezone: Fix tst-bz28707 Makefile rule") turns out to be
insufficient.

Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>

powerpc: Remove relocation cache flush code for power64

This is only needed for -mno-secure-plt, and this linkage mode
is not supported with powerpc64 and powerp64le.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

math: Fix up THREEp96 constant in expf128 [BZ #32411]

As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.

The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.

DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.

Reviewed-by: Joseph Myers <josmyers@redhat.com>

elf: Extend glibc.rtld.execstack tunable to force executable stack (BZ 32653)

From the bug report [1], multiple programs still require to dlopen
shared libraries with either missing PT_GNU_STACK or with the executable
bit set.  Although, in some cases, it seems to be a hard-craft assembly
source without the required .note.GNU-stack marking (so the static linker
is forced to set the stack executable if the ABI requires it), other
cases seem that the library uses trampolines [2].

Unfortunately, READ_IMPLIES_EXEC is not an option since on some ABIs
(x86_64), the kernel clears the bit, making it unsupported.  To avoid
reinstating the broken code that changes stack permission on dlopen
(0ca8785a28), this patch extends the glibc.rtld.execstack tunable to
allow an option to force an executable stack at the program startup.

The tunable is a security issue because it defeats the PT_GNU_STACK
hardening.  It has the slight advantage of making it explicit by the
caller, and, as for other tunables, this is disabled for setuid binaries.
A tunable also allows us to eventually remove it, but from previous
experiences, it would require some time.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=32653
[2] https://github.com/conda-forge/ctng-compiler-activation-feedstock/issues/143
Reviewed-by: Sam James <sam@gentoo.org>

stdlib: Implement C2Y uabs, ulabs, ullabs and uimaxabs

C2Y adds unsigned versions of the abs functions (see C2Y draft N3467 and
proposal N3349).

Tested for x86_64.

Signed-off-by: Lenard Mollenkopf <glibc@lenardmollenkopf.de>

stdio-common: In tst-setvbuf2, close helper thread descriptor only if opened

The helper thread may get canceled before the open system
call succeds. Then ThreadData.fd remains zero, and eventually
the xclose call in end_reader_thread fails because descriptor 0
is not open.

Instead, initialize the fd member to -1 (not a valid descriptor)
and close the descriptor only if valid. Do this in a new end_thread
helper routine.

Also add more error checking to close operations.

Fixes commit 95b780c1d0549678c0a244c6e2112ec97edf0839 ("stdio: Add
more setvbuf tests").

Remove duplicates from binaries-shared-tests when creating make rules

This avoids a warning from make because binaries-shared-tests contains
both $(tests-container) and $(tests-internal).

x86: Optimize xstate size calculation

Scan xstate IDs up to the maximum supported xstate ID. Remove the
separate AMX xstate calculation. Instead, exclude the AMX space from
the start of TILECFG to the end of TILEDATA in xsave_state_size.

Completed validation on SKL/SKX/SPR/SDE and compared xsave state size
with "ld.so --list-diagnostics" option, no regression.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>

NEWS: update for GCC 12.1 requirement [BZ #32539]

Since 27b96e069aad17cefea9437542180bff448ac3a0, the minimum GCC required
to build glibc is GCC 12.1.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

stdio: fix hurd link for tst-setvbuf2

stdlib: Fix qsort memory leak if callback throws (BZ 32058)

If the input buffer exceeds the stack auxiliary buffer, qsort will
malloc a temporary one to call mergesort. Since C++ standard does
allow the callback comparison function to throw [1], the glibc
implementation can potentially leak memory.

The fixes uses a pthread_cleanup_combined_push and
pthread_cleanup_combined_pop, so it can work with and without
exception enables. The qsort code path that calls malloc now
requires some extra setup and a call to __pthread_cleanup_push
anmd __pthread_cleanup_pop (which should be ok since they just
setup some buffer state).

Checked on x86_64-linux-gnu.

[1] https://timsong-cpp.github.io/cppwp/n4950/alg.c.library#4

Reviewed-by: DJ Delorie <dj@redhat.com>

sysdeps: powerpc: restore -mlong-double-128 check

We mistakenly dropped the check in 27b96e069aad17cefea9437542180bff448ac3a0;
there's some other checks which we *can* drop, but let's worry about that
later.

Fixes the build on ppc64le where GCC is configured with --with-long-double-format=ieee.

Reviewed-by: Andreas Schwab <schwab@suse.de>

stdio: Add more setvbuf tests

add ptmx support to test-container

Update syscall lists for Linux 6.14

Linux 6.14 has no new syscalls. Update the version number in
syscall-names.list to reflect that it is still current for 6.14.

Tested with build-many-glibcs.py.

x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthread

This fixes a test build failure on Hurd.

Fixes commit 145097dff170507fe73190e8e41194f5b5f7e6bf ("x86: Use separate
variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Fix tst-origin build when toolchain defaults to --as-needed (BZ 32823)

Checked on aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>

Raise the minimum GCC version to 12.1 [BZ #32539]

For all Linux distros with glibc 2.40 which I can find, GCC 14.2 is used
to compile glibc 2.40:

OS                    GCC      URL
AOSC                  14.2.0   https://aosc.io/
Arch Linux            14.2.0   https://archlinux.org/
ArchPOWER             14.2.0   https://archlinuxpower.org/
Artix                 14.2.0   https://artixlinux.org/
Debian                14.2.0   https://www.debian.org/
Devuan                14.2.0   https://www.devuan.org/
Exherbo               14.2.0   https://www.exherbolinux.org/
Fedora                14.2.1   https://fedoraproject.org/
Gentoo                14.2.1   https://gentoo.org/
Kali Linux            14.2.0   https://www.kali.org/
KaOS                  14.2.0   https://kaosx.us/
LiGurOS               14.2.0   https://liguros.gitlab.io/
Mageia                14.2.0   https://www.mageia.org/en/
Manjaro               14.2.0   https://manjaro.org/
NixOS                 14.2.0   https://nixos.org/
openmamba             14.2.0   https://openmamba.org/
OpenMandriva          14.2.0   https://openmandriva.org/
openSUSE              14.2.0   https://www.opensuse.org/
Parabola              14.2.0   https://www.parabola.nu/
PLD Linux             14.2.0   https://pld-linux.org/
PureOS                14.2.0   https://pureos.net/
Raspbian              14.2.0   http://raspbian.org/
Slackware             14.2.0   http://www.slackware.com/
Solus                 14.2.0   https://getsol.us/
T2 SDE                14.2.0   http://t2sde.org/
Ubuntu                14.2.0   https://www.ubuntu.com/
Wikidata              14.2.0   https://wikidata.org/

Support older versions of GCC to build glibc 2.42:

1. Need to work around bugs in older versions of GCC.
2. Can't use the new features in newer versions of GCC, which may be
required for new features, like _Float16 which requires GCC 12.1 or
above, in glibc,

The main benefit of supporting older versions of GCC is easier backport
of bug fixes to the older releases of glibc, which can be mitigated by
avoiding incompatible features in newer versions of GCC for critical bug
fixes.  Require GCC 12.1 or newer to build.  Remove GCC version check for
PowerPC and s390x.

TEST_CC and TEST_CXX can be used to test the glibc build with the older
versions of GCC.

For glibc developers who are using Linux OSes which don't come with GCC
12.1 or newer, they should build and install GCC 12.1 or newer to work
on glibc.

This fixes BZ #32539.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>

Fix typo in comment

manual: tidy the longopt.c example

- Change longopt.c's backticks to single quotes
- puts() does not use format specifiers

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

manual: Document functions adopted by POSIX.1-2024.

Here is a patch updating the documentation to mention GNU and BSD
extensions that were adopted by POSIX.1-2024.

* manual/llio.texi (Memory-mapped I/O): Add that MAP_ANON and
MAP_ANONYMOUS were added by POSIX.1-2024.
* manual/memory.texi (Changing Block Size): Mention that reallocarray
was added by POSIX.1-2024.
* manual/message.texi (Message Translation): Adjust wording to match
standardization.
(Translation with gettext): Mention the gettext family of functions were
added by POSIX.1-2024.
* manual/pattern.texi (Wildcard Matching): Mention that FNM_CASEFOLD was
added by POSIX.1-2024.
* manual/process.texi (Creating a Process): Mention that _Fork and
WCOREDUMP were added by POSIX.1-2024.
* manual/signal.texi (Miscellaneous Signals): Mention that SIGWINCH was
added by POSIX-1.2024.
* manual/startup.texi (Environment Access): Mention that secure_getenv
was added by POSIX.1-2024.
* manual/string.texi (Truncating Strings): Mention that strlcpy,
strlcat, wcslcpy, and wslcat were added by POSIX-1.2024.
(Search Functions): Document that memmem was added by POSIX-1.2024.
* manual/terminal.texi (Allocation): Mention that ptsname_r was added by
POSIX-1.2024.
* manual/threads.texi (Waiting with Explicit Clocks): Move node under
POSIX Threads. Mention pthread_cond_clockwait,
pthread_rwlock_clockrdlock, and pthread_rwlock_clockwrlock were added by
POSIX-1.2024.
(Joining Threads): New node under Non-POSIX Extensions.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Collin Funk <collin.funk1@gmail.com>

aarch64: Fix _dl_tlsdesc_dynamic unwind for pac-ret (BZ 32612)

When libgcc is built with pac-ret, it requires to autenticate the
unwinding frame based on CFI information.  The _dl_tlsdesc_dynamic
uses a custom calling convention, where it is responsible to save
and restore all registers it might use (even volatile).

The pac-ret support added by 1be3d6eb823d8b952fa54b7bbc90cbecb8981380
was added only on the slow-path, but the fast path also adds DWARF
Register Rule Instruction (cfi_adjust_cfa_offset) since it requires
to save/restore some auxiliary register.  It seems that this is not
fully supported neither by libgcc nor AArch64 ABI [1].

Instead, move paciasp/autiasp to function prologue/epilogue to be
used on both fast and slow paths.

I also corrected the _dl_tlsdesc_dynamic comment description, it was
copied from i386 implementation without any adjustment.

Checked on aarch64-linux-gnu with a toolchain built with
--enable-standard-branch-protection on a system with pac-ret
support.

[1]  https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)

Previously, the initialization code reused the xsave_state_full_size
member of struct cpu_features for the TLSDESC state size. However,
the tunable processing code assumes that this member has the
original XSAVE (non-compact) state size, so that it can use its
value if XSAVEC is disabled via tunable.

This change uses a separate variable and not a struct member because
the value is only needed in ld.so and the static libc, but not in
libc.so. As a result, struct cpu_features layout does not change,
helping a future backport of this change.

Fixes commit 9b7091415af47082664717210ac49d51551456ab ("x86-64:
Update _dl_tlsdesc_dynamic to preserve AMX registers").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>