]> git.ipfire.org Git - thirdparty/glibc.git/log
thirdparty/glibc.git
2 months agoOptimize __libc_tsd_* thread variable access
Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)] 
Optimize __libc_tsd_* thread variable access

These variables are not exported, and libc.so TLS is initial-exec
anyway.  Declare these variables as hidden and use the initial-exec
TLS model.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit a894f04d877653bea1639fc9a4adf73bd9347bf4)

2 months agoi386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129]
H.J. Lu [Mon, 18 Aug 2025 16:06:48 +0000 (09:06 -0700)] 
i386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129]

Since the GNU2 TLS run-time bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=31372

affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386
to indicate the working GNU2 TLS run-time.  For x86-64, the additional
GNU2 TLS run-time bug fix is needed for

https://sourceware.org/bugzilla/show_bug.cgi?id=31501

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit bd4628f3f18ac312408782eea450429c6f044860)

2 months agoi386: Add GLIBC_ABI_GNU_TLS version [BZ #33221]
H.J. Lu [Mon, 28 Jul 2025 19:16:11 +0000 (12:16 -0700)] 
i386: Add GLIBC_ABI_GNU_TLS version [BZ #33221]

On i386, programs and shared libraries with __thread usage may fail
silently at run-time against glibc without the TLS run-time fix for:

https://sourceware.org/bugzilla/show_bug.cgi?id=32996

Add GLIBC_ABI_GNU_TLS version to indicate that glibc has the working
GNU TLS run-time.  Linker can add the GLIBC_ABI_GNU_TLS version to
binaries which depend on the working TLS run-time so that such programs
and shared libraries will fail to load and run at run-time against
libc.so without the GLIBC_ABI_GNU_TLS version, instead of fail silently
at random.

This fixes BZ #33221.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit ed1b7a5a489ab555a27fad9c101ebe2e1c1ba881)

2 months agodebug: Fix tst-longjmp_chk3 build failure on Hurd
Florian Weimer [Tue, 26 Nov 2024 18:26:13 +0000 (19:26 +0100)] 
debug: Fix tst-longjmp_chk3 build failure on Hurd

Explicitly include <unistd.h> for _exit and getpid.

(cherry picked from commit 4836a9af89f1b4d482e6c72ff67e36226d36434c)

2 months agodebug: Wire up tst-longjmp_chk3
Florian Weimer [Mon, 25 Nov 2024 16:32:54 +0000 (17:32 +0100)] 
debug: Wire up tst-longjmp_chk3

The test was added in commit ac8cc9e300a002228eb7e660df3e7b333d9a7414
without all the required Makefile scaffolding.  Tweak the test
so that it actually builds (including with dynamic SIGSTKSZ).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4b7cfcc3fbfab55a1bbb32a2da69c048060739d6)

2 months agoi386: Update ___tls_get_addr to preserve vector registers
H.J. Lu [Sun, 8 Jun 2025 21:22:10 +0000 (05:22 +0800)] 
i386: Update ___tls_get_addr to preserve vector registers

Compiler generates the following instruction sequence for dynamic TLS
access:

leal tls_var@tlsgd(,%ebx,1), %eax
call ___tls_get_addr@PLT

CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS, AX, CX, and DX, are unchanged after CALL.  But
___tls_get_addr is a normal function which doesn't preserve any vector
registers.

1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal.
2. Change ___tls_get_addr to a wrapper function with implementations for
FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers.
3. dl-tlsdesc-dynamic.h has:

_dl_tlsdesc_dynamic:
/* Like all TLS resolvers, preserve call-clobbered registers.
   We need two scratch regs anyway.  */
subl $32, %esp
cfi_adjust_cfa_offset (32)

It is wrong to use

movl %ebx, -28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl -28(%esp), %ebx

to preserve EBX on stack.  Fix it with:

movl %ebx, 28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl 28(%esp), %ebx

4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly.
5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with
traditional TLS variant to verify the fix.
6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h.

This fixes BZ #32996.

Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 848f0e46f03f22404ed9a8aabf3fd5ce8809a1be)

2 months agoLoongArch: Regenerate preconfigure. [bug 32521]
mengqinggang [Fri, 3 Jan 2025 07:49:43 +0000 (15:49 +0800)] 
LoongArch: Regenerate preconfigure. [bug 32521]

Add "mtls_descriptor=desc" to preconfigure.ac and regenerate preconfigure.
Fix failure: elf/tst-gnu2-tls2.

Reported-by: Joseph S. Myers <josmyers@redhat.com>
Reported-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit d4cdb601df0a125550341f85d7011314e4746308)

2 months agoelf: Preserve _rtld_global layout for the release branch
Florian Weimer [Mon, 18 Aug 2025 11:52:02 +0000 (13:52 +0200)] 
elf: Preserve _rtld_global layout for the release branch

Backporting commit 7d649d1c990441181c903bd2f99251cbf1552180
("elf: Introduce _dl_debug_change_state") removed the
_ns_debug member.  Keep it to preseve struct layout.

2 months agoelf: Compile _dl_debug_state separately (bug 33224)
Florian Weimer [Mon, 28 Jul 2025 12:16:52 +0000 (14:16 +0200)] 
elf: Compile _dl_debug_state separately (bug 33224)

This ensures that the compiler will not inline it, so that
debuggers which do not use the Systemtap probes can reliably
set a breakpoint on it.

Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 620f0730f311635cd0e175a3ae4d0fc700c76366)

2 months agoelf: Restore support for _r_debug interpositions and copy relocations
Florian Weimer [Fri, 4 Jul 2025 19:46:30 +0000 (21:46 +0200)] 
elf: Restore support for _r_debug interpositions and copy relocations

The changes in commit a93d9e03a31ec14405cb3a09aa95413b67067380
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool.  It brings its
own definition of _r_debug (rather than a declaration).

Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC.  To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play.  Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.

Only perform the copy relocation initialization if libc has been
loaded.  Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit ea85e7d55087075376a29261e722e4fae14ecbe7)

2 months agoelf: Introduce _dl_debug_change_state
Florian Weimer [Fri, 4 Jul 2025 19:46:16 +0000 (21:46 +0200)] 
elf: Introduce _dl_debug_change_state

It combines updating r_state with the debugger notification.

The second change to  _dl_open introduces an additional debugger
notification for dlmopen, but debuggers are expected to ignore it.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 8329939a37f483a16013dd8af8303cbcb86d92cb)

2 months agoelf: Introduce separate _r_debug_array variable
Florian Weimer [Fri, 4 Jul 2025 19:46:05 +0000 (21:46 +0200)] 
elf: Introduce separate _r_debug_array variable

It replaces the ns_debug member of the namespaces.  Previously,
the base namespace had an unused ns_debug member.

This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed.  (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 7278d11f3a0cd528188c719bab75575b0aea2c6e)

2 months agoelf: Test dlopen (NULL, RTLD_LAZY) from an ELF constructor
Florian Weimer [Tue, 11 Mar 2025 14:30:52 +0000 (15:30 +0100)] 
elf: Test dlopen (NULL, RTLD_LAZY) from an ELF constructor

This call must not complete initialization of all shared objects
in the global scope because the ELF constructor which makes the call
likely has not finished initialization.  Calling more constructors
at this point would expose those to a partially constructed
dependency.

This completes the revert of commit 9897ced8e78db5d813166a7ccccfd5a
("elf: Run constructors on cyclic recursive dlopen (bug 31986)").

(cherry picked from commit d604f9c500570e80febfcc6a52b63a002b466f35)

2 months agoelf: Second ld.so relocation only if libc.so has been loaded
Florian Weimer [Tue, 7 Jan 2025 08:18:07 +0000 (09:18 +0100)] 
elf: Second ld.so relocation only if libc.so has been loaded

Commit 8f8dd904c4a2207699bb666f30acceb5209c8d3f (“elf:
rtld_multiple_ref is always true”) removed some code that happened
to enable compatibility with programs that do not link against
libc.so.  Such programs cannot call dlopen or any dynamic linker
functions (except __tls_get_addr), so this is not really useful.
Still ld.so should not crash with a null-pointer dereference
or undefined symbol reference in these cases.

In the main relocation loop, call _dl_relocate_object unconditionally
because it already checks if the object has been relocated.

If libc.so was loaded, self-relocate ld.so against it and call
__rtld_mutex_init and __rtld_malloc_init_real to activate the full
implementations.  Those are available only if libc.so is there,
so skip these initialization steps if libc.so is absent.  Without
libc.so, the global scope can be completely empty.  This can cause
ld.so self-relocation to fail because if it uses symbol-based
relocations, which is why the second ld.so self-relocation is not
performed if libc.so is missing.

The previous concern regarding GOT updates through self-relocation
no longer applies because function pointers are updated
explicitly through __rtld_mutex_init and __rtld_malloc_init_real,
and not through relocation.  However, the second ld.so self-relocation
is still delayed, in case there are other symbols being used.

Fixes commit 8f8dd904c4a2207699bb666f30acceb5209c8d3f (“elf:
rtld_multiple_ref is always true”).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 706209867f1ba89c458033408d419e92d8055f58)

2 months agoelf: Fix handling of symbol versions which hash to zero (bug 29190)
Florian Weimer [Fri, 7 Mar 2025 16:37:50 +0000 (17:37 +0100)] 
elf: Fix handling of symbol versions which hash to zero (bug 29190)

This was found through code inspection.  No application impact is
known.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 46d31980943d8be2f421c1e3276b265c7552636e)

2 months agoelf: Reorder audit events in dlcose to match _dl_fini (bug 32066)
Florian Weimer [Tue, 3 Sep 2024 15:57:46 +0000 (17:57 +0200)] 
elf: Reorder audit events in dlcose to match _dl_fini (bug 32066)

This was discovered after extending elf/tst-audit23 to cover
dlclose of the dlmopen namespace.

Auditors already experience the new order during process
shutdown (_dl_fini), so no LAV_CURRENT bump or backwards
compatibility code seems necessary.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 495b96e064da605630a23092d1e484ade4bdc093)

2 months agoelf: Call la_objclose for proxy link maps in _dl_fini (bug 32065)
Florian Weimer [Fri, 9 Aug 2024 14:06:40 +0000 (16:06 +0200)] 
elf: Call la_objclose for proxy link maps in _dl_fini (bug 32065)

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit c4b160744cb39eca20dc36b39c7fa6e10352706c)

2 months agoelf: Signal la_objopen for the proxy link map in dlmopen (bug 31985)
Florian Weimer [Fri, 9 Aug 2024 13:31:18 +0000 (15:31 +0200)] 
elf: Signal la_objopen for the proxy link map in dlmopen (bug 31985)

Previously, the ld.so link map was silently added to the namespace.
This change produces an auditing event for it.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 8f36b1469677afe37168f9af1b77402d7a70c673)

2 months agoelf: Add the endswith function to <endswith.h>
Florian Weimer [Fri, 29 Nov 2024 14:36:40 +0000 (15:36 +0100)] 
elf: Add the endswith function to <endswith.h>

And include <stdbool.h> for a definition of bool.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit a20bc2f6233a726c7df8eaa332b6e498bd59321f)

2 months agoelf: Update DSO list, write audit log to elf/tst-audit23.out
Florian Weimer [Tue, 3 Sep 2024 15:52:47 +0000 (17:52 +0200)] 
elf: Update DSO list, write audit log to elf/tst-audit23.out

After commit 1d5024f4f052c12e404d42d3b5bfe9c3e9fd27c4
("support: Build with exceptions and asynchronous unwind tables
[BZ #30587]"), libgcc_s is expected to show up in the DSO
list on 32-bit Arm.  Do not update max_objs because vdso is not
tracked (and which is the reason why the test currently passes
even with libgcc_s present).

Also write the log output from the auditor to standard output,
for easier test debugging.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4a50fdf8b2c1106b50cd9056b4c6f3a72cdeed5f)

2 months agoelf: Switch to main malloc after final ld.so self-relocation
Florian Weimer [Wed, 6 Nov 2024 09:33:44 +0000 (10:33 +0100)] 
elf: Switch to main malloc after final ld.so self-relocation

Before commit ee1ada1bdb8074de6e1bdc956ab19aef7b6a7872
("elf: Rework exception handling in the dynamic loader
[BZ #25486]"), the previous order called the main calloc
to allocate a shadow GOT/PLT array for auditing support.
This happened before libc.so.6 ELF constructors were run, so
a user malloc could run without libc.so.6 having been
initialized fully.  One observable effect was that
environ was NULL at this point.

It does not seem to be possible at present to trigger such
an allocation, but it seems more robust to delay switching
to main malloc after ld.so self-relocation is complete.
The elf/tst-rtld-no-malloc-audit test case fails with a
2.34-era glibc that does not have this fix.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit c1560f3f75c0e892b5522c16f91b4e303f677094)

2 months agoelf: Introduce _dl_relocate_object_no_relro
Florian Weimer [Wed, 6 Nov 2024 09:33:44 +0000 (10:33 +0100)] 
elf: Introduce _dl_relocate_object_no_relro

And make _dl_protect_relro apply RELRO conditionally.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit f2326c2ec0a0a8db7bc7f4db8cce3002768fc3b6)

2 months agoelf: Do not define consider_profiling, consider_symbind as macros
Florian Weimer [Wed, 6 Nov 2024 09:33:44 +0000 (10:33 +0100)] 
elf: Do not define consider_profiling, consider_symbind as macros

This avoids surprises when refactoring the code if these identifiers
are re-used later in the file.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit a79642204537dec8a1e1c58d1e0a074b3c624f46)

2 months agoelf: rtld_multiple_ref is always true
Florian Weimer [Wed, 6 Nov 2024 09:33:44 +0000 (10:33 +0100)] 
elf: rtld_multiple_ref is always true

For a long time, libc.so.6 has dependend on ld.so, which
means that there is a reference to ld.so in all processes,
and rtld_multiple_ref is always true.  In fact, if
rtld_multiple_ref were false, some of the ld.so setup code
would not run.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8f8dd904c4a2207699bb666f30acceb5209c8d3f)

2 months agoRevert "elf: Run constructors on cyclic recursive dlopen (bug 31986)"
Florian Weimer [Mon, 28 Oct 2024 13:45:30 +0000 (14:45 +0100)] 
Revert "elf: Run constructors on cyclic recursive dlopen (bug 31986)"

This reverts commit 9897ced8e78db5d813166a7ccccfd5a42c69ef20.

Adjust the test expectations in elf/tst-dlopen-auditdup-auditmod.c
accordingly.

(cherry picked from commit 95129e6b8fabdaa8cd8a4a5cc20be0f4cb0ba59f)

2 months agoelf: Fix map_complete Systemtap probe in dl_open_worker
Florian Weimer [Fri, 25 Oct 2024 15:41:53 +0000 (17:41 +0200)] 
elf: Fix map_complete Systemtap probe in dl_open_worker

The refactoring did not take the change of variable into account.
Fixes commit 43db5e2c0672cae7edea7c9685b22317eae25471
("elf: Signal RT_CONSISTENT after relocation processing in dlopen
(bug 31986)").

(cherry picked from commit ac73067cb7a328bf106ecd041c020fc61be7e087)

2 months agoelf: Signal RT_CONSISTENT after relocation processing in dlopen (bug 31986)
Florian Weimer [Fri, 25 Oct 2024 14:50:10 +0000 (16:50 +0200)] 
elf: Signal RT_CONSISTENT after relocation processing in dlopen (bug 31986)

Previously, a la_activity audit event was generated before
relocation processing completed.  This does did not match what
happened during initial startup in elf/rtld.c (towards the end
of dl_main).  It also caused various problems if an auditor
tried to open the same shared object again using dlmopen:
If it was the directly loaded object, it had a search scope
associated with it, so the early exit in dl_open_worker_begin
was taken even though the object was unrelocated.  This caused
the r_state == RT_CONSISTENT assert to fail.  Avoidance of the
assert also depends on reversing the order of r_state update
and auditor event (already implemented in a previous commit).

At the later point, args->map can be NULL due to failure,
so use the assigned namespace ID instead if that is available.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 43db5e2c0672cae7edea7c9685b22317eae25471)

2 months agoelf: Signal LA_ACT_CONSISTENT to auditors after RT_CONSISTENT switch
Florian Weimer [Fri, 25 Oct 2024 14:50:10 +0000 (16:50 +0200)] 
elf: Signal LA_ACT_CONSISTENT to auditors after RT_CONSISTENT switch

Auditors can call into the dynamic loader again if
LA_ACT_CONSISTENT, and  those recursive calls could observe
r_state != RT_CONSISTENT.

We should consider failing dlopen/dlmopen/dlclose if
r_state != RT_CONSISTENT.  The dynamic linker is probably not
in a state in which it can handle reentrant calls.  This
needs further investigation.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit e096b7a1896886eb7dd2732ccbf1184b0eec9a63)

2 months agoelf: Run constructors on cyclic recursive dlopen (bug 31986)
Florian Weimer [Fri, 25 Oct 2024 14:50:10 +0000 (16:50 +0200)] 
elf: Run constructors on cyclic recursive dlopen (bug 31986)

This is conceptually similar to the reported bug, but does not
depend on auditing.  The fix is simple: just complete execution
of the constructors.  This exposed the fact that the link map
for statically linked executables does not have l_init_called
set, even though constructors have run.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9897ced8e78db5d813166a7ccccfd5a42c69ef20)

2 months agox86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212]
H.J. Lu [Thu, 14 Aug 2025 14:03:20 +0000 (07:03 -0700)] 
x86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212]

When the linker -z mark-plt option is used to add DT_X86_64_PLT,
DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the
R_X86_64_JUMP_SLOT relocation stores the offset of the indirect
branch instruction.  However, glibc versions without the commit:

commit f8587a61892cbafd98ce599131bf4f103466f084
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri May 20 19:21:48 2022 -0700

    x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT

    According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT
    and R_X86_64_JUMP_SLOT.  Since linkers always set their r_addends to 0, we
    can ignore their r_addends.

Reviewed-by: Fangrui Song <maskray@google.com>
won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation.
Such programs and shared libraries will fail at run-time randomly.

Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible
with DT_X86_64_PLT.

The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency
whenever -z mark-plt is passed to the linker.  The resulting programs and
shared libraries will fail to load at run-time against libc.so without the
GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly.

This fixes BZ #33212.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 399384e0c8193e31aea014220ccfa24300ae5938)

2 months agox86-64: Add GLIBC_ABI_GNU2_TLS version [BZ #33129]
H.J. Lu [Mon, 28 Jul 2025 19:18:22 +0000 (12:18 -0700)] 
x86-64: Add GLIBC_ABI_GNU2_TLS version [BZ #33129]

Programs and shared libraries compiled with -mtls-dialect=gnu2 may fail
silently at run-time against glibc without the GNU2 TLS run-time fix
for:

https://sourceware.org/bugzilla/show_bug.cgi?id=31372

Add GLIBC_ABI_GNU2_TLS version to indicate that glibc has the working
GNU2 TLS run-time.  Linker can add the GLIBC_ABI_GNU2_TLS version to
binaries which depend on the working GNU2 TLS run-time:

https://sourceware.org/bugzilla/show_bug.cgi?id=33130

so that such programs and shared libraries will fail to load and run at
run-time against libc.so without the GLIBC_ABI_GNU2_TLS version, instead
of fail silently at random.

This fixes BZ #33129.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 9df8fa397d515dc86ff5565f6c45625e672d539e)

2 months agomath: Remove no-mathvec flag
Joe Ramsay [Fri, 3 Jan 2025 19:13:36 +0000 (19:13 +0000)] 
math: Remove no-mathvec flag

More routines are to follow, some of which hit many failures in the
current testsuite due to wrong sign of zero (mathvec routines are not
required to get this right). Instead of disabling a large number of
tests, change the failure condition such that, for vector routines,
tests pass as long as computed == expected == 0.0, regardless of sign.

Affected tests (vector tests for expm1, log1p, sin, tan and tanh) all
still pass.

(cherry picked from commit 939e770e0196ebd763cacc602421b76d62df0798)

2 months agoUse TLS initial-exec model for __libc_tsd_CTYPE_* thread variables [BZ #33234]
Jens Remus [Fri, 25 Jul 2025 13:40:03 +0000 (15:40 +0200)] 
Use TLS initial-exec model for __libc_tsd_CTYPE_* thread variables [BZ #33234]

Commit 10a66a8e421b ("Remove <libc-tsd.h>") removed the TLS initial-exec
(IE) model attribute from the __libc_tsd_CTYPE_* thread variable declarations
and definitions.  Commit a894f04d8776 ("Optimize __libc_tsd_* thread
variable access") restored it on declarations.

Restore the TLS initial-exec model attribute on __libc_tsd_CTYPE_* thread
variable definitions.

This resolves test tst-locale1 failure on s390 32-bit, when using a
GNU linker without the fix from GNU binutils commit aefebe82dc89
("IBM zSystems: Fix offset relative to static TLS").

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit e5363e6f460c2d58809bf10fc96d70fd1ef8b5b2)

2 months agoctype: Fallback initialization of TLS using relocations (bug 19341, bug 32483)
Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)] 
ctype: Fallback initialization of TLS using relocations (bug 19341, bug 32483)

This ensures that the ctype data pointers in TLS are valid
in secondary namespaces even without initialization via
__ctype_init.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit 2745db8dd3ec31045acd761b612516490085bc20)

2 months agoUse proper extern declaration for _nl_C_LC_CTYPE_{class,toupper,tolower}
Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)] 
Use proper extern declaration for _nl_C_LC_CTYPE_{class,toupper,tolower}

The existing initializers already contain explicit casts.  Keep them
due to int/uint32_t mismatch.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit e0c0f856f58ceb68800a964c36c15c606e7a8c4c)

2 months agoRemove <libc-tsd.h>
Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)] 
Remove <libc-tsd.h>

Use __thread variables directly instead.  The macros do not save any
typing.  It seems unlikely that a future port will lack __thread
variable support.

Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names.  Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit 10a66a8e421b09682b774c795ef1da402235dddc)

2 months agoelf: Handle ld.so with LOAD segment gaps in _dl_find_object (bug 31943)
Florian Weimer [Fri, 1 Aug 2025 10:19:49 +0000 (12:19 +0200)] 
elf: Handle ld.so with LOAD segment gaps in _dl_find_object (bug 31943)

Detect if ld.so not contiguous and handle that case in _dl_find_object.
Set l_find_object_processed even for initially loaded link maps,
otherwise dlopen of an initially loaded object adds it to
_dlfo_loaded_mappings (where maps are expected to be contiguous),
in addition to _dlfo_nodelete_mappings.

Test elf/tst-link-map-contiguous-ldso iterates over the loader
image, reading every word to make sure memory is actually mapped.
It only does that if the l_contiguous flag is set for the link map.
Otherwise, it finds gaps with mmap and checks that _dl_find_object
does not return the ld.so mapping for them.

The test elf/tst-link-map-contiguous-main does the same thing for
the libc.so shared object.  This only works if the kernel loaded
the main program because the glibc dynamic loader may fill
the gaps with PROT_NONE mappings in some cases, making it contiguous,
but accesses to individual words may still fault.

Test elf/tst-link-map-contiguous-libc is again slightly different
because the dynamic loader always fills the gaps with PROT_NONE
mappings, so a different form of probing has to be used.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 20681be149b9eb1b6c1f4246bf4bd801221c86cd)

2 months agoelf: Extract rtld_setup_phdr function from dl_main
Florian Weimer [Fri, 1 Aug 2025 08:20:23 +0000 (10:20 +0200)] 
elf: Extract rtld_setup_phdr function from dl_main

Remove historic binutils reference from comment and update
how this data is used by applications.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 2cac9559e06044ba520e785c151fbbd25011865f)

2 months agoelf: Do not add a copy of _dl_find_object to libc.so
Florian Weimer [Sat, 1 Feb 2025 11:37:58 +0000 (12:37 +0100)] 
elf: Do not add a copy of _dl_find_object to libc.so

This reduces code size and dependencies on ld.so internals from
libc.so.

Fixes commit f4c142bb9fe6b02c0af8cfca8a920091e2dba44b
("arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 96429bcc91a14f71b177ddc5e716de3069060f2c)

2 months agoAArch64: Improve codegen SVE log1p helper
Luna Lamb [Wed, 18 Jun 2025 16:12:19 +0000 (16:12 +0000)] 
AArch64: Improve codegen SVE log1p helper

Improve codegen by packing coefficients.
4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh
and atanh respectively.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 6849c5b791edd216f2ec3fdbe4d138bc69b9b333)

2 months agoAArch64: Optimise SVE FP64 Hyperbolics
Dylan Fleming [Wed, 18 Jun 2025 16:19:22 +0000 (16:19 +0000)] 
AArch64: Optimise SVE FP64 Hyperbolics

Reworke SVE FP64 hyperbolics to use the SVE FEXPA
instruction.

Also update the special case handelling for large
inputs to be entirely vectorised.

Performance improvements on Neoverse V1:

cosh_sve: 19% for |x| < 709, 5x otherwise
sinh_sve: 24% for |x| < 709, 5.9x otherwise
tanh_sve: 12% for |x| < 19,  9x otherwise

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit dee22d2a81ab59afc165fb6dcb45d723f13582a0)

2 months agoAArch64: Optimize SVE exp functions
Dylan Fleming [Wed, 18 Jun 2025 16:17:12 +0000 (16:17 +0000)] 
AArch64: Optimize SVE exp functions

Improve performance of SVE exps by making better use
of the SVE FEXPA instruction.

Performance improvement on Neoverse V1:
exp2_sve:   21%
exp2f_sve:  24%
exp10f_sve: 23%
expm1_sve:  25%

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 1e3d1ddf977ecd653de8d0d10eb083d80ac21cf3)

2 months agoAArch64: Improve codegen in SVE log1p
Luna Lamb [Thu, 29 May 2025 15:22:51 +0000 (15:22 +0000)] 
AArch64: Improve codegen in SVE log1p

Improves memory access, reformat evaluation scheme to pack coefficients.
5% improvement in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit da196e6134ede64728006518352d75b6c3902fec)

2 months agoAArch64: Optimize inverse trig functions
Dylan Fleming [Mon, 19 May 2025 11:36:51 +0000 (11:36 +0000)] 
AArch64: Optimize inverse trig functions

Improve performance of Inverse trig functions by altering how coefficients are
loaded.

Performance improvement on Neoverse V1:
SVE     acos   14%
AdvSIMD acos   6%

AdvSIMD asin   6%
SVE     asin   5%
AdvSIMD asinf  2%

AdvSIMD atanf  22%
SVE     atanf  20%
SVE     atan   11%
AdvSIMD atan   5%

SVE     atan2  7%
SVE     atan2f 4%
AdvSIMD atan2f 3%
AdvSIMD atan2  2%

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 1e84509e0041c0a83997aba602a585bb3b8285f0)

2 months agoAArch64: Optimize algorithm in users of SVE expf helper
Pierre Blanchard [Tue, 18 Mar 2025 17:07:31 +0000 (17:07 +0000)] 
AArch64: Optimize algorithm in users of SVE expf helper

Polynomial order was unnecessarily high, unlocking multiple
optimizations.
Max error for new SVE expf is 0.88 +0.5ULP.
Max error for new SVE coshf is 2.56 +0.5ULP.
Performance improvement on Neoverse V1: expf (30%), coshf (26%).

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit cf56eb28fa277d9dbb301654682ca89f71c30a48)

2 months agoAArch64: Avoid memset ifunc in cpu-features.c [BZ #33112]
Wilco Dijkstra [Fri, 27 Jun 2025 14:10:55 +0000 (14:10 +0000)] 
AArch64: Avoid memset ifunc in cpu-features.c [BZ #33112]

During early startup memcpy or memset must not be called since many targets
use ifuncs for them which won't be initialized yet.  Security hardening may
use -ftrivial-auto-var-init=zero which inserts calls to memset.  Redirect
memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c.
This fixes BZ #33112.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 681a24ae4d0cb8ed92de98b4da660308840b09ba)

3 months agoposix: Fix double-free after allocation failure in regcomp (bug 33185)
Florian Weimer [Mon, 21 Jul 2025 19:43:49 +0000 (21:43 +0200)] 
posix: Fix double-free after allocation failure in regcomp (bug 33185)

If a memory allocation failure occurs during bracket expression
parsing in regcomp, a double-free error may result.

Reported-by: Anastasia Belova <abelova@astralinux.ru>
Co-authored-by: Paul Eggert <eggert@cs.ucla.edu>
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 7ea06e994093fa0bcca0d0ee2c1db271d8d7885d)

4 months agoFix error reporting (false negatives) in SGID tests
Florian Weimer [Thu, 22 May 2025 12:36:37 +0000 (14:36 +0200)] 
Fix error reporting (false negatives) in SGID tests

And simplify the interface of support_capture_subprogram_self_sgid.

Use the existing framework for temporary directories (now with
mode 0700) and directory/file deletion.  Handle all execution
errors within support_capture_subprogram_self_sgid.  In particular,
this includes test failures because the invoked program did not
exit with exit status zero.  Existing tests that expect exit
status 42 are adjusted to use zero instead.

In addition, fix callers not to call exit (0) with test failures
pending (which may mask them, especially when running with --direct).

Fixes commit 35fc356fa3b4f485bd3ba3114c9f774e5df7d3c2
("elf: Fix subprocess status handling for tst-dlopen-sgid (bug 32987)").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 3a3fb2ed83f79100c116c824454095ecfb335ad7)

4 months agosupport: Pick group in support_capture_subprogram_self_sgid if UID == 0
Florian Weimer [Wed, 21 May 2025 14:47:34 +0000 (16:47 +0200)] 
support: Pick group in support_capture_subprogram_self_sgid if UID == 0

When running as root, it is likely that we can run under any group.
Pick a harmless group from /etc/group in this case.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 2f769cec448d84a62b7dd0d4ff56978fe22c0cd6)

4 months agoppc64le: Revert "powerpc: Optimized strcmp for power10" (CVE-2025-5702)
Carlos O'Donell [Mon, 16 Jun 2025 17:09:57 +0000 (13:09 -0400)] 
ppc64le: Revert "powerpc: Optimized strcmp for power10" (CVE-2025-5702)

This reverts commit 3367d8e180848030d1646f088759f02b8dfe0d6f

Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)

Tested on ppc64le without regression.

(cherry picked from commit 15808c77b35319e67ee0dc8f984a9a1a434701bc)

4 months agoppc64le: Revert "powerpc : Add optimized memchr for POWER10" (Bug 33059)
Carlos O'Donell [Wed, 11 Jun 2025 13:33:45 +0000 (09:33 -0400)] 
ppc64le: Revert "powerpc : Add optimized memchr for POWER10" (Bug 33059)

This reverts commit b9182c793caa05df5d697427c0538936e6396d4b

Reason for revert: Power10 memchr clobbers v20 vector register
(Bug 33059)

This is not a security issue, unlike CVE-2025-5745 and
CVE-2025-5702.

Tested on ppc64le without regression.

(cherry picked from commit a7877bb6685300f159fa095c9f50b22b112cddb8)

4 months agoppc64le: Revert "powerpc: Fix performance issues of strcmp power10" (CVE-2025-5702)
Carlos O'Donell [Wed, 11 Jun 2025 13:43:50 +0000 (09:43 -0400)] 
ppc64le: Revert "powerpc: Fix performance issues of strcmp power10" (CVE-2025-5702)

This reverts commit 90bcc8721ef82b7378d2b080141228660e862d56

This change is in the chain of the final revert that fixes the CVE
i.e. 3367d8e180848030d1646f088759f02b8dfe0d6f

Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)

Tested on ppc64le with no regressions.

(cherry picked from commit c22de63588df7a8a0edceea9bb02534064c9d201)

4 months agoppc64le: Revert "powerpc: Optimized strncmp for power10" (CVE-2025-5745)
Carlos O'Donell [Wed, 11 Jun 2025 13:19:17 +0000 (09:19 -0400)] 
ppc64le: Revert "powerpc: Optimized strncmp for power10" (CVE-2025-5745)

This reverts commit 23f0d81608d0ca6379894ef81670cf30af7fd081

Reason for revert: Power10 strncmp clobbers non-volatile vector
registers (Bug 33060)

Tested on ppc64le with no regressions.

(cherry picked from commit 63c60101ce7c5eac42be90f698ba02099b41b965)

5 months agoelf: Fix subprocess status handling for tst-dlopen-sgid (bug 32987)
Florian Weimer [Wed, 21 May 2025 06:43:32 +0000 (08:43 +0200)] 
elf: Fix subprocess status handling for tst-dlopen-sgid (bug 32987)

This should really move into support_capture_subprogram_self_sgid.

Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 35fc356fa3b4f485bd3ba3114c9f774e5df7d3c2)

5 months agox86_64: Fix typo in ifunc-impl-list.c.
Sunil K Pandey [Tue, 20 May 2025 17:07:27 +0000 (10:07 -0700)] 
x86_64: Fix typo in ifunc-impl-list.c.

Fix wcsncpy and wcpncpy typo in ifunc-impl-list.c.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit f2aeb6ff941dccc4c777b5621e77addea6cc076c)

5 months agoelf: Test case for bug 32976 (CVE-2025-4802)
Florian Weimer [Tue, 20 May 2025 17:45:06 +0000 (19:45 +0200)] 
elf: Test case for bug 32976 (CVE-2025-4802)

Check that LD_LIBRARY_PATH is ignored for AT_SECURE statically
linked binaries, using support_capture_subprogram_self_sgid.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit d8f7a79335b0d861c12c42aec94c04cd5bb181e2)

5 months agosupport: Add support_record_failure_barrier
Florian Weimer [Mon, 23 Dec 2024 12:57:55 +0000 (13:57 +0100)] 
support: Add support_record_failure_barrier

This can be used to stop execution after a TEST_COMPARE_BLOB
failure, for example.

(cherry picked from commit d0b8aa6de4529231fadfe604ac2c434e559c2d9e)

5 months agosupport: Use const char * argument in support_capture_subprogram_self_sgid
Florian Weimer [Tue, 20 May 2025 17:36:02 +0000 (19:36 +0200)] 
support: Use const char * argument in support_capture_subprogram_self_sgid

The function does not modify the passed-in string, so make this clear
via the prototype.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f0c09fe61678df6f7f18fe1ebff074e62fa5ca7a)

5 months agoelf: Keep using minimal malloc after early DTV resize (bug 32412)
Florian Weimer [Thu, 13 Feb 2025 20:56:52 +0000 (21:56 +0100)] 
elf: Keep using minimal malloc after early DTV resize (bug 32412)

If an auditor loads many TLS-using modules during startup, it is
possible to trigger DTV resizing.  Previously, the DTV was marked
as allocated by the main malloc afterwards, even if the minimal
malloc was still in use.  With this change, _dl_resize_dtv marks
the resized DTV as allocated with the minimal malloc.

The new test reuses TLS-using modules from other auditing tests.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit aa3d7bd5299b33bffc118aa618b59bfa66059bcb)

6 months agolibio: Correctly link tst-popen-fork against libpthread
Arjun Shankar [Fri, 25 Oct 2024 07:33:45 +0000 (09:33 +0200)] 
libio: Correctly link tst-popen-fork against libpthread

tst-popen-fork failed to build for Hurd due to not being linked with
libpthread.  This commit fixes that.

Tested with build-many-glibcs.py for i686-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 6a290b2895b77be839fcb7c44a6a9879560097ad)

6 months agolibio: Fix a deadlock after fork in popen
Arjun Shankar [Fri, 18 Oct 2024 14:03:25 +0000 (16:03 +0200)] 
libio: Fix a deadlock after fork in popen

popen modifies its file handler book-keeping under a lock that wasn't
being taken during fork.  This meant that a concurrent popen and fork
could end up copying the lock in a "locked" state into the fork child,
where subsequently calling popen would lead to a deadlock due to the
already (spuriously) held lock.

This commit fixes the deadlock by appropriately taking the lock before
fork, and releasing/resetting it in the parent/child after the fork.

A new test for concurrent popen and fork is also added.  It consistently
hangs (and therefore fails via timeout) without the fix applied.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 9f0d2c0ee6c728643fcf9a4879e9f20f5e45ce5f)

6 months agonptl: PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions (bug 32786)
Florian Weimer [Thu, 13 Mar 2025 05:07:07 +0000 (06:07 +0100)] 
nptl: PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions (bug 32786)

The new initializer and struct layout does not initialize the
__g_signals field in the old struct layout before the change in
commit c36fc50781995e6758cae2b6927839d0157f213c ("nptl: Remove
g_refs from condition variables").  Bring back fields at the end
of struct __pthread_cond_s, so that they are again zero-initialized.

Reviewed-by: Sam James <sam@gentoo.org>
6 months agonptl: Use all of g1_start and g_signals
Malte Skarupke [Wed, 4 Dec 2024 13:05:40 +0000 (08:05 -0500)] 
nptl: Use all of g1_start and g_signals

The LSB of g_signals was unused. The LSB of g1_start was used to indicate
which group is G2. This was used to always go to sleep in pthread_cond_wait
if a waiter is in G2. A comment earlier in the file says that this is not
correct to do:

 "Waiters cannot determine whether they are currently in G2 or G1 -- but they
  do not have to because all they are interested in is whether there are
  available signals"

I either would have had to update the comment, or get rid of the check. I
chose to get rid of the check. In fact I don't quite know why it was there.
There will never be available signals for group G2, so we didn't need the
special case. Even if there were, this would just be a spurious wake. This
might have caught some cases where the count has wrapped around, but it
wouldn't reliably do that, (and even if it did, why would you want to force a
sleep in that case?) and we don't support that many concurrent waiters
anyway. Getting rid of it allows us to use one more bit, making us more
robust to wraparound.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 91bb902f58264a2fd50fbce8f39a9a290dd23706)

6 months agonptl: rename __condvar_quiesce_and_switch_g1
Malte Skarupke [Wed, 4 Dec 2024 13:04:54 +0000 (08:04 -0500)] 
nptl: rename __condvar_quiesce_and_switch_g1

This function no longer waits for threads to leave g1, so rename it to
__condvar_switch_g1

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 4b79e27a5073c02f6bff9aa8f4791230a0ab1867)

6 months agonptl: Fix indentation
Malte Skarupke [Wed, 4 Dec 2024 13:04:10 +0000 (08:04 -0500)] 
nptl: Fix indentation

In my previous change I turned a nested loop into a simple loop. I'm doing
the resulting indentation changes in a separate commit to make the diff on
the previous commit easier to review.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit ee6c14ed59d480720721aaacc5fb03213dc153da)

6 months agonptl: Use a single loop in pthread_cond_wait instaed of a nested loop
Malte Skarupke [Wed, 4 Dec 2024 13:03:44 +0000 (08:03 -0500)] 
nptl: Use a single loop in pthread_cond_wait instaed of a nested loop

The loop was a little more complicated than necessary. There was only one
break statement out of the inner loop, and the outer loop was nearly empty.
So just remove the outer loop, moving its code to the one break statement in
the inner loop. This allows us to replace all gotos with break statements.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 929a4764ac90382616b6a21f099192b2475da674)

6 months agonptl: Remove g_refs from condition variables
Malte Skarupke [Wed, 4 Dec 2024 12:56:38 +0000 (07:56 -0500)] 
nptl: Remove g_refs from condition variables

This variable used to be needed to wait in group switching until all sleepers
have confirmed that they have woken. This is no longer needed. Nothing waits
on this variable so there is no need to track how many threads are currently
asleep in each group.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit c36fc50781995e6758cae2b6927839d0157f213c)

6 months agonptl: Remove unnecessary quadruple check in pthread_cond_wait
Malte Skarupke [Wed, 4 Dec 2024 12:56:13 +0000 (07:56 -0500)] 
nptl: Remove unnecessary quadruple check in pthread_cond_wait

pthread_cond_wait was checking whether it was in a closed group no less than
four times. Checking once is enough. Here are the four checks:

1. While spin-waiting. This was dead code: maxspin is set to 0 and has been
   for years.
2. Before deciding to go to sleep, and before incrementing grefs: I kept this
3. After incrementing grefs. There is no reason to think that the group would
   close while we do an atomic increment. Obviously it could close at any
   point, but that doesn't mean we have to recheck after every step. This
   check was equally good as check 2, except it has to do more work.
4. When we find ourselves in a group that has a signal. We only get here after
   we check that we're not in a closed group. There is no need to check again.
   The check would only have helped in cases where the compare_exchange in the
   next line would also have failed. Relying on the compare_exchange is fine.

Removing the duplicate checks clarifies the code.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 4f7b051f8ee3feff1b53b27a906f245afaa9cee1)

6 months agonptl: Remove unnecessary catch-all-wake in condvar group switch
Malte Skarupke [Wed, 4 Dec 2024 12:55:50 +0000 (07:55 -0500)] 
nptl: Remove unnecessary catch-all-wake in condvar group switch

This wake is unnecessary. We only switch groups after every sleeper in a group
has been woken. Sure, they may take a while to actually wake up and may still
hold a reference, but waking them a second time doesn't speed that up. Instead
this just makes the code more complicated and may hide problems.

In particular this safety wake wouldn't even have helped with the bug that was
fixed by Barrus' patch: The bug there was that pthread_cond_signal would not
switch g1 when it should, so we wouldn't even have entered this code path.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit b42cc6af11062c260c7dfa91f1c89891366fed3e)

6 months agonptl: Update comments and indentation for new condvar implementation
Malte Skarupke [Wed, 4 Dec 2024 12:55:22 +0000 (07:55 -0500)] 
nptl: Update comments and indentation for new condvar implementation

Some comments were wrong after the most recent commit. This fixes that.

Also fixing indentation where it was using spaces instead of tabs.

Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 0cc973160c23bb67f895bc887dd6942d29f8fee3)

6 months agopthreads NPTL: lost wakeup fix 2
Frank Barrus [Wed, 4 Dec 2024 12:55:02 +0000 (07:55 -0500)] 
pthreads NPTL: lost wakeup fix 2

This fixes the lost wakeup (from a bug in signal stealing) with a change
in the usage of g_signals[] in the condition variable internal state.
It also completely eliminates the concept and handling of signal stealing,
as well as the need for signalers to block to wait for waiters to wake
up every time there is a G1/G2 switch.  This greatly reduces the average
and maximum latency for pthread_cond_signal.

The g_signals[] field now contains a signal count that is relative to
the current g1_start value.  Since it is a 32-bit field, and the LSB is
still reserved (though not currently used anymore), it has a 31-bit value
that corresponds to the low 31 bits of the sequence number in g1_start.
(since g1_start also has an LSB flag, this means bits 31:1 in g_signals
correspond to bits 31:1 in g1_start, plus the current signal count)

By making the signal count relative to g1_start, there is no longer
any ambiguity or A/B/A issue, and thus any checks before blocking,
including the futex call itself, are guaranteed not to block if the G1/G2
switch occurs, even if the signal count remains the same.  This allows
initially safely blocking in G2 until the switch to G1 occurs, and
then transitioning from G1 to a new G1 or G2, and always being able to
distinguish the state change.  This removes the race condition and A/B/A
problems that otherwise ocurred if a late (pre-empted) waiter were to
resume just as the futex call attempted to block on g_signal since
otherwise there was no last opportunity to re-check things like whether
the current G1 group was already closed.

By fixing these issues, the signal stealing code can be eliminated,
since there is no concept of signal stealing anymore.  The code to block
for all waiters to exit g_refs can also be removed, since any waiters
that are still in the g_refs region can be guaranteed to safely wake
up and exit.  If there are still any left at this time, they are all
sent one final futex wakeup to ensure that they are not blocked any
longer, but there is no need for the signaller to block and wait for
them to wake up and exit the g_refs region.

The signal count is then effectively "zeroed" but since it is now
relative to g1_start, this is done by advancing it to a new value that
can be observed by any pending blocking waiters.  Any late waiters can
always tell the difference, and can thus just cleanly exit if they are
in a stale G1 or G2.  They can never steal a signal from the current
G1 if they are not in the current G1, since the signal value that has
to match in the cmpxchg has the low 31 bits of the g1_start value
contained in it, and that's first checked, and then it won't match if
there's a G1/G2 change.

Note: the 31-bit sequence number used in g_signals is designed to
handle wrap-around when checking the signal count, but if the entire
31-bit wraparound (2 billion signals) occurs while there is still a
late waiter that has not yet resumed, and it happens to then match
the current g1_start low bits, and the pre-emption occurs after the
normal "closed group" checks (which are 64-bit) but then hits the
futex syscall and signal consuming code, then an A/B/A issue could
still result and cause an incorrect assumption about whether it
should block.  This particular scenario seems unlikely in practice.
Note that once awake from the futex, the waiter would notice the
closed group before consuming the signal (since that's still a 64-bit
check that would not be aliased in the wrap-around in g_signals),
so the biggest impact would be blocking on the futex until the next
full wakeup from a G1/G2 switch.

Signed-off-by: Frank Barrus <frankbarrus_sw@shaggy.cc>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1db84775f831a1494993ce9c118deaf9537cc50a)

6 months agox86: Detect Intel Diamond Rapids
H.J. Lu [Sat, 12 Apr 2025 15:37:29 +0000 (08:37 -0700)] 
x86: Detect Intel Diamond Rapids

Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit de14f1959ee5f9b845a7cae43bee03068b8136f0)

6 months agox86: Handle unknown Intel processor with default tuning
Sunil K Pandey [Fri, 11 Apr 2025 15:52:52 +0000 (08:52 -0700)] 
x86: Handle unknown Intel processor with default tuning

Enable default tuning for unknown Intel processor.

Tested on x86, no regression.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 9f0deff558d1d6b08c425c157f50de85013ada9c)

6 months agox86: Add ARL/PTL/CWF model detection support
Sunil K Pandey [Fri, 4 Apr 2025 01:14:20 +0000 (18:14 -0700)] 
x86: Add ARL/PTL/CWF model detection support

- Add ARROWLAKE model detection.
- Add PANTHERLAKE model detection.
- Add CLEARWATERFOREST model detection.

Intel® Architecture Instruction Set Extensions Programming Reference
https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2.

No regression, validated model detection on SDE.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit e53eb952b970ac94c97d74fb447418fb327ca096)

6 months agox86: Optimize xstate size calculation
Sunil K Pandey [Thu, 3 Apr 2025 20:00:45 +0000 (13:00 -0700)] 
x86: Optimize xstate size calculation

Scan xstate IDs up to the maximum supported xstate ID.  Remove the
separate AMX xstate calculation.  Instead, exclude the AMX space from
the start of TILECFG to the end of TILEDATA in xsave_state_size.

Completed validation on SKL/SKX/SPR/SDE and compared xsave state size
with "ld.so --list-diagnostics" option, no regression.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit 70b648855185e967e54668b101d24704c3fb869d)

6 months agox86: Use `Avoid_Non_Temporal_Memset` to control non-temporal path
Noah Goldstein [Wed, 14 Aug 2024 06:37:30 +0000 (14:37 +0800)] 
x86: Use `Avoid_Non_Temporal_Memset` to control non-temporal path

This is just a refactor and there should be no behavioral change from
this commit.

The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob
for controlling whether we use non-temporal memset rather than having
extra logic based on vendor.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit b93dddfaf440aa12f45d7c356f6ffe9f27d35577)

7 months agox86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthread
Florian Weimer [Mon, 31 Mar 2025 19:33:18 +0000 (21:33 +0200)] 
x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthread

This fixes a test build failure on Hurd.

Fixes commit 145097dff170507fe73190e8e41194f5b5f7e6bf ("x86: Use separate
variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit c6e2895695118ab59c7b17feb0fcb75a53e3478c)

7 months agox86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)
Florian Weimer [Fri, 28 Mar 2025 08:26:59 +0000 (09:26 +0100)] 
x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)

Previously, the initialization code reused the xsave_state_full_size
member of struct cpu_features for the TLSDESC state size.  However,
the tunable processing code assumes that this member has the
original XSAVE (non-compact) state size, so that it can use its
value if XSAVEC is disabled via tunable.

This change uses a separate variable and not a struct member because
the value is only needed in ld.so and the static libc, but not in
libc.so.  As a result, struct cpu_features layout does not change,
helping a future backport of this change.

Fixes commit 9b7091415af47082664717210ac49d51551456ab ("x86-64:
Update _dl_tlsdesc_dynamic to preserve AMX registers").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 145097dff170507fe73190e8e41194f5b5f7e6bf)

7 months agox86: Skip XSAVE state size reset if ISA level requires XSAVE
Florian Weimer [Fri, 28 Mar 2025 08:26:06 +0000 (09:26 +0100)] 
x86: Skip XSAVE state size reset if ISA level requires XSAVE

If we have to use XSAVE or XSAVEC trampolines, do not adjust the size
information they need.  Technically, it is an operator error to try to
run with -XSAVE,-XSAVEC on such builds, but this change here disables
some unnecessary code with higher ISA levels and simplifies testing.

Related to commit befe2d3c4dec8be2cdd01a47132e47bdb7020922
("x86-64: Don't use SSE resolvers for ISA level 3 or above").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 59585ddaa2d44f22af04bb4b8bd4ad1e302c4c02)

7 months agox86_64: Add atanh with FMA
Sunil K Pandey [Thu, 6 Mar 2025 00:13:38 +0000 (16:13 -0800)] 
x86_64: Add atanh with FMA

On SPR, it improves atanh bench performance by:

Before After Improvement
reciprocal-throughput 15.1715 14.8628 2%
latency 57.1941 56.1883 2%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c7c4a5906f326f1290b1c2413a83c530564ec4b8)

7 months agox86_64: Add sinh with FMA
Sunil K Pandey [Sat, 8 Mar 2025 16:51:10 +0000 (08:51 -0800)] 
x86_64: Add sinh with FMA

On SPR, it improves sinh bench performance by:

Before After Improvement
reciprocal-throughput 14.2017 11.815 17%
latency 36.4917 35.2114 4%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit dded0d20f67ba1925ccbcb9cf28f0c75febe0dbe)

7 months agox86_64: Add tanh with FMA
Sunil K Pandey [Mon, 10 Mar 2025 17:24:07 +0000 (10:24 -0700)] 
x86_64: Add tanh with FMA

On Skylake, it improves tanh bench performance by:

Before  After  Improvement
max 110.89 95.826 14%
min 20.966 20.157 4%
mean 30.9601 29.8431 4%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c6352111c72a20b3588ae304dd99b63e25dd6d85)

7 months agonptl: clear the whole rseq area before registration
Michael Jeanson [Fri, 14 Feb 2025 18:54:22 +0000 (13:54 -0500)] 
nptl: clear the whole rseq area before registration

Due to the extensible nature of the rseq area we can't explictly
initialize fields that are not part of the ABI yet. It was agreed with
upstream that all new fields will be documented as zero initialized by
userspace. Future kernels configured with CONFIG_DEBUG_RSEQ will
validate the content of all fields during registration.

Replace the explicit field initialization with a memset of the whole
rseq area which will cover fields as they are added to future kernels.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 689a62a4217fae78b9ce0db781dc2a421f2b1ab4)

8 months agomath: Improve layout of exp/exp10 data
Wilco Dijkstra [Fri, 13 Dec 2024 15:43:07 +0000 (15:43 +0000)] 
math: Improve layout of exp/exp10 data

GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps.  As a result on targets that support them, more load-pair
instructions are used in exp.  Exp10 is improved by moving invlog10_2N later
so that neglog10_2hiN and neglog10_2loN can be loaded using load-pair.

The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2.  Exp10 improves by 1.5%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 5afaf99edb326fd9f36eb306a828d129a3a1d7f7)

8 months agoAArch64: Use prefer_sve_ifuncs for SVE memset
Wilco Dijkstra [Thu, 27 Feb 2025 16:28:52 +0000 (16:28 +0000)] 
AArch64: Use prefer_sve_ifuncs for SVE memset

Use prefer_sve_ifuncs for SVE memset just like memcpy.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 0f044be1dae5169d0e57f8d487b427863aeadab4)

8 months agoAArch64: Add SVE memset
Wilco Dijkstra [Tue, 24 Dec 2024 18:01:59 +0000 (18:01 +0000)] 
AArch64: Add SVE memset

Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes.  Performance of random memset benchmark improves
by ~2% on Neoverse V1.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 163b1bbb76caba4d9673c07940c5930a1afa7548)

8 months agomath: Improve layout of expf data
Wilco Dijkstra [Wed, 24 Jul 2024 14:17:47 +0000 (15:17 +0100)] 
math: Improve layout of expf data

GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp2f_data struct slightly so that the fields are better aligned.
As a result on targets that support them, load-pair instructions accessing
poly_scaled and invln2_scaled are now 16-byte aligned.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 44fa9c1080fe6a9539f0d2345b9d2ae37b8ee57a)

8 months agoAArch64: Remove zva_128 from memset
Wilco Dijkstra [Mon, 25 Nov 2024 18:43:08 +0000 (18:43 +0000)] 
AArch64: Remove zva_128 from memset

Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]).  Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.

[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html

Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit a08d9a52f967531a77e1824c23b5368c6434a72d)

8 months agoAArch64: Optimize memset
Wilco Dijkstra [Mon, 9 Sep 2024 14:26:47 +0000 (15:26 +0100)] 
AArch64: Optimize memset

Improve small memsets by avoiding branches and use overlapping stores.
Use DC ZVA for copies over 128 bytes.  Remove unnecessary code for ZVA sizes
other than 64 and 128.  Performance of random memset benchmark improves by 24%
on Neoverse N1.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit cec3aef32412779e207f825db0d057ebb4628ae8)

8 months agoAArch64: Improve generic strlen
Wilco Dijkstra [Wed, 7 Aug 2024 13:43:47 +0000 (14:43 +0100)] 
AArch64: Improve generic strlen

Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates.  Change final
size computation to avoid increasing latency.  On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3dc426b642dcafdbc11a99f2767e081d086f5fc7)

8 months agoRevert "AArch64: Add vector logp1 alias for log1p"
Wilco Dijkstra [Thu, 27 Feb 2025 20:34:34 +0000 (20:34 +0000)] 
Revert "AArch64: Add vector logp1 alias for log1p"

This reverts commit a991a0fc7c051d7ef2ea7778e0a699f22d4e53d7.

8 months agoAArch64: Improve codegen for SVE powf
Yat Long Poon [Thu, 13 Feb 2025 18:03:04 +0000 (18:03 +0000)] 
AArch64: Improve codegen for SVE powf

Improve memory access with indexed/unpredicated instructions.
Eliminate register spills.  Speedup on Neoverse V1: 3%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 95e807209b680257a9afe81a507754f1565dbb4d)

8 months agoAArch64: Improve codegen for SVE pow
Yat Long Poon [Thu, 13 Feb 2025 18:02:01 +0000 (18:02 +0000)] 
AArch64: Improve codegen for SVE pow

Move constants to struct.  Improve memory access with indexed/unpredicated
instructions.  Eliminate register spills.  Speedup on Neoverse V1: 24%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 0b195651db3ae793187c7dd6d78b5a7a8da9d5e6)

8 months agoAArch64: Improve codegen for SVE erfcf
Yat Long Poon [Thu, 13 Feb 2025 18:00:50 +0000 (18:00 +0000)] 
AArch64: Improve codegen for SVE erfcf

Reduce number of MOV/MOVPRFXs and use unpredicated FMUL.
Replace MUL with LSL.  Speedup on Neoverse V1: 6%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit f5ff34cb3c75ec1061c75bb9188b3c1176426947)

8 months agoAarch64: Improve codegen in SVE exp and users, and update expf_inline
Luna Lamb [Thu, 13 Feb 2025 17:54:46 +0000 (17:54 +0000)] 
Aarch64: Improve codegen in SVE exp and users, and update expf_inline

Use unpredicted muls, and improve memory access.
7%, 3% and 1% improvement in throughput microbenchmark on Neoverse V1,
for exp, exp2 and cosh respectively.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit c0ff447edf19bd4630fe79adf5e8b896405b059f)

8 months agoAarch64: Improve codegen in SVE asinh
Luna Lamb [Thu, 13 Feb 2025 17:52:09 +0000 (17:52 +0000)] 
Aarch64: Improve codegen in SVE asinh

Use unpredicated muls, use lanewise mla's and improve memory access.
1% regression in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 8f0e7fe61e0a2ad5ed777933703ce09053810ec4)

8 months agoAArch64: Improve codegen in SVE expm1f and users
Luna Lamb [Fri, 3 Jan 2025 20:15:17 +0000 (20:15 +0000)] 
AArch64: Improve codegen in SVE expm1f and users

Use unpredicated muls, use absolute compare and improve memory access.
Expm1f, sinhf and tanhf show 7%, 5% and 1% improvement in throughput
microbenchmark on Neoverse V1.

(cherry picked from commit f86b4cf87581cf1e45702b07880679ffa0b1f47a)

8 months agoAArch64: Improve codegen for SVE log1pf users
Yat Long Poon [Fri, 3 Jan 2025 19:09:05 +0000 (19:09 +0000)] 
AArch64: Improve codegen for SVE log1pf users

Reduce memory access by using lanewise MLA and reduce number of MOVPRFXs.
Move log1pf implementation to inline helper function.
Speedup on Neoverse V1 for log1pf (10%), acoshf (-1%), atanhf (2%), asinhf (2%).

(cherry picked from commit 91c1fadba338752bf514cd4cca057b27b1b10eed)

8 months agoAArch64: Improve codegen for SVE logs
Yat Long Poon [Fri, 3 Jan 2025 19:07:30 +0000 (19:07 +0000)] 
AArch64: Improve codegen for SVE logs

Reduce memory access by using lanewise MLA and moving constants to struct
and reduce number of MOVPRFXs.
Update maximum ULP error for double log_sve from 1 to 2.
Speedup on Neoverse V1 for log (3%), log2 (5%), and log10 (4%).

(cherry picked from commit 32d193a372feb28f9da247bb7283d404b84429c6)

8 months agoAArch64: Improve codegen in SVE tans
Luna Lamb [Fri, 3 Jan 2025 19:02:52 +0000 (19:02 +0000)] 
AArch64: Improve codegen in SVE tans

Improves memory access.
Tan: MOVPRFX 7 -> 2, LD1RD 12 -> 5, move MOV away from return.
Tanf: MOV 2 -> 1, MOVPRFX 6 -> 3, LD1RW 5 -> 4, move mov away from return.

(cherry picked from commit aa6609feb20ebf8653db639dabe2a6afc77b02cc)