Jiamei Xie [Tue, 14 Oct 2025 12:14:11 +0000 (20:14 +0800)]
x86: fix wmemset ifunc stray '!' (bug 33542)
The ifunc selector for wmemset had a stray '!' in the
X86_ISA_CPU_FEATURES_ARCH_P(...) check:
if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
&& X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
AVX_Fast_Unaligned_Load, !))
This effectively negated the predicate and caused the AVX2/AVX512
paths to be skipped, making the dispatcher fall back to the SSE2
implementation even on CPUs where AVX2/AVX512 are available. The
regression leads to noticeable throughput loss for wmemset.
Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is
tested as intended and the correct AVX2/EVEX variants are selected.
Impact:
- On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly
falls back to SSE2; perf now shows __wmemset_evex/avx2 variants.
Testing:
- benchtests/bench-wmemset shows improved bandwidth across sizes.
- perf confirm the selected symbol is no longer SSE2.
Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d86b6cdd8132e0410347e07262239750f86dfb4)
nss: Group merge does not react to ERANGE during merge (bug 33361)
The break statement in CHECK_MERGE is expected to exit the surrounding
while loop, not the do-while loop with in the macro. Remove the
do-while loop from the macro. It is not needed to turn the macro
expansion into a single statement due to the way CHECK_MERGE is used
(and the statement expression would cover this anyway).
Pierre Blanchard [Wed, 20 Aug 2025 17:41:50 +0000 (17:41 +0000)]
AArch64: Fix SVE powf routine [BZ #33299]
Fix a bug in predicate logic introduced in last change.
A slight performance improvement from relying on all true
predicates during conversion from single to double.
This fixes BZ #33299.
Add GLIBC_ABI_GNU_TLS version to indicate that glibc has the working
GNU TLS run-time. Linker can add the GLIBC_ABI_GNU_TLS version to
binaries which depend on the working TLS run-time so that such programs
and shared libraries will fail to load and run at run-time against
libc.so without the GLIBC_ABI_GNU_TLS version, instead of fail silently
at random.
affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386
to indicate the working GNU2 TLS run-time. For x86-64, the additional
GNU2 TLS run-time bug fix is needed for
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But
___tls_get_addr is a normal function which doesn't preserve any vector
registers.
1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal.
2. Change ___tls_get_addr to a wrapper function with implementations for
FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers.
3. dl-tlsdesc-dynamic.h has:
_dl_tlsdesc_dynamic:
/* Like all TLS resolvers, preserve call-clobbered registers.
We need two scratch regs anyway. */
subl $32, %esp
cfi_adjust_cfa_offset (32)
4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly.
5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with
traditional TLS variant to verify the fix.
6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h.
This fixes BZ #32996.
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 848f0e46f03f22404ed9a8aabf3fd5ce8809a1be)
Florian Weimer [Tue, 11 Mar 2025 14:30:52 +0000 (15:30 +0100)]
elf: Test dlopen (NULL, RTLD_LAZY) from an ELF constructor
This call must not complete initialization of all shared objects
in the global scope because the ELF constructor which makes the call
likely has not finished initialization. Calling more constructors
at this point would expose those to a partially constructed
dependency.
This completes the revert of commit 9897ced8e78db5d813166a7ccccfd5a
("elf: Run constructors on cyclic recursive dlopen (bug 31986)").
H.J. Lu [Thu, 14 Aug 2025 14:03:20 +0000 (07:03 -0700)]
x86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212]
When the linker -z mark-plt option is used to add DT_X86_64_PLT,
DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the
R_X86_64_JUMP_SLOT relocation stores the offset of the indirect
branch instruction. However, glibc versions without the commit:
x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT
According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT
and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we
can ignore their r_addends.
Reviewed-by: Fangrui Song <maskray@google.com>
won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation.
Such programs and shared libraries will fail at run-time randomly.
Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible
with DT_X86_64_PLT.
The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency
whenever -z mark-plt is passed to the linker. The resulting programs and
shared libraries will fail to load at run-time against libc.so without the
GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly.
Add GLIBC_ABI_GNU2_TLS version to indicate that glibc has the working
GNU2 TLS run-time. Linker can add the GLIBC_ABI_GNU2_TLS version to
binaries which depend on the working GNU2 TLS run-time:
so that such programs and shared libraries will fail to load and run at
run-time against libc.so without the GLIBC_ABI_GNU2_TLS version, instead
of fail silently at random.
This ensures that the compiler will not inline it, so that
debuggers which do not use the Systemtap probes can reliably
set a breakpoint on it.
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org> Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 620f0730f311635cd0e175a3ae4d0fc700c76366)
elf: Restore support for _r_debug interpositions and copy relocations
The changes in commit a93d9e03a31ec14405cb3a09aa95413b67067380
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool. It brings its
own definition of _r_debug (rather than a declaration).
Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC. To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play. Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.
Only perform the copy relocation initialization if libc has been
loaded. Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.
It replaces the ns_debug member of the namespaces. Previously,
the base namespace had an unused ns_debug member.
This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed. (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)
Use TLS initial-exec model for __libc_tsd_CTYPE_* thread variables [BZ #33234]
Commit 10a66a8e421b ("Remove <libc-tsd.h>") removed the TLS initial-exec
(IE) model attribute from the __libc_tsd_CTYPE_* thread variable declarations
and definitions. Commit a894f04d8776 ("Optimize __libc_tsd_* thread
variable access") restored it on declarations.
Restore the TLS initial-exec model attribute on __libc_tsd_CTYPE_* thread
variable definitions.
This resolves test tst-locale1 failure on s390 32-bit, when using a
GNU linker without the fix from GNU binutils commit aefebe82dc89
("IBM zSystems: Fix offset relative to static TLS").
Florian Weimer [Fri, 16 May 2025 17:53:09 +0000 (19:53 +0200)]
Remove <libc-tsd.h>
Use __thread variables directly instead. The macros do not save any
typing. It seems unlikely that a future port will lack __thread
variable support.
Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names. Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.
Pierre Blanchard [Tue, 18 Mar 2025 17:07:31 +0000 (17:07 +0000)]
AArch64: Optimize algorithm in users of SVE expf helper
Polynomial order was unnecessarily high, unlocking multiple
optimizations.
Max error for new SVE expf is 0.88 +0.5ULP.
Max error for new SVE coshf is 2.56 +0.5ULP.
Performance improvement on Neoverse V1: expf (30%), coshf (26%).
Wilco Dijkstra [Fri, 27 Jun 2025 14:10:55 +0000 (14:10 +0000)]
AArch64: Avoid memset ifunc in cpu-features.c [BZ #33112]
During early startup memcpy or memset must not be called since many targets
use ifuncs for them which won't be initialized yet. Security hardening may
use -ftrivial-auto-var-init=zero which inserts calls to memset. Redirect
memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c.
This fixes BZ #33112.
nptl: Fix SYSCALL_CANCEL for return values larger than INT_MAX (BZ 33245)
The SYSCALL_CANCEL calls __syscall_cancel, which in turn
calls __internal_syscall_cancel with an 'int' return instead of the
expected 'long int'. This causes issues with syscalls that return
values larger than INT_MAX, such as copy_file_range [1].
Florian Weimer [Fri, 1 Aug 2025 10:19:49 +0000 (12:19 +0200)]
elf: Handle ld.so with LOAD segment gaps in _dl_find_object (bug 31943)
Detect if ld.so not contiguous and handle that case in _dl_find_object.
Set l_find_object_processed even for initially loaded link maps,
otherwise dlopen of an initially loaded object adds it to
_dlfo_loaded_mappings (where maps are expected to be contiguous),
in addition to _dlfo_nodelete_mappings.
Test elf/tst-link-map-contiguous-ldso iterates over the loader
image, reading every word to make sure memory is actually mapped.
It only does that if the l_contiguous flag is set for the link map.
Otherwise, it finds gaps with mmap and checks that _dl_find_object
does not return the ld.so mapping for them.
The test elf/tst-link-map-contiguous-main does the same thing for
the libc.so shared object. This only works if the kernel loaded
the main program because the glibc dynamic loader may fill
the gaps with PROT_NONE mappings in some cases, making it contiguous,
but accesses to individual words may still fault.
Test elf/tst-link-map-contiguous-libc is again slightly different
because the dynamic loader always fills the gaps with PROT_NONE
mappings, so a different form of probing has to be used.
stdlib: resolve a double lock init issue after fork [BZ #32994]
The __abort_fork_reset_child (introduced in d40ac01cbbc66e6d9dbd8e3485605c63b2178251) call resets the lock after the
fork. This causes a DRD regression in valgrind
(https://bugs.kde.org/show_bug.cgi?id=503668), as it's effectively a
double initialization, despite it being actually ok in this case. As
suggested in https://sourceware.org/bugzilla/show_bug.cgi?id=32994#c2
we replace it here with a memcpy of another initialized lock instead,
which makes valgrind happy.
posix: Fix double-free after allocation failure in regcomp (bug 33185)
If a memory allocation failure occurs during bracket expression
parsing in regcomp, a double-free error may result.
Reported-by: Anastasia Belova <abelova@astralinux.ru> Co-authored-by: Paul Eggert <eggert@cs.ucla.edu> Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 7ea06e994093fa0bcca0d0ee2c1db271d8d7885d)
Florian Weimer [Thu, 22 May 2025 12:36:37 +0000 (14:36 +0200)]
Fix error reporting (false negatives) in SGID tests
And simplify the interface of support_capture_subprogram_self_sgid.
Use the existing framework for temporary directories (now with
mode 0700) and directory/file deletion. Handle all execution
errors within support_capture_subprogram_self_sgid. In particular,
this includes test failures because the invoked program did not
exit with exit status zero. Existing tests that expect exit
status 42 are adjusted to use zero instead.
In addition, fix callers not to call exit (0) with test failures
pending (which may mask them, especially when running with --direct).
sparc: Fix sparc32 Fix argument passing to __libc_start_main (BZ 32981)
Commit 404526ee2e58f3c075253943ddc9988f4bd6b80c changed _start to write
the last argument to __libc_start_main without taking into consideration
that the function did not create a full stack frame, which leads to
overwriting the argv[0].
sparc: Fix argument passing to __libc_start_main (BZ 32981)
sparc start.S does not provide the final argument for
__libc_start_main, which is the highest stack address used to
update the __libc_stack_end.A
This fixes elf/tst-execstack-prog-static-tunable on sparc64.
On sparcv9 this does not happen because the kernel puts an
auxv value, which turns to point to a value in the stack itself.
Florian Weimer [Thu, 13 Feb 2025 20:56:52 +0000 (21:56 +0100)]
elf: Keep using minimal malloc after early DTV resize (bug 32412)
If an auditor loads many TLS-using modules during startup, it is
possible to trigger DTV resizing. Previously, the DTV was marked
as allocated by the main malloc afterwards, even if the minimal
malloc was still in use. With this change, _dl_resize_dtv marks
the resized DTV as allocated with the minimal malloc.
The new test reuses TLS-using modules from other auditing tests.
nptl: Fix pthread_getattr_np when modules with execstack are allowed (BZ 32897)
The BZ 32653 fix (12a497c716f0a06be5946cabb8c3ec22a079771e) kept the
stack pointer zeroing from make_main_stack_executable on
_dl_make_stack_executable. However, previously the 'stack_endp'
pointed to temporary variable created before the call of
_dl_map_object_from_fd; while now we use the __libc_stack_end
directly.
Since pthread_getattr_np relies on correct __libc_stack_end, if
_dl_make_stack_executable is called (for instance, when
glibc.rtld.execstack=2 is set) __libc_stack_end will be set to zero,
and the call will always fail.
The __libc_stack_end zero was used a mitigation hardening, but since 52a01100ad011293197637e42b5be1a479a2f4ae it is used solely on
pthread_getattr_np code. So there is no point in zeroing anymore.
elf: tst-audit10: split AVX512F code into dedicated functions [BZ #32882]
"Recent" GCC versions (since commit fc62716fe8d1, backported to stable
branches) emit a vzeroupper instruction at the end of functions
containing AVX instructions. This causes the tst-audit10 test to fail
on CPUs lacking AVX instructions, despite the AVX512F check. The crash
occurs in the pltenter function of tst-auditmod10b.c.
Fix that by moving the code guarded by the check_avx512 function into
specific functions using the target ("avx512f") attribute. Note that
since commit 5359c3bc91cc ("x86-64: Remove compiler -mavx512f check") it
is safe to assume that the compiler has AVX512F support, thus the
__AVX512F__ checks can be dropped.
H.J. Lu [Sat, 12 Apr 2025 15:37:29 +0000 (08:37 -0700)]
x86: Detect Intel Diamond Rapids
Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit de14f1959ee5f9b845a7cae43bee03068b8136f0)
Scan xstate IDs up to the maximum supported xstate ID. Remove the
separate AMX xstate calculation. Instead, exclude the AMX space from
the start of TILECFG to the end of TILEDATA in xsave_state_size.
Completed validation on SKL/SKX/SPR/SDE and compared xsave state size
with "ld.so --list-diagnostics" option, no regression.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit 70b648855185e967e54668b101d24704c3fb869d)
elf: Extend glibc.rtld.execstack tunable to force executable stack (BZ 32653)
From the bug report [1], multiple programs still require to dlopen
shared libraries with either missing PT_GNU_STACK or with the executable
bit set. Although, in some cases, it seems to be a hard-craft assembly
source without the required .note.GNU-stack marking (so the static linker
is forced to set the stack executable if the ABI requires it), other
cases seem that the library uses trampolines [2].
Unfortunately, READ_IMPLIES_EXEC is not an option since on some ABIs
(x86_64), the kernel clears the bit, making it unsupported. To avoid
reinstating the broken code that changes stack permission on dlopen
(0ca8785a28), this patch extends the glibc.rtld.execstack tunable to
allow an option to force an executable stack at the program startup.
The tunable is a security issue because it defeats the PT_GNU_STACK
hardening. It has the slight advantage of making it explicit by the
caller, and, as for other tunables, this is disabled for setuid binaries.
A tunable also allows us to eventually remove it, but from previous
experiences, it would require some time.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=32653
[2] https://github.com/conda-forge/ctng-compiler-activation-feedstock/issues/143 Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 12a497c716f0a06be5946cabb8c3ec22a079771e)
Florian Weimer [Fri, 28 Mar 2025 08:26:59 +0000 (09:26 +0100)]
x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)
Previously, the initialization code reused the xsave_state_full_size
member of struct cpu_features for the TLSDESC state size. However,
the tunable processing code assumes that this member has the
original XSAVE (non-compact) state size, so that it can use its
value if XSAVEC is disabled via tunable.
This change uses a separate variable and not a struct member because
the value is only needed in ld.so and the static libc, but not in
libc.so. As a result, struct cpu_features layout does not change,
helping a future backport of this change.
Florian Weimer [Fri, 28 Mar 2025 08:26:06 +0000 (09:26 +0100)]
x86: Skip XSAVE state size reset if ISA level requires XSAVE
If we have to use XSAVE or XSAVEC trampolines, do not adjust the size
information they need. Technically, it is an operator error to try to
run with -XSAVE,-XSAVEC on such builds, but this change here disables
some unnecessary code with higher ISA levels and simplifies testing.
nptl: Check if thread is already terminated in sigcancel_handler (BZ 32782)
The SIGCANCEL signal handler should not issue __syscall_do_cancel,
which calls __do_cancel and __pthread_unwind, if the cancellation
is already in proces (and libgcc unwind is not reentrant). Any
cancellation signal received after is ignored.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Florian Weimer [Thu, 13 Mar 2025 05:07:07 +0000 (06:07 +0100)]
nptl: PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions (bug 32786)
The new initializer and struct layout does not initialize the
__g_signals field in the old struct layout before the change in
commit c36fc50781995e6758cae2b6927839d0157f213c ("nptl: Remove
g_refs from condition variables"). Bring back fields at the end
of struct __pthread_cond_s, so that they are again zero-initialized.
Michael Jeanson [Fri, 14 Feb 2025 18:54:22 +0000 (13:54 -0500)]
nptl: clear the whole rseq area before registration
Due to the extensible nature of the rseq area we can't explictly
initialize fields that are not part of the ABI yet. It was agreed with
upstream that all new fields will be documented as zero initialized by
userspace. Future kernels configured with CONFIG_DEBUG_RSEQ will
validate the content of all fields during registration.
Replace the explicit field initialization with a memset of the whole
rseq area which will cover fields as they are added to future kernels.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 689a62a4217fae78b9ce0db781dc2a421f2b1ab4)
Aurelien Jarno [Thu, 6 Mar 2025 18:34:15 +0000 (19:34 +0100)]
math: Remove an extra semicolon in math function declarations
Commit 6bc301672bfbd ("math: Remove __XXX math functions from installed
math.h [BZ #32418]") left an extra semicolon after macro expansion. For
instance the ceil declaration after expansion is:
H.J. Lu [Fri, 7 Mar 2025 00:58:47 +0000 (08:58 +0800)]
elf: Check if __attribute__ ((aligned (65536))) is supported
The BZ #32763 tests fail to build for MicroBlaze (which defines
MAX_OFILE_ALIGNMENT to (32768*8) in GCC, so __attribute__ ((aligned
(65536))) is unsupported). Add a configure-time check to enable BZ #32763
tests only if __attribute__ ((aligned (65536))) is supported.
Sam James [Tue, 18 Feb 2025 18:49:09 +0000 (18:49 +0000)]
Pass -Wl,--no-error-execstack for tests where -Wl,-z,execstack is used [PR32717]
When GNU Binutils is configured with --enable-error-execstack=yes, a handful
of our tests which rely on -Wl,-z,execstack fail. Pass --Wl,--no-error-execstack
to override the behaviour and get a warning instead.
Wilco Dijkstra [Tue, 24 Dec 2024 18:01:59 +0000 (18:01 +0000)]
AArch64: Add SVE memset
Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes. Performance of random memset benchmark improves
by ~2% on Neoverse V1.
Wilco Dijkstra [Fri, 13 Dec 2024 15:43:07 +0000 (15:43 +0000)]
math: Improve layout of exp/exp10 data
GCC aligns global data to 16 bytes if their size is >= 16 bytes. This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps. As a result on targets that support them, more load-pair
instructions are used in exp. Exp10 is improved by moving invlog10_2N later
so that neglog10_2hiN and neglog10_2loN can be loaded using load-pair.
The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2. Exp10 improves by 1.5%.
Yury Khrustalev [Tue, 21 Jan 2025 13:33:20 +0000 (13:33 +0000)]
aarch64: Add tests for Guarded Control Stack
These tests validate that GCS tunable works as expected depending
on the GCS markings in the test binaries.
Tests validate both static and dynamically linked binaries.
These new tests are AArch64 specific. Moreover, they are included only
if linker supports the "-z gcs=<value>" option. If built, these tests
will run on systems with and without HWCAP_GCS. In the latter case the
tests will be reported as UNSUPPORTED.
Luna Lamb [Thu, 13 Feb 2025 17:54:46 +0000 (17:54 +0000)]
Aarch64: Improve codegen in SVE exp and users, and update expf_inline
Use unpredicted muls, and improve memory access.
7%, 3% and 1% improvement in throughput microbenchmark on Neoverse V1,
for exp, exp2 and cosh respectively.
Yangyu Chen [Mon, 24 Feb 2025 17:12:19 +0000 (01:12 +0800)]
RISC-V: Fix IFUNC resolver cannot access gp pointer
In some cases, an IFUNC resolver may need to access the gp pointer to
access global variables. Such an object may have l_relocated == 0 at
this time. In this case, an IFUNC resolver will fail to access a global
variable and cause a SIGSEGV.
This patch fixes this issue by relaxing the check of l_relocated in
elf_machine_runtime_setup, but added a check for SHARED case to avoid
using this code in static-linked executables. Such object have already
set up the gp pointer in load_gp function and l->l_scope will be NULL if
it is a pie object. So if we use these code to set up the gp pointer
again for static-pie, it will causing a SIGSEGV in glibc as original bug
on BZ #31317.
I have also reproduced and checked BZ #31317 using the mold commit bed5b1731b ("illumos: Treat absolute symbols specially"), this patch can
fix the issue.
Also, we used the wrong gp pointer previously because ref->st_value is
not the relocated address but just the offset from the base address of
ELF. An edge case may happen if we reference gp pointer in a IFUNC
resolver in a PIE object, but it will not happen in compiler-generated
codes since -pie will disable relax to gp. In this case, the GP will be
initialized incorrectly since the ref->st_value is not the address after
relocation. This patch fixes this issue by adding the l->l_addr to
ref->st_value to get the relocated address for the gp pointer. We don't
use SYMBOL_ADDRESS macro here because __global_pointer$ is a special
symbol that has SHN_ABS type, but it will use PC-relative addressing in
the load_gp function using lla.
Closes: BZ #32269 Fixes: 96d1b9ac23 ("RISC-V: Fix the static-PIE non-relocated object check") Co-authored-by: Vivian Wang <dramforever@live.com> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
(cherry picked from commit 3fd2ff7685e3ee85c8cd2896f28ad62f67d7c483)
math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP. Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:
if (TININESS_AFTER_ROUNDING)
{
w.d = a1 + u.d;
if (w.ieee.exponent == 109)
return w.d * 0x1p-108;
}
This caused the exponent value to be wrong and the wrong return path
to be used.
Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
koraynilay [Sat, 22 Feb 2025 14:55:59 +0000 (15:55 +0100)]
math: Fix `unknown type name '__float128'` for clang 3.4 to 3.8.1 (bug 32694)
When compiling a program that includes <bits/floatn.h> using a clang version
between 3.4 (included) and 3.8.1 (included), clang will fail with `unknown type
name '__float128'; did you mean '__cfloat128'?`. This changes fixes the clang
prerequirements macro call in floatn.h to check for clang 3.9 instead of 3.4,
since support for __float128 was actually enabled in 3.9 by:
H.J. Lu [Wed, 19 Feb 2025 23:08:26 +0000 (07:08 +0800)]
x86 (__HAVE_FLOAT128): Defined to 0 for Intel SYCL compiler [BZ #32723]
Intel compiler always defines __INTEL_LLVM_COMPILER. When SYCL is
enabled by -fsycl, it also defines SYCL_LANGUAGE_VERSION. Since Intel
SYCL compiler doesn't support _Float128:
https://github.com/intel/llvm/issues/16903
define __HAVE_FLOAT128 to 0 for Intel SYCL compiler.
Aurelien Jarno [Sat, 15 Feb 2025 10:08:33 +0000 (11:08 +0100)]
Fix tst-aarch64-pkey to handle ENOSPC as not supported
The syscall pkey_alloc can return ENOSPC to indicate either that all
keys are in use or that the system runs in a mode in which memory
protection keys are disabled. In such case the test should not fail and
just return unsupported.
This matches the behaviour of the generic tst-pkey.
nptl: Correct stack size attribute when stack grows up [BZ #32574]
Set stack size attribute to the size of the mmap'd region only
when the size of the remaining stack space is less than the size
of the mmap'd region.
This was reversed. As a result, the initial stack size was only
135168 bytes. On architectures where the stack grows down, the
initial stack size is approximately 8384512 bytes with the default
rlimit settings. The small main stack size on hppa broke
applications like ruby that check for stack overflows.
Signed-off-by: John David Anglin <dave.anglin@bell.net>