git.ipfire.org Git - thirdparty/glibc.git/log

nptl: clear the whole rseq area before registration

Due to the extensible nature of the rseq area we can't explictly
initialize fields that are not part of the ABI yet. It was agreed with
upstream that all new fields will be documented as zero initialized by
userspace. Future kernels configured with CONFIG_DEBUG_RSEQ will
validate the content of all fields during registration.

Replace the explicit field initialization with a memset of the whole
rseq area which will cover fields as they are added to future kernels.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 689a62a4217fae78b9ce0db781dc2a421f2b1ab4)

math: Improve layout of exp/exp10 data

GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps.  As a result on targets that support them, more load-pair
instructions are used in exp.  Exp10 is improved by moving invlog10_2N later
so that neglog10_2hiN and neglog10_2loN can be loaded using load-pair.

The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2.  Exp10 improves by 1.5%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 5afaf99edb326fd9f36eb306a828d129a3a1d7f7)

AArch64: Use prefer_sve_ifuncs for SVE memset

Use prefer_sve_ifuncs for SVE memset just like memcpy.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 0f044be1dae5169d0e57f8d487b427863aeadab4)

AArch64: Add SVE memset

Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes. Performance of random memset benchmark improves
by ~2% on Neoverse V1.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 163b1bbb76caba4d9673c07940c5930a1afa7548)

math: Improve layout of expf data

GCC aligns global data to 16 bytes if their size is >= 16 bytes. This patch
changes the exp2f_data struct slightly so that the fields are better aligned.
As a result on targets that support them, load-pair instructions accessing
poly_scaled and invln2_scaled are now 16-byte aligned.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 44fa9c1080fe6a9539f0d2345b9d2ae37b8ee57a)

AArch64: Remove zva_128 from memset

Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.

[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html

Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit a08d9a52f967531a77e1824c23b5368c6434a72d)

AArch64: Optimize memset

Improve small memsets by avoiding branches and use overlapping stores.
Use DC ZVA for copies over 128 bytes. Remove unnecessary code for ZVA sizes
other than 64 and 128. Performance of random memset benchmark improves by 24%
on Neoverse N1.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit cec3aef32412779e207f825db0d057ebb4628ae8)

AArch64: Improve generic strlen

Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates. Change final
size computation to avoid increasing latency. On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3dc426b642dcafdbc11a99f2767e081d086f5fc7)

Revert "AArch64: Add vector logp1 alias for log1p"

This reverts commit a991a0fc7c051d7ef2ea7778e0a699f22d4e53d7.

AArch64: Improve codegen for SVE powf

Improve memory access with indexed/unpredicated instructions.
Eliminate register spills. Speedup on Neoverse V1: 3%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 95e807209b680257a9afe81a507754f1565dbb4d)

AArch64: Improve codegen for SVE pow

Move constants to struct. Improve memory access with indexed/unpredicated
instructions. Eliminate register spills. Speedup on Neoverse V1: 24%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 0b195651db3ae793187c7dd6d78b5a7a8da9d5e6)

AArch64: Improve codegen for SVE erfcf

Reduce number of MOV/MOVPRFXs and use unpredicated FMUL.
Replace MUL with LSL. Speedup on Neoverse V1: 6%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit f5ff34cb3c75ec1061c75bb9188b3c1176426947)

Aarch64: Improve codegen in SVE exp and users, and update expf_inline

Use unpredicted muls, and improve memory access.
7%, 3% and 1% improvement in throughput microbenchmark on Neoverse V1,
for exp, exp2 and cosh respectively.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit c0ff447edf19bd4630fe79adf5e8b896405b059f)

Aarch64: Improve codegen in SVE asinh

Use unpredicated muls, use lanewise mla's and improve memory access.
1% regression in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 8f0e7fe61e0a2ad5ed777933703ce09053810ec4)

AArch64: Improve codegen in SVE expm1f and users

Use unpredicated muls, use absolute compare and improve memory access.
Expm1f, sinhf and tanhf show 7%, 5% and 1% improvement in throughput
microbenchmark on Neoverse V1.

(cherry picked from commit f86b4cf87581cf1e45702b07880679ffa0b1f47a)

AArch64: Improve codegen for SVE log1pf users

Reduce memory access by using lanewise MLA and reduce number of MOVPRFXs.
Move log1pf implementation to inline helper function.
Speedup on Neoverse V1 for log1pf (10%), acoshf (-1%), atanhf (2%), asinhf (2%).

(cherry picked from commit 91c1fadba338752bf514cd4cca057b27b1b10eed)

AArch64: Improve codegen for SVE logs

Reduce memory access by using lanewise MLA and moving constants to struct
and reduce number of MOVPRFXs.
Update maximum ULP error for double log_sve from 1 to 2.
Speedup on Neoverse V1 for log (3%), log2 (5%), and log10 (4%).

(cherry picked from commit 32d193a372feb28f9da247bb7283d404b84429c6)

AArch64: Improve codegen in SVE tans

Improves memory access.
Tan: MOVPRFX 7 -> 2, LD1RD 12 -> 5, move MOV away from return.
Tanf: MOV 2 -> 1, MOVPRFX 6 -> 3, LD1RW 5 -> 4, move mov away from return.

(cherry picked from commit aa6609feb20ebf8653db639dabe2a6afc77b02cc)

AArch64: Improve codegen in AdvSIMD asinh

Improves memory access and removes spills.
Load the polynomial evaluation coefficients into 2 vectors and use lanewise
MLAs. Reduces MOVs 6->3 , LDR 11->5, STR/STP 2->0, ADRP 3->2.

(cherry picked from commit 140b985e5a2071000122b3cb63ebfe88cf21dd29)

AArch64: Improve codegen of AdvSIMD expf family

Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
Also use intrinsics instead of native operations.
expf: 3% improvement in throughput microbenchmark on Neoverse V1, exp2f: 5%,
exp10f: 13%, coshf: 14%.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit cff9648d0b50d19cdaf685f6767add040d4e1a8e)

AArch64: Improve codegen of AdvSIMD atan(2)(f)

Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 6914774b9d3460876d9ad4482782213ec01a752e)

AArch64: Improve codegen of AdvSIMD logf function family

Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1 for log2 and log,
and 2% for log10.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit d6e034f5b222a9ed1aeb5de0c0c7d0dda8b63da3)

AArch64: Improve codegen in users of ADVSIMD log1p helper

Add inline helper for log1p and rearrange operations so MOV
is not necessary in reduction or around the special-case handler.
Reduce memory access by using more indexed MLAs in polynomial.
Speedup on Neoverse V1 for log1p (3.5%), acosh (7.5%) and atanh (10%).

(cherry picked from commit ca0c0d0f26fbf75b9cacc65122b457e8fdec40b8)

AArch64: Improve codegen in AdvSIMD logs

Remove spurious ADRP and a few MOVs.
Reduce memory access by using more indexed MLAs in polynomial.
Align notation so that algorithms are easier to compare.
Speedup on Neoverse V1 for log10 (8%), log (8.5%), and log2 (10%).
Update error threshold in AdvSIMD log (now matches SVE log).

(cherry picked from commit 8eb5ad2ebc94cc5bedbac57c226c02ec254479c7)

AArch64: Improve codegen in AdvSIMD pow

Remove spurious ADRP. Improve memory access by shuffling constants and
using more indexed MLAs.

A few more optimisation with no impact on accuracy
- force fmas contraction
- switch from shift-aided rint to rint instruction

Between 1 and 5% throughput improvement on Neoverse
V1 depending on benchmark.

(cherry picked from commit 569cfaaf4984ae70b23c61ee28a609b5aef93fea)

AArch64: Remove SVE erf and erfc tables

By using a combination of mask-and-add instead of the shift-based
index calculation the routines can share the same table as other
variants with no performance degradation.

The tables change name because of other changes in downstream AOR.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 2d82d781a539ce8e82178fc1fa2c99ae1884e7fe)

AArch64: Small optimisation in AdvSIMD erf and erfc

In both routines, reduce register pressure such that GCC 14 emits no
spills for erf and fewer spills for erfc. Also use more efficient
comparison for the special-case in erf.

Benchtests show erf improves by 6.4%, erfc by 1.0%.

(cherry picked from commit 1cf29fbc5be23db775d1dfa6b332ded6e6554252)

AArch64: Simplify rounding-multiply pattern in several AdvSIMD routines

This operation can be simplified to use simpler multiply-round-convert
sequence, which uses fewer instructions and constants.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 16a59571e4e9fd019d3fc23a2e7d73c1df8bb5cb)

AArch64: Improve codegen in users of ADVSIMD expm1f helper

Rearrange operations so MOV is not necessary in reduction or around
the special-case handler. Reduce memory access by using more indexed
MLAs in polynomial.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 7900ac490db32f6bccff812733f00280dde34e27)

AArch64: Improve codegen in users of AdvSIMD log1pf helper

log1pf is quite register-intensive - use fewer registers for the
polynomial, and make various changes to shorten dependency chains in
parent routines.  There is now no spilling with GCC 14.  Accuracy moves
around a little - comments adjusted accordingly but does not require
regen-ulps.

Use the helper in log1pf as well, instead of having separate
implementations.  The more accurate polynomial means special-casing can
be simplified, and the shorter dependency chain avoids the usual dance
around v0, which is otherwise difficult.

There is a small duplication of vectors containing 1.0f (or 0x3f800000) -
GCC is not currently able to efficiently handle values which fit in FMOV
but not MOVI, and are reinterpreted to integer.  There may be potential
for more optimisation if this is fixed.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 5bc100bd4b7e00db3009ae93d25d303341545d23)

AArch64: Improve codegen in SVE F32 logs

Reduce MOVPRFXs by using unpredicated (non-destructive) instructions
where possible. Similar to the recent change to AdvSIMD F32 logs,
adjust special-case arguments and bounds to allow for more optimal
register usage. For all 3 routines one MOVPRFX remains in the
reduction, which cannot be avoided as immediate AND and ASR are both
destructive.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit a15b1394b5eba98ffe28a02a392b587e4fe13c0d)

AArch64: Improve codegen in SVE expf & related routines

Reduce MOV and MOVPRFX by improving special-case handling.  Use inline
helper to duplicate the entire computation between the special- and
non-special case branches, removing the contention for z0 between x
and the return value.

Also rearrange some MLAs and MLSs - by making the multiplicand the
destination we can avoid a MOVPRFX in several cases.  Also change which
constants go in the vector used for lanewise ops - the last lane is no
longer wasted.

Spotted that shift was incorrect in exp2f and exp10f, w.r.t. to the
comment that explains it.  Fixed - worst-case ULP for exp2f moves
around but it doesn't change significantly for either routine.

Worst-case error for coshf increases due to passing x to exp rather
than abs(x) - updated the comment, but does not require regen-ulps.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 7b8c134b5460ed933d610fa92ed1227372b68fdc)

AArch64: Add vector logp1 alias for log1p

This enables vectorisation of C23 logp1, which is an alias for log1p.
There are no new tests or ulp entries because the new symbols are simply
aliases.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 751a5502bea1d13551c62c47bb9bd25bff870cda)

aarch64: Avoid redundant MOVs in AdvSIMD F32 logs

Since the last operation is destructive, the first argument to the FMA
also has to be the first argument to the special-case in order to
avoid unnecessary MOVs. Reorder arguments and adjust special-case
bounds to facilitate this.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 8b09af572b208bfde4d31c6abbae047dcc217675)

math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]

A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP.  Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:

    if (TININESS_AFTER_ROUNDING)
      {
        w.d = a1 + u.d;
        if (w.ieee.exponent == 109)
          return w.d * 0x1p-108;
      }

This caused the exponent value to be wrong and the wrong return path
to be used.

Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.

Signed-off-by: John David Anglin <dave.anglin@bell.net>

assert: Add test for CVE-2025-0395

Use the __progname symbol to override the program name to induce the
failure that CVE-2025-0395 describes.

This is related to BZ #32582

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit cdb9ba84191ce72e86346fb8b1d906e7cd930ea2)

nptl: Correct stack size attribute when stack grows up [BZ #32574]

Set stack size attribute to the size of the mmap'd region only
when the size of the remaining stack space is less than the size
of the mmap'd region.

This was reversed.  As a result, the initial stack size was only
135168 bytes.  On architectures where the stack grows down, the
initial stack size is approximately 8384512 bytes with the default
rlimit settings.  The small main stack size on hppa broke
applications like ruby that check for stack overflows.

Signed-off-by: John David Anglin <dave.anglin@bell.net>

stdlib: Test using setenv with updated environ [BZ #32588]

Add a test for setenv with updated environ. Verify that BZ #32588 is
fixed.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 8ab34497de14e35aff09b607222fe1309ef156da)

malloc: obscure calloc use in tst-calloc

Similar to a9944a52c967ce76a5894c30d0274b824df43c7a and
f9493a15ea9cfb63a815c00c23142369ec09d8ce, we need to hide calloc use from
the compiler to accommodate GCC's r15-6566-g804e9d55d9e54c change.

First, include tst-malloc-aux.h, but then use `volatile` variables
for size.

The test passes without the tst-malloc-aux.h change but IMO we want
it there for consistency and to avoid future problems (possibly silent).

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c3d1dac96bdd10250aa37bb367d5ef8334a093a1)

Hide all malloc functions from compiler [BZ #32366]

Since -1 isn't a power of two, compiler may reject it, hide memalign from
Clang 19 which issues an error:

tst-memalign.c:86:31: error: requested alignment is not a power of 2 [-Werror,-Wnon-power-of-two-alignment]
   86 |   p = memalign (-1, pagesize);
      |                 ^~
tst-memalign.c:86:31: error: requested alignment must be 4294967296 bytes or smaller; maximum alignment assumed [-Werror,-Wbuiltin-assume-aligned-alignment]
   86 |   p = memalign (-1, pagesize);
      |                 ^~

Update tst-malloc-aux.h to hide all malloc functions and include it in
all malloc tests to prevent compiler from optimizing out any malloc
functions.

Tested with Clang 19.1.5 and GCC 15 20241206 for BZ #32366.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit f9493a15ea9cfb63a815c00c23142369ec09d8ce)

Fix underallocation of abort_msg_s struct (CVE-2025-0395)

Include the space needed to store the length of the message itself, in
addition to the message string. This resolves BZ #32582.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 68ee0f704cb81e9ad0a78c644a83e1e9cd2ee578)

Fix missing randomness in __gen_tempname (bug 32214)

Make sure to update the random value also if getrandom fails.

Fixes: 686d542025 ("posix: Sync tempname with gnulib")
(cherry picked from commit 5f62cf88c4530c11904482775b7582bd7f6d80d2)

hppa: Simplify handling of sanity check errors in clone.S.

This simplifies the handling of sanity check errors in clone.S.
Adjusted a couple of comments to reflect current code.

Signed-off-by: John David Anglin <dave.anglin@bell.net>

hppa: Fix strace detach-vfork test

This change implements vfork.S for direct support of the vfork
syscall.  clone.S is revised to correct child support for the
vfork case.

The main bug was creating a frame prior to the clone syscall.
This was done to allow the rp and r4 registers to be saved and
restored from the stack frame.  r4 was used to save and restore
the PIC register, r19, across the system call and the call to
set errno.  But in the vfork case, it is undefined behavior
for the child to return from the function in which vfork was
called.  It is surprising that this usually worked.

Syscalls on hppa save and restore rp and r19, so we don't need
to create a frame prior to the clone syscall.  We only need a
frame when __syscall_error is called.  We also don't need to
save and restore r19 around the call to $$dyncall as r19 is not
used in the code after $$dyncall.

This considerably simplifies clone.S.

Signed-off-by: John David Anglin <dave.anglin@bell.net>

x86: Avoid integer truncation with large cache sizes (bug 32470)

Some hypervisors report 1 TiB L3 cache size. This results
in some variables incorrectly getting zeroed, causing crashes
in memcpy/memmove because invariants are violated.

(cherry picked from commit 61c3450db96dce96ad2b24b4f0b548e6a46d68e5)

linux: Fix tst-syscall-restart.c on old gcc (BZ 32283)

To avoid a parameter name omitted error.

(cherry picked from commit ab564362d0470d10947c24155ec048c4e14a009d)

math: Exclude internal math symbols for tests [BZ #32414]

Since internal tests don't have access to internal symbols in libm,
exclude them for internal tests. Also make tst-strtod5 and tst-strtod5i
depend on $(libm) to support older versions of GCC which can't inline
copysign family functions. This fixes BZ #32414.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit 5df09b444835fca6e64b3d4b4a5beb19b3b2ba21)

sparc: Fix restartable syscalls (BZ 32173)

The commit 'sparc: Use Linux kABI for syscall return'
(86c5d2cf0ce046279baddc7faa27da71f1a89fde) did not take into account
a subtle sparc syscall kABI constraint.  For syscalls that might block
indefinitely, on an interrupt (like SIGCONT) the kernel will set the
instruction pointer to just before the syscall:

arch/sparc/kernel/signal_64.c
476 static void do_signal(struct pt_regs *regs, unsigned long orig_i0)
477 {
[...]
525                 if (restart_syscall) {
526                         switch (regs->u_regs[UREG_I0]) {
527                         case ERESTARTNOHAND:
528                         case ERESTARTSYS:
529                         case ERESTARTNOINTR:
530                                 /* replay the system call when we are done */
531                                 regs->u_regs[UREG_I0] = orig_i0;
532                                 regs->tpc -= 4;
533                                 regs->tnpc -= 4;
534                                 pt_regs_clear_syscall(regs);
535                                 fallthrough;
536                         case ERESTART_RESTARTBLOCK:
537                                 regs->u_regs[UREG_G1] = __NR_restart_syscall;
538                                 regs->tpc -= 4;
539                                 regs->tnpc -= 4;
540                                 pt_regs_clear_syscall(regs);
541                         }

However, on a SIGCONT it seems that 'g1' register is being clobbered after the
syscall returns.  Before 86c5d2cf0ce046279, the 'g1' was always placed jus
before the 'ta' instruction which then reloads the syscall number and restarts
the syscall.

On master, where 'g1' might be placed before 'ta':

  $ cat test.c
  #include <unistd.h>

  int main ()
  {
    pause ();
  }
  $ gcc test.c -o test
  $ strace -f ./t
  [...]
  ppoll(NULL, 0, NULL, NULL, 0

On another terminal

  $ kill -STOP 2262828

  $ strace -f ./t
  [...]
  --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  --- stopped by SIGSTOP ---

And then

  $ kill -CONT 2262828

Results in:

  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  restart_syscall(<... resuming interrupted ppoll ...>) = -1 EINTR (Interrupted system call)

Where the expected behaviour would be:

  $ strace -f ./t
  [...]
  ppoll(NULL, 0, NULL, NULL, 0)           = ? ERESTARTNOHAND (To be restarted if no handler)
  --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  --- stopped by SIGSTOP ---
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  ppoll(NULL, 0, NULL, NULL, 0

Just moving the 'g1' setting near the syscall asm is not suffice,
the compiler might optimize it away (as I saw on cancellation.c by
trying this fix).  Instead, I have change the inline asm to put the
'g1' setup in ithe asm block.  This would require to change the asm
constraint for INTERNAL_SYSCALL_NCS, since the syscall number is not
constant.

Checked on sparc64-linux-gnu.

Reported-by: René Rebe <rene@exactcode.de>
Tested-by: Sam James <sam@gentoo.org>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 2c1903cbbac0022153a67776f474c221250ad6ed)
(cherry picked from commit 1cd7e13289b91e1495a1865c1f678196d1bb7be4)

support: Make support_process_state_wait return the found state

So caller can check which state was found if multiple ones are
asked.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 38316352e0f742f3a2b5816a61a4b603cb5573f8)
(cherry picked from commit 2e38c5a090b3a54040b6e508d42e5a76e492c6e8)

malloc: add indirection for malloc(-like) functions in tests [BZ #32366]

GCC 15 introduces allocation dead code removal (DCE) for PR117370 in
r15-5255-g7828dc070510f8. This breaks various glibc tests which want
to assert various properties of the allocator without doing anything
obviously useful with the allocated memory.

Alexander Monakov rightly pointed out that we can and should do better
than passing -fno-malloc-dce to paper over the problem. Not least because
GCC 14 already does such DCE where there's no testing of malloc's return
value against NULL, and LLVM has such optimisations too.

Handle this by providing malloc (and friends) wrappers with a volatile
function pointer to obscure that we're calling malloc (et. al) from the
compiler.

Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
(cherry picked from commit a9944a52c967ce76a5894c30d0274b824df43c7a)

nptl: initialize cpu_id_start prior to rseq registration

When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.

While current kernels don't validate the content of this field on
registration, future ones could.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
(cherry picked from commit d9f40387d3305d97e30a8cf8724218c42a63680a)

nptl: initialize rseq area prior to registration

Per the rseq syscall documentation, 3 fields are required to be
initialized by userspace prior to registration, they are 'cpu_id',
'rseq_cs' and 'flags'. Since we have no guarantee that 'struct pthread'
is cleared on all architectures, explicitly set those 3 fields prior to
registration.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 97f60abd25628425971f07e9b0e7f8eec0741235)

elf: handle addition overflow in _dl_find_object_update_1 [BZ #32245]

The remaining_to_add variable can be 0 if (current_used + count) wraps,
This is caught by GCC 14+ on hppa, which determines from there that
target_seg could be be NULL when remaining_to_add is zero, which in
turns causes a -Wstringop-overflow warning:

In file included from ../include/atomic.h:49,
                  from dl-find_object.c:20:
In function '_dlfo_update_init_seg',
     inlined from '_dl_find_object_update_1' at dl-find_object.c:689:30,
     inlined from '_dl_find_object_update' at dl-find_object.c:805:13:
../sysdeps/unix/sysv/linux/hppa/atomic-machine.h:44:4: error: '__atomic_store_4' writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
    44 |    __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);                        \
       |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dl-find_object.c:644:3: note: in expansion of macro 'atomic_store_relaxed'
   644 |   atomic_store_relaxed (&seg->size, new_seg_size);
       |   ^~~~~~~~~~~~~~~~~~~~
In function '_dl_find_object_update':
cc1: note: destination object is likely at address zero

In practice, this is not possible as it represent counts of link maps.
Link maps have sizes larger than 1 byte, so the sum of any two link map
counts will always fit within a size_t without wrapping around.

This patch therefore adds a check on remaining_to_add == 0 and tell GCC
that this can not happen using __builtin_unreachable.

Thanks to Andreas Schwab for the investigation.

Closes: BZ #32245
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: John David Anglin <dave.anglin@bell.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 6c915c73d08028987232f6dc718f218c61113240)

linux: sparc: Fix clone for LEON/sparcv8 (BZ 31394)

The sparc clone mitigation (faeaa3bc9f76030) added the use of
flushw, which is not support by LEON/sparcv8. As discussed on
the libc-alpha, 'ta 3' is a working alternative [1].

[1] https://sourceware.org/pipermail/libc-alpha/2024-August/158905.html

Checked with a build for sparcv8-linux-gnu targetting leon.

Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
(cherry picked from commit 5e8cfc5d625e6dd000a0371d21d792836ea7951a)

Mitigation for "clone on sparc might fail with -EFAULT for no valid reason" (bz 31394)

It seems the kernel can not deal with uncommitted stack space in the area intended
for the register window when executing the clone() system call. So create a nested
frame (proxy for the kernel frame) and flush it from the processor to memory to
force committing pages to the stack before invoking the system call.

Bug: https://www.mail-archive.com/debian-glibc@lists.debian.org/msg62592.html
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31394
See-also: https://lore.kernel.org/sparclinux/62f9be9d-a086-4134-9a9f-5df8822708af@mkarcher.dialup.fu-berlin.de/
Signed-off-by: Michael Karcher <sourceware-bugzilla@mkarcher.dialup.fu-berlin.de>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit faeaa3bc9f76030b9882ccfdee232fc0ca6dcb06)

elf: Change ldconfig auxcache magic number (bug 32231)

In commit c628c2296392ed3bf2cb8d8470668e64fe53389f (elf: Remove
ldconfig kernel version check), the layout of auxcache entries
changed because the osversion field was removed from
struct aux_cache_file_entry. However, AUX_CACHEMAGIC was not
changed, so existing files are still used, potentially leading
to unintended ldconfig behavior. This commit changes AUX_CACHEMAGIC,
so that the file is regenerated.

Reported-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 0a536f6e2f76e3ef581b3fd9af1e5cf4ddc7a5a2)

Make tst-strtod-underflow type-generic

The test tst-strtod-underflow covers various edge cases close to the
underflow threshold for strtod (especially cases where underflow on
architectures with after-rounding tininess detection depends on the
rounding mode). Make it use the type-generic machinery, with
corresponding test inputs for each supported floating-point format, so
that other functions in the strtod family are tested for underflow
edge cases as well.

Tested for x86_64.

(cherry picked from commit 94ca2c0894f0e1b62625c369cc598a2b9236622c)

libio: Set _vtable_offset before calling _IO_link_in [BZ #32148]

Since _IO_vtable_offset is used to detect the old binaries, set it
in _IO_old_file_init_internal before calling _IO_link_in which checks
_IO_vtable_offset. Add a glibc 2.0 test with copy relocation on
_IO_stderr_@GLIBC_2.0 to verify that fopen won't cause memory corruption.
This fixes BZ #32148.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 9dfea3de7f690bff70e3c6eb346b9ad082bb2e35)

Add tests of more strtod special cases

There is very little test coverage of inputs to strtod-family
functions that don't contain anything that can be parsed as a number
(one test of ".y" in tst-strtod2), and none that I can see of skipping
initial whitespace. Add some tests of these things to tst-strtod2.

Tested for x86_64.

(cherry picked from commit 378039ca578c2ea93095a1e710d96f58c68a3997)

Add more tests of strtod end pointer

Although there are some tests in tst-strtod2 and tst-strtod3 for the
end pointer provided by strtod when it doesn't parse the whole string,
they aren't very thorough. Add tests of more such cases to
tst-strtod2.

Tested for x86_64.

(cherry picked from commit b5d3737b305525315e0c7c93ca49eadc868eabd5)

Make tst-strtod2 and tst-strtod5 type-generic

Some of the strtod tests use type-generic machinery in tst-strtod.h to
test the strto* functions for all floating types, while others only
test double even when the tests are in fact meaningful for all
floating types.

Convert tst-strtod2 and tst-strtod5 to use the type-generic machinery
so they test all floating types. I haven't tried to convert them to
use newer test interfaces in other ways, just made the changes
necessary to use the type-generic machinery.

Tested for x86_64.

(cherry picked from commit 8de031bcb9adfa736c0caed2c79d10947b8d8f48)

powerpc64le: Build new strtod tests with long double ABI flags (bug 32145)

This fixes several test failures:

=====FAIL: stdlib/tst-strtod1i.out=====
Locale tests
all OK
Locale tests
all OK
Locale tests
strtold("1,5") returns -6,38643e+367 and not 1,5
strtold("1.5") returns 1,5 and not 1
strtold("1.500") returns 1 and not 1500
strtold("36.893.488.147.419.103.232") returns 1500 and not 3,68935e+19
Locale tests
all OK

=====FAIL: stdlib/tst-strtod3.out=====
0: got wrong results -2.5937e+4826, expected 0

=====FAIL: stdlib/tst-strtod4.out=====
0: got wrong results -6,38643e+367, expected 0
1: got wrong results 0, expected 1e+06
2: got wrong results 1e+06, expected 10

=====FAIL: stdlib/tst-strtod5i.out=====
0: got wrong results -6,38643e+367, expected 0
2: got wrong results 0, expected -0
4: got wrong results -0, expected 0
5: got wrong results 0, expected -0
6: got wrong results -0, expected 0
7: got wrong results 0, expected -0
8: got wrong results -0, expected 0
9: got wrong results 0, expected -0
10: got wrong results -0, expected 0
11: got wrong results 0, expected -0
12: got wrong results -0, expected 0
13: got wrong results 0, expected -0
14: got wrong results -0, expected 0
15: got wrong results 0, expected -0
16: got wrong results -0, expected 0
17: got wrong results 0, expected -0
18: got wrong results -0, expected 0
20: got wrong results 0, expected -0
22: got wrong results -0, expected 0
23: got wrong results 0, expected -0
24: got wrong results -0, expected 0
25: got wrong results 0, expected -0
26: got wrong results -0, expected 0
27: got wrong results 0, expected -0

Fixes commit 3fc063dee01da4f80920a14b7db637c8501d6fd4
("Make __strtod_internal tests type-generic").

Suggested-by: Joseph Myers <josmyers@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit cc3e743fc09ee6fca45767629df9cbcbe1feba82)

Do not set errno for overflowing NaN payload in strtod/nan (bug 32045)

As reported in bug 32045, it's incorrect for strtod/nan functions to
set errno based on overflowing payload (strtod should only set errno
for overflow / underflow of its actual result, and potentially if
nothing in the string can be parsed as a number at all; nan should be
a pure function that never sets it). Save and restore errno around
the internal strtoull call and add associated test coverage.

Tested for x86_64.

(cherry picked from commit 64f62c47e9c350f353336f2df6714e1d48ec50d8)

Improve NaN payload testing

There are two separate sets of tests of NaN payloads in glibc:

* libm-test-{get,set}payload* verify that getpayload, setpayload,
  setpayloadsig and __builtin_nan functions are consistent in their
  payload handling.

* test-nan-payload verifies that strtod-family functions and the
  not-built-in nan functions are consistent in their payload handling.

Nothing, however, connects the two sets of functions (i.e., verifies
that strtod / nan are consistent with getpayload / setpayload /
__builtin_nan).

Improve test-nan-payload to check actual payload value with getpayload
rather than just verifying that the strtod and nan functions produce
the same NaN.  Also check that the NaNs produced aren't signaling and
extend the tests to cover _FloatN / _FloatNx.

Tested for x86_64.

(cherry picked from commit be77d5ae417236883c02d3d67c0716e3f669fa41)

Make __strtod_internal tests type-generic

Some of the strtod tests use type-generic machinery in tst-strtod.h to
test the strto* functions for all floating types, while others only
test double even when the tests are in fact meaningful for all
floating types.

Convert the tests of the internal __strtod_internal interface to cover
all floating types. I haven't tried to convert them to use newer test
interfaces in other ways, just made the changes necessary to use the
type-generic machinery. As an internal interface, there are no
aliases for different types with the same ABI (however,
__strtold_internal is defined even if long double has the same ABI as
double), so macros used by the type-generic testing code are redefined
as needed to avoid expecting such aliases to be present.

Tested for x86_64.

(cherry picked from commit 3fc063dee01da4f80920a14b7db637c8501d6fd4)

Fix strtod subnormal rounding (bug 30220)

As reported in bug 30220, the implementation of strtod-family
functions has a bug in the following case: the input string would,
with infinite exponent range, take one more bit to represent than is
available in the normal precision of the return type; the value
represented is in the subnormal range; and there are no nonzero bits
in the value, below those that can be represented in subnormal
precision, other than the least significant bit and possibly the
0.5ulp bit. In this case, round_and_return ends up discarding the
least significant bit.

Fix by saving that bit to merge into more_bits (it can't be merged in
at the time it's computed, because more_bits mustn't include this bit
in the case of after-rounding tininess detection checking if the
result is still subnormal when rounded to normal precision, so merging
this bit into more_bits needs to take place after that check).

Tested for x86_64.

(cherry picked from commit 457622c2fa8f9f7435822d5287a437bc8be8090d)

More thoroughly test underflow / errno in tst-strtod-round

Add tests of underflow in tst-strtod-round, and thus also test for
errno being unchanged when there is neither overflow nor underflow.
The errno setting before the function call to test for being unchanged
is adjusted to set errno to 12345 instead of 0, so that any bugs where
strtod sets errno to 0 would be detected.

This doesn't add any new test inputs for tst-strtod-round, and in
particular doesn't cover the edge cases of underflow the way
tst-strtod-underflow does (none of the existing test inputs for
tst-strtod-round actually exercise cases that have underflow with
before-rounding tininess detection but not with after-rounding
tininess detection), but at least it provides some coverage (as per
the recent discussions) that ordinary non-overflowing non-underflowing
inputs to these functions do not set errno.

Tested for x86_64.

(cherry picked from commit d73ed2601b7c3c93c3529149a3d7f7b6177900a8)

Test errno setting on strtod overflow in tst-strtod-round

We have no tests that errno is set to ERANGE on overflow of
strtod-family functions (we do have some tests for underflow, in
tst-strtod-underflow). Add such tests to tst-strtod-round.

Tested for x86_64.

(cherry picked from commit 207d64feb26279e152c50744e3c37e68491aca99)

Add tests of fread

There seem to be no glibc tests specifically for the fread function.
Add basic tests of that function.

Tested for x86_64.

(cherry picked from commit d14c977c65aac7db35bb59380ef99d6582c4f930)

stdio-common: Add new test for fdopen

This commit adds fdopen test with all modes.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 1d72fa3cfa046f7293421a7e58f2a272474ea901)

libio: Attempt wide backup free only for non-legacy code

_wide_data and _mode are not available in legacy code, so do not attempt
to free the wide backup buffer in legacy code.

Resolves: BZ #32137 and BZ #27821

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit ae4d44b1d501421ad9a3af95279b8f4d1546f1ce)

debug: Fix read error handling in pcprofiledump

The reading loops did not check for read failures. Addresses
a static analysis report.

Manually tested by compiling a program with the GCC's
-finstrument-functions option, running it with
“LD_PRELOAD=debug/libpcprofile.so PCPROFILE_OUTPUT=output-file”,
and reviewing the output of “debug/pcprofiledump output-file”.

(cherry picked from commit 89b088bf70c651c231bf27e644270d093b8f144a)

elf: Fix tst-dlopen-tlsreinit1.out test dependency

Fixes commit 5097cd344fd243fb8deb6dec96e8073753f962f9
("elf: Avoid re-initializing already allocated TLS in dlopen
(bug 31717)").

Reported-by: Patsy Griffin <patsy@redhat.com>
Reviewed-by: Patsy Griffin <patsy@redhat.com>
(cherry picked from commit e82a7cb1622bff08d8e3a144d7c5516a088f1cbc)

elf: Avoid re-initializing already allocated TLS in dlopen (bug 31717)

The old code used l_init_called as an indicator for whether TLS
initialization was complete.  However, it is possible that
TLS for an object is initialized, written to, and then dlopen
for this object is called again, and l_init_called is not true at
this point.  Previously, this resulted in TLS being initialized
twice, discarding any interim writes (technically introducing a
use-after-free bug even).

This commit introduces an explicit per-object flag, l_tls_in_slotinfo.
It indicates whether _dl_add_to_slotinfo has been called for this
object.  This flag is used to avoid double-initialization of TLS.
In update_tls_slotinfo, the first_static_tls micro-optimization
is removed because preserving the initalization flag for subsequent
use by the second loop for static TLS is a bit complicated, and
another per-object flag does not seem to be worth it.  Furthermore,
the l_init_called flag is dropped from the second loop (for static
TLS initialization) because l_need_tls_init on its own prevents
double-initialization.

The remaining l_init_called usage in resize_scopes and update_scopes
is just an optimization due to the use of scope_has_map, so it is
not changed in this commit.

The isupper check ensures that libc.so.6 is TLS is not reverted.
Such a revert happens if l_need_tls_init is not cleared in
_dl_allocate_tls_init for the main_thread case, now that
l_init_called is not checked anymore in update_tls_slotinfo
in elf/dl-open.c.

Reported-by: Jonathon Anderson <janderson@rice.edu>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 5097cd344fd243fb8deb6dec96e8073753f962f9)

elf: Clarify and invert second argument of _dl_allocate_tls_init

Also remove an outdated comment: _dl_allocate_tls_init is
called as part of pthread_create.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit fe06fb313bddf7e4530056897d4a706606e49377)

nptl: Use <support/check.h> facilities in tst-setuid3

Remove local FAIL macro in favor to FAIL_EXIT1 from <support/check.h>,
which provides equivalent reporting, with the name of the file and the
line number within of the failure site additionally included. Remove
FAIL_ERR altogether and include ": %m" explicitly with the format string
supplied to FAIL_EXIT1 as there seems little value to have a separate
macro just for this.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8c98195af6e6f1ce21743fc26c723e0f7e45bcf2)

posix: Use <support/check.h> facilities in tst-truncate and tst-truncate64

Remove local FAIL macro in favor to FAIL_RET from <support/check.h>,
which provides equivalent reporting, with the name of the file of the
failure site additionally included, for the tst-truncate-common core
shared between the tst-truncate and tst-truncate64 tests.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit fe47595504a55e7bb992f8928533df154b510383)

ungetc: Fix backup buffer leak on program exit [BZ #27821]

If a file descriptor is left unclosed and is cleaned up by _IO_cleanup
on exit, its backup buffer remains unfreed, registering as a leak in
valgrind.  This is not strictly an issue since (1) the program should
ideally be closing the stream once it's not in use and (2) the program
is about to exit anyway, so keeping the backup buffer around a wee bit
longer isn't a real problem.  Free it anyway to keep valgrind happy
when the streams in question are the standard ones, i.e. stdout, stdin
or stderr.

Also, the _IO_have_backup macro checks for _IO_save_base,
which is a roundabout way to check for a backup buffer instead of
directly looking for _IO_backup_base.  The roundabout check breaks when
the main get area has not been used and user pushes a char into the
backup buffer with ungetc.  Fix this to use the _IO_backup_base
directly.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 3e1d8d1d1dca24ae90df2ea826a8916896fc7e77)

ungetc: Fix uninitialized read when putting into unused streams [BZ #27821]

When ungetc is called on an unused stream, the backup buffer is
allocated without the main get area being present. This results in
every subsequent ungetc (as the stream remains in the backup area)
checking uninitialized memory in the backup buffer when trying to put a
character back into the stream.

Avoid comparing the input character with buffer contents when in backup
to avoid this uninitialized read. The uninitialized read is harmless in
this context since the location is promptly overwritten with the input
character, thus fulfilling ungetc functionality.

Also adjust wording in the manual to drop the paragraph that says glibc
cannot do multiple ungetc back to back since with this change, ungetc
can actually do this.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit cdf0f88f97b0aaceb894cc02b21159d148d7065c)

Make tst-ungetc use libsupport

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 3f7df7e757f4efec38e45d4068e5492efcac4856)

stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650]

Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer
than INT_MAX (bug 27650)") and add a test case for the issue, inspired
by the reproducer provided with the bug report.

This has been verified to succeed as from the commit referred and fail
beforehand.

As the test requires 2GiB of data to be passed around its performance
has been evaluated using a choice of systems and the execution time
determined to be respectively in the range of 9s for POWER9@2.166GHz,
24s for FU740@1.2GHz, and 40s for 74Kf@950MHz. As this is on the verge
of and beyond the default timeout it has been increased by the factor of
8. Regardless, following recent practice the test has been added to the
standard rather than extended set.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 89cddc8a7096f3d9225868304d2bc0a1aaf07d63)

support: Add FAIL test failure helper

Add a FAIL test failure helper analogous to FAIL_RET, that does not
cause the current function to return, providing a standardized way to
report a test failure with a message supplied while permitting the
caller to continue executing, for further reporting, cleaning up, etc.

Update existing test cases that provide a conflicting definition of FAIL
by removing the local FAIL definition and then as follows:

- tst-fortify-syslog: provide a meaningful message in addition to the
  file name already added by <support/check.h>; 'support_record_failure'
  is already called by 'support_print_failure_impl' invoked by the new
  FAIL test failure helper.

- tst-ctype: no update to FAIL calls required, with the name of the file
  and the line number within of the failure site additionally included
  by the new FAIL test failure helper, and error counting plus count
  reporting upon test program termination also already provided by
  'support_record_failure' and 'support_report_failure' respectively,
  called by 'support_print_failure_impl' and 'adjust_exit_status' also
  respectively.  However in a number of places 'printf' is called and
  the error count adjusted by hand, so update these places to make use
  of FAIL instead.  And last but not least adjust the final summary just
  to report completion, with any error count following as reported by
  the test driver.

- test-tgmath2: no update to FAIL calls required, with the name of the
  file of the failure site additionally included by the new FAIL test
  failure helper.  Also there is no need to track the return status by
  hand as any call to FAIL will eventually cause the test case to return
  an unsuccesful exit status regardless of the return status from the
  test function, via a call to 'adjust_exit_status' made by the test
  driver.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 1b97a9f23bf605ca608162089c94187573fb2a9e)

string: strerror, strsignal cannot use buffer after dlmopen (bug 32026)

Secondary namespaces have a different malloc.  Allocating the
buffer in one namespace and freeing it another results in
heap corruption.  Fix this by using a static string (potentially
translated) in secondary namespaces.  It would also be possible
to use the malloc from the initial namespace to manage the
buffer, but these functions would still not be safe to use in
auditors etc. because a call to strerror could still free a
buffer while it is used by the application.  Another approach
could use proper initial-exec TLS, duplicated in secondary
namespaces, but that would need a callback interface for freeing
libc resources in namespaces on thread exit, which does not exist
today.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 25a5eb4010df94b412c67db9e346029de316d06b)

Define __libc_initial for the static libc

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit eb0e50e9a1cf80a2ba6f33f990a08ef37a3267fb)

x86: Fix bug in strchrnul-evex512 [BZ #32078]

Issue was we were expecting not matches with CHAR before the start of
the string in the page cross case.

The check code in the page cross case:
```
    and    $0xffffffffffffffc0,%rax
    vmovdqa64 (%rax),%zmm17
    vpcmpneqb %zmm17,%zmm16,%k1
    vptestmb %zmm17,%zmm17,%k0{%k1}
    kmovq  %k0,%rax
    inc    %rax
    shr    %cl,%rax
    je     L(continue)
```

expects that all characters that neither match null nor CHAR will be
1s in `rax` prior to the `inc`. Then the `inc` will overflow all of
the 1s where no relevant match was found.

This is incorrect in the page-cross case, as the
`vmovdqa64 (%rax),%zmm17` loads from before the start of the input
string.

If there are matches with CHAR before the start of the string, `rax`
won't properly overflow.

The fix is quite simple. Just replace:

```
    inc    %rax
    shr    %cl,%rax
```
With:
```
    sar    %cl,%rax
    inc    %rax
```

The arithmetic shift will clear any matches prior to the start of the
string while maintaining the signbit so the 1s can properly overflow
to zero in the case of no matches.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 7da08862471dfec6fdae731c2a5f351ad485c71f)

x32/cet: Support shadow stack during startup for Linux 6.10

Use RXX_LP in RTLD_START_ENABLE_X86_FEATURES.  Support shadow stack during
startup for Linux 6.10:

commit 2883f01ec37dd8668e7222dfdb5980c86fdfe277
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 15 07:04:33 2024 -0700

    x86/shstk: Enable shadow stacks for x32

    1. Add shadow stack support to x32 signal.
    2. Use the 64-bit map_shadow_stack syscall for x32.
    3. Set up shadow stack for x32.

Add the map_shadow_stack system call to <fixup-asm-unistd.h> and regenerate
arch-syscall.h.  Tested on Intel Tiger Lake with CET enabled x32.  There
are no regressions with CET enabled x86-64.  There are no changes in CET
enabled x86-64 _dl_start_user.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 8344c1f5514b1b5b1c8c6e48f4b802653bd23b71)

x86-64: Remove sysdeps/x86_64/x32/dl-machine.h

Remove sysdeps/x86_64/x32/dl-machine.h by folding x32 ARCH_LA_PLTENTER,
ARCH_LA_PLTEXIT and RTLD_START into sysdeps/x86_64/dl-machine.h.  There
are no regressions on x86-64 nor x32.  There are no changes in x86-64
_dl_start_user.  On x32, _dl_start_user changes are

<_dl_start_user>:
mov    %eax,%r12d
+ mov    %esp,%r13d
mov    (%rsp),%edx
mov    %edx,%esi
- mov    %esp,%r13d
and    $0xfffffff0,%esp
mov    0x0(%rip),%edi        # <_dl_start_user+0x14>
lea    0x8(%r13,%rdx,4),%ecx

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 652c6cf26927352fc0e37e4e60c6fc98ddf6d3b4)

support: Add options list terminator to the test driver

This avoids crashes if a test is passed unknown options.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit c2a474f4617ede7a8bf56b7257acb37dc757b2d1)

manual/stdio: Further clarify putc, putwc, getc, and getwc

This is a follow-up to 10de4a47ef3f481592e3c62eb07bcda23e9fde4d that
reworded the manual entries for putc and putwc and removed any
performance claims.

This commit further clarifies these entries and brings getc and getwc in
line with the descriptions of putc and putwc, removing any performance
claims from them as well.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 942670c81dc8071dd75d6213e771daa5d2084cb6)

Fix name space violation in fortify wrappers (bug 32052)

Rename the identifier sz to __sz everywhere.

Fixes: a643f60c53 ("Make sure that the fortified function conditionals are constant")
(cherry picked from commit 39ca997ab378990d5ac1aadbaa52aaf1db6d526f)

x86: Tunables may incorrectly set Prefer_PMINUB_for_stringop (bug 32047)

Fixes commit 5bcf6265f215326d14dfacdce8532792c2c7f8f8 ("x86:
Disable non-temporal memset on Skylake Server").

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 7a630f7d3392ca391a399486ce2846f9e4b4ee63)

resolv: Fix tst-resolv-short-response for older GCC (bug 32042)

Previous GCC versions do not support the C23 change that
allows labels on declarations.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit ec119972cb2598c04ec7d4219e20506006836f64)

Add mremap tests

Add tests for MREMAP_MAYMOVE and MREMAP_FIXED. On Linux, also test
MREMAP_DONTUNMAP.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit ff0320bec2810192d453c579623482fab87bfa01)

mremap: Update manual entry

Update mremap manual entry:

1. Change mremap to variadic.
2. Document MREMAP_FIXED and MREMAP_DONTUNMAP.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit cb2dee4eccf46642eef588bee64f9c875c408f1c)

linux: Update the mremap C implementation [BZ #31968]

Update the mremap C implementation to support the optional argument for
MREMAP_DONTUNMAP added in Linux 5.7 since it may not always be correct
to implement a variadic function as a non-variadic function on all Linux
targets. Return MAP_FAILED and set errno to EINVAL for unknown flag bits.
This fixes BZ #31968.

Note: A test must be added when a new flag bit is introduced.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 6c40cb0e9f893d49dc7caee580a055de53562206)

Enhanced test coverage for strncmp, wcsncmp

Add string/test-strncmp-nonarray and
wcsmbs/test-wcsncmp-nonarray.

This is the test that uncovered bug 31934. Test run time
is more than one minute on a fairly current system, so turn
these into xtests that do not run automatically.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 54252394c25ddf0062e288d4a6ab7a885f8ae009)

Enhance test coverage for strnlen, wcsnlen

This commit adds string/test-strnlen-nonarray and
wcsmbs/test-wcsnlen-nonarray.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 783d4c0b81889c39a9ddf13b60d0fde4040fb1c0)

manual: make setrlimit() description less ambiguous

The existing description for setrlimit() has some ambiguity. It could be
understood to have the semantics of getrlimit(), i.e., the limits from the
process are stored in the provided rlp pointer.

Make the description more explicit that rlp are the input values, and that
the limits of the process is changed with this function.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit aedbf08891069fc029ed021e4dba933eb877b394)

manual/stdio: Clarify putc and putwc

The manual entry for `putc' described what "most systems" do instead of
describing the glibc implementation and its guarantees. This commit
fixes that by warning that putc may be implemented as a macro that
double-evaluates `stream', and removing the performance claim.

Even though the current `putc' implementation does not double-evaluate
`stream', offering this obscure guarantee as an extension to what
POSIX allows does not seem very useful.

The entry for `putwc' is also edited to bring it in line with `putc'.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 10de4a47ef3f481592e3c62eb07bcda23e9fde4d)

malloc: add multi-threaded tests for aligned_alloc/calloc/malloc

Improve aligned_alloc/calloc/malloc test coverage by adding
multi-threaded tests with random memory allocations and with/without
cross-thread memory deallocations.

Perform a number of memory allocation calls with random sizes limited
to 0xffff.

Use the existing DSO ('malloc/tst-aligned_alloc-lib.c') to randomize
allocator selection.

The multi-threaded allocation/deallocation is staged as described below:

- Stage 1: Half of the threads will be allocating memory and the
  other half will be waiting for them to finish the allocation.
- Stage 2: Half of the threads will be allocating memory and the
  other half will be deallocating memory.
- Stage 3: Half of the threads will be deallocating memory and the
  second half waiting on them to finish.

Add 'malloc/tst-aligned-alloc-random-thread.c' where each thread will
deallocate only the memory that was previously allocated by itself.

Add 'malloc/tst-aligned-alloc-random-thread-cross.c' where each thread
will deallocate memory that was previously allocated by another thread.

The intention is to be able to utilize existing malloc testing to ensure
that similar allocation APIs are also exposed to the same rigors.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
(cherry picked from commit b0fbcb7d0051a68baf26b2aed51a8a31c34d68e5)