git.ipfire.org Git - thirdparty/glibc.git/log

elf: don't clobber ld.so.conf in tst-glibc-hwcaps-prepend-cache [BZ #34210]

dbe5065f2166be20e57a24f246a40d50e001a05d and ae589cb84df10825fc545a45c7007a5f79409bf1
cater for setups where ld.so.conf{,.d} is required to find runtime support
libraries, but tst-glibc-hwcaps-prepend-cache clobbers the created ld.so.conf
with its own entry.

Fix it to instead use the ld.so.conf.d created in ae589cb84df10825fc545a45c7007a5f79409bf1
to co-exist with existing entries.

Bug: https://bugs.gentoo.org/976773
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31901
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=34210
Tested-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reported-by: Eli Schwartz <eschwartz@gentoo.org>
Reviewed-by: Andreas K. Hüttel <dilfridge@gentoo.org>

elf: Remove inhibit_stack_protector from __ifunc_resolver

With 01964c3ec8e fix ifunc resolvers can be fully instrumented with
stack protector.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu
built with --enable-stack-protector=all.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Set up TLS slotinfo for dlopen'd modules before relocation (BZ 34170)

An IFUNC resolver in a DSO that is being loaded by dlopen is allowed
to read its own TLS storage during the resolver call.  After
af34b1376a3 ("elf: Initialize static TLS before relocation processing
(BZ 34164)") that works for the initial-exec model on every supported
architecture.

However, it does not work for the dynamic-TLS path (on both -mtls-dialect mode,
if the ABI supports both).  Both lookup paths index the calling thread's
DTV by the new module's l_tls_modid and, on miss, walk
GL(dl_tls_dtv_slotinfo_list) to discover the module and lazily allocate
its TLS block.  The just-loaded DSO is however not yet in that list when
its resolver fires, so the lookup faults inside dlopen.  This is the
direct dlopen analog of BZ 34164.

The solution is to reorder dl_open_worker_begin so the slotinfo install
happens before the relocation pass.  The new order is:

  1. resize_scopes, resize_tls_slotinfo, add_to_global_resize
     (unchanged, still recoverable).
  2. update_tls_slotinfo: register the new modules in slotinfo, bump
     dl_tls_generation, initialise their static TLS images.
  3. Relocate the new objects.  IFUNC resolvers can now read their
     own DSO's __thread storage via any TLS model.
  4. Demarcation point.
  5. update_scopes, _dl_find_object_update.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

malloc: Simplify _int_free_chunk

Simplify _int_free_chunk() and always lock if needed. Use
_int_free_merge_chunk() for cases that assume the arena has been locked
instead. Move the errno save/restore to _int_free_maybe_trim().

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

riscv: Add RVV strrchr for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of strrchr for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__strrchr_vector) is added
alongside the generic fallback (__strrchr_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling strrchr().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

riscv: Add RVV strchr for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of strchr for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__strchr_vector) is added
alongside the generic fallback (__strchr_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling strchr().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

riscv: Add RVV memchr for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of memchr for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__memchr_vector) is added
alongside the generic fallback (__memchr_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling memchr().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

riscv: Add RVV memccpy for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of memccpy for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__memccpy_vector) is added
alongside the generic fallback (__memccpy_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling memccpy().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

riscv: Add RVV memcmp for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of memcmp for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__memcmp_vector) is added
alongside the generic fallback (__memcmp_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling memcmp().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

malloc: Reduce maximum arenas

The default maximum arenas is 8 times the number of cores in a 64-bit system.
Since modern CPUs have many cores and big servers have 256 cores, this results
in excessive number of arenas, which wastes memory. Limit the number of arenas
to max (8, ncores) which is less extreme. In the future the limit should be
lowered further for large systems.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

resolv: reset _u._ext.nscount in __res_iclose [BZ #34154]

__res_iclose, when called with FREE_ADDR=true, frees and NULLs every
statp->_u._ext.nsaddrs[ns] but does not reset statp->_u._ext.nscount.
This breaks the invariant relied on by __res_context_send's cache
validation loop in resolv/res_send.c:293-312: when _u._ext.nscount
is non-zero, every _u._ext.nsaddrs[ns] corresponding to
statp->nsaddr_list[ns].sin_family != 0 is expected to be non-NULL.

res_init unconditionally calls __res_iclose(&_res, true) before
__res_vinit, so any __res_vinit failure (for example,
fopen("/etc/resolv.conf") returning EMFILE under file-descriptor
exhaustion, or any allocation failure in __resolv_conf_load,
__resolv_conf_allocate, or __resolv_conf_attach before
update_from_conf runs) leaves _u._ext.nscount non-zero with all
_u._ext.nsaddrs[] NULL.  The next name lookup walks the validation
loop and dereferences NULL in sock_eq.  A DNS lookup may fail after
a failed res_init, but it should not segfault.

Reset _u._ext.nscount = 0 alongside the existing __resolv_conf_detach
call.  The next __res_context_send call then re-enters its init
block (res_send.c:316-335) and repopulates _u._ext.nsaddrs[] from
statp->nsaddr_list[], which __res_iclose leaves untouched.  This
also lets support/resolv_test.c drop its now-redundant manual reset
after __res_iclose(&_res, true).

Add a regression test that drops RLIMIT_NOFILE so fopen of
/etc/resolv.conf fails with EMFILE inside __res_vinit, then verifies
the subsequent gethostbyname does not crash.

Signed-off-by: Adam Yi <ayi@janestreet.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

sysdeps/ieee754/ldbl-128ibm-compat/test-printf-chk-ldbl-compat.c: Fix typo

Signed-off-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Improve documentation of malloc tunables

Update default for tcache_count tunable. Remove existing documentation and
mention removal of fastbins in mxfast tunable. Improve wording of hugetlb
tunable, including default for AArch64.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Minor cleanups

Merge request2size into checked_request2size. Improve interface of
clear_memory. Fix incorrect use of __glibc_unlikely in __libc_calloc.
Fix missing tabs.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

support: Also run malloc hugetlb=1 tests when transparent hugepage is 'always'

The glibc.malloc.hugetlb=1 is redundant with kernel is n 'always'
mode, but the madvise does work, and the tunable should not fail.

Suggested-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

arm: Fix main-in-dso with non-BFD linkers (BZ 34098)

Commit c2d6afb4a010 changed the PIC && !SHARED path of start.S (used by
crt1.o and rcrt1.o) to load main via a R_ARM_REL32 data relocation. When
main is provided by a shared object, BFD ld synthesizes a canonical PLT
entry for it, but other linkers (e.g. mold) do not, producing a broken
binary that crashes at startup.

Follow the aarch64 approach and reference main through a local __wrap_main
that tail-calls it with a branch relocation, which every linker turns into
a PLT entry.

Checked on armv7a-linux-gnueabihf with and without --enable-default-pie.

elf: Fix tst-ifunc-tls-init with --disable-default-pie

The test failed with --disable-default-pie because its primary check read
the resolver's diagnostic side effect before the resolver had run.

In a non-PIE executable the references to the IFUNC (fptr and ifunc_tls)
are satisfied through a canonical IPLT entry in the executable itself.
Under the default lazy binding that IPLT is resolved on first use, not
during startup relocation, so the resolver had not yet run when
'check_sentinel' inspected that value. With a PIE executable
(or LD_BIND_NOW=1) the resolver runs eagerly at startup and the check
passed. The dlopen path was unaffected because dlopen resolves the
data relocation eagerly.

This is a test ordering issue: the resolver always reads the initial-exec
TLS correctly whenever it runs, so the BZ 34164 fix is not involved.
Reorder test_tls_ifunc so that fptr/ifunc_tls force the IFUNC to be
resolved before the last_seen_sentinel value is inspected.

Checked on x86_64-linux-gnu and i686-linux-gnu with --disable-default-pie.

Reported-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

powerpc64le: Add optimized __memcmpeq for POWER10

__memcmpeq (added in glibc 2.35) was previously an alias to memcmp on
POWER10 via strong_alias. However, in the multiarch IFUNC path, this
caused __memcmpeq to resolve to the generic C memcmp.c implementation
rather than the optimized POWER10 memcmp.S, leaving a significant
performance gap.

Unlike memcmp, __memcmpeq only needs to return zero or nonzero with
no requirement on the sign or magnitude for unequal inputs, allowing
a simpler and faster implementation.

Performance on POWER10 :

  1) __memcmpeq (generic) -> __memcmpeq_power10
     The primary motivation - __memcmpeq was resolving to generic C
     in the multiarch path.

  - Small data (< 8B to < 512B) : ~52% - 82% improvement.
  - Bulk  (< 16MB to < 256MB)   : ~25% - 32% improvement.
  - Large (1GB) : ~33% improvement

  2) memcmp_power10 (optimized .S) -> __memcmpeq_power10:
     Comparing dedicated __memcmpeq against the optimized memcmp
     it previously aliased to.

  - Small data (< 8B to < 256B) : No improvement observed.
    Real-world workloads predominantly operate on larger buffers
  - >= 512B : ~9%  improvement.
  - 16MB - 128MB : ~25% - 32% improvement.
  - 256MB : ~3%  improvement.
  - Large (1GB) : On par.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

arm: Redirect memcpy to __memcpy_arm for loader/static init code

After "elf: Initialize TCB and stack-protector before static IFUNC
resolvers (BZ 20680, BZ 27582, BZ 28817)", early loader and static
initialization code may run before the IFUNC resolvers are processed.

The armv7-a supports memcpy, so add a dl-symbol-redir-ifunc.h to
redirect the memcpy to to __memcpy_arm implementation.

Checked on armv7a-linux-gnueabihf.

misc: Fix typos in comments

hurd: define SO_TIMESTAMP in socket.h

* sysdeps/mach/hurd/bits/socket.h: add SO_TIMESTAMP enum entry
and define corresponding preprocessor macro.
Message-ID: <20260531195342.1633-1-dnietoc@gmail.com>

malloc: aarch64: Remove broken memory tagging

Remove the --enable-memory-tagging configure option along with
all associated variables and macros.

Removing the glibc.mem.tagging tunable.

Remove the memory-tagging makefile variable.

Remove the USE_MTAG macro definition and code that is conditionally
compiled when this macro is defined.

As a result, we change 'mtag_mmap_flags' to 'extra_mmap_prot' that
is now always defined. Change of the name due to this being used
as part of PROT options in mmap syscalls rather than part of flags.

Remove 'mtag_enabled' that would become compile-time false. Also
remove any code that would never be compiled when 'mtag_enabled'
is false.

Remove AArch64-specific code pertinent to memory tagging, that is
currently broken, from the core malloc implementation. We keep
the assembly code, since we are going to need it in the future. to
preserve Git history.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

hesiod: use booleans in parser macro calls

The swallow argument in the INT_FIELD and STRING_FIELD macros is used as a
boolean, change all callers to use false and true instead of 0 and 1.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

hesiod: fix swapped arguments in service parser

The port number in the service file is a decimal number followed by a
single slash.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

math: Fix non-narrowing test build with arg-format conditions

auto-libm-test-in shares inputs between narrowing and non-narrowing
functions, and some carry arg-format conditions (e.g. "arg-ibm128").
When auto-libm-test-out-fma is regenerated, gen-libm-test.py turns these
into TEST_COND_arg_ibm128, which expands via ARG_MANT_DIG to ARG_PREFIX.
ARG_PREFIX is only defined for TEST_NARROW, so the non-narrowing fma test
failed to build with "ARG_PREFIX_MANT_DIG undeclared".

Define TEST_COND_arg_ibm128 to 0 when ARG_FLOAT is not defined, mirroring
the existing guard for TEST_COND_ibm128_libgcc; in the non-narrowing case
there is no separate argument format, so the condition is always false.

Regenerate auto-libm-test-out-fma accordingly.

aarch64: Use build attributes for asm feature marking

When the compiler defines __ARM_BUILDATTR64_FV, emit AArch64
feature-and-bits build attributes for BTI, PAC, and GCS from sysdep.h
instead of a GNU property note. Keep the GNU property note as the
fallback for older toolchains.

Mirror the same marking logic in elf/tst-asm-helper.h so custom test
DSOs and assembly tests that cannot include sysdep.h get consistent
feature marking.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

misc: add a new test for gethostname

Add a simple test that runs in a UTS namespace: checks gethostname
after various sethostname values (including empty and HOST_NAME_MAX),
verifies ENAMETOOLONG when the buffer is too small for a maximal name,
EINVAL when sethostname is given a name longer than HOST_NAME_MAX.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

test: Fix and stabilize tst-wcsmbs-clone-overflow test

The test tst-wcsmbs-clone-overflow was initially added to tests-static.
However, this causes the test to be unstable because gconv modules
dynamically load libc.so. Any discrepancy between the statically linked
version and the dynamically loaded one can lead to a crash.

By removing the test from tests-static, it relies on dynamic linking,
safely bypassing the dlopen crash. Since the test is now dynamically
linked, it cannot use the internal thread-local symbol
_NL_CURRENT_DATA(LC_CTYPE) because _nl_current_LC_CTYPE is hidden in
libc.so, leading to undefined references. Thus, the test now uses
newlocale and uselocale, safely extracting the locale data from the
returned locale_t object.

Furthermore, using newlocale requires the gconv-modules configuration to
be built and available so that the ISO8859-1.so module can be
dynamically loaded. Otherwise, glibc falls back to the built-in C locale
conversions, leaving __shlib_handle as NULL and silently bypassing the
reference counter increment.
A new Makefile fragment, gen-gconv-modules.mk, is introduced to ensure
the gconv-modules are built before the test runs, and an explicit check
for __shlib_handle != NULL is added to the test.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

math: Fix fma alignment when exponent difference is exactly 64 (BZ 34183)

When d (the exponent difference between z and x*y) is exactly 64,
the alignment path shifts z left by 64 bits via (zhi = nz.m, zlo = 0)
and decrements d to 0, then takes the inner 'if (d < 64)' branch
which evaluates 'rhi << (64 - d)' with d == 0. A shift by 64 of a
64-bit value is UB in C.

Add the explicit 'if (d == 0)' empty branch (present in the
original musl implementation).

Checked on x86_64-linux-gnu with --disable-multi-arch and
arm-linux-gnueabihf.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

support: Skip malloc hugetlb={1,2} variants when kernel cannot honor them

The malloc test variants run with GLIBC_TUNABLES=glibc.malloc.hugetlb=1
exercise transparent huge pages via MADV_HUGEPAGE, which is only
meaningful when /sys/kernel/mm/transparent_hugepage/enabled is set to
'madvise' ('always' makes the madvise redundant and 'never' makes it
ineffective).  The hugetlb=2 variants rely on MAP_HUGETLB, which
requires a positive /proc/sys/vm/nr_hugepages.  On systems that do not
satisfy these prerequisites - including any non-Linux target - those
runs only consume CPU time in this case.

Add support/support_check_hugetlb.{c,h} exposing:

  - support_thp_is_madvise: true iff THP is in 'madvise' mode;
  - support_hugepages_reserved: true iff nr_hugepages > 0;
  - support_check_malloc_hugetlb: inspects GLIBC_TUNABLES and calls
    FAIL_UNSUPPORTED when the requested hugetlb mode cannot be honored.

Gate the check at compile time so only the variant binaries pay for
it.

Reviewed-by: DJ Delorie <dj@redhat.com>

elf: Re-initialise static TLS after .tdata relocation (BZ 34164)

The af34b1376a37fa27e1de9d869ed9493fc569bfa6 (BZ 34164) changed the
TLS setup from:

  relocation loop (applies relocations to .tdata in DSO memory)
  _dl_allocate_tls_init copies relocated .tdata -> main thread TLS

to a new order:

  _dl_allocate_tls_init copies unrelocated .tdata -> main thread TLS
  relocation loop (relocates .tdata in DSO memory, but the TLS block
  has stale copies)

This broke file-scope thread-local initialised with the address of a
function (for instance the cache structs in libmpfr).

Fix it by splitting ELF_DYNAMIC_RELOCATE inside
_dl_relocate_object_no_relro into the non-IRELATIVE and IRELATIVE
sub-passes (similar as done on static-pie startup by b75ad99d45b)
and call _dl_init_static_tls between them.  By the time the IFUNC
pass fires, .tdata is fully relocated.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Arch64: Add support for SVE2 ifuncs

Add support for SVE2 in cpu-features. Minor cleanup of init-arch.h.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

hurd: adjtime: on error return -1 and set errno

* sysdeps/mach/hurd/adjtime.c: use __hurd_fail to return
errors back to the caller.
Message-ID: <20260527014103.10791-1-dnietoc@gmail.com>

sunrpc/Makefile: Split and sort tests

This commit splits and sorts the tests in sunrpc/Makefile.

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

Hurd: restore some SIOC ioctls

We do not define struct ifreq_short and ifreq_int, but we do
define _IOT_ifreq_short and _IOT_ifreq_int, and need these for
sysdeps/gnu/ifaddrs.c and sysdeps/mach/hurd/if_index.c

Hurd: comment ioctls which cannot currently compile

We don't currently have struct ifreq_short, ifreq_int and ifaliasreq, so
don't let applications even try to compile these.

Hurd: comment PF_ROUTE/AF_ROUTE defines

Comment out the PF_ROUTE and AF_ROUTE defines, since they would be used for
PF_ROUTE setsockopts, which are not available on Hurd.

Hurd: comment PF_LINK/AF_LINK defines

Comment out the PF_LINK and AF_LINK defines, since they are usually associated
with struct sockaddr_dl, which is not available on Hurd.

s390: Enabling lint-makefiles

The s390 specific Makefiles were adjusted to match the required format
for scripts/lint-makefiles.sh / scripts/sort-makefile-lines.py.

Afterwards the lines were sorted by those scripts.
And the testcase lint-makefiles is passing.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

hurd: let the root user raise its priority

    Check for task_max_priority RPC

      * config.h.in: add #undef for HAVE_MACH_TASK_MAX_PRIORITY.
      * sysdeps/mach/configure.ac: use mach_RPC_CHECK to check for
        task_max_priority RPC in mach_host.defs.
      * sysdeps/mach/configure: regenerate file.

    Use task_max_priority when setpriority is called by root

      * sysdeps/mach/hurd/setpriority.c: clamp the prio argument
        to the range [-NZERO, NZERO-1] and use task_max_priority
        when called by root.
Message-ID: <8e806c83d8d7b59b2894b4944e4ad82477ecc3cd.1779316637.git.dnietoc@gmail.com>

hurd: add validations in msync

POSIX specs specify that invalid flags shall return EINVAL and that
ENOMEM shall be returned in case of address outside of address space or
when one or more pages are not mapped.

Signed-off-by: Etienne Brateau <etienne.brateau@gmail.com>
Message-ID: <20260525211142.131508-1-etienne.brateau@gmail.com>

hurd: clamp the setpriority prio argument to the range [-NZERO, NZERO-1]

The Open Group Base Specifications Issue 8
getpriority ( https://pubs.opengroup.org/onlinepubs/9799919799/ )

<< The nice value is in the range [0,2*{NZERO} -1], while the
return value for getpriority() and the third parameter for
setpriority() are in the range [-{NZERO},{NZERO} -1]. >>

So given that NZERO is defined to 20, we shall use it to clamp to the range
specified by POSIX.

That range is then mapped to something similar to [0, 2*{NZERO}-1], as
specified by POSIX, through the usage of the macro NICE_TO_MACH_PRIORITY
and MACH_PRIORITY_TO_NICE. (i.e. [5, 45] )

intl: Fix tst-gettext under inherited LC_* environment

The final block of tst-gettext unsets LC_ALL plus LC_MESSAGES, LC_CTYPE,
LC_TIME and LC_NUMERIC, sets LANG=existing-locale, and then expects
setlocale (LC_ALL, "") to resolve every category through LANG. Any
other LC_* category inherited from the invoking shell (e.g. LC_PAPER,
LC_MONETARY) still takes precedence over LANG, and if it points to a
locale that is not under the test's LOCPATH the setlocale call fails
with ENOENT.

Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

nptl: Skip tst-pthread-gdb-attach{, -static} on env mismatches

The test previously failed with confusing diagnostics in two situations
that are properties of the runtime environment rather than of glibc:

1. find_gdb only checked access(X_OK), which is true for directories
   too.  A 'gdb' directory ahead of /usr/bin in PATH (e.g. one holding
   gdb python helpers) was therefore returned as the gdb executable,
   and the subsequent execl failed with errno != ENOENT, causing the
   test to fail with `numeric comparison failure ... status 256'.

2. The in-tree libthread_db.so.1 is built with -z mark-plt and
   therefore carries a versioned dependency on
   GLIBC_ABI_DT_X86_64_PLT in libc.so (see BZ #33212).  When the
   system gdb is linked against a libc older than 2.41, that version
   is not provided and gdb's dlopen of the in-tree libthread_db.so.1
   fails.  Thread debugging is then disabled, the gdb script's
   `thread 1' / `thread 2' commands fail, gdb exits non-zero, and the
   test reports a generic status mismatch.

Two changes:

* Require S_ISREG in find_gdb so a directory named 'gdb' on PATH is
  skipped, falling through to the next candidate.

* Before running the real gdb scenario, run a minimal probe script
  that triggers libthread_db loading (set debug libthread-db 1; set
  libthread-db-search-path; file /proc/self/exe; start).  If the
  probe output contains `dlopen failed', mark the test UNSUPPORTED
  with a clear message instead of letting the real run fail.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

alpha: fix setrlimit compat symbol for negative rlim values besides -1

Old alpha glibc defined rlim_t as signed long, making RLIM_INFINITY
equal to LONG_MAX (0x7ffffffffffffffful). The compat symbol
__old_setrlimit64 (setrlimit@GLIBC_2.0 and setrlimit64@GLIBC_2.1) was
introduced in 0d0bc784ca [BZ #22648] to translate this old RLIM_INFINITY
to the kernel's RLIM64_INFINITY (ULONG_MAX) before the prlimit64
syscall, using an exact equality check.

Because old rlim_t was signed, any value a caller treats as negative
(e.g. -2 = 0xfffffffffffffffe unsigned) is also an "infinity or beyond"
value in the old ABI. Such values are >= OLD_RLIM64_INFINITY and
should be translated to RLIM64_INFINITY; passing them through unchanged
causes prlimit64 to treat them as large finite limits, resulting in
unexpected failures (EPERM or silent truncation).

Change the equality check to >= OLD_RLIM64_INFINITY in
__old_setrlimit64 so that all values the old signed-rlim_t ABI would
interpret as infinity-or-more are correctly mapped to RLIM64_INFINITY.

No change is made to __old_getrlimit64: prlimit64 returns only exact
RLIM64_INFINITY for unlimited resources, so the existing equality
check against RLIM64_INFINITY is correct and mirrors the kernel's own
rlim64_is_infinity() logic.

Fixes: 0d0bc784ca ("Alpha: Add wrappers to get/setrlimit64 to fix
RLIM64_INFINITY constant [BZ #22648]")
Fixes: https://sourceware.org/bugzilla/show_bug.cgi?id=30992
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

sh: reload r3 after arg evaluation in INTERNAL_SYSCALL [BZ #34167]

r3 is caller-saved. When a function call appears in the args list
(e.g. INTERNAL_SYSCALL_CALL(tgkill, __getpid(), tid, sig)),
SUBSTITUTE_ARGS evaluates __getpid() before r3 is reloaded, leaving
r3=20 (__NR_getpid) instead of __NR_tgkill=270. The trapa then
dispatches the wrong syscall and the signal is silently dropped.

Fix: declare r3 uninitialised, expand SUBSTITUTE_ARGS (all function
calls happen here), then assign the syscall number to r3 with no
intervening calls before the trapa. Applies to both INTERNAL_SYSCALL
and INTERNAL_SYSCALL_NCS.

signal/tst-raise is a regression test for this bug: raise() calls
tgkill(__getpid(), tid, sig), which triggers the clobber.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Initialize TCB and stack-protector before static IFUNC resolvers (BZ 20680, BZ 27582, BZ 28817)

In static linking the IFUNC IPLT (apply_irel for non-PIE, the IRELATIVE
phase inside _dl_relocate_static_pie for static-pie) ran before
__libc_setup_tls and before _dl_setup_stack_chk_guard.  When a resolver
is compiled with -fstack-protector(-all) its prologue loads the canary
from the TCB (TCB-canary ABIs: x86_64, i386, powerpc, s390) or from
__stack_chk_guard (global-var ABIs).  On the former the resolver
crashed reading an unmapped TCB; on the latter it loaded a zero canary
(no crash, but the check is ineffective).  The same applies to a
resolver that reads any thread-local: it crashes on TCB-canary ABIs and
observes a zero-filled slot on the others (BZ 20680).  The pointer
guard has the same problem (e.g. resolvers that register an atexit
handler).

Reorder csu/libc-start.c so that ARCH_SETUP_TLS, the stack-protector
canary and the pointer guard are set up before any IFUNC resolver
runs.  For static-pie this requires splitting the existing
_dl_relocate_static_pie into two phases so the TCB/canary setup can be
interleaved between the non-IRELATIVE and IRELATIVE passes.

The historical ARCH_SETUP_IREL / ARCH_APPLY_IREL split (introduced for
powerpc so its IFUNC resolvers could read TCB fields like hwcap and
at_platform) is no longer required: TLS is now set up before either
macro runs.  ARCH_APPLY_IREL is removed, ARCH_SETUP_IREL does the work
uniformly on every arch, and the powerpc-specific libc-start.h becomes
redundant.

__libc_setup_tls reaches memcpy / mempcpy via _dl_allocate_tls_init in
elf/dl-tls.c, so it requires update ABI specific dl-symbol-redir-ifunc.h
with memcpy/memmove.

Tests added (each fails pre-fix on TCB-canary ABIs with SIGSEGV; the
static-protector variants additionally fail on global-var ABIs with a
"resolver_canary != main_canary" diagnostic):

  elf/tst-ifunc-bz28817                            static-pie + TLS in
                                                   resolver (BZ 28817)
  elf/tst-ifunc-resolver-protector                 dynamic
  elf/tst-ifunc-resolver-protector-static          static-pie
  elf/tst-ifunc-resolver-protector-static-non-pie  non-PIE static

Checked on aarch64-linux-gnu, arm-linux-gnueabihf, x86_64-linux-gnu,
and i686-linux-gnu

I also ran the ELF tests on qemu system for loongarch64-linux-gnuf64,
powerpc-linux-gnu, powerpc-linux-gnu-power4, powerpc-linux-gnu-soft,
powerpc64-linux-gnu, powerpc64le-linux-gnu, riscv64-linux-gnu, and
s390x-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Initialize static TLS before relocation processing (BZ 34164)

An IFUNC resolver firing during dynamic linker relocation reads its
DSO's __thread storage from a zero-filled slot: init_tls() allocates
the static TLS block zero-filled, but .tdata is not copied in until
the trailing _dl_allocate_tls_init at the end of dl_main, long after
the per-object phase 2 resolvers from commit 63b31c05a8a901 have run.
A resolver that *writes* TLS is even worse off -- the write is
clobbered by that same trailing copy.

dl_main (elf/rtld.c): populate the DTV slotinfo, bump
dl_tls_generation, and call _dl_allocate_tls_init right after
init_tls(), before the relocation loop.

_dl_try_allocate_static_tls (elf/dl-reloc.c): drop the
"defer-if-not-relocated" branch and always run _dl_init_static_tls
inline, so a CHECK_STATIC_TLS allocation triggered mid-relocation
initialises the slot before the same object's phase 2 fires.

The new tests cheks some scenarios:

  elf/tst-ifunc-tls-init         resolver reads its DSO's IE TLS.
  elf/tst-ifunc-tls-init-dlopen  same, via dlopen.
  elf/tst-ifunc-tls-write        resolver write to TLS must survive
                                 to main.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

x86: Revert "x86: Lower non-temporal copy threshold for Hygon"

This version was superseeded by 213ffdfbbae6d4cb4e8dd4a9e3e57c69127620c4.

string: Improve test-memchr page cross checks

The tests for memchr attempt to check for accidental overreads that cross a
page. However they weren't done at the end of a page and don't check for the
case where we match right at the end. Add buf1_size/buf2_size to make finding
end of buffer easier.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

Vectorise special cases for SVE log1p(f)

This patch adds vectorised special cases for the SVE log functions
log1p and log1pf.

When built with GCC-15 and executed on a Neoverse V2 platform, the
following benchmarking throughput uplifts were measured:

log1pf -> 285% speed-up (4.85 ns/element to 1.26 ns/element)
log1p -> 117% speed-up (8.25 ns/element to 3.80 ns/element)

Note that the numbers here are for the special case path only and that
the fast path performance has been maintained. These changes have also
maintained the same level of accuracy as before.

stdio-common: Optimize scanf %ms series array expansion

* stdio-common/vfscanf-internal.c: If user explicitly set the maximum
size of the string, respect it when reading characters. Instead of
always expanding exponentially, try to expand array to the exact size
user requested when `user_size < current_size * 2`.

Signed-off-by: Rocket Ma <marocketbd@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Vectorise special cases for SVE inverse hyperbolics

This patch adds vectorised special cases for the SVE inverse hyperbolic
functions atanh, acosh and asinh for single precision floats. It also
moves the commonly used inf and nan bit values into the sv_log1pf_inline
data struct for resuse.

When built with GCC-15 and executed on a Neoverse V2 platform, the
following benchmarking throughput uplifts were measured:

atanh -> 215% speed-up (5.51 ns/element to 1.75 ns/element)
acosh -> 152% speed-up (4.63 ns/element to 1.84 ns/element)
asinh -> 51% speed-up (5.00 ns/element to 3.31 ns/element)

Note that the numbers here are for the special case path only and that
the fast path performance has been maintained.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

x86: Fix non-temporal memset unreachable on AMD Zen 3/4/5

On AMD Zen 3/4/5 with ERMS, the non-temporal memset path is unreachable
because rep_stosb_threshold is set to SIZE_MAX (vectorized loop is faster
than ERMS on these CPUs), but the non-temporal code path is nested inside
the rep_stosb branch.

The existing rescue logic at the Avoid_STOSB check only covers the case
where the CPU lacks ERMS hardware support. It does not cover AMD Zen 3+
where ERMS is supported but deliberately unused for performance reasons.

Extend the condition to also lower rep_stosb_threshold when:
- The user has not explicitly set x86_rep_stosb_threshold (respect tunables)
- rep_stosb_threshold is higher than memset_non_temporal_threshold (NT gated)

This makes the non-temporal path reachable for large memset operations,
providing ~2x speedup on pre-faulted buffers larger than L3 cache.

Tested on AMD Ryzen 7 8745HS (Zen 4):
- Pre-faulted 64MB memset: 2.02 ms -> 0.94 ms (2.15x faster)
- First-touch 64MB memset: 19.3 ms -> 21.3 ms (11% regression, expected:
kernel clear_page cache warming bypassed by NT stores)

* sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Extend
rep_stosb_threshold lowering condition to cover AMD Zen 3/4/5
where ERMS is supported but stosb is disabled via threshold.

Signed-off-by: zombie12138 <zombie12139@gmail.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=34129
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

libio: Ignore doallocate for open_memstream and open_wmemstream [BZ #34019]

setvbuf (stream, NULL, _IOFBF, 0) takes a special path in
_IO_setvbuf: if the byte-oriented buffer base is NULL, it calls
_IO_DOALLOCATE and returns without invoking the stream setbuf hook.

For open_wmemstream, the byte-oriented buffer base is NULL although
the wide result buffer has already been initialized in _wide_data.
As a result, this path calls _IO_wdefault_doallocate, which may
replace the wide buffer managed by open_wmemstream.

Install an open_wmemstream-specific doallocate hook that leaves
the growable result buffer unchanged. Add a regression test for this
path.

Install a narrow memstream doallocate hook as well. It keeps both
memstream vtables consistent (generic stdio allocation must not
replace the growable result buffer).

Signed-off-by: Xiang Gao <gaoxiang@kylinos.cn>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

libio: Ignore setbuf for open_memstream and open_wmemstream [BZ #34019]

open_memstream and open_wmemstream manage an internal growable buffer.
The default setbuf hook can reset that buffer, breaking the assumptions
used by the string stream overflow paths.

Install setbuf hooks that leave the internal buffer unchanged, and add
regression test cases for the narrow and wide cases, based on the
reproducer in BZ #34019.

Checked on x86_64 with no regression in the libio tests.

Reported-by: Rocket Ma <marocketbd@gmail.com>
Signed-off-by: Xiang Gao <gaoxiang@kylinos.cn>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

x86: Lower non-temporal copy threshold for Hygon

Benchmarks on Hygon processors show that the default non-temporal
threshold is higher than ideal for large copy workloads. As a result,
memcpy and memmove may continue to use the temporal copy path for
longer than is beneficial, increasing cache pollution and reducing
throughput for large copies.

Lower the copy non-temporal threshold to 3/8 of the shared cache size
per thread on Hygon. This allows the non-temporal copy path to be
selected earlier while leaving the memset non-temporal threshold
unchanged.

Signed-off-by: xiejiamei <xiejiamei@hygon.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Don't crash in dlsym when tail-called from a constructor [BZ #34156]

If a shared library's constructor calls dlsym and discards the result,
the compiler is free to lower the call to a tail jump. The dynamic
linker then resolves the apparent caller to ld.so's own link map, which
has no l_scope, and crashes in _dl_lookup_symbol_x dereferencing the
NULL scope pointer.

Tail-call optimization is a legal C transformation and there is no way
for the dynamic linker to recover the real caller from the elided frame.
Detect the situation by its observable effect -- a link map with no
l_scope -- and fall back to the main program's link map, the same
treatment used when the caller's address is otherwise unrecognized.

The check is written against l->l_scope rather than against _dl_rtld_map
directly because dl-sym-post.h is also compiled into libc.so, where
_dl_rtld_map is not visible (it lives only in ld.so).

Add dlfcn/tst-dlsym-ctor exercising the pattern. Without the fix the
test SIGSEGVs during dlopen; with the fix dlopen returns cleanly.

Signed-off-by: Daan De Meyer <daan@amutable.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

SHARED-FILES: Update gettext sync record

Update the gettext section to reflect the 2026 sync with GNU
gettext 1.0 (through commit 2ebbdd0e2). Add intl/eval-plural.h
which was missing from the shared files list.

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

intl: Fix undefined pointer behaviour

In _nl_find_msg (dcigettext.c), outbuf was computed as
freemem + sizeof(size_t) before checking whether freemem_size is
large enough.  When freemem is NULL (initial state), this is
undefined behaviour i.e arithmetic on a null pointer.  Move the
outbuf assignment after the size check where freemem is guaranteed
to be a valid allocation.

In read_alias_file (localealias.c), after realloc the old
string_space pointer is dangling.  The expression
new_pool - string_space subtracts a valid pointer from a dangling
one, which is undefined behaviour per ISO C 23.
Rewrite as new_pool + (map[i].alias - string_space) so both
operands of the subtraction point into the same (old) object
before string_space is reassigned.

Based on GNU gettext commits 695429040 and 2ebbdd0e2.
Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Fix hurd bootstrap after 4c6f92daead

Hurd bootstrap with build-many-glibc.py fails for i686-gnu since
4c6f92daead7aa989ae1b7c67760f81a3550f044:

In file included from zic.c:16:
private.h:849:1: error: static declaration of ‘mempcpy’ follows non-static declaration
  849 | mempcpy(void *restrict s1, void const *restrict s2, size_t n)
      | ^~~~~~~
In file included from ../include/string.h:60,
                 from private.h:222:
../string/string.h:432:14: note: previous declaration of ‘mempcpy’ with type [...]
  432 | extern void *mempcpy (void *__restrict __dest,
      |              ^~~~~~~

The libc-symbols.h already defined some HAVE_*, but timezone files are
built with -D_ISOMAC.  Remove its usage and only define _ and N_
macros if not already defined.

Checked on x86_64-linux-gnu and with a build-many-glibcs.py build for
i686-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>

elf: Use dl_scratch_buffer for LD_LIBRARY_PATH copy in _dl_init_paths

_dl_init_paths used strdupa to make a mutable copy of LD_LIBRARY_PATH
for fillin_rpath to tokenize.  The env block is attacker-controllable
and Linux allows individual variables up to MAX_ARG_STRLEN (32 *
PAGE_SIZE = 128 KB), so the strdupa can push tens of KB onto the
loader's startup stack on top of the env block that already sits on
the initial stack.  With a reduced RLIMIT_STACK the doubled copy
overflows before main () is reached.

Replace the strdupa with a dl_scratch_buffer: short paths stay in
the 256-byte inline area, longer ones spill to anonymous mmap (malloc
is not yet available during _dl_init_paths).  Two follow-on changes
make the new scratch lifetime safe against _dl_signal_error:

  * Count entries directly off the const LD_LIBRARY_PATH and allocate
    __rtld_env_path_list.dirs *before* the scratch is live.  That way
    the larger of the two heap allocations the loader controls signals
    its OOM with no scratch to leak.

  * Convert fillin_rpath to return bool instead of calling
    _dl_signal_error internally on per-entry malloc failure.  Its
    only caller in the LLP path now frees the scratch first and then
    signals the error from a clean state.  decompose_rpath, the other
    caller, is updated symmetrically.  This also fixes a pre-existing
    leak in fillin_rpath's OOM path, where the to_free heap copy from
    expand_dynamic_string_token was not released before the
    _dl_signal_error.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Use dl_scratch_buffer for DST expansion in _dl_map_object_deps

The expand_dst macro in _dl_map_object_deps performs an unbounded
alloca via DL_DST_REQUIRED, which scales with the link map's
l_origin length plus the count of dynamic-string tokens in the
input string. When a DT_NEEDED entry carries several DSTs and the
link map sits in a deep directory, the resulting allocation grows
to several kilobytes -- enough to overflow a PTHREAD_STACK_MIN
thread that calls dlopen.

Convert the macro to a static function that draws from a caller-
owned dl_scratch_buffer, so oversized expansions land on the heap
(or anonymous mmap during early startup) instead of the stack.
The scratch buffer is reused across DT_NEEDED, DT_AUXILIARY, and
DT_FILTER entries of the same map and freed once dependency
expansion completes.

A new regression test, tst-dst-needed-minstack, builds a wrapper
library that inherits a five-DST SONAME from a leaf module,
deploys it under a deep temporary directory, and dlopens it from
a PTHREAD_STACK_MIN thread. Without the fix the dlopen overflows
the thread stack and crashes; with the fix the dlopen returns
cleanly (with or without a successful load).

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Replace alloca with dl_scratch_buffer in _dl_load_cache_lookup

The alloca added by commit ccdb048d ("Fix recursive dlopen") to
snapshot the matched cache entry before __strdup runs through
interposable malloc is sized by best_len, which can reach PATH_MAX.
On PTHREAD_STACK_MIN threads that's enough to overflow the stack
mid-dlopen.

Use dl_scratch_buffer with DL_SCRATCH_NO_MALLOC: short entries stay
in the 256-byte inline area, longer ones spill to anonymous mmap
rather than to interposable malloc. The recursive-dlopen invariant
is preserved.

New container test elf/tst-dl-cache-long-path constructs a ~3.4 KB
deep directory, populates ld.so.cache with that entry, and dlopens
from a PTHREAD_STACK_MIN thread under deliberate stack pressure;
reliably SIGSEGVs against the alloca-based code and passes with the
fix.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Replace alloca/VLA with dl_scratch_buffer in dl-load.c

is_trusted_path_normalize, print_search_path, and open_path used
alloca or a VLA to hold a path scratch buffer sized by user-controlled
inputs (an RPATH directory length, or
max_dirnamelen + max_capstrlen + namelen). On the worst case that
consumes up to PATH_MAX bytes of stack per call, which can overflow a
PTHREAD_STACK_MIN-sized stack mid-dlopen when combined with the
loader's other on-stack scratch (struct filebuf, etc.).

Replace those allocations with dl_scratch_buffer. As a small cleanup,
print_search_path now takes the scratch buffer from its caller
(open_path's buffer is already large enough --
max_dirnamelen + max_capstrlen + namelen with namelen >= 1 covers the
max_dirnamelen + max_capstrlen + 1 print_search_path requires), so
LD_DEBUG=libs no longer pays for an extra allocation per open_path
invocation.

A new test elf/tst-dl-path-buf exercises the relevant paths -- dlopen
via DT_RPATH, open_path failure cleanup, dlopen with an over-long
name, dlopen from a PTHREAD_STACK_MIN thread.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

support: Add use_stack_min option to support_small_thread_stack_size

It allows it to return PTHREAD_STACK_MIN if defined.

Checked on x86_64-linux-gnu and with a build for i686-gnu.

Suggested-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Add dl_scratch_buffer, a loader-side scratch buffer

Several loader code paths need a short-lived scratch buffer sized
by attacker-influenced inputs (RPATH entries, ld.so.cache strings,
etc.).  The available primitives are all unsuitable:

  - alloca is unbounded and can overflow PTHREAD_STACK_MIN stacks.

  - <scratch_buffer.h> is unaware of __minimal_malloc: a malloc'd
    spill freed during early loader startup silently leaks because
    __minimal_free only releases the most-recent allocation.

  - A few paths cannot route through the interposable malloc at
    all -- ld.so.cache lookup in particular, because an interposed
    user malloc may recursively call dlopen and __munmap the cache
    mapping mid-copy (commit ccdb048d, "Fix recursive dlopen").

Add a loader-side analogue of <scratch_buffer.h>: a 256-byte inline
area for the common case, with spill to malloc by default or to
anonymous mmap when __minimal_malloc is active or the caller passes
DL_SCRATCH_NO_MALLOC.  Mmap spills are tagged " glibc: loader
scratch" via __set_vma_name for /proc/self/maps visibility.  On OOM
dl_scratch_buffer_allocate raises a loader error via _dl_signal_error
and does not return.  The one-shot contract (no second allocate
without an intervening free) is enforced by an assertion in
_dl_scratch_buffer_allocate.

No functional change in this commit; consumers are added separately.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

malloc: Small fix for code readability

A couple of small fixes for code readability, no functional change.

- Add missing comments for #endif statements.
- Move inclusion of string.h from malloc.c to calloc-clear-memory.h
where it is actually used.
- Re-order alias definitions for malloc functions.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

s390: Adjust configure check for static-pie support.

With the previous approach, the configure check fails for lld in version >=19.
While binutils and lld 18 is placing the R_390_IRELATIVE relocation in .rela.plt
and emits DT_JMPREL pointing to it, newer lld versions puts the R_390_IRELATIVE
relocation in .rela.dyn and therefore there is also no DT_JMPREL entry and the
configure check claims that lld does not support static-pie.

The R_390_IRELATIVE relocation is also processed fine in .rela.dyn, thus the
configure check is adjusted. Now the configure checks that it exists a
R_390_IRELATIVE relocation at all. If the R_390_IRELATIVE relocation lands in
.rela.plt, it ensures that there is DT_JMPREL pointing to it. Otherwise there
should be a .rela.dyn section.

nss_files: use booleans in parser macro calls

The swallow argument in the INT_FIELD and STRING_FIELD macros is used as a
boolean, change all callers to use false and true instead of 0 and 1.

nss_files: fix swapped arguments in service parser

The port number in the service file is a decimal number followed by a
single slash.

Regenerate 'configure'

* configure: Regenerate from configure.ac,
which was updated in the recent commit
'Simplify tzdb-related configuration'.

AArch64: Optimize memcmp for Kunpeng 950 with SVE

Key optimizations:
- Use SVE predication for branch-free handling of short inputs and tails
- Use 4-way loop unrolling to maximize pipeline utilization
- Optimize mismatch detection with early exit logic

Benchmark (bench-memcmp, generic -> this patch):
- Small (0-128B): 15% - 50% speedup
- Medium (129-1024B): 21% - 50% speedup
- Large (2048-4096B): 28% - 50% speedup

Note: regressions may be observed in edge cases where offsets
are near 4K boundaries. These instances are rare and the overall
performance gain remains significantly positive.

Also add IFUNC support for memcmp and correct the first-line
comment in memcpy_kunpeng950.S.

Simplify tzdb-related configuration

tzdb 2026b no longer needs -Wno-discarded-qualifiers or
-Wno-unused-variable. From a suggestion by Joseph Myers in:
https://sourceware.org/pipermail/libc-alpha/2026-May/177312.html
* configure.ac (config-cflags-wno-discarded-qualifiers): Remove.
* timezone/Makefile (CFLAGS-zic.c): Remove -Wno-unused-variable,
$(config-cflags-wno-discarded-qualifiers).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

timezone: sync to tzdb 2026b

Sync tzselect, zdump, zic to tzdb 2026b.

This fixes some buffer and integer overflows in zic,
adds new zic options -D, -m and -u inspired by FreeBSD,
and raises zic’s maximum number of abbreviation bytes
per timezone from 50 to 256.

This patch incorporates the following tzdb source code changes:

f9d30685 Output a minimal time zone designation table
37a4d178 Fix zic overflow bug with too-large offsets
4392f2dc zic now checks for signals more often
99a08a66 Fix zic buffer overflow when computing TZ
d63b9287 zic: keep needed last transition to new type
d005045d Pacify clang -Wunterminated-string-initialization
e67b08d3 Port to C23 strchr macro
3d4b4e46 Add zic.c overflow commentary
d9101b88 zic now a bit safer for overflows near 2**63
b23fa8e0 zic now allows more than 50 leap seconds
4ff518d2 Increase TZ_MAX_CHARS from 50 to 256
75d3b73b New -DTZ_RUNTIME_LEAPS=0 build-time option
87343c6e TZ_MAX_TIMES must be at least 310 now
fc8f1b68 Simplify int_fast32_t definition on C89 platforms
24581465 Remove TZDEFRULES ("posixrules") from localtime.c
fc708427 zic now warns about -p
b09a3f23 Port TWOS_COMPLEMENT to signed-magnitude hosts
56b7a24a Make sure 2**31 - 1 is signed
9068ab78 zic no longer generates utoff == -2**31
cb6f9b3b Omit unnecessary L suffixes
c37fbc32 Clarify when ‘__attribute__((pure))’ is a hack
859690a7 Fix some unsequenced/reproducible commentary
9c772ca7 Port to POSIX.1-2001 fflush
10f93018 Omit no-op transitions when Rule+Zone cancel
a0b09b52 Fix unlikely backslash bug in scripts
2cbd3a71 Allow builder to override GRANDPARENTED
c7257626 not used at → used outside
faed4bd3 Clarify <sys/auxv.h> vs getauxval
df08e6a1 Port mode_t (and gid_t, uid_t) to MS-Windows
6127d375 New zic option -u, inspired by FreeBSD
813c9ee0 New zic option -m, inspired by FreeBSD
987ea89c New zic option -D, inspired by FreeBSD
cc377b07 Simplify mkdir situation
cd994a90 Simplify !HAVE_POSIX_DECLS situation
052ddf76 Minor gettext macro improvements
d9018f1c Refactor duplicate duplicate-option code
8d65db97 Prefer fdopen to umask in zic
d7edca6e Omit “'”s from zic usage message
a09ba7a5 getopt returns -1 (not EOF) on failure
e22d410c zic now uses is_digit
f57cadda Always invoke umask at start
242a8338 Fix mode_t issues on MS-Windows
2fecd606 MKDIR_UMASK → MKDIR_PERMS refactoring
90ef088a Move static_assert to top level
41576478 Port better to platforms lacking mempcpy
90a08d3e * private.h: Include stddef.h early enough
aa8b35fe Simplify port to NetBSD struct __state
cd2fddf7 Port to -DHAVE_SYS_STAT_H=0 -DHAVE_POSIX_DECLS=0
8470e759 Pacify GCC 15 -Wunterminated-string-initialization
8817d42f Prefer mempcpy to doing it by hand
87abb113 Tighten security checks on TZ values
c87f0918 Use strnlen
07f7f31a Fix preprocessor indenting
3adf4123 Add offtime_r à la FreeBSD and NetBSD
b807a31e Don’t depend on ‘true’ for tzselect
ddffc800 * zic.c: Fix misspelled comment (thanks to Jonathan Wakely).
7063d08c Fix bug with -d RELATIVE -t ABSOLUTE
e8920e76 Rename emalloc to xmalloc.
e8e1a3d2 NetBSD defines STD_INSPIRED functions
3411494c Define _CRT_DECLARE_NONSTDC_NAMES for MS-Windows
7c909166 Define NOMINMAX for MS-Windows
24a4d97f 'zdump -' now reads from stdin
e6d6bc3e Pacify gcc -Wsuggest-attribute=format sans snprintf in zdump
99557862 TZNAME_MAXIMUM defaults to 254, not 255
fe5be99d Be more consistent about macro true/false vs 1/0
31f483a1 Remove dependency of asctime on strftime
7ef7ed06 Simplify timeoff redefinition
1bd67a4b Move MKTIME_MIGHT_OVERFLOW definition
67f7e8ab Pacify GCC 15ish -Wzero-as-null-pointer-constant
535a4e8b Pacify GCC 15ish -Wleading-whitespace=blanks
0706ef0b Move iinntt definition
ea814e99 strftime %s no longer is limited to time_t range
41e5344e Fix bug near the year 2**31 - 1 - 1900
4e1de249 Pacify gcc -Wsuggest-attribute=const
ebd2ed92 Don’t define _FILE_OFFSET_BITS if _TIME_BITS
26a649a1 Improve zdump overflow checking
9c8221d7 * private.h: Fix timeoff comment.
9db906a0 Switch from RFC 8536 to 9636 for documentation
af54a9e8 Port better to glibc when used internally there

Checked on x86_64-linux-gnu.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

x86: Lower non-temporal copy threshold for Hygon

Benchmarks on Hygon processors show that the default non-temporal
threshold is higher than ideal for large copy workloads. As a result,
memcpy and memmove may continue to use the temporal copy path for
longer than is beneficial, increasing cache pollution and reducing
throughput for large copies.

Lower the copy non-temporal threshold to 3/8 of the shared cache size
per thread on Hygon. This allows the non-temporal copy path to be
selected earlier while leaving the memset non-temporal threshold
unchanged.

Signed-off-by: xiejiamei <xiejiamei@hygon.cn>

libio: Fix fmemopen_write on appending mode (BZ 34006)

* libio/fmemopen.c: Reference pos the variable instead of c->pos.
On the edge case, one byte should be written at the end of buffer,
instead of returning error.

Signed-off-by: Rocket Ma <marocketbd@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Defer all IRELATIVE relocations until after PLT setup

When a shared library is built with -z lazy and its IFUNC resolver calls
a PLT function, the dynamic linker can crash.  The resolver runs while
the PLT stubs still hold their raw ELF virtual addresses — l_addr has
not yet been added — so the call branches to an unmapped address.

The old code deferred IRELATIVE entries only to the end of the relocation
range currently being processed (via the r2/end2 scan-ahead mechanism in
elf_dynamic_do_Rel).  This was sufficient only when both IRELATIVE and the
JMP_SLOT entries for the PLT functions it needs are in the same section.
On x86-64, aarch64, arm, i386 and most other targets, a file-scope
initialiser of the form

  int (*fptr)(void) = some_ifunc;

causes the linker to place R_*_IRELATIVE in .rela.dyn, while JMP_SLOT
entries for any PLT calls made by the resolver live in .rela.plt.
Processing .rela.dyn before .rela.plt means the resolver fires before the
PLT is usable, regardless of where within .rela.dyn IRELATIVE appears.

Fix this by splitting IRELATIVE processing into a separate, explicitly
deferred pass.  In elf/do-rel.h:

- Remove the r2/end2 variables and the post-loop IRELATIVE re-scan from
   elf_dynamic_do_Rel.  IRELATIVE entries are now always skipped in the
   non-bootstrap path.

- Add a new elf_dynamic_do_Rel_irelative function that scans a
   relocation range and calls elf_machine_rel/elf_machine_lazy_rel for
   IRELATIVE and ifunc relocations.

In elf/dynamic-link.h, update _ELF_DYNAMIC_DO_RELOC to use a two-phase
approach for non-bootstrap builds unconditionally (regardless of whether
ranges[1].size is zero):

Phase 1+2: elf_dynamic_do_Rel over .rela.dyn then .rela.plt — processes
            everything except IRELATIVE/STT_GNU_IFUNC.
Phase 3+4: elf_dynamic_do_Rel_irelative over .rela.dyn then .rela.plt —
            processes only IRELATIVE, by which point all PLT stubs are
            valid.

This guarantees that IRELATIVE resolvers can call PLT stubs safely
regardless of which section the linker placed R_*_IRELATIVE in.

Add ELF_MACHINE_IRELATIVE to the architectures that were missing it so
the new skip logic in elf_dynamic_do_Rel is compiled for all targets.

This patch addresses the binutils BZ 13302 [1] from the glibc side, and
also fixes the mold-reported issue [2], which shows that IFUNC relocation
placement and processing can work differently across ABIs.

I checked on all ABIs that support IFUNC (x86_64, i686, aarch64, arm,
loongarch, powerpc, riscv, s390, and sparc), some via qemu-system.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=13302
[2] https://github.com/rui314/mold/issues/1550

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

malloc: Remove dynamic mmap/trim threshold [BZ #30769]

v2: Update documentation

Whenever a large mmap is released the mmap and trim thresholds are updated.
As a result these thresholds grow ever larger which means huge allocations
are always served by arenas rather than mmap. The thresholds can end up as
large as an arena, which completely stops all trimming of the top block.
Remove the code completely - the default thresholds seem way too low for
modern 64-bit targets, but they can be increased seperately.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

intl: Fix memory leak in _nl_find_domain on allocation failure

When _nl_explode_name() returns -1 (out of memory) and the locale was
resolved through an alias, _nl_find_domain() returns immediately
without freeing the locale copy allocated earlier.  Similarly,
when _nl_make_l10nflist() returns NULL, the 'goto out' skips the
alias_value free.

Fix by nesting the _nl_make_l10nflist() call and its result handling
inside 'if (mask != -1)' instead of returning early.  Move the
normalized_codeset free inside the same block.  Both failure paths
now fall through to the unconditional alias_value free at the end.

Imported from GNU gettext commit 10eafd9e5.
Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

intl: Remove pre-C99 fallbacks from plural-exp.c

glibc requires C11 since 2022, making pre-C99 compatibility
paths in plural-exp.c dead code:

- init_germanic_plural(): With C99+, GERMANIC_PLURAL is
initialized at compile time and this function is never called.
Remove the function and the INIT_GERMANIC_PLURAL macro.

- HAVE_STRTOUL guard: Protected strtoul() usage with a manual
digit-parsing fallback. strtoul is in C89 <stdlib.h> and glibc
provides it. Remove the guard and the fallback loop.

Imported from GNU gettext commits ab5990532 and c1d84d656.
Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

intl: Remove PRI_MACROS_BROKEN from loadmsgcat.c

PRI_MACROS_BROKEN was a workaround for AIX 4, where inttypes.h
did not properly define the PRI* format macros (PRId8, PRIu32, etc.).
glibc has never supported AIX, the macro was always hardcoded to 0
under _LIBC, making it a dead code.

GNU gettext removed this in commit 267f61670 ("Drop portability to
AIX 4"), since no supported system has broken PRI macros post-C99.

Based on GNU gettext commit 267f61670.
Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

intl: Remove IN_LIBGLOCALE dead code

Remove all IN_LIBGLOCALE conditional blocks from intl/. libglocale
was a proposed API from 2005 that was never completed or shipped.
The macro is never defined in glibc or in current GNU gettext, making
every #ifdef IN_LIBGLOCALE block dead code.

GNU gettext removed these in commits starting from 2023. Removing
them from glibc reduces noise and eases future syncs with gettext.

Imported from GNU gettext commit d6a6801c1.
Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

AArch64: simplify __libc_arm_za_disable failure path

The failure tail of __libc_arm_za_disable only leads to
__libc_fatal, so it does not need to preserve call frame state.

Remove the PAC prologue, frame setup, saved cntd value, stack
stores, and associated CFI directives from the fatal path, leaving
only the required SME state shutdown and fatal call.

Add tst-sme-za-disable-fail to exercise the abort path by providing
a TPIDR2 block with non-zero reserved bytes and checking that the
process terminates with SIGABRT and the expected fatal message.

stdio-common: Silence clang -Wfortify-source warning in tst-vfscanf-bz34008

clang does not recognize the 'm' scanf specifier and incorrectly warns
that the buf argument may overflow. Suppress the warning with the
clang-specific DIAG_* macros.

elf: Batch program-header reads in _dl_map_segments (oversight fix)

The fix for BZ 26577 ("Fix stack overflow in _dl_map_object_from_fd
with large e_phnum") removed the alloca for the program-header table
and introduced a streaming iterator (dl_pt_load_iterator) so segments
could be walked without staging the entire table on the stack.

That patch batched reads correctly in _dl_map_object_scan_phdrs (the
first walk, which collects PT_DYNAMIC/PT_TLS/PT_GNU_* metadata), but
overlooked the second walk in _dl_map_segments:
_dl_pt_load_iterator_next issued one pread64 per program header to
find the next PT_LOAD entry.  For an object with N program headers
this added N redundant per-phdr syscalls on every dlopen / loader
startup -- regardless of whether the table had already been read by
open_verify into struct filebuf.

Unify both walks behind a single batched helper,
_dl_pt_load_iterator_phdr_at:

  - When the program header table fits in the bytes already read by
    open_verify into fbp->buf (the common case for nearly all shared
    objects), all phdr accesses are served from that buffer with no
    syscall at all.

  - Otherwise, up to FILEBUF_SIZE / sizeof(ElfW(Phdr)) program headers
    are read into fbp->buf with a single pread64; subsequent indices
    in the same window hit the buffer.

Both _dl_map_object_scan_phdrs and _dl_pt_load_iterator_next now go
through this helper, eliminating the separate batching logic in
_dl_map_object_scan_phdrs.  struct filebuf moves from dl-load.c to
dl-load.h so the inline iterator in dl-map-segments.h can reach
fbp->buf.

The filebuf size is also bumped to ensure the cached fast path
triggers for all observed binaries.  A survey of an Ubuntu 24.04
installation (scanning /usr) shows:

    Candidate files       : 465834
    ELF files inspected   : 11624
    glibc-linked binaries : 10164
    Minimum e_phnum       : 5
    Maximum e_phnum       : 14
    Average e_phnum       : 11.37
    Median  e_phnum       : 11.0

shows e_phnum capped at 14 (for instance gcc's cc1, lto1, perl,
and gdb).  The previous FILEBUF_SIZE of 832 on 64-bit fit only 13
program headers after the ELF header (64 + 13*56 = 792), so 64-bit
binaries with 14 phdrs missed the cached path.  FILEBUF_SIZE is
bumped from 512/832 to 640/1024 (32-bit / 64-bit) -- enough for at
least 16 program headers on either ABI, leaving headroom over the
observed maximum.

For a typical shared library where open_verify's initial read covers
the program header table, this reduces _dl_map_segments from N
preads to 0.  For a worst-case e_phnum that does not fit in fbp->buf,
reads drop from N to ceil(N / phdrs_per_buf) -- the same cost
_dl_map_object_scan_phdrs already pays.

No functional change.  Tested on x86_64-linux-gnu, aaarch64-linux-gnu,
and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Fix 785a028ab70 for PTHREAD_STACK_MIN platforms

Due a missing ';'.

elf: Fix elf/tst-bz26577-minstack.c on hurd

Hurd does not defined PTHREAD_STACK_MIN, use support_small_thread_stack_size
instead.

Checked on a x86_64-gnu build.

libio: Fix race in _IO_new_file_init_internal initialization order [BZ #33785]

_IO_new_file_init_internal linked the new stream into _IO_list_all
before setting fp->_fileno to -1. A concurrent thread that walks
_IO_list_all (for example via fflush (NULL)) could observe the stream
with an uninitialized _fileno before initialization completed.

Set _fileno = -1 before _IO_link_in so the stream is fully
initialized when it becomes visible in the global list.

This is the residual concurrency defect noted at the end of commit
b657f72fa3 ("libio: Fix deadlock between freopen, fflush (NULL) and
fclose (bug 24963)").

Add libio/tst-file-init-race exercising concurrent fopen/fclose and
fflush (NULL) to detect regressions.

Signed-off-by: Shamil Abdulaev <ashamil435@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

test: Add gconv refcount leak test for swscanf

Add a new internal test, `tst-wcsmbs-clone-overflow`, to verify correct
gconv module reference counting. The Makefile is updated to include this
test in the `tests-internal` list and ensure it runs with generated locales.

This test specifically checks that the `__counter` for `gconv_fcts->towc`
does not leak references when `swscanf` is used with a stack-allocated
wide character stream. It ensures that `_IO_wstrfile_fclose_stack`
properly decrements the module reference counter, preventing a module
from staying loaded indefinitely due to unreleased references.

Assisted-by: LLM
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

libio: Fix gconv module reference counter overflow in swscanf

The swscanf family of functions creates a wide-oriented FILE stream
on the stack. Initialization of this stream invokes `_IO_fwide`, which
clones the global locale's gconv transformation steps via
`__wcsmbs_clone_conv`. This increments the reference counter (`__counter`)
of the gconv module.

Because the FILE stream is stack-allocated, `fclose` cannot be called,
and so `__gconv_release_step` is never invoked. The counter leaks,
eventually hitting the 32-bit integer overflow limit and aborting the
process.

To resolve this, we introduce `_IO_wstrfile_fclose_stack`, a dedicated
cleanup function for stack-allocated FILE streams. This function invokes
`_IO_FINISH` and correctly releases the gconv steps via
`__gconv_release_step` without attempting to `free` the FILE pointer.
This cleanup function is then hooked into all variants of swscanf right
before they return.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Eliminate alloca for program-header table in the ELF loader

The ELF loader allocates the program-header table on the stack with
alloca(e_phnum * sizeof(ElfW(Phdr))) in two places: once in
open_verify to call elf_machine_reject_phdr_p, and again in
_dl_map_object_from_fd to scan segment types.  Both fall back to
alloca only when the table does not fit in the initial fbp->buf read;
for a crafted ELF with e_phnum == 0x7FFF this means up to ~1.8 MB
(32767 × 56 bytes on a 64-bit host) on the stack in each call, with
no guard against the combination exhausting the available stack space.

A latent variant of this problem exists even for ordinary shared
libraries when dlopen is called from a thread running with
PTHREAD_STACK_MIN stack (16 KB on Linux).  The nptl/tst-minstack-exit
test demonstrates that glibc code paths must operate correctly under
minimum-stack conditions; loading a shared library with even a modest
number of program headers can overflow the remaining stack through the
alloca-based phdr table.

This patch eliminates both allocas by replacing them with a single
_dl_map_object_scan_phdrs function that reads program headers in
fixed-size chunks into the existing fbp->buf scratch buffer (512 B on
32-bit, 832 B on 64-bit) using pread.  When all headers fit within
the bytes already captured by open_verify's initial read() call (the
common case), no extra syscall is needed.  This should be the case for
most of the ELF objects and should not required additional syscalls.

The slow path issues as many pread calls as necessary without any stack
growth proportional to e_phnum.  The elf_machine_reject_phdr_p interface
is redesigned around a new struct dl_machine_phdr_info and on MIPS this
captures the PT_MIPS_ABIFLAGS entry in-flight, so the compatibility check
in elf_machine_reject_phdr_p no longer needs to re-scan the program-header
table.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

NB: this patch depends on https://sourceware.org/pipermail/libc-alpha/2026-May/177239.html
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

elf: Fix stack overflow in _dl_map_object_from_fd with large e_phnum (BZ 26577)

The _dl_map_object_from_fd uses a VLA (loadcmds[l->l_phnum]) whose size
is proportional to e_phnum.  A crafted ELF with e_phnum == 0x7FFF
allocates ~1.5 MB (32767 × 48 bytes on 64-bit machine) on the stack,
which adds to the previous ~1.75 MB alloca for the phdr table that
precedes it.

This patch follow Florian's suggestion [1] to use a two-pass approach
(collect-then-map) with a single-pass struct dl_pt_load_iterator that
precomputes the metadata needed by _dl_map_segments (p_align_max,
has_holes, first/last segment bounds, nloadcmds) and then yields one
struct loadcmd at a time through _dl_pt_load_iterator_next, holding at
most one loadcmd on the stack at a time.  The same iterator is
threaded through _dl_map_segments in dl-map-segments.h.

The main complex part is the test, which adds python-generated crafted
ET_DYN that has e_phnum == 0x7FFF: one PT_LOAD covering the ELF header
so the loader exercises the full iterator path, and the remaining
headers PT_NULL.  The test runs two subtests under a reduced stack limit
(phdr alloca + 1 MB headroom ≈ 2.75 MB, well below the 3.25 MB the
unfixed VLA code requires).

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

[1] https://sourceware.org/pipermail/libc-alpha/2026-February/175136.html
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

intl: Add tests for plural expression hardening

The first test checks for stack overflow.  It uses a plural expression
nested 5000 levels deep using the !(1-(...)) pattern.  The parser
accepts it (below YYMAXDEPTH=10000), but evaluation exeeds
EVAL_MAXDEPTH=100 and falls back to index 0 instead of crashing with
SIGSEGV.

The second test checks for division by zero in plural expression.  The
expression (n!=1)+1/(n!=1729) triggers 1/0 for n=1729.  msgfmt only
validates 0<= n <= 1000, so the .mo file is accepted.  Evaluation
returns PE_INTDIV and falls back instead of raising SIGFPE.

Adaptations from gettext to glibc:

- gettext's plural-3 embeds the nested expresion as a literal string.
This test uses an AWK script (plural-depth.awk) to generate the same
expression.

- gettext uses LANGUAGE= (empty) with LC_ALL=ll and its own locale
setup.  glibc requires a real locale for setlocale() or else the "C"
locale override in dcigettext.c ignores LANGUAGE entirely.

The tests are derived from GNU gettext's plural-3 (commit 021348871a22)
and plural-4 (commit 429ba6c6b835), adapted to glibc's test framework.

Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

intl: Import plural expression hardening from GNU gettext

The plural expression evaluator plural_eval() in eval-plural.h uses
unbounded recursion, which can cause a stack overflow crash with
deeply nested expressions in malicious .mo files.  This is
particularly dangerous on threads with small stacks (musl libc
default: 128 KB, AIX 7 default: 96 KB, glibc after ulimit -s 260:
~3919 recursions max).

Additionally, division by zero in plural expressions triggers
raise(SIGFPE), which is not multithread-safe, catching SIGFPE
requires per-process signal handlers that race with other threads.

Fix both by importing the hardening from GNU gettext:

- Replace unbounded plural_eval() with depth-limited
  plural_eval_recurse() (EVAL_MAXDEPTH=100), returning a
  struct eval_result with status instead of a bare unsigned long.

- Return PE_INTDIV status on division by zero instead of raising
  SIGFPE.  Remove the architecture-specific INTDIV0_RAISES_SIGFPE
  macro and the conditional #include <signal.h>.

- Update plural_lookup() in dcigettext.c to handle the new return
  type, falling back to index 0 on any evaluation failure.

Based on GNU gettext commits ef37a1540 and 726bfb1d1.
Discussed on: https://sourceware.org/pipermail/libc-alpha/2023-October/152010.html

Original author: Bruno Haible <bruno@clisp.org>

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

arm: Enable static-pie support (BZ 34098)

It requires proper gcc support [1], and without proper compiler support
the arm configure disable static-pie support.

The start.S requires some adjustment to avoid loading main from
the GOT.

Checked on arm-linux-gnueabihf with and without the gcc patch applied.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598610.html

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Sam James <sam@gentoo.org>

riscv: redirect strlen in early startup

__tunables_init calls strlen before ifunc relocations have been set up,
redirect it to __strlen_generic.

resolv, rt: Change some extern inline functions to static inline

The following functions:

__aio_create_helper_thread
__aio_start_notify_thread
__gai_create_helper_thread
__gai_start_notify_thread

are declared as extern inline, but no translation unit provides their
real definitions. This can lead to a link failure if the functions are
not inlined. Fix it by declaring them as static inline instead.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

riscv: Add RVV strncmp for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of strncmp for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__strncmp_vector) is added
alongside the generic fallback (__strncmp_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling strncmp().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

riscv: Add RVV strcmp for both multiarch and non-multiarch builds

This patch adds an RVV-optimized implementation of strcmp for RISC-V and
enables it for both multiarch (IFUNC) and non-multiarch builds.

The implementation integrates Hau Hsu's 2023 RVV work under a unified
ifunc-based framework. A vectorized version (__strcmp_vector) is added
alongside the generic fallback (__strcmp_generic). The runtime resolver
selects the RVV variant when RISCV_HWPROBE_KEY_IMA_EXT_0 reports vector
support (RVV).

Currently, the resolver still selects the RVV variant even when the RVV
extension is disabled via prctl(). As a consequence, any process that
has RVV disabled via prctl() will receive SIGILL when calling strcmp().

Co-authored-by: Hau Hsu <hau.hsu@sifive.com>
Co-authored-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>