git.ipfire.org Git - thirdparty/glibc.git/log

riscv: Treat clang separately in RVV compiler checks

Detect clang explicitly and apply compiler-specific version checks for
RVV support.

Signed-off-by: Zihong Yao <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>

math: Fix spurious overflow and missing errno for lgammaf

It syncs with CORE-MATH 9a75500ba1831 and 20d51f2ee.

Checked on aarch64-linux-gnu.

misc: Fix a few typos in comments

math: Sync lgammaf with CORE-MATH

It removes some unnecessary corner-case checks and uses a slightly
different binary algorithm for the hard-case database binary search.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

math: Sync tgammaf with CORE-MATH

It adds a minor optimization on fast path.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

Makefile: add allow-list for failures

Enable adding known failures to allowed-failures.txt and ignore failures
in case they are in the list. In case the allowed-failures.txt does not
exist, all failures lead to a failed status as before.

When the file is present, failures of listed tests are ignored and reported
on stdout. If tests not in the allowed list fail, summarize-tests exits with
status 1 and reports the failing tests.

The expected format of allowed-failures.txt file is:
<test_name> # <comment>

Reviewed-by: Florian Weimer <fweimer@redhat.com>

string: Add fallback implementation for ctz/clz

The libgcc implementations of __builtin_clzl/__builtin_ctzl may require
access to additional data that is not marked as hidden, which could
introduce additional GOT indirection and necessitate RELATIVE relocs.
And the RELATIVE reloc is an issue if the code is used during static-pie
startup before self-relocation (for instance, during an assert).

For this case, the ABI can add a string-bitops.h header that defines
HAVE_BITOPTS_WORKING to 0. A configure check for this issue is tricky
because it requires linking against the standard libraries, which
create many RELATIVE relocations and complicate filtering those that
might be created by the builtins.

The fallback is disabled by default, so no target is affected.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

AArch64: Remove prefer_sve_ifuncs

Remove the prefer_sve_ifuncs CPU feature since it was intended for older
kernels. Current distros all use modern Linux kernels with improved support
for SVE save/restore, making this check redundant.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

This reverts commit 6e8f32d39a57aa1f31bf15375810aab79a0f5f4b.

First off, apologies for my misunderstanding on how madvise(MADV_HUGEPAGE)
works. I had the misconception that doing madvise(p, 1, MADV_HUGEPAGE) will set
VM_HUGEPAGE on the entire VMA - it does not, it will align the size to
PAGE_SIZE (4k) and then *split* the VMA. Only the first page-length of the
virtual space will VM_HUGEPAGE'd, the rest of it will stay the same.

The above is the semantics for all madvise() calls - which makes sense from a
UABI perspective. madvise() should do the proposed thing to only the length
(page-aligned) which it was asked to do, doing any more than that is not
something the user is expecting.

Commit 6e8f32d39a57 tries to optimize around the madvise() call by determining
whether the VMA got madvise'd before. This will work for most cases except
the following: if check_may_shrink_heap() is true, shrink_heap() re-maps the
shrunk portion, giving us a new VMA altogether. That VMA won't have the
VM_HUGEPAGE flag.

Reverting this commit, we will again mark the new VMA with VM_HUGEPAGE, and
the kernel will merge the two into a single VMA marked with VM_HUGEPAGE.

This may be the only case where we lose VM_HUGEPAGE, and we could micro-optimize
by extending the current if-condition with !check_may_shrink_heap. But let us
not do this - this is very difficult to reason about, and I am soon going
to propose mmap(MAP_HUGEPAGE) in Linux to do away with all these workarounds.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: factor out ld.conf parsing

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

x86: Fix tanh ifunc selection

The inclusion of generic tanh implementation without undefining the
libm_alias_double (to provide the __tanh_sse2 implementation) makes
the exported tanh symbol pointing to SSE2 variant.

Reviewed-by: DJ Delorie <dj@redhat.com>

x86_64: Add cosh with FMA

The cosh shows an improvement of about ~35% when building for
x86_64-v3.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Consolidated common definition/data for cosh/sinh/tanh

Common data definitions are moved to e_coshsinh_data, cosh only
data is moved to e_cosh_data, sinh to e_sinh_data, and tanh to
e_tanh_data.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use tanh from CORE-MATH

The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,-10], [-10,10], [10,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MAX, -10]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-10, -10]
* FE_TONEAREST
     0:       4059325526  94.51%
     1:        231023238   5.38%
     2:          4618531   0.11%
* FE_UPWARD
     0:       2106654900  49.05%
     1:       2145413180  49.95%
     2:         40847554   0.95%
     3:          2051661   0.05%
* FE_DOWNWARD
     0:       2106618401  49.05%
     1:       2145409958  49.95%
     2:         40880992   0.95%
     3:          2057944   0.05%
* FE_TOWARDZERO
     0:       4061659952  94.57%
     1:        221006985   5.15%
     2:         12285512   0.29%
     3:            14846   0.00%

* Range [10, DBL_MAX]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Performance-wise, it shows:

latency                      master        patched        improvement
x86_64                     109.7420       184.5950            -68.21%
x86_64v2                   109.1230       187.1890            -71.54%
x86_64v3                    99.4471        49.1104             50.62%
aarch64                     43.0474        32.2933             24.98%
armhf-vpfv4                 41.0954        35.8473             12.77%
powerpc64le                 27.3282        22.7134             16.89%

reciprocal-throughput        master        patched        improvement
x86_64                      42.5562       158.1820           -271.70%
x86_64v2                    42.5734       159.2560           -274.07%
x86_64v3                    35.9899        24.2877             32.52%
aarch64                     24.7660        22.8466              7.75%
armhf-vpfv4                 27.0251        25.8150              4.48%
powerpc64le                 11.7350        11.2504              4.13%

* x86_64:        gcc version 15.2.1 20260112, Ryzen 9 5900X, --disable-multi-arch
* aarch64:       gcc version 15.2.1 20251105, Neoverse-N1
* armv7a-vpfv4:  gcc version 15.2.1 20251105, Neoverse-N1
* powerpc64le:   gcc version 15.2.1 20260128, POWER10

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Remove the SVID error handling from sinh

It improves throughput from 8 to 18% and latency from 1 to 10%,
dependending of the ABI.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use sinh from CORE-MATH

The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,-10], [-10,10], [10,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MAX, -10]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-10, -10]
* FE_TONEAREST
     0:       3169388892  73.79%
     1:       1125270674  26.20%
     2:           307729   0.01%
* FE_UPWARD
     0:       1450068660  33.76%
     1:       2146926394  49.99%
     2:        697404986  16.24%
     3:           567255   0.01%
* FE_DOWNWARD
     0:       1449727976  33.75%
     1:       2146957381  49.99%
     2:        697719649  16.25%
     3:           562289   0.01%
* FE_TOWARDZERO
     0:       2519351889  58.66%
     1:       1773434502  41.29%
     2:          2180904   0.05%

* Range [10, DBL_MAX]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Performance-wise, it shows:

latency                      master        patched        improvement
x86_64                     101.0710       129.4710            -28.10%
x86_64v2                   101.1810       127.6370            -26.15%
x86_64v3                    96.0685        48.5911             49.42%
aarch64                     41.4229        22.3971             45.93%
armhf-vpfv4                 42.8620        25.6011             40.27%
powerpc64le                 29.2630        13.1450             55.08%

reciprocal-throughput        master        patched        improvement
x86_64                      42.6895       105.7150           -147.64%
x86_64v2                    42.7255       104.7480           -145.17%
x86_64v3                    39.6949        25.9087             34.73%
aarch64                     26.0104        19.2236             26.09%
armhf-vpfv4                 29.4362        23.6350             19.71%
powerpc64le                 12.9170        8.34582             35.39%

* x86_64:        gcc version 15.2.1 20260112, Ryzen 9 5900X, --disable-multi-arch
* aarch64:       gcc version 15.2.1 20251105, Neoverse-N1
* armv7a-vpfv4:  gcc version 15.2.1 20251105, Neoverse-N1
* powerpc64le:   gcc version 15.2.1 20260128, POWER10

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Remove the SVID error handling from cosh

It improves throughout from 3.5% to 9%.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use cosh from CORE-MATH

The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,-10], [-10,10], [10,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MAX, -10]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-10, -10]
* FE_TONEAREST
     0:       3291614060  76.64%
     1:       1003353235  23.36%
* FE_UPWARD
     0:       2295272497  53.44%
     1:       1999675198  46.56%
     2:            19600   0.00%
* FE_DOWNWARD
     0:       2294966533  53.43%
     1:       1999981461  46.57%
     2:            19301   0.00%
* FE_TOWARDZERO
     0:       2306015780  53.69%
     1:       1988942093  46.31%
     2:             9422   0.00%

* Range [10, DBL_MAX]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Performance-wise, it shows:

latency                      master        patched     improvement
x86_64                      52.1066       126.4120        -142.60%
x86_64v2                    49.5781       119.8520        -141.74%
x86_64v3                    45.0811        50.5758         -12.19%
aarch64                     19.9977        21.7814          -8.92%
armhf-vpfv4                 20.5969        27.0479         -31.32%
powerpc64le                 12.6405        13.6768          -8.20%

reciprocal-throughput        master        patched     improvement
x86_64                      18.4833        102.9120       -456.78%
x86_64v2                    17.5409        99.5179        -467.35%
x86_64v3                    18.9187        25.3662         -34.08%
aarch64                     10.9045        18.8217         -72.60%
armhf-vpfv4                 15.7430        24.0822         -52.97%
powerpc64le                  5.4275         8.1269         -49.73%

* x86_64:        gcc version 15.2.1 20260112, Ryzen 9 5900X, --disable-multi-arch
* aarch64:       gcc version 15.2.1 20251105, Neoverse-N1
* armv7a-vpfv4:  gcc version 15.2.1 20251105, Neoverse-N1
* powerpc64le:   gcc version 15.2.1 20260128, POWER10

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

nptl/htl: Add missing AC_PROVIDES

nptl/htl: Fix confusion over PTHREAD_IN_LIBC and __PTHREAD_NPTL/HTL

The last uses of PTHREAD_IN_LIBC is where it should have been
__PTHREAD_NPTL/HTL. The latter was not conveniently available everywhere.
Defining it from config.h makes things simpler.

nptl: Drop comment about PTHREAD_IN_LIBC

nptl is now always in libc.

elf: directly call dl_init_static_tls

htl can now have it directly in ld.so

resolv: Move libanl symbols to libc on hurd too

elf: Drop librt.so from localplt-built-dso

It's always empty now.

rt: Move librt symbols to libc on hurd too

htl: Use pthread_rwlock for libc_rwlock

Like nptl does, so we really get rwlock behavior.

mach: Add __mach_rwlock_*

We cannot use pthread_rwlock for these until we have reimplemented
pthread_rwlock with gsync, so fork __libc_rwlock off for now.

configure: Remove extra ')' from b4c110022c

configure: Fix bootstrap build after 570c46d36b (BZ 33985)

The 570c46d36b make libgcc_s to be defined for have-cc-with-libunwind=noi
(default for gcc builds) without taking into consideration that the compiler
can link against -lgcc_s (defined by have-libgcc_s).

Checked with a build-many-glibc.py for x86_64-linux-gnu.

linux: Fix aliasing violations and assert address in __check_pf (bug #33927)

The Linux implementation of __check_pf retrieves interface data via
make_request, which queries the kernel via netlink. The IFA_ADDRESS
received from the kernel's RTM_NEWADDR netlink message is (a)
type-punned via pointer-casting leading to strict aliasing violations,
and (b) dereferenced assuming that it is non-NULL.

This commit removes the strict-aliasing violations using memcpy, and
adds an assert that the address is indeed non-NULL before dereferencing
it.

Reported-by: Siteshwar Vashisht <svashisht@redhat.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

x86: Don't left shift negative values

GCC warns about this with -Wshift-negative-value:

    In file included from ../sysdeps/x86/cpu-features.c:24:
    ../sysdeps/x86/dl-cacheinfo.h: In function ‘get_common_cache_info’:
    ../sysdeps/x86/dl-cacheinfo.h:913:45: warning: left shift of negative value [-Wshift-negative-value]
      913 |                           count_mask = ~(-1 << (count_mask + 1));
          |                                             ^~
    ../sysdeps/x86/dl-cacheinfo.h:930:45: warning: left shift of negative value [-Wshift-negative-value]
      930 |                           count_mask = ~(-1 << (count_mask + 1));
          |                                             ^~

This is because C23 § 6.5.8 specifies that this is undefined behavior.
We can cast it to unsigned which would be equivelent to UINT_MAX.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Support loading libunwind instead of libgcc_s

The 'unwind-link' facility allows glibc to support thread cancellation
and exit (pthread_cancel, pthread_exiti, backtrace) by dynamically
loading the  unwind library at runtime, preventing a hard dependency on
libgcc_s within libc.so.

When building with libunwind (for clang/LLVM toolchains [1]), two
assumptions in the existing code break:

  1. The runtime library is libunwind.so instead of libgcc_s.so.

  2. libgcc relies on __gcc_personality_v0 to handle unwinding mechanics.
     libunwind exposes the standard '_Unwind_*' accessors directly.

This patch adapts `unwind-link` to handle both environments based on
the HAVE_CC_WITH_LIBUNWIND configuration:

  * The UNWIND_SONAME macro now selects between LIBGCC_S_SO and
    LIBUNWIND_SO.

  * For libgcc, it continues to resolve `__gcc_personality_v0`.

  * For libunwind, it instead resolves the standard
    _Unwind_GetLanguageSpecificData, _Unwind_SetGR, _Unwind_SetIP,
     and _Unwind_GetRegionStart helpers.

   * unwind-resume.c is updated to implement wrappers for these
     accessors that forward calls to the dynamically loaded function
     pointers, effectively shimming the unwinder.

Tests and Makefiles are updated to link against `$(libunwind)` where
appropriate.

Reviewed-by: Sam James <sam@gentoo.org>
[1] https://github.com/libunwind/libunwind

configure: Repurpose have-cc-with-libunwind for clang support

The `have-cc-with-libunwind` check (and its corresponding macro
HAVE_CC_WITH_LIBUNWIND) was historically specific to IA64, intended
to supplement libgcc with libunwind.  Since this logic is unused in
current GCC configurations, this patch repurposes it to support
clang-based toolchains that utilize LLVM's libunwind instead of
libgcc_s.

The configure script now detects if the compiler natively supports
unwinding via `-lunwind`.

Additionally, when this mode is enabled, `-lclang_rt.builtins` is
explicitly added to the `libgcc_eh` definition.  This is necessary
because `links-dso-program` otherwise fails to link due to a missing
`__gcc_personality_v0` symbol.  It appears that clang does not
automatically link the builtins providing this personality routine
when `rlink-path` is actively used during the build.

Reviewed-by: Sam James <sam@gentoo.org>

configure: Parametrize runtime libraries to support compiler-rt

Historically, the build system has hardcoded references to `-lgcc` and
`-lgcc_eh`, explicitly assuming the use of the GCC runtime.  This
prevents building glibc with alternative toolchains, specifically clang
configured with `--rtlib=compiler-rt`, where these libraries are
replaced by `libclang_rt.builtins`.

This patch introduces a mechanism to dynamically detect the compiler's
underlying runtime library.

The logic works as follows:

1. It queries the compiler using `-print-libgcc-file-name`.
2. It parses the output path to determine if `libgcc` or `compiler-rt`
   is in use.
3. Based on this detection, it parametrizes the build variables for
   the static runtime and exception handling libraries (replacing
   hardcoded `-lgcc` and `-lgcc_eh`).

This ensures that the build system correctly links against the active
compiler runtime—whether it is the traditional libgcc or LLVM's
compiler-rt—without requiring manual overrides.

Reviewed-by: Sam James <sam@gentoo.org>

malloc: Remove lingering DIAG_POP_NEEDS_COMMENT

From 0ea9ebe48ad624919d579dbe651293975fb6a699.

malloc: Cleanup warnings

Cleanup warnings - malloc builds with -Os and -Og without needing any
complex warning avoidance defines.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Document CVE-2026-3904

All branches already have a fix, so this is mainly for distributions
that may have cherry-picked the SSE2 memcmp implementation.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>

LoongArch: Optimize float environment functions

In LoongArch, fcsr1 is the alias of enables field in fcsr0, fscr3 is the
alias of RM field in fscr0. This patch use fcsr1 and fcsr3 register to
optimize fedisableexcept, feenableexcept, fegetexcept, fegetround,
fesetround, get_rounding_mode functions, which could reduce the
additional andi instruction.

nptl: Only issues __libc_unwind_link_get for SHARED

The compiler already optimizes it away for static builds.

Reviewed-by: Collin Funk <collin.funk1@gmail.com>

x86_64: Conditionally define __sfp_handle_exceptions for compiler-rt

The LLVM compiler-rt builtins library does not currently provide an
implementation for __sfp_handle_exceptions. On x86_64, this causes
unresolved symbol errors when building glibc in environments that
exclude libgcc.

This patch implements __sfp_handle_exceptions specifically for x86_64,
bridging the gap for non-GNU compiler runtimes.

The implementation is used conditionally, only if the compiler does
not already provide the symbol.

NB: the implementation is based on libgcc and raises bosh SSE and i387
exceptions (different that the one from 460ee50de054396cc9791ff4)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

test-assert-c++-variadic.cc: Disable assert_works for GCC 14.2 and 14.1

PR118629 [1] resolved issue with usage of __PRETTY_FUNCTION__
(to which assert expands) inside unevaluated context for GCC 14.3.
This affects only versions 14.1 and 14.2, as -std=c++26 option is
supported since 14.1.

clang supports above snippet for all version that supports --std=c++26
flag (since 17.0.1).

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118629

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

libio: Properly link in function _IO_wfile_doallocate in static binaries

This patch addresses Bug 33935 - _IO_wfile_doallocate not linked correctly
when linking glibc statically.
https://sourceware.org/bugzilla/show_bug.cgi?id=33935

The function _IO_wfile_doallocate has been added with pragma weak in vtable.c,
while it is the only one symbol contained in wfiledoalloc.c,
and has not been directly called in libio.

In static binaries the true function symbol _IO_wfile_doallocate may not
be correctly linked when linking glibc with cases contains wchar functions,
but the weak symbol in vtable is linked instead,
and cause segmentation fault when running.

This patch fixes this with similar way to symbol _IO_file_doallocate,
that add libio_static_fn_required(_IO_wfile_doallocate) in wgenops.c
to make _IO_wfile_doallocate always link in static binaries.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Improve memalign alignment

Use generic stdc_bit_width to safely adapt to input types. Move rounding up of
alignments that are not powers of 2 to __libc_memalign. Simplify alignment
handling of aligned_alloc and __posix_memalign. Add a testcase for non-power
of 2 memalign and fix malloc-debug.

Reviewed-by: DJ Delorie <dj@redhat.com>

feat(rtld): Allow LD_DEBUG category exclusion

Adds support for excluding specific categories from `LD_DEBUG` output.

The `LD_DEBUG` environment variable now accepts category names prefixed
with a dash (`-`) to disable their debugging output. This allows users
to enable broad categories (e.g., `all`) while suppressing verbose or
irrelevant information from specific sub-categories (e.g., `-tls`).

The `process_dl_debug` function in `rtld.c` has been updated to parse
these exclusion options and unset the corresponding bits in
`GLRO(dl_debug_mask)`. The `LD_DEBUG=help` output has also been updated
to document this new functionality. A new test `tst-dl-debug-exclude.sh`
is added to verify the correct behavior of category exclusion.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf(tls): Add debug logging for TLS operations

This commit introduces extensive debug logging for thread-local storage
(TLS) operations within the dynamic linker. When `LD_DEBUG=tls` is
enabled, messages are printed for:
- TLS module assignment and release.
- DTV (Dynamic Thread Vector) resizing events.
- TLS block allocations and deallocations.
- `__tls_get_addr` slow path events (DTV updates, lazy allocations, and
static TLS usage).

The log format is standardized to use a "tls: " prefix and identifies
modules using the "modid %lu" convention. To aid in debugging
multithreaded applications, thread-specific logs include the Thread
Control Block (TCB) address to identify the context of the operation.

A new test module `tst-tls-debug-mod.c` and a corresponding shell script
`tst-tls-debug-recursive.sh` have been added. Additionally, the existing
`tst-dl-debug-tid` NPTL test has been updated to verify these TLS debug
messages in a multithreaded context.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: should check result of openat with -1 not 1

Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>

htl: Fix pthread_once memory ordering

We need to tie the fast-path read with the store, to make sure that when
fast-reading 1, we see all the effects performed by the init routine.

(and we don't need a full barrier, only an acquire/release pair is
needed)

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

htl: Make sure the exit path of last thread sees all thread cleanups

In case e.g. some atexit() handlers expect all threads to have finished
their side effects.

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

hurd: Check for _hurdsig_preempted_set with _hurd_siglock held

Without taking _hurd_siglock, we could be missing the addition of a global
preemptor.

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

htl: Call thread-specific destructors for last thread too

As required by posix.

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

htl: Fix checking for mutex not being recoverable

pthread_mutex_unlock sets __owner_id to NOTRECOVERABLE_ID

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

benchtests: Adapt tanh

Random values in the range of [-4,4].

benchtests: Adapt sinh

Random values in the range of [-10,10].

benchtests: Adapt cosh

Random values in the range of [-10,10].

Fix Makefile alphabetical ordering

hurd; Fix return value for sigwait

It is supposed to return an error code, not just -1.

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

hurd: Fix cleaning on sigtimedwait timing out

sigtimedwait also needs to clean up preemptors and the blocked mask before
returning EAGAIN.

Also add some sigtimedwait testing.

Linux: Only define OPEN_TREE_* macros in <sys/mount.h> if undefined (bug 33921)

There is a conditional inclusion of <linux/mount.h> earlier in the file.
If that defines the macros, do not redefine them. This addresses build
problems as the token sequence used by the UAPI macro definitions
changes between Linux versions.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

malloc: Avoid accessing /sys/kernel/mm files

On AArch64 malloc always checks /sys/kernel/mm/transparent_hugepage/enabled to
set the THP mode. However this check is quite expensive and the file may not
be accessible in containers. If DEFAULT_THP_PAGESIZE is non-zero, use
malloc_thp_mode_madvise so that we take advantage of THP in all cases. Since
madvise is a fast systemcall, it adds only a small overhead compared to the
cost of mmap and populating the pages.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

misc: Fix a few typos in comments

htl: Fix race between timedrd/wrlock and unlock

In case the rwlock is unlocked right before we time out, we will have been
given ownership, so we shouldn't time out.

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

hurd: Take cancel_lock in critical section

read/write etc. shall be signal-safe, and take cancel_lock, so we have to
defer signal delivery while holding cancel_lock.

Reported-by: Brent Baccala <cosine@freesoft.org> 's Claude assistant

resolv: Avoid duplicate query if search list contains '.' (bug 33804)

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

support: no_override_resolv_conf_search flag for resolver test framework

It is required to test "search ." in /etc/resolv.conf files. The
default is to override the search path isolate from unexpected
settings in the test execution environment.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

AArch64: Improve memset when len is 64

Change the mask to 48 to support len==64. The second memory store now accesses
offset 32, whereas the third one accesses offset 16. As a result performance
for len==64 almost doubles.

malloc: Add asserts for malloc assumptions

Currently malloc has various assumptions, some documented, some implicit.
Add a few asserts to check the most fundamental assumptions using verify().
Remove some odd #define void.

Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>

tests: posix: use cpu clock for sleep

On some emulated targets sleep may result in inconsistent wait
times which will lead to the failure of the tst-chmod test.

To account for this we use the CLOCK_PROCESS_CPUTIME_ID clock ID
while also consuming CPU time by repeatedly calling clock_gettime.

Reviewed-by: DJ Delorie <dj@redhat.com>

assert: Support assert as variadic macro for C++26 [PR27276]

C++26 changes assert into a variadic macro to support using
assignment-expressions that would be interpreted as multiple macro
arguments, in particular one containing:
* template parameter lists: func<int, float>()
* calls to overloaded operator[] that accepts multiple arguments: arr[1, 2]
  this is C++23 feature, see libstdc++ PR/119855 [1]
* lambdas with explicit captures: [x, y] { ... }

The new expansion in form:
  (__VA_ARGS__) ? void (1 ? 1 : bool (__VA_ARGS__))
                : __assert_fail (...)
Has the following properties:
* Use of (__VA_ARGS__) ? ... : ..., requires that __VA_ARGS__
  is contextually convertible to bool. This means that enumerators
  of scoped enumeration are no longer accepted (they are only
  explicitly convertible). Thus this patch address the glibc PR/27276 [2].
* Nested ternary 1 ? 1 : bool (__VA_ARGS__) guarantees that
  expression expanded from __VA_ARGS__ is not evaluated twice.
  This is used instead of unevaluated context (like sizeof...)
  to support C++ expressions that are not allowed in unevaluated
  context (lambdas until C++20, co_await, co_yield).
* bool (__VA_ARGS__) is ill-formed if __VA_ARGS__ expands to
  multiple arguments: assert(1, 2)
* bool (__VA_ARGS__) also triggers warnings when __VA_ARGS__
  expands to x = 1: assert(x = 1)

To guarantee that the code snippets from assert/test-assert-c++-variadic.cc,
are actually checked for validity, we need to compile this test in C++26
(-std=c++26) mode. To achieve that, this patch compiles the file with
test-config-cxxflags-stdcxx26 variable as additional flag, that is set to
-std=c++26 if $(TEST_CXX) executable supports that flag, and empty otherwise.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119855
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=27276

Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

math: Sync atanh with CORE-MATH

It speeds the fast-path for |x|<0.25. The CORE-MATH muldd is the
same as muldd2 from glibc ddcoremath.h.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

math: Sync log10p1f with CORE-MATH

The new code shows a small performance increase in x86_64:

latency                      patched        sync   improvement
x86_64                       41.8873     40.2864         3.82%
x86_64v2                     40.5859     39.2079         3.40%
x86_64v3                     34.6393     33.5018         3.28%
aarch64                      15.2731     14.5953         4.44%
armhf-vpfv4                  17.0373     17.0186         0.11%
powerpc64le                   8.3341      8.3298         0.05%

reciprocal-throughput        patched        sync   improvement
x86_64                       15.6516     13.6373        12.87%
x86_64v2                     15.0551     13.2769        11.81%
x86_64v3                     12.8994     11.0628        14.24%
aarch64                       8.8306      9.1898        -4.07%
armhf-vpfv4                   9.5855     10.0199        -4.53%
powerpc64le                   4.0074      4.4466       -10.96%

x86_64 / i686      gcc version 15.2.1 20260112. Ryzen 5900X
aarch64:           gcc version 15.2.1 20251105, Neoverse-N1
armv7a-vpfv4:      gcc version 15.2.1 20251105, Neoverse-N1
powerpc64le:       gcc version 14.2.1 20241230, POWER10

The code size also show slight improvement, the s_log10pf.os
'size' output shows:

size                patched     sync   improvement
x86_64                 2345     2243         4.35%
x86_64v2               2345     2243         4.35%
x86_64v3               2226     2162         2.88%
aarch64                2104     2112        -0.38%
armhf-vpfv4            2016     2012         0.20%
powerpc64le            2324     2340        -0.69%

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

math: Sync log10f with CORE-MATH

The performance is similar:

latency                        master        sync  improvement
x86_64                        34.6851     32.7977        5.44%
x86_64v2                      34.0921     32.4295        4.88%
x86_64v3                      27.8292     27.6070        0.80%
aarch64                       11.7246     11.1351        5.03%
armhf-vpfv4                   13.3748     12.9055        3.51%
powerpc64le                    6.4036      6.5825       -2.79%

reciprocal-throughput          master        sync  improvement
x86_64                        10.2653     10.0437        2.16%
x86_64v2                      10.8432     10.7040        1.28%
x86_64v3                      10.9006     11.0765       -1.61%
aarch64                        6.6447      6.2743        5.57%
armhf-vpfv4                    6.8916      6.7538        2.00%
powerpc64le                    2.9494      2.7661        6.21%

x86_64 / i686      gcc version 15.2.1 20260112. Ryzen 5900X
aarch64:           gcc version 15.2.1 20251105, Neoverse-N1
armv7a-vpfv4:      gcc version 15.2.1 20251105, Neoverse-N1
powerpc64le:       gcc version 14.2.1 20241230, POWER10

The code size is also similar.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

math: Sync log2p1f with CORE-MATH

The new code shows better performance overall:

latency                      patched        sync   improvement
x86_64                       48.5909     33.3368        31.39%
x86_64v2                     49.1357     33.9981        30.81%
x86_64v3                     39.2397     28.0957        28.40%
aarch64                      16.5372     12.8133        22.52%
armhf-vpfv4                  18.1434     14.5273        19.93%
powerpc64le                   9.0999     7.49235        17.67%

reciprocal-throughput        patched        sync   improvement
x86_64                       14.5197     10.9726        24.43%
x86_64v2                     14.7640     11.1358        24.57%
x86_64v3                     11.5523     9.83253        14.89%
aarch64                       8.2854      7.8479         5.28%
armhf-vpfv4                   8.8586      8.5245         3.77%
powerpc64le                   3.8995      4.0069        -2.75%

x86_64 / i686      gcc version 15.2.1 20260112. Ryzen 5900X
aarch64:           gcc version 15.2.1 20251105, Neoverse-N1
armv7a-vpfv4:      gcc version 15.2.1 20251105, Neoverse-N1
powerpc64le:       gcc version 14.2.1 20241230, POWER10

The sync also improves the internal table size, the s_log1pf.os
'size' output shows:

size                         master        sync   improvement
x86_64                         3417        2089        38.86%
x86_64v2                       3417        2089        38.86%
x86_64v3                       3228        2001        38.01%
i686                           3490        2151        38.37%
aarch64                        3200        1888        41.00%
armhf-vpfv4                    3080        1804        41.43%
powerpc64le                    3408        2148        36.97%

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

math: Sync log1pf with CORE-MATH

The performance is similar, with some minor regression:

latency                      master        sync   improvement
x86_64                      38.2841     38.1375         0.38%
x86_64v2                    37.7338     37.4292         0.81%
x86_64v3                    31.3500     32.3576        -3.21%
aarch64                     13.7384     13.9030        -1.20%
armhf-vpfv4                 15.5730     16.5105        -6.02%
powerpc64le                  7.6038      7.5757         0.37%

reciprocal-throughput        master        sync   improvement
x86_64                      12.4910     11.9683         4.18%
x86_64v2                    12.2935     11.7614         4.33%
x86_64v3                    11.5444     10.6369         7.86%
aarch64                      7.7262      7.8954        -2.19%
armhf-vpfv4                  8.3502      8.8741        -6.27%
powerpc64le                  3.5883      3.5259         1.74%

x86_64 / i686      gcc version 15.2.1 20260112. Ryzen 5900X
aarch64:           gcc version 15.2.1 20251105, Neoverse-N1
armv7a-vpfv4:      gcc version 15.2.1 20251105, Neoverse-N1
powerpc64le:       gcc version 14.2.1 20241230, POWER10

The sync also improves the internal table size, the s_log1pf.os
'size' output shows:

size                         master       sync    improvement
x86_64                         2078       1641         21.03%
x86_64v2                       2078       1641         21.03%
x86_64v3                       1975       1514         23.34%
aarch64                        1808       1336         26.11%
armhf-vpfv4                    1716       1284         25.17%
powerpc64le                    2132       1616         24.20%

Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

htl: Fix mt-safeness of libio

Since d2e04918833 ("Single threaded stdio optimization")
we are supposed to call _IO_enable_locks when creating the first thread,
but that commit missed doing it for htl.

hurd: Sign-extend the sigcode passed to msg_sig_post

The negative sigcodes are reserved for SI_* values, we need to keep them
negative when extending to exc_subcode's long type.

This fixes the comparison in HURD_PREEMPT_SIGNAL_P for the signal preemptor
for setitimer, thus fixing considering it.it_interval.

Reported-by: David Yang <mmyangfl@gmail.com>

Vectorise inverse hyperbolic special cases

Vectorise SVE and AdvSIMD special-case handling for inverse
hyperbolic functions (acosh, asinh, atanh).

General-case improvements yield an average 11% speedup, with
peak gains of up to 80%.

For benchmarking I used Neoverse V2 with GCC@15.

AArch64: Single and Double precision entire exp family, SVE and AdvSIMD optimisations

This patch vectorises remaining special cases and optimises some
fast path performance for single and double precision exp, SVE and
AdvSIMD.

Moves most special case functions to header files to minimise code size.

Uses NOINLINE in main case where half width alias is used to minimise
codegen.

Special case vectorisation performance increase of average 8x to
greatest 9.5x.

Special case improvements performance increase average 15% speed
improvement to greatest 40%.

Some fast path gains during rework of files. Fastest notable increase
in exp2m1 AdvSIMD double precision of 26% improvement. Most fast
paths improved by 5-10%. 8 unchanged. No regressions.

Benchmarked on Neoverse V2 with GCC@15

manual: Document that EOPNOTSUPP and ENOTSUP are equal, not distinct (BZ 2363)

The section 2.1 of the glibc manual says that EWOULDBLOCK == EAGAIN,
but forgets to mention that ENOTSUP == EOPNOTSUPP.

https://sourceware.org/legacy-ml/libc-alpha/2019-08/msg00629.html
https://sourceware.org/bugzilla/show_bug.cgi?id=2363
https://bugs.debian.org/337013

Signed-off-by: Nicolas Boulenguez <nicolas@debian.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

AArch64: Vectorise SVE log/log2/log10 single and double precision special cases.

This patch vectorises scalar fallbacks for SVE logB(f) functions.

Special case performance increase of around 3x speedup for double
and 5x speedup for single precision.

Some fast path gains during rework of files. Most fast paths improved
by 2-3%. 2 unchanged. No regressions.

Benchmarked on Neoverse V2 with GCC@15.

AArch64: Single and Double precision hyperbolics, SVE and AdvSIMD optimisations

This patch vectorises special cases and optimises some fast path
performance for single and double precision hyperbolics, sve and
advsimd.

Special case performance increase of average 4x to greatest 10x.

Some fast path gains during rework of files. Fastest notable increase
in sinh advsimd double precision of 2x. Most fast paths improved by
5-10%.

Benchmarked on Neoverse V2 with GCC@15.

malloc: alignment might change in future versions

This follows up on a comment by Wilco Dijkstra; see:
https://sourceware.org/pipermail/libc-alpha/2026-February/174934.html
* NEWS: Mention this.
* manual/memory.texi (Malloc Examples):
Say that alignment guarantee might change for small allocations.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

Say malloc (0) != NULL is now common; resection

* manual/memory.texi (Portable Allocation):
New section, split off from Malloc Examples.
Say that almost every system follows glibc's example
in having successful malloc (0) return non-null;
AIX is the only exception nowadays.
Document fundamental alignment portability.
Have examples match the new text, and use NULL rather than 0.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Document malloc alignment

* manual/memory.texi (Malloc Examples, Changing Block Size)
(Allocating Cleared Space):
Document the alignment of the returned value.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Document max_align_t

* manual/lang.texi (Important Data Types): Mention max_align_t.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

manual: Fix typo in documentation of iconv character set options

Reported-by: Andreas Schwab <schwab@suse.de>

debug: Fix build with --enable-fortify-source=1 (BZ 33904)

The libio/bits/stdio2-decl.h only defined the prototypes for
__vasprintf_chk and __vfprintf_chk for __USE_FORTIFY_LEVEL > 1.
Also defined them for the internal header regardless.

Checked with a build with --enable-fortify-source=1 and
--enable-fortify-source=2 for all afftected ABIs.

linux/mips: handle wait status 0x7f specially for WIFSIGNALED and WIFSTOPPED

MIPS Linux has SIGRTMAX=127, thus the wait status 0x7f means the program
is terminated by SIGRTMAX, not stopped.

This cannot happen on other ports so make a special version of
waitstatus.h for MIPS to avoid adding redundant calculation to others.
I cannot find a way to use status only once in the expression, so use
inline functions instead of macros to avoid double-evaluating status.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

elf: Fix tst-link-map-contiguous-ldso build for Hurd

Checked with a make check for i686-gnu.

elf: parse /proc/self/maps as the last resort to find the gap for tst-link-map-contiguous-ldso

The initialization process of libc.so calls mmap() several times and the
kernel may lay the maps into the gap.  If all pages in the gap are
occupied, the test would not be able to find the gap with mmap() and the
test would fail.

The failure reproduces most frequently on LoongArch because with the
commonly used page size (16 KiB) the gap only contains 4 pages and the
probability they are all occupied is not near to zero.

With the changes in the patch, a test run may output:

    info: ld.so link map is not contiguous
    info: object "/dev/zero" found at 0x7ffff1fe0000 - 0x7ffff1fe4000
    info: anonymous mapping found at 0x7ffff1fe4000 - 0x7ffff1fec000

Also take the chance to fix a mistake in the "object found at" message
which has puzzled me during the initial debug session.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

tests: fix tst-rseq with Linux 7.0

A sub-test of tst-rseq is to validate the return code and errno of the
rseq syscall when attempting to register the exact same rseq area as was
done in the dynamic loader.

This involves finding the rseq area address by adding the
'__rseq_offset' to the thread pointer and calculating the area size from
the AT_RSEQ_FEATURE_SIZE auxiliary vector. However the test currently
calculates the size of the rseq area allocation in the TLS block which
must be a multiple of AT_RSEQ_ALIGN.

Up until now that happened to be the same value since the feature size
and alignment exposed by the kernel were below the minimum ABI size of
32. Starting with Linux 7.0 the feature size has reached 33 while the
alignment is now 64.

This results in the test trying to re-register the rseq area with a
different size and thus not getting the expected errno value.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

libio: Fix deadlock between freopen, fflush (NULL) and fclose (bug 24963)

The canonical lock ordering for stream list processing is
locking list_all_lock first, then individual streams as needed.
The fclose implementation reversed that, and the freopen
implementation performed list operations under the reverse order,
too.

Unlinking in fclose was already unconditional, and the early unlinking
looks unnecessary: _IO_file_close_it would call it even for
!_IO_IS_FILEBUF streams.

There is still a remaining concurrency defect because
_IO_new_file_init_internal links in the stream before it is
fully initialized, and it is not locked at this point.

Reviewed-by: Arjun Shankar <arjun@redhat.com>

hurd: Define _POSIX_TIMERS to 200809L

We now have monotonic vs realtime clocks, and high-precision.

elf: Use dl-symbol-redir-ifunc.h instead _dl_strlen

Also replace the loop with strlen And remove
-fno-tree-loop-distribute-patterns usage.

It requires redirect the strlen to the baseline implementation
for x86_64, aarch64, and loongarch64.

Checked on x86_64-linux-gnu{-v2,v3} and aarch64-linux-gnu with
both gcc-15 and clang-21.

Reviewed-by: DJ Delorie <dj@redhat.com>

posix: execvpe: skip $PATH components that are too long [BZ #33626]

* posix/execvpe.c (__execvpe_common): Rather than error out
with ENAMETOOLONG, just ignore and try the next path.
Note we know the FILE length is <= NAME_MAX, so the ENAMETOOLONG
almost certainly pertains to the current $PATH entry.
* posix/tst-execvpe7.c: A new test based on tst-execvp3.c.
* posix/Makefile: Reference the new test.

Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Rename __unused fields to __glibc_reserved.

__unused is often defined to __attribute__((unused)) in BSD
sources and furthermore libstdc++ testsuite uses it as a macro
to verify libstdc++ headers don't use __unused identifiers.
In ~2012 glibc headers have been cleaned up, but some new
uses of __unused have reappeared (s390 fenv.h already many
years ago, the rest last November).

Reviewed-by: Florian Weimer <fweimer@redhat.com>

aarch64: Tests for locking GCS

Check that GCS is locked properly based on the value of the
glibc.cpu.aarch64_gcs tunable.

Test tst-gcs-execv checks that a child process can be spawned correctly
when GCS is locked for the parent process.

Test tst-gcs-fork checks that if GCS is not locked for the parent
process, the forked child can disable GCS.

Tests tst-gcs-lock and tst-gcs-lock-static check that GCS is locked
for dynamic and static executables when run with aarch64_gcs=1.

Tests tst-gcs-unlock and tst-gcs-unlock-static check that GCS is not
locked for dynamic and static executables when run with aarch64_gcs=0.

Test tst-gcs-lock-ptrace checks via ptrace that when GCS is locked,
all GCS features are locked.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

aarch64: Lock GCS status at startup

If GCS is enabled (see tunable glibc.cpu.aarch64_gcs), we lock all GCS
operations (including status, write on shadow stack, and push to shadow
stack) unless OPTIONAL policy is used.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

tests: aarch64: fix makefile dependencies for dlopen tests for BTI

Some BTI tests in the sysdeps/unix/sysv/linux/aarch64 directory use
test shared objects via dlopen. Due to lack of direct makefile level
dependencies on these modules these tests could be run before the
required .so files would be created. This could lead to flaky test
results when running make check with -j flag. This commit fixes it.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>