git.ipfire.org Git - thirdparty/glibc.git/log

elf: Implement force_first handling in _dl_sort_maps_dfs (bug 28937)

The implementation in _dl_close_worker requires that the first
element of l_initfini is always this very map (“We are always the
zeroth entry, and since we don't include ourselves in the
dependency analysis start at 1.”). Rather than fixing that
assumption, this commit adds an implementation of the force_first
argument to the new dependency sorting algorithm. This also means
that the directly dlopen'ed shared object is always initialized last,
which is the least surprising behavior in the presence of cycles.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 1df71d32fe5f5905ffd5d100e5e9ca8ad6210891)

elf: Rename _dl_sort_maps parameter from skip to force_first

The new implementation will not be able to skip an arbitrary number
of objects.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit dbb75513f5cf9285c77c9e55777c5c35b653f890)

scripts/dso-ordering-test.py: Generate program run-time dependencies

The main program needs to depend on all shared objects, even objects
that have link-time dependencies among shared objects. Filtering
out shared objects that already have an link-time dependencies is not
necessary here; make will do this automatically.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 183d99737298bb3200f0610fdcd1c7549c8ed560)

elf: Fix hwcaps string size overestimation

Commit dad90d528259b669342757c37dedefa8577e2636 added glibc-hwcaps
support for LD_LIBRARY_PATH and, for this, it adjusted the total
string size required in _dl_important_hwcaps. However, in doing so
it inadvertently altered the calculation of the size required for
the power set strings, as the computation of the power set string
size depended on the first value assigned to the total variable,
which is later shifted, resulting in overallocation of string
space. Fix this now by using a different variable to hold the
string size required for glibc-hwcaps.

Signed-off-by: Javier Pello <devel@otheo.eu>
(cherry picked from commit a23820f6052a740246fdc7dcd9c43ce8eed0c45a)

elf: Run tst-audit-tlsdesc, tst-audit-tlsdesc-dlopen everywhere

The test is valid for all TLS models, but we want to make a reasonable
effort to test the GNU2 model specifically. For example, aarch64
defaults to GNU2, but does not have -mtls-dialect=gnu2, and the test
was not run there.

Suggested-by: Martin Coufal <mcoufal@redhat.com>
(cherry picked from commit dd2315a866a4ac2b838ea1cb10c5ea1c35d51a2f)

Fixes early backport commit 924e4f3eaa502ce82fccf8537f021a796d158771
("elf: Call __libc_early_init for reused namespaces (bug 29528)");
it had a wrong conflict resolution.

NEWS: Note bug 12154 and bug 29305 as fixed

resolv: Fix building tst-resolv-invalid-cname for earlier C standards

This fixes this compiler error:

tst-resolv-invalid-cname.c: In function ‘test_mode_to_string’:
tst-resolv-invalid-cname.c:164:10: error: label at end of compound statement
case test_mode_num:
^~~~~~~~~~~~~

Fixes commit 9caf782276ecea4bc86fc94fbb52779736f3106d
("resolv: Add new tst-resolv-invalid-cname").

(cherry picked from commit d09aa4a17229bcaa2ec7642006b12612498582e7)

nss_dns: Rewrite _nss_dns_gethostbyname4_r using current interfaces

Introduce struct alloc_buffer to this function, and use it and
struct ns_rr_cursor in gaih_getanswer_slice. Adjust gaih_getanswer
and gaih_getanswer_noaaaa accordingly.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 1d495912a746e2a1ffb780c9a81fd234ec2464e8)

resolv: Add new tst-resolv-invalid-cname

This test checks resolution through CNAME chains that do not contain
host names (bug 12154).

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 9caf782276ecea4bc86fc94fbb52779736f3106d)

nss_dns: In gaih_getanswer_slice, skip strange aliases (bug 12154)

If the name is not a host name, skip adding it to the result, instead
of reporting query failure. This fixes bug 12154 for getaddrinfo.

This commit still keeps the old parsing code, and only adjusts when
a host name is copied.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 32b599ac8c21c4c332cc3900a792a1395bca79c7)

nss_dns: Rewrite getanswer_r to match getanswer_ptr (bug 12154, bug 29305)

Allocate the pointer arrays only at the end, when their sizes
are known. This addresses bug 29305.

Skip over invalid names instead of failing lookups. This partially
fixes bug 12154 (for gethostbyname, fixing getaddrinfo requires
different changes).

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit d101d836e7e4bd1d4e4972b0e0bd0a55c9b650fa)

nss_dns: Remove remnants of IPv6 address mapping

res_use_inet6 always returns false since commit 3f8b44be0a658266adff5
("resolv: Remove support for RES_USE_INET6 and the inet6 option").

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit a7fc30b522a0cd7c8c5e7e285b9531b704e02f04)

nss_dns: Rewrite _nss_dns_gethostbyaddr2_r and getanswer_ptr

The simplification takes advantage of the split from getanswer_r.
It fixes various aliases issues, and optimizes NSS buffer usage.
The new DNS packet parsing helpers are used, too.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit e32547d661a43da63368e488b6cfa9c53b4dcf92)

nss_dns: Split getanswer_ptr from getanswer_r

And expand the use of name_ok and qtype in getanswer_ptr (the
former also in getanswer_r).

After further cleanups, not much code will be shared between the
two functions.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 0dcc43e9981005540bf39dc7bf33fbab62cf9e84)

resolv: Add DNS packet parsing helpers geared towards wire format

The public parser functions around the ns_rr record type produce
textual domain names, but usually, this is not what we need while
parsing DNS packets within glibc. This commit adds two new helper
functions, __ns_rr_cursor_init and __ns_rr_cursor_next, for writing
packet parsers, and struct ns_rr_cursor, struct ns_rr_wire as
supporting types.

In theory, it is possible to avoid copying the owner name
into the rname field in __ns_rr_cursor_next, but this would need
more functions that work on compressed names.

Eventually, __res_context_send could be enhanced to preserve the
result of the packet parsing that is necessary for matching the
incoming UDP packets, so that this works does not have to be done
twice.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 857c890d9b42c50c8a94b76d47d4a61ab6d2f49c)

resolv: Add internal __ns_name_length_uncompressed function

This function is useful for checking that the question name is
uncompressed (as it should be).

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 78b1a4f0e49064e5dfb686c7cd87bd4df2640b29)

resolv: Add the __ns_samebinaryname function

During packet parsing, only the binary name is available. If the name
equality check is performed before conversion to text, we can sometimes
skip the last step.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 394085a34d25a51513019a4dc411acd3527fbd33)

resolv: Add internal __res_binary_hnok function

During package parsing, only the binary representation is available,
and it is convenient to check that directly for conformance with host
name requirements.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit c79327bf00a4be6d60259227acc78ef80ead3622)

resolv: Add tst-resolv-aliases

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 87aa98aa80627553a66bdcad2701fd6307723645)

resolv: Add tst-resolv-byaddr for testing reverse lookup

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 0b99828d54e5d1fc8f5ad3edf5ba262ad2e9c5b0)

nscd: Fix netlink cache invalidation if epoll is used [BZ #29415]

Processes cache network interface information such as whether IPv4 or IPv6
are enabled. This is only checked again if the "netlink timestamp" provided
by nscd changed, which is triggered by netlink socket activity.

However, in the epoll handler for the netlink socket, it was missed to
assign the new timestamp to the nscd database. The handler for plain poll
did that properly, copy that over.

This bug caused that e.g. processes which started before network
configuration got unusuable addresses from getaddrinfo, like IPv6 only even
though only IPv4 is available:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1041

It's a bit hard to reproduce, so I verified this by checking the timestamp
on calls to __check_pf manually. Without this patch it's stuck at 1, now
it's increasing on network changes as expected.

Signed-off-by: Fabian Vogt <fvogt@suse.de>
(cherry picked from commit 02ca25fef2785974011e9c5beecc99b900b69fd7)

Add NEWS entry for CVE-2022-39046

(cherry picked from commit 76fe56020e7ef354685b2284580ac1630c078a2b)

syslog: Remove extra whitespace between timestamp and message (BZ#29544)

The rfc3164 clear states that a single space character must follow
the timestamp field.

Checked on x86_64-linux-gnu.

elf: Restore how vDSO dependency is printed with LD_TRACE_LOADED_OBJECTS (BZ #29539)

The d7703d3176d225d5743b21811d888619eba39e82 changed how vDSO like
dependencies are printed, instead of just the name and address it
follows other libraries mode and prints 'name => path'.

Unfortunately, this broke some ldd consumer that uses the output to
filter out the program's dependencies. For instance CMake
bundleutilities module [1], where GetPrequirite uses the regex to filter
out 'name => path' [2].

This patch restore the previous way to print just the name and the
mapping address.

Checked on x86_64-linux-gnu.

[1] https://github.com/Kitware/CMake/tree/master/Tests/BundleUtilities
[2] https://github.com/Kitware/CMake/blob/master/Modules/GetPrerequisites.cmake#L733

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 1e903124cec4492463d075c6c061a2a772db77bf)

Apply asm redirections in wchar.h before first use

Similar to d0fa09a770, but for wchar.h. Fixes [BZ #27087] by applying
all long double related asm redirections before using functions in
bits/wchar2.h.
Moves the function declarations from wcsmbs/bits/wchar2.h to a new file
wcsmbs/bits/wchar2-decl.h that will be included first in wcsmbs/wchar.h.

Tested with build-many-glibcs.py.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit c7509d49c4e8fa494120c5ead21338559dad16f5)

elf: Call __libc_early_init for reused namespaces (bug 29528)

libc_map is never reset to NULL, neither during dlclose nor on a
dlopen call which reuses the namespace structure. As a result, if a
namespace is reused, its libc is not initialized properly. The most
visible result is a crash in the <ctype.h> functions.

To prevent similar bugs on namespace reuse from surfacing,
unconditionally initialize the chosen namespace to zero using memset.

(cherry picked from commit d0e357ff45a75553dee3b17ed7d303bfa544f6fe)

syslog: Fix large messages (BZ#29536)

The a583b6add407c17cd change did not handle large messages that
would require a heap allocation correctly, where the message itself
is not take in consideration.

This patch fixes it and extend the tst-syslog to check for large
messages as well.

Checked on x86_64-linux-gnu.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 52a5be0df411ef3ff45c10c7c308cb92993d15b1)

Linux: Fix enum fsconfig_command detection in <sys/mount.h>

The #ifdef FSOPEN_CLOEXEC check did not work because the macro
was always defined in this header prior to the check, so that
the <linux/mount.h> contents did not matter.

Fixes commit 774058d72942249f71d74e7f2b639f77184160a6
("linux: Fix sys/mount.h usage with kernel headers").

(cherry picked from commit 2955ef4b7c9b56fcd7abfeddef7ee83c60abff98)

linux: Fix sys/mount.h usage with kernel headers

Now that kernel exports linux/mount.h and includes it on linux/fs.h,
its definitions might clash with glibc exports sys/mount.h.  To avoid
the need to rearrange the Linux header to be always after glibc one,
the glibc sys/mount.h is changed to:

  1. Undefine the macros also used as enum constants.  This covers prior
     inclusion of <linux/mount.h> (for instance MS_RDONLY).

  2. Include <linux/mount.h> based on the usual __has_include check
     (needs to use __has_include ("linux/mount.h") to paper over GCC
     bugs.

  3. Define enum fsconfig_command only if FSOPEN_CLOEXEC is not defined.
     (FSOPEN_CLOEXEC should be a very close proxy.)

  4. Define struct mount_attr if MOUNT_ATTR_SIZE_VER0 is not defined.
     (Added in the same commit on the Linux side.)

This patch also adds some tests to check if including linux/fs.h and
linux/mount.h after and before sys/mount.h does work.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 774058d72942249f71d74e7f2b639f77184160a6)

linux: Use compile_c_snippet to check linux/mount.h availability

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit e1226cdc6b209539a92d32d5b620ba53fd35abf3)

linux: Mimic kernel defition for BLOCK_SIZE

To avoid possible warnings if the kernel header is included before
sys/mount.h.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit c68b6044bc7945716431f1adc091b17c39b80a06)

linux: Use compile_c_snippet to check linux/pidfd.h availability

Instead of tying to a specific kernel version.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 1542019b69b7ec7b2cd34357af035e406d153631)

glibcextract.py: Add compile_c_snippet

It might be used on tests to check if a snippet build with the provided
compiler and flags.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 841afa116e32b3c7195475769c26bf46fd870d32)

NEWS: Add entry for bug 28846

socket: Check lengths before advancing pointer in CMSG_NXTHDR

The inline and library functions that the CMSG_NXTHDR macro may expand
to increment the pointer to the header before checking the stride of
the increment against available space. Since C only allows incrementing
pointers to one past the end of an array, the increment must be done
after a length check. This commit fixes that and includes a regression
test for CMSG_FIRSTHDR and CMSG_NXTHDR.

The Linux, Hurd, and generic headers are all changed.

Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x.

[BZ #28846]

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 9c443ac4559a47ed99859bd80d14dc4b6dd220a1)

alpha: Fix generic brk system call emulation in __brk_call (bug 29490)

The kernel special-cases the zero argument for alpha brk, and we can
use that to restore the generic Linux error handling behavior.

Fixes commit b57ab258c1140bc45464b4b9908713e3e0ee35aa ("Linux:
Introduce __brk_call for invoking the brk system call").

(cherry picked from commit e7ad26ee3cb74e61d0637c888f24dd478d77af58)

Linux: Terminate subprocess on late failure in tst-pidfd (bug 29485)

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f82e05ebb295cadd35f7372f652c72264da810ad)

elf: Replace `strcpy` call with `memcpy` [BZ #29454]

GCC normally does this optimization for us in
strlen_pass::handle_builtin_strcpy but only for optimized
build. To avoid needing to include strcpy.S in the rtld build to
support the debug build, just do the optimization by hand.

(cherry picked from commit 483cfe1a6a33d6335b1901581b41040d2d412511)

Update syscall lists for Linux 5.19

Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi
headers for RISC-V. Update the version number in syscall-names.list
to reflect that it is still current for 5.19 and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.

(cherry picked from commit fccadcdf5bed7ee67a6cef4714e0b477d6c8472c)

dlfcn: Pass caller pointer to static dlopen implementation (bug 29446)

Fixes commit 0c1c3a771eceec46e66ce1183cf988e2303bd373 ("dlfcn: Move
dlopen into libc").

(cherry picked from commit ed0185e4129130cbe081c221efb758fb400623ce)

wcsmbs: Add missing test-c8rtomb/test-mbrtoc8 dependency

Make test-c8rtomb.out and test-mbrtoc8.out depend on $(gen-locales) for

xsetlocale (LC_ALL, "de_DE.UTF-8");
xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");

Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit e03f5ccd6cc8f829416156eac75acee501626c1f)

stdlib: Suppress gcc diagnostic that char8_t is a keyword in C++20 in uchar.h.

gcc 13 issues the following diagnostic for the uchar.h header when the
-Wc++20-compat option is enabled in C++ modes that do not enable char8_t
as a builtin type (C++17 and earlier by default; subject to _GNU_SOURCE
and the gcc -f[no-]char8_t option).
  warning: identifier ‘char8_t’ is a keyword in C++20 [-Wc++20-compat]
This change modifies the uchar.h header to suppress the diagnostic through
the use of '#pragma GCC diagnostic' directives for gcc 10 and later (the
-Wc++20-compat option was added in gcc version 10).  Unfortunately, a bug
in gcc currently prevents those directives from having the intended effect
as reported at https://gcc.gnu.org/PR106423.  A patch for that issue has
been submitted and is available in the email thread archive linked below.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598736.html

(cherry picked from commit 825f84f133bd840347dc49229b6d831f07d04775)

Create ChangeLog.old/ChangeLog.25.

Prepare for glibc 2.36 release.

Update version.h, and include/features.h.

Update install.texi, and regenerate INSTALL.

Update NEWS bug list.

Update libc.pot for 2.36 release.

tst-pidfd.c: UNSUPPORTED if we get EPERM on valid pidfd_getfd call

pidfd_getfd can fail for a valid pidfd with errno EPERM for various
reasons in a restricted environment. Use FAIL_UNSUPPORTED in that case.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

stdlib: Tuned down tst-arc4random-thread internal parameters

With new arc4random implementation, the internal parameters might
require a lot of runtime and/or trigger some contention on older
kernels (which might trigger spurious timeout failures).

Also, since we are now testing getrandom entropy instead of an
userspace RNG, there is no much need to extensive testing.

With this change the tst-arc4random-thread goes from about 1m to
5s on a Ryzen 9 with 5.15.0-41-generic.

Checked on x86_64-linux-gnu.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

LoongArch: Add greg_t and gregset_t.

LoongArch: Fix VDSO_HASH and VDSO_NAME.

riscv: Update rv64 libm test ulps

Generated on a Microsemi Polarfire Icicle Kit running Linux version
5.15.32. Same ULPs were also produced on QEMU 5.2.0 running Linux
5.18.0.

riscv: Update nofpu libm test ulps

arc4random: simplify design for better safety

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

LoongArch: Update NEWS and README for the LoongArch port.

LoongArch: Update build-many-glibcs.py for the LoongArch Port.

LoongArch: Hard Float Support

LoongArch: Build Infrastructure

LoongArch: Add ABI Lists

LoongArch: Linux ABI

LoongArch: Linux Syscall Interface

LoongArch: Atomic and Locking Routines

LoongArch: Generic <math.h> and soft-fp Routines

LoongArch: Thread-Local Storage Support

LoongArch: ABI Implementation

LoongArch: Add relocations and ELF flags to elf.h and scripts/glibcelf.py

LoongArch: Add LoongArch entries to config.h.in

struct stat is not posix conformant on microblaze with __USE_FILE_OFFSET64

Commit a06b40cdf5ba0d2ab4f9b4c77d21e45ff284fac7 updated stat.h to use
__USE_XOPEN2K8 instead of __USE_MISC to add the st_atim, st_mtim and
st_ctim members to struct stat. However, for microblaze, there are two
definitions of struct stat, depending on the __USE_FILE_OFFSET64 macro.
The second one was not updated.

Change __USE_MISC to __USE_XOPEN2K8 in the __USE_FILE_OFFSET64 version
of struct stat for microblaze.

Linux: dirent/tst-readdir64-compat needs to use TEST_COMPAT (bug 27654)

The hppa port starts libc at GLIBC_2.2, but has earlier symbol
versions in other shared objects. This means that the compat
symbol for readdir64 is not actually present in libc even though
have-GLIBC_2.1.3 is defined as yes at the make level.

Fixes commit 15e50e6c966fa0f26612602a95f0129543d9f9d5 ("Linux:
dirent/tst-readdir64-compat can be a regular test") by mostly
reverting it.

manual: Add documentation for arc4random functions

s390x: Add optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-s390x.S.  The final state register clearing is
omitted.

On a z15 it shows the following improvements (using formatted
bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               198.92
arc4random_buf(16) [single-thread]       244.49
arc4random_buf(32) [single-thread]       282.73
arc4random_buf(48) [single-thread]       286.64
arc4random_buf(64) [single-thread]       320.06
arc4random_buf(80) [single-thread]       297.43
arc4random_buf(96) [single-thread]       310.96
arc4random_buf(112) [single-thread]      308.10
arc4random_buf(128) [single-thread]      309.90
-----------------------------------------------

VX.                                        MB/s
-----------------------------------------------
arc4random [single-thread]               430.26
arc4random_buf(16) [single-thread]       735.14
arc4random_buf(32) [single-thread]      1029.99
arc4random_buf(48) [single-thread]      1206.76
arc4random_buf(64) [single-thread]      1311.92
arc4random_buf(80) [single-thread]      1378.74
arc4random_buf(96) [single-thread]      1445.06
arc4random_buf(112) [single-thread]     1484.32
arc4random_buf(128) [single-thread]     1517.30
-----------------------------------------------

Checked on s390x-linux-gnu.

powerpc64: Add optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-ppc.c.  It targets POWER8 and it is used on default
for LE.

On a POWER8 it shows the following improvements (using formatted
bench-arc4random data):

POWER8

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               138.77
arc4random_buf(16) [single-thread]       174.36
arc4random_buf(32) [single-thread]       228.11
arc4random_buf(48) [single-thread]       252.31
arc4random_buf(64) [single-thread]       270.11
arc4random_buf(80) [single-thread]       278.97
arc4random_buf(96) [single-thread]       287.78
arc4random_buf(112) [single-thread]      291.92
arc4random_buf(128) [single-thread]      295.25

POWER8                                     MB/s
-----------------------------------------------
arc4random [single-thread]               198.06
arc4random_buf(16) [single-thread]       278.79
arc4random_buf(32) [single-thread]       448.89
arc4random_buf(48) [single-thread]       551.09
arc4random_buf(64) [single-thread]       646.12
arc4random_buf(80) [single-thread]       698.04
arc4random_buf(96) [single-thread]       756.06
arc4random_buf(112) [single-thread]      784.12
arc4random_buf(128) [single-thread]      808.04
-----------------------------------------------

Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>

x86: Add AVX2 optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S.  It is used only if AVX2 is supported
and enabled by the architecture.

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

AVX2                                       MB/s
-----------------------------------------------
arc4random [single-thread]               922.61
arc4random_buf(16) [single-thread]      1478.70
arc4random_buf(32) [single-thread]      2241.80
arc4random_buf(48) [single-thread]      2681.28
arc4random_buf(64) [single-thread]      2913.43
arc4random_buf(80) [single-thread]      3009.73
arc4random_buf(96) [single-thread]      3141.16
arc4random_buf(112) [single-thread]     3254.46
arc4random_buf(128) [single-thread]     3305.02
-----------------------------------------------

Checked on x86_64-linux-gnu.

x86: Add SSE2 optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               443.11
arc4random_buf(16) [single-thread]       552.27
arc4random_buf(32) [single-thread]       626.86
arc4random_buf(48) [single-thread]       649.81
arc4random_buf(64) [single-thread]       663.95
arc4random_buf(80) [single-thread]       674.78
arc4random_buf(96) [single-thread]       675.17
arc4random_buf(112) [single-thread]      680.69
arc4random_buf(128) [single-thread]      683.20
-----------------------------------------------

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

Checked on x86_64-linux-gnu.

aarch64: Add optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S.  It is used as default and only
little-endian is supported (BE uses generic code).

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               380.89
arc4random_buf(16) [single-thread]       500.73
arc4random_buf(32) [single-thread]       552.61
arc4random_buf(48) [single-thread]       566.82
arc4random_buf(64) [single-thread]       574.01
arc4random_buf(80) [single-thread]       581.02
arc4random_buf(96) [single-thread]       591.19
arc4random_buf(112) [single-thread]      592.29
arc4random_buf(128) [single-thread]      596.43
-----------------------------------------------

OPTIMIZED                                  MB/s
-----------------------------------------------
arc4random [single-thread]               569.60
arc4random_buf(16) [single-thread]       825.78
arc4random_buf(32) [single-thread]       987.03
arc4random_buf(48) [single-thread]      1042.39
arc4random_buf(64) [single-thread]      1075.50
arc4random_buf(80) [single-thread]      1094.68
arc4random_buf(96) [single-thread]      1130.16
arc4random_buf(112) [single-thread]     1129.58
arc4random_buf(128) [single-thread]     1137.91
-----------------------------------------------

Checked on aarch64-linux-gnu.

benchtests: Add arc4random benchtest

It shows both throughput (total bytes obtained in the test duration)
and latecy for both arc4random and arc4random_buf with different
sizes.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

stdlib: Add arc4random tests

The basic tst-arc4random-chacha20.c checks if the output of ChaCha20
implementation matches the reference test vectors from RFC8439.

The tst-arc4random-fork.c check if subprocesses generate distinct
streams of randomness (if fork handling is done correctly).

The tst-arc4random-stats.c is a statistical test to the randomness of
arc4random, arc4random_buf, and arc4random_uniform.

The tst-arc4random-thread.c check if threads generate distinct streams
of randomness (if function are thread-safe).

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Checked on x86_64-linux-gnu and aarch64-linux-gnu.

stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)

The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.

To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros.  This strategy is similar to what OpenBSD arc4random
does.

The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.

The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit.  It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte.  Depending of the upper bound requested,
it might lead to better CPU utilization.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf

locale: Optimize tst-localedef-path-norm

The locale generation are issues in parallel to try speed locale
generation. The maximum number of jobs are limited to the online
CPU (in hope to not overcommit on environments with lower cores
than tests).

On a Ryzen 9, the test execution improves from ~6.7s to ~1.4s.

Tested-by: Mark Wielaard <mark@klomp.org>

malloc: Simplify implementation of __malloc_assert

It is prudent not to run too much code after detecting heap
corruption, and __fxprintf is really complex. The line number
and file name do not carry much information, so it is not included
in the error message. (__libc_message only supports %s formatting.)
The function name and assertion should provide some context.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

Update scripts/config.* files from upstream GNU config version

This patch updates various miscellaneous files from their upstream
sources (version 2022-05-25).

It is required for loongarch support.

Checked on aarch64-linux-gnu.

linux: return UNSUPPORTED from tst-mount if entering mount namespace fails

Before this the test fails if run in a chroot by a non-root user:

warning: could not become root outside namespace (Operation not permitted)
../sysdeps/unix/sysv/linux/tst-mount.c:36: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 19 (0x13); from: ENODEV
error: ../sysdeps/unix/sysv/linux/tst-mount.c:39: not true: fd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:46: not true: r != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:48: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:52: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 9 (0x9); from: EBADF
error: ../sysdeps/unix/sysv/linux/tst-mount.c:55: not true: mfd != -1
../sysdeps/unix/sysv/linux/tst-mount.c:58: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:61: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:65: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:68: not true: pfd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:75: not true: fd_tree != -1
../sysdeps/unix/sysv/linux/tst-mount.c:88: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 38 (0x26); from: ENOSYS
error: 12 test failures

Checking that the test can enter a new mount namespace is more correct
than just checking the return value of support_become_root() as the test
code changes the mount namespace it runs in so running it as root on a
system that does not support mount namespaces should still skip.

Also change the test to remove the unnecessary fork.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level

1. Add default ISA level selection in non-multiarch/rtld
   implementations.

2. Add ISA level build guards to different implementations.
    - I.e strcpy-avx2.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to
      include it as we will always use one of the ISA level 4
      implementations (strcpy-evex.S).

3. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.

x86: Add support to build wcscpy with explicit ISA level

1. Add ISA level build guards to different implementations.
    - wcscpy-ssse3.S is used as ISA level 2/3/4.
    - wcscpy-generic.c is only used at ISA level 1 and will
      only build if compiled with ISA level == 1. Otherwise
      there is no reason to include it as we will always use
      wcscpy-ssse3.S

2. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.

x86: Add support to build strcmp/strlen/strchr with explicit ISA level

1. Add default ISA level selection in non-multiarch/rtld
   implementations.

2. Add ISA level build guards to different implementations.
    - I.e strcmp-avx2.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to
      include it as we will always use one of the ISA level 4
      implementations (strcmp-evex.S).

3. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.

elf: Fix wrong fscanf usage on tst-pldd

The fix done b2cd93fce666fdc8c9a5c64af2741a8a6940ac99 does not really
work since macro strification does not expand the sizeof nor the
arithmetic operation.

Checked on x86_64-linux-gnu.

Apply asm redirections in stdio.h before first use [BZ #27087]

Compilers may not be able to apply asm redirections to functions after
these functions are used for the first time, e.g. clang 13.
Fix [BZ #27087] by applying all long double-related asm redirections
before using functions in bits/stdio.h.
However, as these asm redirections depend on the declarations provided
by libio/bits/stdio2.h, this header was split in 2:

- libio/bits/stdio2-decl.h contains all function declarations;
- libio/bits/stdio2.h remains with the remaining contents, including
redirections.

This also adds the access attribute to __vsnprintf_chk that was missing.

Tested with build-many-glibcs.py.

Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>

S390: Define SINGLE_THREAD_BY_GLOBAL only on s390x

Starting with commit e070501d12b47e88c1ff8c313f887976fb578938
"Replace __libc_multiple_threads with __libc_single_threaded"
the testcases nptl/tst-cancel-self and
nptl/tst-cancel-self-cancelstate are failing.

This is fixed by only defining SINGLE_THREAD_BY_GLOBAL on s390x,
but not on s390.

Starting with commit 09c76a74099826f4c6e1c4c431d7659f78112862
"Linux: Consolidate {RTLD_}SINGLE_THREAD_P definition",
SINGLE_THREAD_BY_GLOBAL was defined in
sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h.

Lateron the commit 9a973da617772eff1f351989f8995f4305a2e63c
"s390: Consolidate Linux syscall definition" consolidates the sysdep.h files
from s390-32/s390-64 subdirectories. Unfortunately the macro is now always
defined instead of only on s390-64.

As information:
TLS_MULTIPLE_THREADS_IN_TCB is also only defined for s390.
See: sysdeps/s390/nptl/tls.h

x86: Add missing rtm tests for strcmp family

Add new tests for:
    strcasecmp
    strncasecmp
    strcmp
    wcscmp

These functions all have avx2_rtm implementations so should be tested.

x86: Remove unneeded rtld-wmemcmp

wmemcmp isn't used by the dynamic loader so their no need to add an
RTLD stub for it.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.

x86: Move wcslen SSE2 implementation to multiarch/wcslen-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move wcschr SSE2 implementation to multiarch/wcschr-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move strcat SSE2 implementation to multiarch/strcat-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move strchr SSE2 implementation to multiarch/strchr-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move strrchr SSE2 implementation to multiarch/strrchr-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move memrchr SSE2 implementation to multiarch/memrchr-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move strcpy SSE2 implementation to multiarch/strcpy-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move strlen SSE2 implementation to multiarch/strlen-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move strcmp SSE42 implementation to multiarch/strcmp-sse4_2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.

x86: Move wcscmp SSE2 implementation to multiarch/wcscmp-sse2.S

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.