git.ipfire.org Git - thirdparty/glibc.git/log

aio: Fix freeing memory

The content of the pool array is initialized only until pool_size,
pointers between pool_size and pool_max_size were not initialized by the
realloc call in get_elem so they should not be freed.

This fixes aio tests crashing at their termination on GNU/Hurd.

(cherry picked from commit 0cee4aa92f5b9b213856c8ba1ab84c34d73c943b)

misc: Add support for Linux uio.h RWF_NOAPPEND flag

In Linux 6.9 a new flag is added to allow for Per-io operations to
disable append mode even if a file was opened with the flag O_APPEND.
This is done with the new RWF_NOAPPEND flag.

This caused two test failures as these tests expected the flag 0x00000020
to be unused.  Adding the flag definition now fixes these tests on Linux
6.9 (v6.9-rc1).

  FAIL: misc/tst-preadvwritev2
  FAIL: misc/tst-preadvwritev64v2

This patch adds the flag, adjusts the test and adds details to
documentation.

Link: https://lore.kernel.org/all/20200831153207.GO3265@brightrain.aerifal.cx/
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3db9d208dd5f30b12900989c6d2214782b8e2011)

Linux: Make __rseq_size useful for feature detection (bug 31965)

The __rseq_size value is now the active area of struct rseq
(so 20 initially), not the full struct size including padding
at the end (32 initially).

Update misc/tst-rseq to print some additional diagnostics.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
(cherry picked from commit 2e456ccf0c34a056e3ccafac4a0c7effef14d918)

elf: Make dl-rseq-symbols Linux only

And avoid a Hurd build failures.

Checked on x86_64-linux-gnu.

(cherry picked from commit 9fc639f654dc004736836613be703e6bed0c36a8)

nptl: fix potential merge of __rseq_* relro symbols

While working on a patch to add support for the extensible rseq ABI, we
came across an issue where a new 'const' variable would be merged with
the existing '__rseq_size' variable. We tracked this to the use of
'-fmerge-all-constants' which allows the compiler to merge identical
constant variables. This means that all 'const' variables in a compile
unit that are of the same size and are initialized to the same value can
be merged.

In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t'
are both 4 bytes and initialized to 0 which should trigger the merge.
However for reasons we haven't delved into when the attribute 'section
(".data.rel.ro")' is added to the mix, only variables of the same exact
types are merged. As far as we know this behavior is not specified
anywhere and could change with a new compiler version, hence this patch.

Move the definitions of these variables into an assembler file and add
hidden writable aliases for internal use. This has the added bonus of
removing the asm workaround to set the values on rseq registration.

Tested on Debian 12 with GCC 12.2.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 2b92982e2369d292560793bee8e730f695f48ff3)

Add AT_RSEQ_* from Linux 6.3 to elf.h

Linux 6.3 adds constants AT_RSEQ_FEATURE_SIZE and AT_RSEQ_ALIGN; add
them to glibc's elf.h. (Recall that, although elf.h is a
system-independent header, so far we've put AT_* constants there even
if Linux-specific, as discussed in bug 15794. So rather than making
any attempt to fix that issue, the new constants are just added there
alongside the existing ones.)

Tested for x86_64.

(cherry picked from commit 8754a4133e154ca853e6765a3fe5c7a904c77626)

s390x: Fix segfault in wcsncmp [BZ #31934]

The z13/vector-optimized wcsncmp implementation segfaults if n=1
and there is only one character (equal on both strings) before
the page end. Then it loads and compares one character and misses
to check n again. The following load fails.

This patch removes the extra load and compare of the first character
and just start with the loop which uses vector-load-to-block-boundary.
This code-path also checks n.

With this patch both tests are passing:
- the simplified one mentioned in the bugzilla 31934
- the full one in Florian Weimer's patch:
"manual: Document a GNU extension for strncmp/wcsncmp"
(https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/):
On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9b7651410375ec8848a1944992d663d514db4ba7)

Force DT_RPATH for --enable-hardcoded-path-in-tests

On Fedora 40/x86-64, linker enables --enable-new-dtags by default which
generates DT_RUNPATH instead of DT_RPATH.  Unlike DT_RPATH, DT_RUNPATH
only applies to DT_NEEDED entries in the executable and doesn't applies
to DT_NEEDED entries in shared libraries which are loaded via DT_NEEDED
entries in the executable.  Some glibc tests have libstdc++.so.6 in
DT_NEEDED, which has libm.so.6 in DT_NEEDED.  When DT_RUNPATH is generated,
/lib64/libm.so.6 is loaded for such tests.  If the newly built glibc is
older than glibc 2.36, these tests fail with

assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_2.36' not found (required by /lib64/libm.so.6)
assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by /lib64/libm.so.6)

Pass -Wl,--disable-new-dtags to linker when building glibc tests with
--enable-hardcoded-path-in-tests.  This fixes BZ #31719.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 2dcaf70643710e22f92a351e36e3cff8b48c60dc)

elf: Disable some subtests of ifuncmain1, ifuncmain5 for !PIE

(cherry picked from commit 9cc9d61ee12f2f8620d8e0ea3c42af02bf07fe1e)

nscd: Use time_t for return type of addgetnetgrentX

Using int may give false results for future dates (timeouts after the
year 2028).

Fixes commit 04a21e050d64a1193a6daab872bca2528bda44b ("CVE-2024-33601,
CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX
(bug 31680)").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 4bbca1a44691a6e9adcee5c6798a707b626bc331)

login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)

These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9abdae94c7454c45e02e97e4ed1eb1b1915d13d8)

login: Check default sizes of structs utmp, utmpx, lastlog

The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d4da5aab936504b2d3eca3146e109630d9093c4)

sparc: Remove 64 bit check on sparc32 wordsize (BZ 27574)

The sparc32 is always 32 bits.

Checked on sparcv9-linux-gnu.

(cherry picked from commit dd57f5e7b652772499cb220d78157c1038d24f06)

elf: Also compile dl-misc.os with $(rtld-early-cflags)

Also compile dl-misc.os with $(rtld-early-cflags) to avoid

Program received signal SIGILL, Illegal instruction.
0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2",
    endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156
156   bool positive = true;
(gdb) bt
#0  0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2",
    endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156
#1  0x00007ffff7fdb1a9 in tunable_initialize (
    cur=cur@entry=0x7ffff7ffbc00 <tunable_list+2176>,
    strval=strval@entry=0x7fffffffe2c9 "2", len=len@entry=1)
    at dl-tunables.c:131
#2  0x00007ffff7fdb3a2 in parse_tunables (valstring=<optimized out>)
    at dl-tunables.c:258
#3  0x00007ffff7fdb5d9 in __GI___tunables_init (envp=0x7fffffffdd58)
    at dl-tunables.c:288
#4  0x00007ffff7fe44c3 in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0x7fffffffdcb0,
    dl_main=dl_main@entry=0x7ffff7fe5f80 <dl_main>)
    at ../sysdeps/unix/sysv/linux/dl-sysdep.c:110
#5  0x00007ffff7fe5cae in _dl_start_final (arg=0x7fffffffdcb0) at rtld.c:494
#6  _dl_start (arg=0x7fffffffdcb0) at rtld.c:581
#7  0x00007ffff7fe4b38 in _start ()
(gdb)

when setting GLIBC_TUNABLES in glibc compiled with APX.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 049b7684c912dd32b67b1b15b0f43bf07d5f512e)

CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)

This avoids potential memory corruption when the underlying NSS
callback function does not use the buffer space to store all strings
(e.g., for constant strings).

Instead of custom buffer management, two scratch buffers are used.
This increases stack usage somewhat.

Scratch buffer allocation failure is handled by return -1
(an invalid timeout value) instead of terminating the process.
This fixes bug 31679.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit c04a21e050d64a1193a6daab872bca2528bda44b)

CVE-2024-33600: nscd: Avoid null pointer crashes after notfound response (bug 31678)

The addgetnetgrentX call in addinnetgrX may have failed to produce
a result, so the result variable in addinnetgrX can be NULL.
Use db->negtimeout as the fallback value if there is no result data;
the timeout is also overwritten below.

Also avoid sending a second not-found response. (The client
disconnects after receiving the first response, so the data stream did
not go out of sync even without this fix.) It is still beneficial to
add the negative response to the mapping, so that the client can get
it from there in the future, instead of going through the socket.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit b048a482f088e53144d26a61c390bed0210f49f2)

CVE-2024-33600: nscd: Do not send missing not-found response in addgetnetgrentX (bug 31678)

If we failed to add a not-found response to the cache, the dataset
point can be null, resulting in a null pointer dereference.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 7835b00dbce53c3c87bbbb1754a95fb5e58187aa)

CVE-2024-33599: nscd: Stack-based buffer overflow in netgroup cache (bug 31677)

Using alloca matches what other caches do. The request length is
bounded by MAXKEYLEN.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 87801a8fd06db1d654eea3e4f7626ff476a9bdaa)

iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence (CVE-2024-2961)

ISO-2022-CN-EXT uses escape sequences to indicate character set changes
(as specified by RFC 1922). While the SOdesignation has the expected
bounds checks, neither SS2designation nor SS3designation have its;
allowing a write overflow of 1, 2, or 3 bytes with fixed values:
'$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.

Checked on aarch64-linux-gnu.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f9dc609e06b1136bb0408be9605ce7973a767ada)

powerpc: Fix ld.so address determination for PCREL mode (bug 31640)

This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
(cherry picked from commit 14e56bd4ce15ac2d1cc43f762eb2e6b83fec1afe)

AArch64: Check kernel version for SVE ifuncs

Old Linux kernels disable SVE after every system call.  Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy.  This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched.  Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.

Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions.  If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.

Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2e94e2f5d2bf2de124c8ad7da85463355e54ccb2)

aarch64: fix check for SVE support in assembler

Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.

The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 73c26018ed0ecd9c807bb363cc2c2ab4aca66a82)

aarch64: correct CFI in rawmemchr (bug 31113)

The .cfi_return_column directive changes the return column for the whole
FDE range. But the actual intent is to tell the unwinder that the value
in x30 (lr) now resides in x15 after the move, and that is expressed by
the .cfi_register directive.

(cherry picked from commit 3f798427884fa57770e8e2291cf58d5918254bb5)

AArch64: Remove Falkor memcpy

The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.

[1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 2f5524cc5381eb75fef55f7901bb907bd5628333)

AArch64: Add memset_zva64

Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3d7090f14b13312320e425b27dcf0fe72de026fd)

AArch64: Cleanup emag memset

Cleanup emag memset - merge the memset_base64.S file, remove
the unused ZVA code (since it is disabled on emag).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9627ab99b50d250c6dd3001a3355aa03692f7fe5)

AArch64: Cleanup ifuncs

Cleanup ifuncs.  Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation.  Rename
strlen_mte to strlen_generic.  Remove rtld-memset.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 9fd3409842b3e2d31cff5dbd6f96066c430f0aa2)

AArch64: Add support for MOPS memcpy/memmove/memset

Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2bd00179885928fd95fcabfafc50e7b5c6e660d2)

Add HWCAP2_MOPS from Linux 6.5 to AArch64 bits/hwcap.h

Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS. Add it to
glibc's bits/hwcap.h.

Tested with build-many-glibcs.py for aarch64-linux-gnu.

(cherry picked from commit ff5d2abd18629e0efac41e31699cdff3be0e08fa)

AArch64: Improve SVE memcpy and memmove

Improve SVE memcpy by copying 2 vectors if the size is small enough.
This improves performance of random memcpy by ~9% on Neoverse V1, and
33-64 byte copies are ~16% faster.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit d2d3f3720ce627a4fe154d8dd14db716a32bcc6e)

AArch64: Improve strrchr

Use shrn for narrowing the mask which simplifies code and speeds up small
strings. Unroll the first search loop to improve performance on large
strings.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 55599d480437dcf129b41b95be32b48f2a9e5da9)

AArch64: Optimize strnlen

Optimize strnlen using the shrn instruction and improve the main loop.
Small strings are around 10% faster, large strings are 40% faster on
modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit ad098893ba3c3344a5f2f6ab1627c47204afdb47)

AArch64: Optimize strlen

Optimize strlen by unrolling the main loop. Large strings are 64% faster on
modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 03c8ce5000198947a4dd7b2c14e5131738fda62b)

AArch64: Optimize strcpy

Unroll the main loop. Large strings are around 20% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 349e48c01e85bd96006860084e76d322e6ca02f1)

AArch64: Improve strchrnul

Unroll the main loop, which improves performance slightly.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 09ebd8549b2ce5a3a6c0c7c5f3e62227faf50a99)

AArch64: Optimize strchr

Simplify calculation of the mask using shrn. Unroll the main loop.
Small strings are 20% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 51541a229740801882490177fa178e49264b13fb)

AArch64: Improve strlen_asimd

Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment.
Performance improves slightly as a result.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 1bbb1a2022e126f21810d3d0ebe0a975d5243e43)

AArch64: Optimize memrchr

Optimize the main loop - large strings are 43% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 00776241776e67fc666b896c1e85770f4f3ec1e1)

AArch64: Optimize memchr

Optimize the main loop - large strings are 40% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit ce758d4f063820c2bc743e12797d7454c66be718)

aarch64: Use memcpy_simd as the default memcpy

Since __memcpy_simd is the fastest memcpy on almost all cores, replace
the generic memcpy with it. If SVE is available, a SVE memcpy will be
used by default (including for Neoverse N2).

(cherry picked from commit e6f3fe362f1aab78b1448d69ecdbd9e3872636d3)

aarch64: Cleanup memset ifunc

Cleanup memset ifunc selectors. The A64FX memset relies on a ZVA size of
256, so add an explicit check.

(cherry picked from commit a8e72913fea0c6e2832c50523c60907ffa3b753b)

AArch64: Fix typo in sve configure check (BZ# 29394)

Fix a typo in the SVE configure check. This fixes [BZ# 29394].

(cherry picked from commit 12182ba18dabda791a4f63a11ee2e9d828f40f9b)

linux: Use rseq area unconditionally in sched_getcpu (bug 31479)

Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e6b4d07303af25709051c3e96288f2d ("nptl:
Unconditionally use a 32-byte rseq area").  After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition.  This commit always checks
the rseq area for CPU number information before using the other
approaches.

This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call.  Most architectures that have vDSO
acceleration for getcpu also have rseq support.

Fixes: 2c6b4b272e6b4d07303af25709051c3e96288f2d
Fixes: 1d350aa06091211863e41169729cee1bca39f72f
Reviewed-by: Arjun Shankar <arjun@redhat.com>
(cherry picked from commit 7a76f218677d149d8b7875b336722108239f7ee9)
Fixes: 54e812100df2a6f1d75fbef4e3b45c076599842f

Include sys/rseq.h in tst-rseq-disable.c

Starting with commit 2c6b4b272e6b4d07303af25709051c3e96288f2d
"nptl: Unconditionally use a 32-byte rseq area", the testcase
misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined.

The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h.
Thus just include sys/rseq.h in the tst-rseq-disable.c as also done
in tst-rseq.c and tst-rseq-nptl.c.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 637aac2ae3980de31a6baab236a9255fe853cc76)

nptl: Unconditionally use a 32-byte rseq area

If the kernel headers provide a larger struct rseq, we used that
size as the argument to the rseq system call. As a result,
rseq registration would fail on older kernels which only accept
size 32.

(cherry picked from commit 2c6b4b272e6b4d07303af25709051c3e96288f2d)

make ‘struct pthread’ a complete type

* nptl/descr.h (struct pthread): Remove end_padding member, which
made this type incomplete.
(PTHREAD_STRUCT_END_PADDING): Stop using end_padding.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 3edc4ff2ceff4a59587ebecb94148d3bcfa1df62)

support: use 64-bit time_t (bug 30111)

Ensure to use 64-bit time_t in the test infrastructure.

(cherry picked from commit 3bfdc4e2bceb601b90c81a9baa73c1904db58b2f)

malloc: Use __get_nprocs on arena_get2 (BZ 30945)

This restore the 2.33 semantic for arena_get2.  It was changed by
11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2acafc).  The
__get_nproc was refactored over then and now it also avoid to call
malloc.

The 11a02b035b46 did not take in consideration any performance
implication, which should have been discussed properly.  The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.

Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 472894d2cfee5751b44c0aaa71ed87df81c8e62e)

arm: Remove wrong ldr from _dl_start_user (BZ 31339)

The commit 49d877a80b29d3002887b084eec6676d9f5fec18 (arm: Remove
_dl_skip_args usage) removed the _SKIP_ARGS literal, which was
previously loader to r4 on loader _start. However, the cleanup did not
remove the following 'ldr r4, [sl, r4]' on _dl_start_user, used to check
to skip the arguments after ld self-relocations.

In my testing, the kernel initially set r4 to 0, which makes the
ldr instruction just read the _GLOBAL_OFFSET_TABLE_. However, since r4
is a callee-saved register; a different runtime might not zero
initialize it and thus trigger an invalid memory access.

Checked on arm-linux-gnu.

Reported-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 1e25112dc0cb2515d27d8d178b1ecce778a9d37a)

sparc: Remove unwind information from signal return stubs [BZ #31244]

The functions were previously written in C, but were not compiled
with unwind information. The ENTRY/END macros includes .cfi_startproc
and .cfi_endproc which adds unwind information. This caused the
tests cleanup-8 and cleanup-10 in the GCC testsuite to fail.
This patch adds a version of the ENTRY/END macros without the
CFI instructions that can be used instead.

sigaction registers a restorer address that is located two instructions
before the stub function. This patch adds a two instruction padding to
avoid that the unwinder accesses the unwind information from the function
that the linker has placed right before it in memory. This fixes an issue
with pthread_cancel that caused tst-mutex8-static (and other tests) to fail.

Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 7bd06985c0a143cdcba2762bfe020e53514a53de)

sparc: Fix sparc64 memmove length comparison (BZ 31266)

The small counts copy bytes comparsion should be unsigned (as the
memmove size argument). It fixes string/tst-memmove-overflow on
sparcv9, where the input size triggers an invalid code path.

Checked on sparc64-linux-gnu and sparcv9-linux-gnu.

(cherry picked from commit 926a4bdbb5fc8955570208b5571b2d04c6ffbd1d)

sparc64: Remove unwind information from signal return stubs [BZ#31244]

Similar to sparc32 fix, remove the unwind information on the signal
return stubs. This fixes the regressions:

FAIL: nptl/tst-cancel24-static
FAIL: nptl/tst-cond8-static
FAIL: nptl/tst-mutex8-static
FAIL: nptl/tst-mutexpi8-static
FAIL: nptl/tst-mutexpi9

On sparc64-linux-gnu.

(cherry picked from commit 369efd817780276dbe0ecf8be6e1f354bdbc9857)

sparc: Fix broken memset for sparc32 [BZ #31068]

Fixes commit a61933fe27df ("sparc: Remove bzero optimization") that
after moving code jumped to the wrong label 4.

Verfied by successfully running string/test-memset on sparc32.

Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 578190b7e43305141512dee777e4a3b3e8159393)

x86_64: Optimize ffsll function code size.

Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory.  Ffsll function code size is 17 bytes.  Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory.  When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line.  When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.

Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.

This patch fixes ffsll function random performance regression.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9d94997b5f9445afd4f2bccc5fa60ff7c4361ec1)

syslog: Fix integer overflow in __vsyslog_internal (CVE-2023-6780)

__vsyslog_internal calculated a buffer size by adding two integers, but
did not first check if the addition would overflow. This commit fixes
that.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit ddf542da94caf97ff43cc2875c88749880b7259b)

syslog: Fix heap buffer overflow in __vsyslog_internal (CVE-2023-6779)

__vsyslog_internal used the return value of snprintf/vsnprintf to
calculate buffer sizes for memory allocation. If these functions (for
any reason) failed and returned -1, the resulting buffer would be too
small to hold output. This commit fixes that.

All snprintf/vsnprintf calls are checked for negative return values and
the function silently returns upon encountering them.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 7e5a0c286da33159d47d0122007aac016f3e02cd)

syslog: Fix heap buffer overflow in __vsyslog_internal (CVE-2023-6246)

__vsyslog_internal did not handle a case where printing a SYSLOG_HEADER
containing a long program name failed to update the required buffer
size, leading to the allocation and overflow of a too-small buffer on
the heap. This commit fixes that. It also adds a new regression test
that uses glibc.malloc.check.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 6bd0e4efcc78f3c0115e5ea9739a1642807450da)

NEWS: Mention bug fixes for 29039/30745/30843

x86-64: Fix the tcb field load for x32 [BZ #31185]

_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:

_dl_tlsdesc_undefweak:
        _CET_ENDBR
        movq    8(%rax), %rax
        subq    %fs:0, %rax
        ret

_dl_tlsdesc_dynamic:
...
        subq    %fs:0, %rax
        movq    -8(%rsp), %rdi
        ret

Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead.  Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64.  This
fixes BZ #31185.

(cherry picked from commit 81be2a61dafc168327c1639e97b6dae128c7ccf3)

x86-64: Fix the dtv field load for x32 [BZ #31184]

On x32, I got

FAIL: elf/tst-tlsgap

$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so

Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108 movq (%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)

This is caused by

_dl_tlsdesc_dynamic:
        _CET_ENDBR
        /* Preserve call-clobbered registers that we modify.
           We need two scratch regs anyway.  */
        movq    %rsi, -16(%rsp)
        movq    %fs:DTV_OFFSET, %rsi

Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit.  Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.

(cherry picked from commit 3502440397bbb840e2f7223734aa5cc2cc0e29b6)

elf: Fix TLS modid reuse generation assignment (BZ 29039)

_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but
does *not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if
a slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.

Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.

Note that slotinfo entry (aka modid) reuse typically happens after a
dlclose and only TLS access via dynamic tlsdesc is affected. Because
tlsdesc is optimized to use the optional part of static TLS, dynamic
tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls
tunable to a large enough value, or by LD_PRELOAD-ing the affected
modules.

Fixes bug 29039.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)

Revert "elf: Move l_init_called_next to old place of l_text_end in link map"

This reverts commit f441cb9a70fa3f55e9bbd615924879d692d21a6c.

Reason for revert: Preserve internal ABI.

Revert "elf: Always call destructors in reverse constructor order (bug 30785)"

This reverts commit 5d83a52a4905405792418d6a0f3afa3601a98c37.

Reason for revert: Incompatibility with existing applications.

Revert "elf: Remove unused l_text_end field from struct link_map"

This reverts commit 9f0d3bb2e325dc30ae20e347cccbe10fa0b4ce9b.

Reason for revert: Preserve ABI after revert of commit 5d83a52a4.

tunables: Terminate if end of input is reached (CVE-2023-4911)

The string parsing routine may end up writing beyond bounds of tunestr
if the input tunable string is malformed, of the form name=name=val.
This gets processed twice, first as name=name=val and next as name=val,
resulting in tunestr being name=name=val:name=val, thus overflowing
tunestr.

Terminate the parsing loop at the first instance itself so that tunestr
does not overflow.

This also fixes up tst-env-setuid-tunables to actually handle failures
correct and add new tests to validate the fix for this CVE.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1056e5b4c3f2d90ed2b4a55f96add28da2f4c8fa)

Document CVE-2023-4806 and CVE-2023-5156 in NEWS

These are tracked in BZ #30884 and BZ #30843.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit fd134feba35fa839018965733b34d28a09a075dd)

Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843]

This patch fixes a very recently added leak in getaddrinfo.

This was assigned CVE-2023-5156.

Resolves: BZ #30884
Related: BZ #30842

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ec6b95c3303c700eb89eebeda2d7264cc184a796)

getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)

When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.

The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query.  For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup.  In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name.  This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.

Fix this by copying h_name over and freeing it at the end.  This
resolves BZ #30843, which is assigned CVE-2023-4806.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 973fe93a5675c42798b2161c6f29c01b0e243994)

io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64

Commit 5f828ff824e3b7cd1 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.

Fix that by also checking that define, restoring the pre
4d0fe291aed3a476a commit values:

Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7

With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14

At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.

Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.

Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit 434bf72a94de68f0cc7fbf3c44bf38c1911b70cb)

CVE-2023-4527: Stack read overflow with large TCP responses in no-aaaa mode

Without passing alt_dns_packet_buffer, __res_context_search can only
store 2048 bytes (what fits into dns_packet_buffer). However,
the function returns the total packet size, and the subsequent
DNS parsing code in _nss_dns_gethostbyname4_r reads beyond the end
of the stack-allocated buffer.

Fixes commit f282cdbe7f436c75864e5640a4 ("resolv: Implement no-aaaa
stub resolver option") and bug 30842.

(cherry picked from commit bd77dd7e73e3530203be1c52c8a29d08270cb25d)

elf: Move l_init_called_next to old place of l_text_end in link map

This preserves all member offsets and the GLIBC_PRIVATE ABI
for backporting.

elf: Remove unused l_text_end field from struct link_map

It is a left-over from commit 52a01100ad011293197637e42b5be1a479a2
("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]").

When backporting commmit 6985865bc3ad5b23147ee73466583dd7fdf65892
("elf: Always call destructors in reverse constructor order
(bug 30785)"), we can move the l_init_called_next field to this
place, so that the internal GLIBC_PRIVATE ABI does not change.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 53df2ce6885da3d0e89e87dca7b095622296014f)

elf: Always call destructors in reverse constructor order (bug 30785)

The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors.  Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles.  (The force_first handling in
_dl_sort_maps is not effective for dlclose.)  After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.

A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.

In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated.  (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.

After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order.  These destructors
can call dlopen.  Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them.  Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)

After _dl_init_called_list traversal, two more loops follow.  The
processing order changes to the original link map order in the
namespace.  Previously, dependency order was used.  The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.

The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list.  The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.

tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications.  The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.

The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.

There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.

Fixes commit 1df71d32fe5f5905ffd5d100e5e9ca8ad62 ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 6985865bc3ad5b23147ee73466583dd7fdf65892)

elf: Do not run constructors for proxy objects

Otherwise, the ld.so constructor runs for each audit namespace
and each dlmopen namespace.

(cherry picked from commit f6c8204fd7fabf0cf4162eaf10ccf23258e4d10e)

elf: Introduce to _dl_call_fini

This consolidates the destructor invocations from _dl_fini and
dlclose. Remove the micro-optimization that avoids
calling _dl_call_fini if they are no destructors (as dlclose is quite
expensive anyway). The debug log message is now printed
unconditionally.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745]

The:

```
    if (shared_per_thread > 0 && threads > 0)
      shared_per_thread /= threads;
```

Code was accidentally moved to inside the else scope.  This doesn't
match how it was previously (before af992e7abd).

This patch fixes that by putting the division after the `else` block.

(cherry picked from commit 084fb31bc2c5f95ae0b9e6df4d3cf0ff43471ede)

x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold.

On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8b9a0af8ca012217bf90d1dc0694f85b49ae09da)

x86: Fix slight bug in `shared_per_thread` cache size calculation.

After:
```
    commit af992e7abdc9049714da76cae1e5e18bc4838fb8
    Author: Noah Goldstein <goldstein.w.n@gmail.com>
    Date:   Wed Jun 7 13:18:01 2023 -0500

        x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
```

Split `shared` (cumulative cache size) from `shared_per_thread` (cache
size per socket), the `shared_per_thread` *can* be slightly off from
the previous calculation.

Previously we added `core` even if `threads_l2` was invalid, and only
used `threads_l2` to divide `core` if it was present. The changed
version only included `core` if `threads_l2` was valid.

This change restores the old behavior if `threads_l2` is invalid by
adding the entire value of `core`.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 47f747217811db35854ea06741be3685e8bbd44d)

x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`

Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 /
ncores_per_socket'. This patch updates that value to roughly
'sizeof_L3 / 4`

The original value (specifically dividing the `ncores_per_socket`) was
done to limit the amount of other threads' data a `memcpy`/`memset`
could evict.

Dividing by 'ncores_per_socket', however leads to exceedingly low
non-temporal thresholds and leads to using non-temporal stores in
cases where REP MOVSB is multiple times faster.

Furthermore, non-temporal stores are written directly to main memory
so using it at a size much smaller than L3 can place soon to be
accessed data much further away than it otherwise could be. As well,
modern machines are able to detect streaming patterns (especially if
REP MOVSB is used) and provide LRU hints to the memory subsystem. This
in affect caps the total amount of eviction at 1/cache_associativity,
far below meaningfully thrashing the entire cache.

As best I can tell, the benchmarks that lead this small threshold
where done comparing non-temporal stores versus standard cacheable
stores. A better comparison (linked below) is to be REP MOVSB which,
on the measure systems, is nearly 2x faster than non-temporal stores
at the low-end of the previous threshold, and within 10% for over
100MB copies (well past even the current threshold). In cases with a
low number of threads competing for bandwidth, REP MOVSB is ~2x faster
up to `sizeof_L3`.

The divisor of `4` is a somewhat arbitrary value. From benchmarks it
seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs
such as Broadwell prefer something closer to `8`. This patch is meant
to be followed up by another one to make the divisor cpu-specific, but
in the meantime (and for easier backporting), this patch settles on
`4` as a middle-ground.

Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable
stores where done using:
https://github.com/goldsteinn/memcpy-nt-benchmarks

Sheets results (also available in pdf on the github):
https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit af992e7abdc9049714da76cae1e5e18bc4838fb8)

elf: _dl_find_object may return 1 during early startup (bug 30515)

Success is reported with a 0 return value, and failure is -1.
Enhance the kitchen sink test elf/tst-audit28 to cover
_dl_find_object as well.

Fixes commit 5d28a8962dcb ("elf: Add _dl_find_object function")
and bug 30515.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1bcfe0f732066ae5336b252295591ebe7e51c301)

io: Fix F_GETLK, F_SETLK, and F_SETLKW for powerpc64

Different than other 64 bit architectures, powerpc64 defines the
LFS POSIX lock constants with values similar to 32 ABI, which
are meant to be used with fcntl64 syscall. Since powerpc64 kABI
does not have fcntl, the constants are adjusted with the
FCNTL_ADJUST_CMD macro.

The 4d0fe291aed3a476a changed the logic of generic constants
LFS value are equal to the default values; which is now wrong
for powerpc64.

Fix the value by explicit define the previous glibc constants
(powerpc64 does not need to use the 32 kABI value, but it simplifies
the FCNTL_ADJUST_CMD which should be kept as compatibility).

Checked on powerpc64-linux-gnu and powerpc-linux-gnu.

(cherry picked from commit 5f828ff824e3b7cd133ef905b8ae25ab8a8f3d66)

io: Fix record locking contants on 32 bit arch with 64 bit default time_t (BZ#30477)

For architecture with default 64 bit time_t support, the kernel
does not provide LFS and non-LFS values for F_GETLK, F_GETLK, and
F_GETLK (the default value used for 64 bit architecture are used).

This is might be considered an ABI break, but the currenct exported
values is bogus anyway.

The POSIX lockf is not affected since it is aliased to lockf64,
which already uses the LFS values.

Checked on i686-linux-gnu and the new tests on a riscv32.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 4d0fe291aed3a476a3b59c4ecfae9d35ac0f15e8)

Document BZ #20975 fix

__check_pf: Add a cancellation cleanup handler [BZ #20975]

There are reports for hang in __check_pf:

https://github.com/JoeDog/siege/issues/4

It is reproducible only under specific configurations:

1. Large number of cores (>= 64) and large number of threads (> 3X of
the number of cores) with long lived socket connection.
2. Low power (frequency) mode.
3. Power management is enabled.

While holding lock, __check_pf calls make_request which calls __sendto
and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
lock held by __check_pf won't be released and can cause deadlock when
thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
cleanup handler for __check_pf to unlock the lock when cancelled by
another thread.  This fixes BZ #20975 and the siege hang issue.

(cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)

gmon: Revert addition of tunables to the manual

These tunables had to be removed over GLIBC_PRIVATE ABI stability
concerns.

gmon: Revert addition of tunables to preserve GLIBC_PRIVATE ABI

Otherwise, processes are likely to crash during concurrent updates
to a new glibc version on the stable release branch.

The test gmon/tst-mcount-overflow depends on those tunables, so
it has to be removed as well.

gmon: fix memory corruption issues [BZ# 30101]

V2 of this patch fixes an issue in V1, where the state was changed to ON not
OFF at end of _mcleanup. I hadn't noticed that (counterintuitively) ON=0 and
OFF=3, hence zeroing the buffer turned it back on. So set the state to OFF
after the memset.

1. Prevent double free, and reads from unallocated memory, when
   _mcleanup is (incorrectly) called two or more times in a row,
   without an intervening call to __monstartup; with this patch, the
   second and subsequent calls effectively become no-ops instead.
   While setting tos=NULL is minimal fix, safest action is to zero the
   whole gmonparam buffer.

2. Prevent memory leak when __monstartup is (incorrectly) called two
   or more times in a row, without an intervening call to _mcleanup;
   with this patch, the second and subsequent calls effectively become
   no-ops instead.

3. After _mcleanup, treat __moncontrol(1) as __moncontrol(0) instead.
   With zeroing of gmonparam buffer in _mcleanup, this stops the
   state incorrectly being changed to GMON_PROF_ON despite profiling
   actually being off. If we'd just done the minimal fix to _mcleanup
   of setting tos=NULL, there is risk of far worse memory corruption:
   kcount would point to deallocated memory, and the __profil syscall
   would make the kernel write profiling data into that memory,
   which could have since been reallocated to something unrelated.

4. Ensure __moncontrol(0) still turns off profiling even in error
   state. Otherwise, if mcount overflows and sets state to
   GMON_PROF_ERROR, when _mcleanup calls __moncontrol(0), the __profil
   syscall to disable profiling will not be invoked. _mcleanup will
   free the buffer, but the kernel will still be writing profiling
   data into it, potentially corrupted arbitrary memory.

Also adds a test case for (1). Issues (2)-(4) are not feasible to test.

Signed-off-by: Simon Kissane <skissane@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit bde121872001d8f3224eeafa5b7effb871c3fbca)

gmon: improve mcount overflow handling [BZ# 27576]

When mcount overflows, no gmon.out file is generated, but no message is printed
to the user, leaving the user with no idea why, and thinking maybe there is
some bug - which is how BZ 27576 ended up being logged. Print a message to
stderr in this case so the user knows what is going on.

As a comment in sys/gmon.h acknowledges, the hardcoded MAXARCS value is too
small for some large applications, including the test case in that BZ. Rather
than increase it, add tunables to enable MINARCS and MAXARCS to be overridden
at runtime (glibc.gmon.minarcs and glibc.gmon.maxarcs). So if a user gets the
mcount overflow error, they can try increasing maxarcs (they might need to
increase minarcs too if the heuristic is wrong in their case.)

Note setting minarcs/maxarcs too large can cause monstartup to fail with an
out of memory error. If you set them large enough, it can cause an integer
overflow in calculating the buffer size. I haven't done anything to defend
against that - it would not generally be a security vulnerability, since these
tunables will be ignored in suid/sgid programs (due to the SXID_ERASE default),
and if you can set GLIBC_TUNABLES in the environment of a process, you can take
it over anyway (LD_PRELOAD, LD_LIBRARY_PATH, etc). I thought about modifying
the code of monstartup to defend against integer overflows, but doing so is
complicated, and I realise the existing code is susceptible to them even prior
to this change (e.g. try passing a pathologically large highpc argument to
monstartup), so I decided just to leave that possibility in-place.

Add a test case which demonstrates mcount overflow and the tunables.

Document the new tunables in the manual.

Signed-off-by: Simon Kissane <skissane@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 31be941e4367c001b2009308839db5c67bf9dcbc)

gmon: Fix allocated buffer overflow (bug 29444)

The `__monstartup()` allocates a buffer used to store all the data
accumulated by the monitor.

The size of this buffer depends on the size of the internal structures
used and the address range for which the monitor is activated, as well
as on the maximum density of call instructions and/or callable functions
that could be potentially on a segment of executable code.

In particular a hash table of arcs is placed at the end of this buffer.
The size of this hash table is calculated in bytes as
p->fromssize = p->textsize / HASHFRACTION;

but actually should be
p->fromssize = ROUNDUP(p->textsize / HASHFRACTION, sizeof(*p->froms));

This results in writing beyond the end of the allocated buffer when an
added arc corresponds to a call near from the end of the monitored
address range, since `_mcount()` check the incoming caller address for
monitored range but not the intermediate result hash-like index that
uses to write into the table.

It should be noted that when the results are output to `gmon.out`, the
table is read to the last element calculated from the allocated size in
bytes, so the arcs stored outside the buffer boundary did not fall into
`gprof` for analysis. Thus this "feature" help me to found this bug
during working with https://sourceware.org/bugzilla/show_bug.cgi?id=29438

Just in case, I will explicitly note that the problem breaks the
`make test t=gmon/tst-gmon-dso` added for Bug 29438.
There, the arc of the `f3()` call disappears from the output, since in
the DSO case, the call to `f3` is located close to the end of the
monitored range.

Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
Another minor error seems a related typo in the calculation of
`kcountsize`, but since kcounts are smaller than froms, this is
actually to align the p->froms data.

Co-authored-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 801af9fafd4689337ebf27260aa115335a0cb2bc)

posix: Fix system blocks SIGCHLD erroneously [BZ #30163]

Fix bug that SIGCHLD is erroneously blocked forever in the following
scenario:

1. Thread A calls system but hasn't returned yet
2. Thread B calls another system but returns

SIGCHLD would be blocked forever in thread B after its system() returns,
even after the system() in thread A returns.

Although POSIX does not require, glibc system implementation aims to be
thread and cancellation safe. This bug was introduced in
5fb7fc96350575c9adb1316833e48ca11553be49 when we moved reverting signal
mask to happen when the last concurrently running system returns,
despite that signal mask is per thread. This commit reverts this logic
and adds a test.

Signed-off-by: Adam Yi <ayi@janestreet.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 436a604b7dc741fc76b5a6704c6cd8bb178518e7)

x86_64: Fix asm constraints in feraiseexcept (bug 30305)

The divss instruction clobbers its first argument, and the constraints
need to reflect that. Fortunately, with GCC 12, generated code does
not actually change, so there is no externally visible bug.

Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 5d1ccdda7b0c625751661d50977f3dfbc73f8eae)

gshadow: Matching sgetsgent, sgetsgent_r ERANGE handling (bug 30151)

Before this change, sgetsgent_r did not set errno to ERANGE, but
sgetsgent only check errno, not the return value from sgetsgent_r.
Consequently, sgetsgent did not detect any error, and reported
success to the caller, without initializing the struct sgrp object
whose address was returned.

This commit changes sgetsgent_r to set errno as well. This avoids
similar issues in applications which only change errno.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 969e9733c7d17edf1e239a73fa172f357561f440)

x86: Check minimum/maximum of non_temporal_threshold [BZ #29953]

The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may
be set to less than the minimum value when the shared cache size isn't
available (e.g., in an emulator) or by the tunable. Add checks for
minimum and maximum of non_temporal_threshold.

This fixes BZ #29953.

(cherry picked from commit 48b74865c63840b288bd85b4d8743533b73b339b)

stdlib: Undo post review change to 16adc58e73f3 [BZ #27749]

Post review removal of "goto restart" from
https://sourceware.org/pipermail/libc-alpha/2021-April/125470.html
introduced a bug when some atexit handers skipped.

Signed-off-by: Vitaly Buka <vitalybuka@google.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit fd78cfa72ea2bab30fdb4e1e0672b34471426c05)

elf: Smoke-test ldconfig -p against system /etc/ld.so.cache

The test is sufficient to detect the ldconfig bug fixed in
commit 9fe6f6363886aae6b2b210cae3ed1f5921299083 ("elf: Fix 64 time_t
support for installed statically binaries").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9fd63e35371b9939e9153907c6a753e6960b68ad)

Use 64-bit time_t interfaces in strftime and strptime (bug 30053)

Both functions use time_t only internally, so the ABI is not affected.

(cherry picked from commit 41349f6f67c83e7bafe49f985b56493d2c4c9c77)

elf: Fix GL(dl_phdr) and GL(dl_phnum) for static builds [BZ #29864]

The 73fc4e28b9464f0e refactor did not add the GL(dl_phdr) and
GL(dl_phnum) for static build, relying on the __ehdr_start symbol,
which is always added by the static linker, to get the correct values.

This is problematic in some ways:

  - The segment may see its in-memory size differ from its in-file
    size (or the binary may have holes).  The Linux has fixed is to
    provide concise values for both AT_PHDR and AT_PHNUM (commit
    0da1d5002745c - "fs/binfmt_elf: Fix AT_PHDR for unusual ELF files")

  - Some archs (alpha for instance) the hidden weak reference is not
    correctly pulled by the static linker and  __ehdr_start address
    end up being 0, which makes GL(dl_phdr) and GL(dl_phnum) have both
    invalid values (and triggering a segfault later on libc.so while
    accessing TLS variables).

The safer fix is to just restore the previous behavior to setup
GL(dl_phdr) and GL(dl_phnum) for static based on kernel auxv.  The
__ehdr_start fallback can also be simplified by not assuming weak
linkage (as for PIE).

The libc-static.c auxv init logic is moved to dl-support.c, since
the later is build without SHARED and then GLRO macro is defined
to access the variables directly.

The _dl_phdr is also assumed to be always non NULL, since an invalid
NULL values does not trigger TLS initialization (which is used in
various libc systems).

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 7e31d166510ac4adbf53d5e8144c709a37dd8c7a)

cdefs: Limit definition of fortification macros

Define the __glibc_fortify and other macros only when __FORTIFY_LEVEL >
0. This has the effect of not defining these macros on older C90
compilers that do not have support for variable length argument lists.

Also trim off the trailing backslashes from the definition of
__glibc_fortify and __glibc_fortify_n macros.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 2337e04e21ba6040926ec871e403533f77043c40)

x86: Prevent SIGSEGV in memcmp-sse2 when data is concurrently modified [BZ #29863]

In the case of INCORRECT usage of `memcmp(a, b, N)` where `a` and `b`
are concurrently modified as `memcmp` runs, there can be a SIGSEGV
in `L(ret_nonzero_vec_end_0)` because the sequential logic
assumes that `(rdx - 32 + rax)` is a positive 32-bit integer.

To be clear, this change does not mean the usage of `memcmp` is
supported.  The program behaviour is undefined (UB) in the
presence of data races, and `memcmp` is incorrect when the values
of `a` and/or `b` are modified concurrently (data race). This UB
may manifest itself as a SIGSEGV. That being said, if we can
allow the idiomatic use cases, like those in yottadb with
opportunistic concurrency control (OCC), to execute without a
SIGSEGV, at no cost to regular use cases, then we can aim to
minimize harm to those existing users.

The fix replaces a 32-bit `addl %edx, %eax` with the 64-bit variant
`addq %rdx, %rax`. The 1-extra byte of code size from using the
64-bit instruction doesn't contribute to overall code size as the
next target is aligned and has multiple bytes of `nop` padding
before it. As well all the logic between the add and `ret` still
fits in the same fetch block, so the cost of this change is
basically zero.

The relevant sequential logic can be seen in the following
pseudo-code:
```
    /*
     * rsi = a
     * rdi = b
     * rdx = len - 32
     */
    /* cmp a[0:15] and b[0:15]. Since length is known to be [17, 32]
    in this case, this check is also assumed to cover a[0:(31 - len)]
    and b[0:(31 - len)].  */
    movups  (%rsi), %xmm0
    movups  (%rdi), %xmm1
    PCMPEQ  %xmm0, %xmm1
    pmovmskb %xmm1, %eax
    subl    %ecx, %eax
    jnz L(END_NEQ)

    /* cmp a[len-16:len-1] and b[len-16:len-1].  */
    movups  16(%rsi, %rdx), %xmm0
    movups  16(%rdi, %rdx), %xmm1
    PCMPEQ  %xmm0, %xmm1
    pmovmskb %xmm1, %eax
    subl    %ecx, %eax
    jnz L(END_NEQ2)
    ret

L(END2):
    /* Position first mismatch.  */
    bsfl    %eax, %eax

    /* The sequential version is able to assume this value is a
    positive 32-bit value because the first check included bytes in
    range a[0:(31 - len)] and b[0:(31 - len)] so `eax` must be
    greater than `31 - len` so the minimum value of `edx` + `eax` is
    `(len - 32) + (32 - len) >= 0`. In the concurrent case, however,
    `a` or `b` could have been changed so a mismatch in `eax` less or
    equal than `(31 - len)` is possible (the new low bound is `(16 -
    len)`. This can result in a negative 32-bit signed integer, which
    when zero extended to 64-bits is a random large value this out
    out of bounds. */
    addl %edx, %eax

    /* Crash here because 32-bit negative number in `eax` zero
    extends to out of bounds 64-bit offset.  */
    movzbl  16(%rdi, %rax), %ecx
    movzbl  16(%rsi, %rax), %eax
```

This fix is quite simple, just make the `addl %edx, %eax` 64 bit (i.e
`addq %rdx, %rax`). This prevents the 32-bit zero extension
and since `eax` is still a low bound of `16 - len` the `rdx + rax`
is bound by `(len - 32) - (16 - len) >= -16`. Since we have a
fixed offset of `16` in the memory access this must be in bounds.

(cherry picked from commit b712be52645282c706a5faa038242504feb06db5)

time: Set daylight to 1 for matching DST/offset change (bug 29951)

The daylight variable is supposed to be set to 1 if DST is ever in
use for the current time zone. But __tzfile_read used to do this:

__daylight = rule_stdoff != rule_dstoff;

This check can fail to set __daylight to 1 if the DST and non-DST
offsets happen to be the same.

(cherry picked from commit 35141f304e319109c322f797ae71c0b9420ccb05)