posix: Fix double-free after allocation failure in regcomp (bug 33185)
If a memory allocation failure occurs during bracket expression
parsing in regcomp, a double-free error may result.
Reported-by: Anastasia Belova <abelova@astralinux.ru> Co-authored-by: Paul Eggert <eggert@cs.ucla.edu> Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 7ea06e994093fa0bcca0d0ee2c1db271d8d7885d)
Florian Weimer [Thu, 22 May 2025 12:36:37 +0000 (14:36 +0200)]
Fix error reporting (false negatives) in SGID tests
And simplify the interface of support_capture_subprogram_self_sgid.
Use the existing framework for temporary directories (now with
mode 0700) and directory/file deletion. Handle all execution
errors within support_capture_subprogram_self_sgid. In particular,
this includes test failures because the invoked program did not
exit with exit status zero. Existing tests that expect exit
status 42 are adjusted to use zero instead.
In addition, fix callers not to call exit (0) with test failures
pending (which may mask them, especially when running with --direct).
support: Don't fail on fchown when spawning sgid processes
In some cases (e.g. when podman creates user containers), the only other
group assigned to the executing user is nobody and fchown fails with it
because the group is not mapped. Do not fail the test in this case,
instead exit as unsupported.
Wilco Dijkstra [Fri, 13 Dec 2024 15:43:07 +0000 (15:43 +0000)]
math: Improve layout of exp/exp10 data
GCC aligns global data to 16 bytes if their size is >= 16 bytes. This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps. As a result on targets that support them, more load-pair
instructions are used in exp.
The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2.
Wilco Dijkstra [Tue, 24 Dec 2024 18:01:59 +0000 (18:01 +0000)]
AArch64: Add SVE memset
Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes. Performance of random memset benchmark improves
by ~2% on Neoverse V1.
GCC aligns global data to 16 bytes if their size is >= 16 bytes. This patch
changes the exp2f_data struct slightly so that the fields are better aligned.
As a result on targets that support them, load-pair instructions accessing
poly_scaled and invln2_scaled are now 16-byte aligned.
Wilco Dijkstra [Mon, 25 Nov 2024 18:43:08 +0000 (18:43 +0000)]
AArch64: Remove zva_128 from memset
Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.
Improve small memsets by avoiding branches and use overlapping stores.
Use DC ZVA for copies over 128 bytes. Remove unnecessary code for ZVA sizes
other than 64 and 128. Performance of random memset benchmark improves by 24%
on Neoverse N1.
Wilco Dijkstra [Wed, 7 Aug 2024 13:43:47 +0000 (14:43 +0100)]
AArch64: Improve generic strlen
Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates. Change final
size computation to avoid increasing latency. On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.
misc: Add support for Linux uio.h RWF_NOAPPEND flag
In Linux 6.9 a new flag is added to allow for Per-io operations to
disable append mode even if a file was opened with the flag O_APPEND.
This is done with the new RWF_NOAPPEND flag.
This caused two test failures as these tests expected the flag 0x00000020
to be unused. Adding the flag definition now fixes these tests on Linux
6.9 (v6.9-rc1).
elf: Support recursive use of dynamic TLS in interposed malloc
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS). The previous workaround from
commit afe42e935b3ee97bac9a7064157587777259c60e ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.
This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update. This
is possible because an interposed malloc has a fixed module ID and
TLS slot. (It cannot be unloaded.) If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc. It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.
The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.
All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.
Florian Weimer [Mon, 3 Jun 2024 08:49:40 +0000 (10:49 +0200)]
elf: Avoid some free (NULL) calls in _dl_update_slotinfo
This has been confirmed to work around some interposed mallocs. Here
is a discussion of the impact test ust/libc-wrapper/test_libc-wrapper
in lttng-tools:
New TLS usage in libgcc_s.so.1, compatibility impact
<https://inbox.sourceware.org/libc-alpha/8734v1ieke.fsf@oldenburg.str.redhat.com/>
Reportedly, this patch also papers over a similar issue when tcmalloc
2.9.1 is not compiled with -ftls-model=initial-exec. Of course the
goal really should be to compile mallocs with the initial-exec TLS
model, but this commit appears to be a useful interim workaround.
x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212]
The loop should be aligned to 32-bytes so that it can ideally run out
the DSB. This is particularly important on Skylake-Server where
deficiencies in it's DSB implementation make it prone to not being
able to run loops out of the DSB.
This results in a 25% performance degradation for the non-aligned
version.
The fix is to just ensure the code layout is such that the loop is
aligned. (Which was previously the case but was accidentally dropped
in 84e7c46df).
NB: The fix was actually 64-byte alignment. This is because 64-byte
alignment generally produces more stable performance than 32-byte
aligned code (cache line crosses can affect perf), so if we are going
past 16-byte alignmnent, might as well go to 64. 64-byte alignment
also matches most other functions we over-align, so it creates a
common point of optimization.
Times are reported as ratio of Time_With_Patch /
Time_Without_Patch. Lower is better.
The values being reported is the geometric mean of the ratio across
all tests in bench-strcmp and bench-strncmp.
Note this patch is only attempting to improve the Skylake-Server
strcmp for long strings. The rest of the numbers are only to test for
regressions.
The 2.6% regression on TGL-strcmp is due to slowdowns caused by
changes in alignment of code handling small sizes (most on the
page-cross logic). These should be safe to ignore because 1) We
previously only 16-byte aligned the function so this behavior is not
new and was essentially up to chance before this patch and 2) this
type of alignment related regression on small sizes really only comes
up in tight micro-benchmark loops and is unlikely to have any affect
on realworld performance.
Noah Goldstein [Fri, 24 May 2024 17:38:50 +0000 (12:38 -0500)]
x86: Improve large memset perf with non-temporal stores [RHEL-29312]
Previously we use `rep stosb` for all medium/large memsets. This is
notably worse than non-temporal stores for large (above a
few MBs) memsets.
See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
For data using different stategies for large memset on ICX and SKX.
Using non-temporal stores can be up to 3x faster on ICX and 2x faster
on SKX. Historically, these numbers would not have been so good
because of the zero-over-zero writeback optimization that `rep stosb`
is able to do. But, the zero-over-zero writeback optimization has been
removed as a potential side-channel attack, so there is no longer any
good reason to only rely on `rep stosb` for large memsets. On the flip
size, non-temporal writes can avoid data in their RFO requests saving
memory bandwidth.
All of the other changes to the file are to re-organize the
code-blocks to maintain "good" alignment given the new code added in
the `L(stosb_local)` case.
The results from running the GLIBC memset benchmarks on TGL-client for
N=20 runs:
Geometric Mean across the suite New / Old EXEX256: 0.979
Geometric Mean across the suite New / Old EXEX512: 0.979
Geometric Mean across the suite New / Old AVX2 : 0.986
Geometric Mean across the suite New / Old SSE2 : 0.979
Most of the cases are essentially unchanged, this is mostly to show
that adding the non-temporal case didn't add any regressions to the
other cases.
The results on the memset-large benchmark suite on TGL-client for N=20
runs:
Geometric Mean across the suite New / Old EXEX256: 0.926
Geometric Mean across the suite New / Old EXEX512: 0.925
Geometric Mean across the suite New / Old AVX2 : 0.928
Geometric Mean across the suite New / Old SSE2 : 0.924
So roughly a 7.5% speedup. This is lower than what we see on servers
(likely because clients typically have faster single-core bandwidth so
saving bandwidth on RFOs is less impactful), but still advantageous.
Full test-suite passes on x86_64 w/ and w/o multiarch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 5bf0ab80573d66e4ae5d94b094659094336da90f)
Gabi Falk [Tue, 7 May 2024 18:25:00 +0000 (18:25 +0000)]
x86_64: Fix missing wcsncat function definition without multiarch (x86-64-v4)
This code expects the WCSCAT preprocessor macro to be predefined in case
the evex implementation of the function should be defined with a name
different from __wcsncat_evex. However, when glibc is built for
x86-64-v4 without multiarch support, sysdeps/x86_64/wcsncat.S defines
WCSNCAT variable instead of WCSCAT to build it as wcsncat. Rename the
variable to WCSNCAT, as it is actually a better naming choice for the
variable in this case.
Reported-by: Kenton Groombridge Link: https://bugs.gentoo.org/921945 Fixes: 64b8b6516b ("x86: Add evex optimized functions for the wchar_t strcpy family") Signed-off-by: Gabi Falk <gabifalk@gmx.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit dd5f891c1ad9f1b43b9db93afe2a55cbb7a6194e)
Noah Goldstein [Wed, 1 Nov 2023 20:30:26 +0000 (15:30 -0500)]
x86: Only align destination to 1x VEC_SIZE in memset 4x loop
Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on
performance other than potentially resulting in an additional
iteration of the loop.
1x maintains aligned stores (the only reason to align in this case)
and doesn't incur any unnecessary loop iterations. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit 9469261cf1924d350feeec64d2c80cafbbdcdd4d)
Szabolcs Nagy [Tue, 16 Feb 2021 12:55:13 +0000 (12:55 +0000)]
elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.
It may be possible to detect up-to-date dtv faster. But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.
This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once. The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.
This patch uses acquire/release synchronization when accessing the
generation counter.
Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.
I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.
H.J. Lu [Mon, 28 Aug 2023 19:08:14 +0000 (12:08 -0700)]
x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]
The old Intel software developer manual specified that the low byte of
EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of
CPUDID leaf 2 was needed to retrieve the complete cache information. The
newer Intel manual has been changed to that it should always return 1
and be ignored. If the lower byte isn't 1, CPUID leaf 2 can't be used.
In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead. If
CPUID leaf 4 doesn't contain the cache information, cache information
isn't available at all. This addresses BZ #30643.
H.J. Lu [Thu, 17 Aug 2023 16:42:29 +0000 (09:42 -0700)]
x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:
Before After Improvement
max 63.349 58.347 8%
min 4.448 5.651 -30%
mean 12.0674 10.336 14%
The minimum code path is
if (hx < 0x3FDA827A) /* x < 0.41422 */
{
if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */
{
...
}
if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */
{
math_force_eval (two54 + x); /* raise inexact */
if (ax < 0x3c900000) /* |x| < 2**-54 */
{
...
}
else
return x - x * x * 0.5;
FMA and non-FMA code sequences look similar. Non-FMA version is slightly
faster. Since log1p is called by asinh and atanh, it improves asinh
performance by:
Before After Improvement
max 75.645 63.135 16%
min 10.074 10.071 0%
mean 15.9483 14.9089 6%
and improves atanh performance by:
Before After Improvement
max 91.768 75.081 18%
min 15.548 13.883 10%
mean 18.3713 16.8011 8%
H.J. Lu [Fri, 11 Aug 2023 15:04:08 +0000 (08:04 -0700)]
x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:
Before After Improvement
max 70.204 68.054 3%
min 20.709 16.2 22%
mean 22.1221 16.7367 24%
NB: Add
extern long double __expm1l (long double);
extern long double __expm1f128 (long double);
for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
Florian Weimer [Tue, 17 Dec 2024 17:12:03 +0000 (18:12 +0100)]
x86: Avoid integer truncation with large cache sizes (bug 32470)
Some hypervisors report 1 TiB L3 cache size. This results
in some variables incorrectly getting zeroed, causing crashes
in memcpy/memmove because invariants are violated.
Michael Jeanson [Wed, 20 Nov 2024 19:15:42 +0000 (14:15 -0500)]
nptl: initialize cpu_id_start prior to rseq registration
When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.
While current kernels don't validate the content of this field on
registration, future ones could.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
(cherry picked from commit d9f40387d3305d97e30a8cf8724218c42a63680a)
Michael Jeanson [Thu, 7 Nov 2024 21:23:49 +0000 (22:23 +0100)]
nptl: initialize rseq area prior to registration
Per the rseq syscall documentation, 3 fields are required to be
initialized by userspace prior to registration, they are 'cpu_id',
'rseq_cs' and 'flags'. Since we have no guarantee that 'struct pthread'
is cleared on all architectures, explicitly set those 3 fields prior to
registration.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 97f60abd25628425971f07e9b0e7f8eec0741235)
Florian Weimer [Mon, 28 Oct 2024 13:45:30 +0000 (14:45 +0100)]
elf: Change ldconfig auxcache magic number (bug 32231)
In commit c628c2296392ed3bf2cb8d8470668e64fe53389f (elf: Remove
ldconfig kernel version check), the layout of auxcache entries
changed because the osversion field was removed from
struct aux_cache_file_entry. However, AUX_CACHEMAGIC was not
changed, so existing files are still used, potentially leading
to unintended ldconfig behavior. This commit changes AUX_CACHEMAGIC,
so that the file is regenerated.
Reported-by: DJ Delorie <dj@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 0a536f6e2f76e3ef581b3fd9af1e5cf4ddc7a5a2)
nptl: Use <support/check.h> facilities in tst-setuid3
Remove local FAIL macro in favor to FAIL_EXIT1 from <support/check.h>,
which provides equivalent reporting, with the name of the file and the
line number within of the failure site additionally included. Remove
FAIL_ERR altogether and include ": %m" explicitly with the format string
supplied to FAIL_EXIT1 as there seems little value to have a separate
macro just for this.
posix: Use <support/check.h> facilities in tst-truncate and tst-truncate64
Remove local FAIL macro in favor to FAIL_RET from <support/check.h>,
which provides equivalent reporting, with the name of the file of the
failure site additionally included, for the tst-truncate-common core
shared between the tst-truncate and tst-truncate64 tests.
ungetc: Fix backup buffer leak on program exit [BZ #27821]
If a file descriptor is left unclosed and is cleaned up by _IO_cleanup
on exit, its backup buffer remains unfreed, registering as a leak in
valgrind. This is not strictly an issue since (1) the program should
ideally be closing the stream once it's not in use and (2) the program
is about to exit anyway, so keeping the backup buffer around a wee bit
longer isn't a real problem. Free it anyway to keep valgrind happy
when the streams in question are the standard ones, i.e. stdout, stdin
or stderr.
Also, the _IO_have_backup macro checks for _IO_save_base,
which is a roundabout way to check for a backup buffer instead of
directly looking for _IO_backup_base. The roundabout check breaks when
the main get area has not been used and user pushes a char into the
backup buffer with ungetc. Fix this to use the _IO_backup_base
directly.
ungetc: Fix uninitialized read when putting into unused streams [BZ #27821]
When ungetc is called on an unused stream, the backup buffer is
allocated without the main get area being present. This results in
every subsequent ungetc (as the stream remains in the backup area)
checking uninitialized memory in the backup buffer when trying to put a
character back into the stream.
Avoid comparing the input character with buffer contents when in backup
to avoid this uninitialized read. The uninitialized read is harmless in
this context since the location is promptly overwritten with the input
character, thus fulfilling ungetc functionality.
Also adjust wording in the manual to drop the paragraph that says glibc
cannot do multiple ungetc back to back since with this change, ungetc
can actually do this.
stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650]
Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer
than INT_MAX (bug 27650)") and add a test case for the issue, inspired
by the reproducer provided with the bug report.
This has been verified to succeed as from the commit referred and fail
beforehand.
As the test requires 2GiB of data to be passed around its performance
has been evaluated using a choice of systems and the execution time
determined to be respectively in the range of 9s for POWER9@2.166GHz,
24s for FU740@1.2GHz, and 40s for 74Kf@950MHz. As this is on the verge
of and beyond the default timeout it has been increased by the factor of
8. Regardless, following recent practice the test has been added to the
standard rather than extended set.
Add a FAIL test failure helper analogous to FAIL_RET, that does not
cause the current function to return, providing a standardized way to
report a test failure with a message supplied while permitting the
caller to continue executing, for further reporting, cleaning up, etc.
Update existing test cases that provide a conflicting definition of FAIL
by removing the local FAIL definition and then as follows:
- tst-fortify-syslog: provide a meaningful message in addition to the
file name already added by <support/check.h>; 'support_record_failure'
is already called by 'support_print_failure_impl' invoked by the new
FAIL test failure helper.
- tst-ctype: no update to FAIL calls required, with the name of the file
and the line number within of the failure site additionally included
by the new FAIL test failure helper, and error counting plus count
reporting upon test program termination also already provided by
'support_record_failure' and 'support_report_failure' respectively,
called by 'support_print_failure_impl' and 'adjust_exit_status' also
respectively. However in a number of places 'printf' is called and
the error count adjusted by hand, so update these places to make use
of FAIL instead. And last but not least adjust the final summary just
to report completion, with any error count following as reported by
the test driver.
- test-tgmath2: no update to FAIL calls required, with the name of the
file of the failure site additionally included by the new FAIL test
failure helper. Also there is no need to track the return status by
hand as any call to FAIL will eventually cause the test case to return
an unsuccesful exit status regardless of the return status from the
test function, via a call to 'adjust_exit_status' made by the test
driver.
Noah Goldstein [Tue, 13 Aug 2024 15:29:14 +0000 (23:29 +0800)]
x86: Fix bug in strchrnul-evex512 [BZ #32078]
Issue was we were expecting not matches with CHAR before the start of
the string in the page cross case.
The check code in the page cross case:
```
and $0xffffffffffffffc0,%rax
vmovdqa64 (%rax),%zmm17
vpcmpneqb %zmm17,%zmm16,%k1
vptestmb %zmm17,%zmm17,%k0{%k1}
kmovq %k0,%rax
inc %rax
shr %cl,%rax
je L(continue)
```
expects that all characters that neither match null nor CHAR will be
1s in `rax` prior to the `inc`. Then the `inc` will overflow all of
the 1s where no relevant match was found.
This is incorrect in the page-cross case, as the
`vmovdqa64 (%rax),%zmm17` loads from before the start of the input
string.
If there are matches with CHAR before the start of the string, `rax`
won't properly overflow.
The arithmetic shift will clear any matches prior to the start of the
string while maintaining the signbit so the 1s can properly overflow
to zero in the case of no matches. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 7da08862471dfec6fdae731c2a5f351ad485c71f)
Andreas Schwab [Mon, 5 Aug 2024 08:55:51 +0000 (10:55 +0200)]
Fix name space violation in fortify wrappers (bug 32052)
Rename the identifier sz to __sz everywhere.
Fixes: a643f60c53 ("Make sure that the fortified function conditionals are constant")
(cherry picked from commit 39ca997ab378990d5ac1aadbaa52aaf1db6d526f)
(redone from scratch because of many conflicts)
H.J. Lu [Wed, 24 Jul 2024 21:05:13 +0000 (14:05 -0700)]
linux: Update the mremap C implementation [BZ #31968]
Update the mremap C implementation to support the optional argument for
MREMAP_DONTUNMAP added in Linux 5.7 since it may not always be correct
to implement a variadic function as a non-variadic function on all Linux
targets. Return MAP_FAILED and set errno to EINVAL for unknown flag bits.
This fixes BZ #31968.
Note: A test must be added when a new flag bit is introduced.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 6c40cb0e9f893d49dc7caee580a055de53562206)
Frédéric Bérat [Wed, 14 Jun 2023 08:52:07 +0000 (10:52 +0200)]
tests: replace system by xsystem
With fortification enabled, system calls return result needs to be checked,
has it gets the __wur macro enabled. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 8022fc7d5119a22e9e0ac72798f649385b0e167a)
resolv: Do not wait for non-existing second DNS response after error (bug 30081)
In single-request mode, there is no second response after an error
because the second query has not been sent yet. Waiting for it
introduces an unnecessary timeout.
Michael Jeanson [Wed, 3 Jul 2024 16:35:34 +0000 (12:35 -0400)]
nptl: fix potential merge of __rseq_* relro symbols
While working on a patch to add support for the extensible rseq ABI, we
came across an issue where a new 'const' variable would be merged with
the existing '__rseq_size' variable. We tracked this to the use of
'-fmerge-all-constants' which allows the compiler to merge identical
constant variables. This means that all 'const' variables in a compile
unit that are of the same size and are initialized to the same value can
be merged.
In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t'
are both 4 bytes and initialized to 0 which should trigger the merge.
However for reasons we haven't delved into when the attribute 'section
(".data.rel.ro")' is added to the mix, only variables of the same exact
types are merged. As far as we know this behavior is not specified
anywhere and could change with a new compiler version, hence this patch.
Move the definitions of these variables into an assembler file and add
hidden writable aliases for internal use. This has the added bonus of
removing the asm workaround to set the values on rseq registration.
Tested on Debian 12 with GCC 12.2.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 2b92982e2369d292560793bee8e730f695f48ff3)
Joseph Myers [Fri, 26 May 2023 15:03:31 +0000 (15:03 +0000)]
Add AT_RSEQ_* from Linux 6.3 to elf.h
Linux 6.3 adds constants AT_RSEQ_FEATURE_SIZE and AT_RSEQ_ALIGN; add
them to glibc's elf.h. (Recall that, although elf.h is a
system-independent header, so far we've put AT_* constants there even
if Linux-specific, as discussed in bug 15794. So rather than making
any attempt to fix that issue, the new constants are just added there
alongside the existing ones.)
Stefan Liebler [Thu, 11 Jul 2024 09:28:53 +0000 (11:28 +0200)]
s390x: Fix segfault in wcsncmp [BZ #31934]
The z13/vector-optimized wcsncmp implementation segfaults if n=1
and there is only one character (equal on both strings) before
the page end. Then it loads and compares one character and misses
to check n again. The following load fails.
This patch removes the extra load and compare of the first character
and just start with the loop which uses vector-load-to-block-boundary.
This code-path also checks n.
With this patch both tests are passing:
- the simplified one mentioned in the bugzilla 31934
- the full one in Florian Weimer's patch:
"manual: Document a GNU extension for strncmp/wcsncmp"
(https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/):
On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9b7651410375ec8848a1944992d663d514db4ba7)
H.J. Lu [Fri, 10 May 2024 03:07:01 +0000 (20:07 -0700)]
Force DT_RPATH for --enable-hardcoded-path-in-tests
On Fedora 40/x86-64, linker enables --enable-new-dtags by default which
generates DT_RUNPATH instead of DT_RPATH. Unlike DT_RPATH, DT_RUNPATH
only applies to DT_NEEDED entries in the executable and doesn't applies
to DT_NEEDED entries in shared libraries which are loaded via DT_NEEDED
entries in the executable. Some glibc tests have libstdc++.so.6 in
DT_NEEDED, which has libm.so.6 in DT_NEEDED. When DT_RUNPATH is generated,
/lib64/libm.so.6 is loaded for such tests. If the newly built glibc is
older than glibc 2.36, these tests fail with
assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_2.36' not found (required by /lib64/libm.so.6)
assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by /lib64/libm.so.6)
Pass -Wl,--disable-new-dtags to linker when building glibc tests with
--enable-hardcoded-path-in-tests. This fixes BZ #31719.
Adam Yi [Wed, 8 Mar 2023 08:11:47 +0000 (03:11 -0500)]
hurd: fix build of tst-system.c
We made tst-system.c depend on pthread, but that requires linking with
$(shared-thread-library). It does not fail under Linux because the
variable expands to nothing under Linux, but it fails for Hurd.
I tested verified via cross-compiling that "make check" now works
for Hurd.
Signed-off-by: Adam Yi <ayi@janestreet.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit d03094649d39949a30513bf3ffb03a28fecbccd8)
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
H.J. Lu [Thu, 25 Apr 2024 15:06:52 +0000 (08:06 -0700)]
elf: Also compile dl-misc.os with $(rtld-early-cflags)
Also compile dl-misc.os with $(rtld-early-cflags) to avoid
Program received signal SIGILL, Illegal instruction.
0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2",
endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156
156 bool positive = true;
(gdb) bt
#0 0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2",
endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156
#1 0x00007ffff7fdb1a9 in tunable_initialize (
cur=cur@entry=0x7ffff7ffbc00 <tunable_list+2176>,
strval=strval@entry=0x7fffffffe2c9 "2", len=len@entry=1)
at dl-tunables.c:131
#2 0x00007ffff7fdb3a2 in parse_tunables (valstring=<optimized out>)
at dl-tunables.c:258
#3 0x00007ffff7fdb5d9 in __GI___tunables_init (envp=0x7fffffffdd58)
at dl-tunables.c:288
#4 0x00007ffff7fe44c3 in _dl_sysdep_start (
start_argptr=start_argptr@entry=0x7fffffffdcb0,
dl_main=dl_main@entry=0x7ffff7fe5f80 <dl_main>)
at ../sysdeps/unix/sysv/linux/dl-sysdep.c:110
#5 0x00007ffff7fe5cae in _dl_start_final (arg=0x7fffffffdcb0) at rtld.c:494
#6 _dl_start (arg=0x7fffffffdcb0) at rtld.c:581
#7 0x00007ffff7fe4b38 in _start ()
(gdb)
when setting GLIBC_TUNABLES in glibc compiled with APX. Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 049b7684c912dd32b67b1b15b0f43bf07d5f512e)
CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)
This avoids potential memory corruption when the underlying NSS
callback function does not use the buffer space to store all strings
(e.g., for constant strings).
Instead of custom buffer management, two scratch buffers are used.
This increases stack usage somewhat.
Scratch buffer allocation failure is handled by return -1
(an invalid timeout value) instead of terminating the process.
This fixes bug 31679.
The addgetnetgrentX call in addinnetgrX may have failed to produce
a result, so the result variable in addinnetgrX can be NULL.
Use db->negtimeout as the fallback value if there is no result data;
the timeout is also overwritten below.
Also avoid sending a second not-found response. (The client
disconnects after receiving the first response, so the data stream did
not go out of sync even without this fix.) It is still beneficial to
add the negative response to the mapping, so that the client can get
it from there in the future, instead of going through the socket.
Charles Fol [Thu, 28 Mar 2024 15:25:38 +0000 (12:25 -0300)]
iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence (CVE-2024-2961)
ISO-2022-CN-EXT uses escape sequences to indicate character set changes
(as specified by RFC 1922). While the SOdesignation has the expected
bounds checks, neither SS2designation nor SS3designation have its;
allowing a write overflow of 1, 2, or 3 bytes with fixed values:
'$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.
Checked on aarch64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f9dc609e06b1136bb0408be9605ce7973a767ada)
powerpc: Fix ld.so address determination for PCREL mode (bug 31640)
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Wilco Dijkstra [Thu, 21 Mar 2024 16:48:33 +0000 (16:48 +0000)]
AArch64: Check kernel version for SVE ifuncs
Old Linux kernels disable SVE after every system call. Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched. Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.
Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions. If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.
Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2e94e2f5d2bf2de124c8ad7da85463355e54ccb2)
Szabolcs Nagy [Wed, 13 Mar 2024 14:34:14 +0000 (14:34 +0000)]
aarch64: fix check for SVE support in assembler
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Andreas Schwab [Thu, 23 Nov 2023 17:23:46 +0000 (18:23 +0100)]
aarch64: correct CFI in rawmemchr (bug 31113)
The .cfi_return_column directive changes the return column for the whole
FDE range. But the actual intent is to tell the unwinder that the value
in x30 (lr) now resides in x15 after the move, and that is expressed by
the .cfi_register directive.
Wilco Dijkstra [Thu, 26 Oct 2023 16:30:36 +0000 (17:30 +0100)]
AArch64: Remove Falkor memcpy
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.
Wilco Dijkstra [Thu, 26 Oct 2023 16:07:21 +0000 (17:07 +0100)]
AArch64: Add memset_zva64
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Wilco Dijkstra [Tue, 24 Oct 2023 12:51:07 +0000 (13:51 +0100)]
AArch64: Cleanup ifuncs
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Wilco Dijkstra [Tue, 17 Oct 2023 15:54:21 +0000 (16:54 +0100)]
AArch64: Add support for MOPS memcpy/memmove/memset
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).
Wilco Dijkstra [Wed, 1 Feb 2023 18:45:19 +0000 (18:45 +0000)]
AArch64: Improve SVE memcpy and memmove
Improve SVE memcpy by copying 2 vectors if the size is small enough.
This improves performance of random memcpy by ~9% on Neoverse V1, and
33-64 byte copies are ~16% faster.
Florian Weimer [Fri, 15 Mar 2024 18:08:24 +0000 (19:08 +0100)]
linux: Use rseq area unconditionally in sched_getcpu (bug 31479)
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e6b4d07303af25709051c3e96288f2d ("nptl:
Unconditionally use a 32-byte rseq area"). After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition. This commit always checks
the rseq area for CPU number information before using the other
approaches.
This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call. Most architectures that have vDSO
acceleration for getcpu also have rseq support.
Stefan Liebler [Tue, 25 Jul 2023 09:34:30 +0000 (11:34 +0200)]
Include sys/rseq.h in tst-rseq-disable.c
Starting with commit 2c6b4b272e6b4d07303af25709051c3e96288f2d
"nptl: Unconditionally use a 32-byte rseq area", the testcase
misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined.
The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h.
Thus just include sys/rseq.h in the tst-rseq-disable.c as also done
in tst-rseq.c and tst-rseq-nptl.c. Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 637aac2ae3980de31a6baab236a9255fe853cc76)
If the kernel headers provide a larger struct rseq, we used that
size as the argument to the rseq system call. As a result,
rseq registration would fail on older kernels which only accept
size 32.
Stefan Liebler [Thu, 22 Feb 2024 14:03:27 +0000 (15:03 +0100)]
S390: Do not clobber r7 in clone [BZ #31402]
Starting with commit e57d8fc97b90127de4ed3e3a9cdf663667580935
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.
This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
This restore the 2.33 semantic for arena_get2. It was changed by 11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2acafc). The
__get_nproc was refactored over then and now it also avoid to call
malloc.
The 11a02b035b46 did not take in consideration any performance
implication, which should have been discussed properly. The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.
arm: Remove wrong ldr from _dl_start_user (BZ 31339)
The commit 49d877a80b29d3002887b084eec6676d9f5fec18 (arm: Remove
_dl_skip_args usage) removed the _SKIP_ARGS literal, which was
previously loader to r4 on loader _start. However, the cleanup did not
remove the following 'ldr r4, [sl, r4]' on _dl_start_user, used to check
to skip the arguments after ld self-relocations.
In my testing, the kernel initially set r4 to 0, which makes the
ldr instruction just read the _GLOBAL_OFFSET_TABLE_. However, since r4
is a callee-saved register; a different runtime might not zero
initialize it and thus trigger an invalid memory access.
Checked on arm-linux-gnu.
Reported-by: Adrian Ratiu <adrian.ratiu@collabora.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 1e25112dc0cb2515d27d8d178b1ecce778a9d37a)
Daniel Cederman [Tue, 16 Jan 2024 08:31:41 +0000 (09:31 +0100)]
sparc: Remove unwind information from signal return stubs [BZ #31244]
The functions were previously written in C, but were not compiled
with unwind information. The ENTRY/END macros includes .cfi_startproc
and .cfi_endproc which adds unwind information. This caused the
tests cleanup-8 and cleanup-10 in the GCC testsuite to fail.
This patch adds a version of the ENTRY/END macros without the
CFI instructions that can be used instead.
sigaction registers a restorer address that is located two instructions
before the stub function. This patch adds a two instruction padding to
avoid that the unwinder accesses the unwind information from the function
that the linker has placed right before it in memory. This fixes an issue
with pthread_cancel that caused tst-mutex8-static (and other tests) to fail.
Signed-off-by: Daniel Cederman <cederman@gaisler.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 7bd06985c0a143cdcba2762bfe020e53514a53de)
The small counts copy bytes comparsion should be unsigned (as the
memmove size argument). It fixes string/tst-memmove-overflow on
sparcv9, where the input size triggers an invalid code path.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.