git.ipfire.org Git - thirdparty/glibc.git/log

]> git.ipfire.org Git - thirdparty/glibc.git/log

Joseph Myers [Wed, 11 Dec 2024 21:51:49 +0000 (21:51 +0000)]

Implement C23 atanpi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the atanpi functions (atan(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.

commit | commitdiff | tree

Peter Bergner [Wed, 11 Dec 2024 20:15:13 +0000 (23:15 +0300)]

powerpc64: Fix dl-trampoline.S big-endian / non-ROP build failure

Fix a big-endian / non-ROP build failure caused by commit 4d9a4c02 when
building dl-trampoline.S.

Reported-by: Joseph Myers <josmyers@redhat.com>

commit | commitdiff | tree

Florian Weimer [Tue, 10 Dec 2024 15:17:06 +0000 (16:17 +0100)]

powerpc: Use correct procedure call standard for getrandom vDSO call (bug 32440)

A plain indirect function call does not work on POWER because
success and failure are signaled through a flag register, and
not via the usual Linux negative return value convention.

This has potential security impact, in two ways: the return value
could be out of bounds (EAGAIN is 11 on powerpc6le), and no
random bytes have been written despite the non-error return value.

Fixes commit 461cab1de747f3842f27a5d24977d78d561d45f9 ("linux: Add
support for getrandom vDSO").

Reported-by: Ján Stanček <jstancek@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

commit | commitdiff | tree

H.J. Lu [Thu, 5 Dec 2024 20:44:05 +0000 (04:44 +0800)]

Add TEST_CC and TEST_CXX support

Support testing glibc build with a different C compiler or a different
C++ compiler with

$ ../glibc-VERSION/configure TEST_CC="gcc-6.4.1" TEST_CXX="g++-6.4.1"

1. Add LIBC_TRY_CC_AND_TEST_CC_OPTION, LIBC_TRY_CC_AND_TEST_CC_COMMAND
and LIBC_TRY_CC_AND_TEST_LINK to test both CC and TEST_CC.
2. Add check and xcheck targets to Makefile.in and override build compiler
options with ones from TEST_CC and TEST_CXX.

Tested on Fedora 41/x86-64:

1. Building with GCC 14.2.1 and testing with GCC 6.4.1 and GCC 11.2.1.
2. Building with GCC 15 and testing with GCC 6.4.1.

Support for GCC versions older than GCC 6.2 may need to change the test
sources. Other targets may need to update configure.ac under sysdeps and
modify Makefile.in to override target build compiler options.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>

commit | commitdiff | tree

Peter Bergner [Tue, 10 Dec 2024 03:41:08 +0000 (22:41 -0500)]

powerpc64le: ROP changes for the dl-trampoline functions

Add ROP protection for the _dl_runtime_resolve and _dl_profile_resolve
functions.

commit | commitdiff | tree

Wangyang Guo [Wed, 4 Dec 2024 11:16:22 +0000 (19:16 +0800)]

malloc: Add tcache path for calloc

This commit add tcache support in calloc() which can largely improve
the performance of small size allocation, especially in multi-thread
scenario. tcache_available() and tcache_try_malloc() are split out as
a helper function for better reusing the code.

Also fix tst-safe-linking failure after enabling tcache. In previous,
calloc() is used as a way to by-pass tcache in memory allocation and
trigger safe-linking check in fastbins path. With tcache enabled, it
needs extra workarounds to bypass tcache.

Result of bench-calloc-thread benchmark

Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)

Threads#   | Ratio
-----------|------
1 thread   | 0.656
4 threads  | 0.470
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Joseph Myers [Tue, 10 Dec 2024 20:42:20 +0000 (20:42 +0000)]

Implement C23 asinpi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the asinpi functions (asin(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.

commit | commitdiff | tree

Sam James [Mon, 9 Dec 2024 23:11:25 +0000 (23:11 +0000)]

malloc: add indirection for malloc(-like) functions in tests [BZ #32366]

GCC 15 introduces allocation dead code removal (DCE) for PR117370 in
r15-5255-g7828dc070510f8. This breaks various glibc tests which want
to assert various properties of the allocator without doing anything
obviously useful with the allocated memory.

Alexander Monakov rightly pointed out that we can and should do better
than passing -fno-malloc-dce to paper over the problem. Not least because
GCC 14 already does such DCE where there's no testing of malloc's return
value against NULL, and LLVM has such optimisations too.

Handle this by providing malloc (and friends) wrappers with a volatile
function pointer to obscure that we're calling malloc (et. al) from the
compiler.

Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>

commit | commitdiff | tree

Joseph Myers [Mon, 9 Dec 2024 23:01:29 +0000 (23:01 +0000)]

Implement C23 acospi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the acospi functions (acos(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.

commit | commitdiff | tree

Sachin Monga [Mon, 9 Dec 2024 21:47:40 +0000 (16:47 -0500)]

powerpc64le: ROP changes for the *context and setjmp functions

Add ROP protection for the getcontext, setcontext, makecontext, swapcontext
and __sigsetjmp_symbol functions.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

commit | commitdiff | tree

Michael Jeanson [Mon, 9 Dec 2024 20:24:26 +0000 (20:24 +0000)]

nptl: Add <thread_pointer.h> for m68k

This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.

Compile tested with build-many-glibcs.py but I don't have access to any
hardware to run the tests.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Arjun Shankar <arjun@redhat.com>

commit | commitdiff | tree

Michael Jeanson [Wed, 31 Jul 2024 15:20:36 +0000 (11:20 -0400)]

nptl: Add <thread_pointer.h> for RISC-V

This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.

Both code paths tested on a Visionfive 2 with Debian sid.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

commit | commitdiff | tree

Michael Jeanson [Wed, 31 Jul 2024 17:18:18 +0000 (13:18 -0400)]

nptl: add RSEQ_SIG for RISC-V

Enable RSEQ for RISC-V, support was added in Linux 5.18.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

commit | commitdiff | tree

Pierre Blanchard [Mon, 9 Dec 2024 15:58:47 +0000 (15:58 +0000)]

AArch64: Improve codegen in users of ADVSIMD expm1 helper

Add inline helper for expm1 and rearrange operations so MOV
is not necessary in reduction or around the special-case handler.
Reduce memory access by using more indexed MLAs in polynomial.
Speedup on Neoverse V1 for expm1 (19%), sinh (8.5%), and tanh (7.5%).

commit | commitdiff | tree

Pierre Blanchard [Mon, 9 Dec 2024 15:55:39 +0000 (15:55 +0000)]

AArch64: Improve codegen in users of ADVSIMD log1p helper

Add inline helper for log1p and rearrange operations so MOV
is not necessary in reduction or around the special-case handler.
Reduce memory access by using more indexed MLAs in polynomial.
Speedup on Neoverse V1 for log1p (3.5%), acosh (7.5%) and atanh (10%).

commit | commitdiff | tree

Pierre Blanchard [Mon, 9 Dec 2024 15:54:34 +0000 (15:54 +0000)]

AArch64: Improve codegen in AdvSIMD logs

Remove spurious ADRP and a few MOVs.
Reduce memory access by using more indexed MLAs in polynomial.
Align notation so that algorithms are easier to compare.
Speedup on Neoverse V1 for log10 (8%), log (8.5%), and log2 (10%).
Update error threshold in AdvSIMD log (now matches SVE log).

commit | commitdiff | tree

Pierre Blanchard [Mon, 9 Dec 2024 15:53:04 +0000 (15:53 +0000)]

AArch64: Improve codegen in AdvSIMD pow

Remove spurious ADRP. Improve memory access by shuffling constants and
using more indexed MLAs.

A few more optimisation with no impact on accuracy
- force fmas contraction
- switch from shift-aided rint to rint instruction

Between 1 and 5% throughput improvement on Neoverse
V1 depending on benchmark.

commit | commitdiff | tree

Stefan Liebler [Mon, 9 Dec 2024 09:25:24 +0000 (10:25 +0100)]

s390x: Regenerated ULPs.

Needed after:
"Implement C23 cospi"
commit 0ae0af68d8fa3bf6cbe1e4f1de5929ff71de67b3
and
"Implement C23 sinpi"
commit 776938e8b8dcf2b59998979e91cc0f9db7d771a8
and
"Implement C23 tanpi"

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:29 +0000 (22:53 +0200)]

htl: move pthread_condattr_setpshared into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-8-gfleury@disroot.org>

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:28 +0000 (22:53 +0200)]

htl: move pthread_condattr_setclock into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-7-gfleury@disroot.org>

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:27 +0000 (22:53 +0200)]

htl: move pthread_condattr_init into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-6-gfleury@disroot.org>

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:26 +0000 (22:53 +0200)]

htl: move pthread_condattr_getpshared into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-5-gfleury@disroot.org>

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:25 +0000 (22:53 +0200)]

htl: move pthread_condattr_getclock into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-4-gfleury@disroot.org>

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:24 +0000 (22:53 +0200)]

htl: move __pthread_default_condattr into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-3-gfleury@disroot.org>

commit | commitdiff | tree

gfleury [Tue, 26 Nov 2024 20:53:23 +0000 (22:53 +0200)]

htl: move pthread_condattr_destroy into libc.

Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20241126205329.2215295-2-gfleury@disroot.org>

commit | commitdiff | tree

Andreas K. Hüttel [Sun, 8 Dec 2024 21:01:51 +0000 (22:01 +0100)]

math: Add sinpi,cospi,tanpi sparc64 ulps

Linux catbus 6.1.112 #1 SMP Sun Oct 13 10:52:08 PDT 2024 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux

gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>

commit | commitdiff | tree

Andreas K. Hüttel [Sun, 8 Dec 2024 17:25:05 +0000 (18:25 +0100)]

math: Add tanpi aarch64 ulps

Linux dola 5.15.169-gentoo-dist #1 SMP Wed Oct 23 06:25:30 -00 2024 aarch64 GNU/Linux
Vendor ID: ARM
Model name: Neoverse-N1

gcc (Gentoo Hardened 13.3.1_p20241025 p1) 13.3.1 20241024

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>

commit | commitdiff | tree

H.J. Lu [Thu, 5 Dec 2024 00:39:44 +0000 (08:39 +0800)]

math: Exclude internal math symbols for tests [BZ #32414]

Since internal tests don't have access to internal symbols in libm,
exclude them for internal tests. Also make tst-strtod5 and tst-strtod5i
depend on $(libm) to support older versions of GCC which can't inline
copysign family functions. This fixes BZ #32414.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>

commit | commitdiff | tree

H.J. Lu [Fri, 6 Dec 2024 05:42:25 +0000 (13:42 +0800)]

Remove AC_SUBST(libc_cv_mtls_descriptor)

Remove

AC_SUBST(libc_cv_mtls_descriptor)

since there is no @libc_cv_mtls_descriptor@ and there is

LIBC_CONFIG_VAR([have-mtls-descriptor], [$libc_cv_mtls_descriptor])

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>

commit | commitdiff | tree

Joseph Myers [Thu, 5 Dec 2024 21:42:10 +0000 (21:42 +0000)]

Implement C23 tanpi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the tanpi functions (tan(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.

commit | commitdiff | tree

Joseph Myers [Thu, 5 Dec 2024 21:40:57 +0000 (21:40 +0000)]

Fix typo in elf/Makefile:postclean-generated

The postclean-generated setting in elf/Makefile lists
$(objpfx)/dso-sort-tests-2.generated-makefile twice and
$(objpfx)/dso-sort-tests-1.generated-makefile not at all, which looks
like a typo; fix it to list each once.

Tested for x86_64.

commit | commitdiff | tree

Adhemerval Zanella [Thu, 5 Dec 2024 16:44:18 +0000 (16:44 +0000)]

math: xfail some sinpi tests for ibm128-libgcc

On powerpc math/test-ibm128-sinpi shows:

testing long double (without inline functions)
Failure: sinpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: sinpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: sinpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: sinpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020)
Result:
is:         qNaN
should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
Failure: Test: sinpi_downward (0x3.fffffffffffffffcp+108)
Result:
is:          2.97479253223185882765417834495004e-15   0x1.acb679186c7b49a36c9ec63e110p-49
should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
difference:  2.97479253223185882765417834495004e-15   0x1.acb679186c7b49a36c9ec63e110p-49
ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp   :  4.0000
Failure: Test: sinpi_downward (0x3.ffffffffffffffffffffffffffp+108)
Result:
is:          2.63250110604328276654475674742669e-15   0x1.7b6225fa8503a5a8c514f5c0208p-49
should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
difference:  2.63250110604328276654475674742669e-15   0x1.7b6225fa8503a5a8c514f5c0208p-49
ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp   :  4.0000
Failure: Test: sinpi_towardzero (-0x3.fffffffffffffffcp+108)
Result:
is:         -1.71856472474338625450766636956702e-14  -0x1.3596cf230d8f69346d93d8c3100p-46
should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
difference:  1.71856472474338625450766636956702e-14   0x1.3596cf230d8f69346d93d8c3100p-46
ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp   :  3.0000
Failure: Test: sinpi_towardzero (-0x3.ffffffffffffffffffffffffffp+108)
Result:
is:         -9.73792846364428462525599942305655e-15  -0x1.5ed8897ea140e96a31453d6e580p-47
should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
difference:  9.73792846364428462525599942305655e-15   0x1.5ed8897ea140e96a31453d6e580p-47
ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp   :  3.0000
Failure: Test: sinpi_towardzero (0x3.fffffffffffffffcp+108)
Result:
is:          1.71856472474338625450766636956702e-14   0x1.3596cf230d8f69346d93d8c3100p-46
should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
difference:  1.71856472474338625450766636956702e-14   0x1.3596cf230d8f69346d93d8c3100p-46
ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp   :  3.0000
Failure: Test: sinpi_towardzero (0x3.ffffffffffffffffffffffffffp+108)
Result:
is:          9.73792846364428462525599942305655e-15   0x1.5ed8897ea140e96a31453d6e580p-47
should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
difference:  9.73792846364428462525599942305655e-15   0x1.5ed8897ea140e96a31453d6e580p-47
ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp   :  3.0000
Failure: Test: sinpi_upward (-0x3.fffffffffffffffcp+108)
Result:
is:         -1.71856472474338625450766636956709e-14  -0x1.3596cf230d8f69346d93d8c3110p-46
should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
difference:  1.71856472474338625450766636956710e-14   0x1.3596cf230d8f69346d93d8c3110p-46
ulp       :  inf
max.ulp   :  4.0000
Failure: Test: sinpi_upward (-0x3.ffffffffffffffffffffffffffp+108)
Result:
is:         -9.73792846364428462525599942305708e-15  -0x1.5ed8897ea140e96a31453d6e598p-47
should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
difference:  9.73792846364428462525599942305709e-15   0x1.5ed8897ea140e96a31453d6e598p-47
ulp       :  inf
max.ulp   :  4.0000
Failure: sinpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: sinpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: sinpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: sinpi_upward (0xf.ffffffffffffbffffffffffffcp+1020)
Result:
is:         qNaN
should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0

commit | commitdiff | tree

Adhemerval Zanella [Thu, 5 Dec 2024 16:36:21 +0000 (16:36 +0000)]

math: xfail some cospi tests for ibm128-libgcc

On powerpc math/test-ibm128-cospi shows:

testing long double (without inline functions)
Failure: cospi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: cospi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: cospi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: cospi_downward (-0xf.ffffffffffffbffffffffffffcp+1020)
Result:
is:         qNaN
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
Failure: Test: cospi_downward (0x3.fffffffffffffffcp+108)
Result:
is:          9.99999999999999999999999999995574e-01   0x1.ffffffffffffffffffffffff4c8p-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  4.42501664022411309598141492088312e-30   0x1.670000000000000000000000000p-98
ulp       :  179.5000
max.ulp   :  4.0000
Failure: Test: cospi_downward (0x3.ffffffffffffffffffffffffffp+108)
Result:
is:          9.99999999999999999999999999996524e-01   0x1.ffffffffffffffffffffffff730p-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  3.47591836363008326759542899077727e-30   0x1.1a0000000000000000000000000p-98
ulp       :  141.0000
max.ulp   :  4.0000
Failure: Test: cospi_towardzero (-0x3.fffffffffffffffcp+108)
Result:
is:          9.99999999999999999999999999852310e-01   0x1.ffffffffffffffffffffffe8990p-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  1.47689552599346303944427057331536e-28   0x1.767000000000000000000000000p-93
ulp       :  5991.0000
max.ulp   :  4.0000
Failure: Test: cospi_towardzero (-0x3.ffffffffffffffffffffffffffp+108)
Result:
is:          9.99999999999999999999999999952569e-01   0x1.fffffffffffffffffffffff87c0p-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  4.74302619264133348003801799876275e-29   0x1.e10000000000000000000000000p-95
ulp       :  1924.0000
max.ulp   :  4.0000
Failure: Test: cospi_towardzero (0x3.fffffffffffffffcp+108)
Result:
is:          9.99999999999999999999999999852310e-01   0x1.ffffffffffffffffffffffe8990p-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  1.47689552599346303944427057331536e-28   0x1.767000000000000000000000000p-93
ulp       :  5991.0000
max.ulp   :  4.0000
Failure: Test: cospi_towardzero (0x3.ffffffffffffffffffffffffffp+108)
Result:
is:          9.99999999999999999999999999952569e-01   0x1.fffffffffffffffffffffff87c0p-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  4.74302619264133348003801799876275e-29   0x1.e10000000000000000000000000p-95
ulp       :  1924.0000
max.ulp   :  4.0000
Failure: Test: cospi_upward (-0x3.fffffffffffffffcp+108)
Result:
is:          9.99999999999999999999999999852323e-01   0x1.ffffffffffffffffffffffe899bp-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  1.47673235656615530277812119019587e-28   0x1.766568e20369c00000000000000p-93
ulp       :  5990.3382
max.ulp   :  4.0000
Failure: Test: cospi_upward (-0x3.ffffffffffffffffffffffffffp+108)
Result:
is:          9.99999999999999999999999999952583e-01   0x1.fffffffffffffffffffffff87cbp-1
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0
difference:  4.74136253815267677203679334037676e-29   0x1.e0d4cf1e9076600000000000000p-95
ulp       :  1923.3252
max.ulp   :  4.0000
Failure: cospi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: cospi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: cospi_upward (0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: cospi_upward (0xf.ffffffffffffbffffffffffffcp+1020)
Result:
is:         qNaN
should be:   1.00000000000000000000000000000000e+00   0x1.000000000000000000000000000p+0

commit | commitdiff | tree

Adhemerval Zanella [Thu, 5 Dec 2024 16:22:04 +0000 (13:22 -0300)]

powerpc: Update ulps

From 'Implement C23 cospi' (0ae0af68d8fa3bf6cbe1e4f1de5929ff71de67b3)
and 'Implement C23 sinpi' (776938e8b8dcf2b59998979e91cc0f9db7d771a8).

commit | commitdiff | tree

Wilco Dijkstra [Thu, 5 Dec 2024 16:18:02 +0000 (16:18 +0000)]

AArch64: Update libm-test-ulps

Add sinpi/cospi.

commit | commitdiff | tree

H.J. Lu [Thu, 5 Dec 2024 01:25:45 +0000 (09:25 +0800)]

i686: Update libm-test-ulps

Update i686 libm-test-ulps to fix

FAIL: math/test-float64x-cospi
FAIL: math/test-float64x-sinpi
FAIL: math/test-ldouble-cospi
FAIL: math/test-ldouble-sinpi

when building glibc with GCC 7.4.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

H.J. Lu [Thu, 5 Dec 2024 00:57:53 +0000 (08:57 +0800)]

x86-64: Update libm-test-ulps

Update x86-64 libm-test-ulps to fix

FAIL: math/test-float64x-cospi
FAIL: math/test-float64x-exp2m1
FAIL: math/test-float64x-sinpi
FAIL: math/test-ldouble-cospi
FAIL: math/test-ldouble-exp2m1
FAIL: math/test-ldouble-sinpi

when building glibc with GCC 7.4.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Joseph Myers [Thu, 5 Dec 2024 10:12:09 +0000 (10:12 +0000)]

Use M_LIT in place of M_MLIT for literals

This should fix the reported issue building cospi and sinpi with GCC 6.

Tested for x86_64 (not with GCC 6).

commit | commitdiff | tree

Joseph Myers [Thu, 5 Dec 2024 09:53:47 +0000 (09:53 +0000)]

Add further test of TLS

Add an additional test of TLS variables, with different alignment,
accessed from different modules.  The idea of the alignment test is
similar to tst-tlsalign and the same code is shared for setting up
test variables, but unlike the tst-tlsalign code, there are multiple
threads and variables are accessed from multiple objects to verify
that they get a consistent notion of the address of an object within a
thread.  Threads are repeatedly created and shut down to verify proper
initialization in each new thread.  The test is also repeated with TLS
descriptors when supported.  (However, only initial-exec TLS is
covered in this test.)

Tested for x86_64.

commit | commitdiff | tree

Sergey Bugaev [Wed, 4 Dec 2024 11:29:15 +0000 (14:29 +0300)]

hurd: Protect against servers returning bogus read/write lengths

There already was a branch checking for this case in _hurd_fd_read ()
when the data is returned out-of-line. Do the same for inline data, as
well as for _hurd_fd_write (). It's also not possible for the length to
be negative, since it's stored in an unsigned integer.

Not verifying the returned length can confuse the callers who assume
the returned length is always reasonable. This manifested as libzstd
test suite failing on writes to /dev/zero, even though the write () call
appeared to succeed. In fact, the zero store backing /dev/zero was
returning a larger written length than the size actually submitted to
it, which is a separate bug to be fixed on the Hurd side. With this
patch, EGRATUITOUS is now propagated to the caller.

Reported-by: Diego Nieto Cid <dnietoc@gmail.com>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20241204112915.540032-1-bugaevc@gmail.com>

commit | commitdiff | tree

H.J. Lu [Thu, 5 Dec 2024 07:16:43 +0000 (15:16 +0800)]

Fix and sort variables in Makefiles

Fix variables in Makefiles:

1. There is a tab, not a space, between "variable" and =, +=, :=.
2. The last entry doesn't have a trailing \.

and sort them.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Joseph Myers [Wed, 4 Dec 2024 20:04:04 +0000 (20:04 +0000)]

Implement C23 sinpi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the sinpi functions (sin(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.

commit | commitdiff | tree

Joseph Myers [Wed, 4 Dec 2024 10:20:44 +0000 (10:20 +0000)]

Implement C23 cospi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the cospi functions (cos(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.

commit | commitdiff | tree

H.J. Lu [Tue, 26 Nov 2024 08:15:25 +0000 (16:15 +0800)]

malloc: Optimize small memory clearing for calloc

Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes
on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead
of up to 3 branches. On x86-64, it is faster than memset since calling
memset needs 1 indirect branch, 1 broadcast, and up to 4 branches.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

commit | commitdiff | tree

Joseph Myers [Tue, 3 Dec 2024 03:11:22 +0000 (03:11 +0000)]

Use Linux 6.12 in build-many-glibcs.py

Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).

commit | commitdiff | tree

Carmen Bianca BAKKER [Wed, 30 Oct 2024 12:26:48 +0000 (13:26 +0100)]

locale: More strictly implement ISO 8601 for Esperanto locale

Esperanto, as an international language and a bit of a non-locale,
usually defaults to international consensus. In this commit, I make the
Esperanto locale more in line with ISO 8601 by setting the first day as
Monday, and the first week as containing January 4.

Closes: BZ #32323
Signed-off-by: Carmen Bianca BAKKER <carmen@carmenbianca.eu>
Reviewed-by: Mike FABIAN <mfabian@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Thu, 28 Nov 2024 17:36:42 +0000 (14:36 -0300)]

elf: Consolidate stackinfo.h

And use sane default the generic implementation.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

commit | commitdiff | tree

Florian Weimer [Mon, 5 Aug 2024 14:01:12 +0000 (16:01 +0200)]

manual: Describe struct link_map, support link maps with dlinfo

This does not describe how to use RTLD_DI_ORIGIN and l_name
to reconstruct a full path for the an object. The reason
is that I think we should not recommend further use of
RTLD_DI_ORIGIN due to its buffer overflow potential (bug 24298).
This should be covered by another dlinfo extension. It would
also obsolete the need for the dladdr approach to obtain
the file name for the main executable.

Obtaining the lowest address from load segments in program
headers is quite clumsy and should be provided directly
via dlinfo.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

commit | commitdiff | tree

Joseph Myers [Fri, 29 Nov 2024 20:25:04 +0000 (20:25 +0000)]

Add threaded test of sem_trywait

All the existing glibc tests of sem_trywait are single-threaded. Add
one that calls sem_trywait and sem_post in separate threads.

Tested for x86_64.

commit | commitdiff | tree

Joseph Myers [Fri, 29 Nov 2024 16:43:56 +0000 (16:43 +0000)]

Add test of ELF hash collisions

Add tests that the dynamic linker works correctly with symbol names
involving hash collisions, for both choices of hash style (and
--hash-style=both as well). I note that there weren't actually any
previous tests using --hash-style (so tests would only cover the
default linker configuration in that regard). Also test symbol
versions involving hash collisions.

Tested for x86_64.

commit | commitdiff | tree

Sergey Kolosov [Wed, 6 Nov 2024 14:24:06 +0000 (15:24 +0100)]

nptl: Add new test for pthread_spin_trylock

Add a threaded test for pthread_spin_trylock attempting to lock already
acquired spin lock and checking for correct return code.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

commit | commitdiff | tree

k4lizen [Fri, 29 Nov 2024 13:25:29 +0000 (13:25 +0000)]

malloc: send freed small chunks to smallbin

Large chunks get added to the unsorted bin since
sorting them takes time, for small chunks the
benefit of adding them to the unsorted bin is
non-existant, actually hurting performance.

Splitting and malloc_consolidate still add small
chunks to unsorted, but we can hint the compiler
that that is a relatively rare occurance.
Benchmarking shows this to be consistently good.

Authored-by: k4lizen <k4lizen@proton.me>
Signed-off-by: Aleksa Siriški <sir@tmina.org>

commit | commitdiff | tree

Wilco Dijkstra [Mon, 25 Nov 2024 18:43:08 +0000 (18:43 +0000)]

AArch64: Remove zva_128 from memset

Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.

[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html

Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Wangyang Guo [Fri, 29 Nov 2024 08:05:35 +0000 (16:05 +0800)]

benchtests: Add calloc test

Two new benchmarks related to calloc added:
- bench-calloc-simple
- bench-calloc-thread
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Siddhesh Poyarekar [Thu, 28 Nov 2024 11:30:40 +0000 (06:30 -0500)]

pthread_getcpuclockid: Add descriptive comment to smoke test

Add a descriptive comment to the tst-pthread-cpuclockid-invalid test and
also drop pthread_getcpuclockid from the TODO-testing list since it now
has full coverage.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

commit | commitdiff | tree

Adhemerval Zanella [Tue, 26 Nov 2024 19:34:00 +0000 (16:34 -0300)]

Remove nios2-linux-gnu

GCC 15 (e876acab6cdd84bb2b32c98fc69fb0ba29c81153) and binutils
(e7a16d9fd65098045ef5959bf98d990f12314111) both removed all Nios II
support, and the architecture has been EOL'ed by the vendor. The
kernel still has support, but without a proper compiler there
is no much sense in keep it on glibc.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

commit | commitdiff | tree

Siddhesh Poyarekar [Thu, 28 Nov 2024 13:27:24 +0000 (08:27 -0500)]

libio: make _IO_least_marker static

Trivial cleanup to limit _IO_least_marker so that it's clear that it is
unused outside of genops.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

commit | commitdiff | tree

Wangyang Guo [Tue, 26 Nov 2024 07:33:38 +0000 (15:33 +0800)]

malloc: Avoid func call for tcache quick path in free()

Tcache is an important optimzation to accelerate memory free(), things
within this code path should be kept as simple as possible. This commit
try to remove the function call when free() invokes tcache code path by
inlining _int_free().

Result of bench-malloc-thread benchmark

Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)

Threads#   | Ratio
-----------|------
1 thread   | 0.879
4 threads  | 0.874

The performance data shows it can improve bench-malloc-thread benchmark
by ~12% in both single thread and multi-thread scenario.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Florian Weimer [Tue, 26 Nov 2024 18:26:13 +0000 (19:26 +0100)]

debug: Fix tst-longjmp_chk3 build failure on Hurd

Explicitly include <unistd.h> for _exit and getpid.

commit | commitdiff | tree

Adhemerval Zanella [Mon, 11 Nov 2024 20:38:44 +0000 (17:38 -0300)]

math: Add internal roundeven_finite

Some CORE-MATH routines uses roundeven and most of ISA do not have
an specific instruction for the operation.  In this case, the call
will be routed to generic implementation.

However, if the ISA does support round() and ctz() there is a better
alternative (as used by CORE-MATH).

This patch adds such optimization and also enables it on powerpc.
On a power10 it shows the following improvement:

expm1f                      master      patched       improvement
latency                     9.8574       7.0139            28.85%
reciprocal-throughput       4.3742       2.6592            39.21%

Checked on powerpc64le-linux-gnu and aarch64-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Julian Zhu [Wed, 11 Sep 2024 07:13:19 +0000 (15:13 +0800)]

RISC-V: Use builtin for fma and fmaf

The built-in functions `builtin_{fma, fmaf}` are sufficient to generate correct `fmadd.d`/`fmadd.s` instructions on RISC-V.

Signed-off-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Julian Zhu [Wed, 11 Sep 2024 08:05:12 +0000 (16:05 +0800)]

RISC-V: Use builtin for copysign and copysignf

The built-in functions `builtin_{copysign, copysignf}` are sufficient to generate correct `fsgnj.d/fsgnj.s` instructions on RISC-V.

Signed-off-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Alejandro Colomar [Sat, 16 Nov 2024 15:51:31 +0000 (16:51 +0100)]

Silence most -Wzero-as-null-pointer-constant diagnostics

Replace 0 by NULL and {0} by {}.

Omit a few cases that aren't so trivial to fix.

Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
Link: <https://software.codidact.com/posts/292718/292759#answer-292759>
Signed-off-by: Alejandro Colomar <alx@kernel.org>

commit | commitdiff | tree

Yannick Le Pennec [Mon, 25 Nov 2024 13:12:05 +0000 (14:12 +0100)]

sysdeps: linux: Fix output of LD_SHOW_AUXV=1 for AT_RSEQ_*

The constants themselves were added to elf.h back in 8754a4133e but the
array in _dl_show_auxv wasn't modified accordingly, resulting in the
following output when running LD_SHOW_AUXV=1 /bin/true on recent Linux:

    AT_??? (0x1b): 0x1c
    AT_??? (0x1c): 0x20

With this patch:

    AT_RSEQ_FEATURE_SIZE: 28
    AT_RSEQ_ALIGN:        32

Tested on Linux 6.11 x86_64

Signed-off-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Florian Weimer [Mon, 25 Nov 2024 16:32:54 +0000 (17:32 +0100)]

debug: Wire up tst-longjmp_chk3

The test was added in commit ac8cc9e300a002228eb7e660df3e7b333d9a7414
without all the required Makefile scaffolding. Tweak the test
so that it actually builds (including with dynamic SIGSTKSZ).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Michael Jeanson [Wed, 20 Nov 2024 19:15:42 +0000 (14:15 -0500)]

nptl: initialize cpu_id_start prior to rseq registration

When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.

While current kernels don't validate the content of this field on
registration, future ones could.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Adhemerval Zanella [Mon, 25 Nov 2024 16:37:50 +0000 (13:37 -0300)]

math: Fix branch hint for 68d7128942

commit | commitdiff | tree

Sachin Monga [Mon, 25 Nov 2024 15:17:30 +0000 (10:17 -0500)]

powerpc64le: ROP Changes for strncpy/ppc-mount

Add ROP protect instructions to strncpy and ppc-mount functions.
Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional
16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

commit | commitdiff | tree

Vincent Lefevre [Fri, 22 Nov 2024 16:54:53 +0000 (13:54 -0300)]

math: Fix non-portability in the computation of signgam in lgammaf

The k>>31 in signgam = 1 - (((k&(k>>31))&1)<<1); is not portable:

* The ISO C standard says "If E1 has a signed type and a negative
  value, the resulting value is implementation-defined." (this is
  still in C23).
* If the int type is larger than 32 bits (e.g. a 64-bit type),
  then k = INT_MAX; line 144 will make k>>31 put 1 in bit 0
  (thus signgam will be -1) while 0 is expected.

Moreover, instead of the fx >= 0x1p31f condition, testing fx >= 0
is probably better for 2 reasons:

The signgam expression has more or less a condition on the sign
of fx (the goal of k>>31, which can be dropped with this new
condition). Since fx ≥ 0 should be the most common case, one can
get signgam directly in this case (value 1). And this simplifies
the expression for the other case (fx < 0).

This new condition may be easier/faster to test on the processor
(e.g. by avoiding a load of a constant from the memory).

This is commit d41459c731865516318f813cf4c966dafa0eecbf from CORE-MATH.

Checked on x86_64-linux-gnu.

commit | commitdiff | tree

Wangyang Guo [Thu, 29 Aug 2024 06:27:28 +0000 (14:27 +0800)]

malloc: Split _int_free() into 3 sub functions

Split _int_free() into 3 smaller functions for flexible combination:
* _int_free_check -- sanity check for free
* tcache_free -- free memory to tcache (quick path)
* _int_free_chunk -- free memory chunk (slow path)

commit | commitdiff | tree

Samuel Thibault [Sun, 24 Nov 2024 23:54:26 +0000 (00:54 +0100)]

hurd: Add MAP_NORESERVE mmap flag

This is already the current default behavior, which we will change with
overcommit support addition.

commit | commitdiff | tree

Siddhesh Poyarekar [Thu, 21 Nov 2024 22:13:33 +0000 (17:13 -0500)]

nptl: Add smoke test for pthread_getcpuclockid failure

Exercise the case where an exited thread will cause
pthread_getcpuclockid to fail.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

commit | commitdiff | tree

Joseph Myers [Fri, 22 Nov 2024 16:58:51 +0000 (16:58 +0000)]

Add multithreaded test of sem_getvalue

Test coverage of sem_getvalue is fairly limited. Add a test that runs
it on threads on each CPU. For this purpose I adapted
tst-skeleton-thread-affinity.c; it didn't seem very suitable to use
as-is or include directly in a different test doing things per-CPU,
but did seem a suitable starting point (thus sharing
tst-skeleton-affinity.c) for such testing.

Tested for x86_64.

commit | commitdiff | tree

Adhemerval Zanella [Fri, 8 Nov 2024 16:24:28 +0000 (13:24 -0300)]

math: Use tanf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Wed, 30 Oct 2024 14:50:03 +0000 (11:50 -0300)]

math: Use lgammaf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic lgammaf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, to use math_narrow_eval
on overflow usage, and to adapt to make it reentrant.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       86.5609       70.3278       18.75%
x86_64v2                     78.3030       69.9709       10.64%
x86_64v3                     74.7470       59.8457       19.94%
i686                         387.355       229.761       40.68%
aarch64                      40.8341       33.7563       17.33%
power10                      26.5520       16.1672       39.11%
powerpc                      28.3145       17.0625       39.74%

reciprocal-throughput         master       patched  improvement
x86_64                       68.0461       48.3098       29.00%
x86_64v2                     55.3256       47.2476       14.60%
x86_64v3                     52.3015       38.9028       25.62%
i686                         340.848       195.707       42.58%
aarch64                      36.8000       30.5234       17.06%
power10                      20.4043       12.6268       38.12%
powerpc                      22.6588       13.8866       38.71%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Tue, 29 Oct 2024 13:02:20 +0000 (10:02 -0300)]

math: Use erfcf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erfcf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       98.8796       66.2142       33.04%
x86_64v2                     98.9617       67.4221       31.87%
x86_64v3                     87.4161       53.1754       39.17%
aarch64                      33.8336       22.0781       34.75%
power10                      21.1750       13.5864       35.84%
powerpc                      21.4694       13.8149       35.65%

reciprocal-throughput         master       patched  improvement
x86_64                       48.5620       27.6731       43.01%
x86_64v2                     47.9497       28.3804       40.81%
x86_64v3                     42.0255       18.1355       56.85%
aarch64                      24.3938       13.4041       45.05%
power10                      10.4919        6.1881       41.02%
powerpc                       11.763       6.76468       42.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Mon, 28 Oct 2024 20:58:18 +0000 (17:58 -0300)]

math: Use erff from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erff.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       85.7363       45.1372       47.35%
x86_64v2                     86.6337       38.5816       55.47%
x86_64v3                     71.3810       34.0843       52.25%
i686                         190.143       97.5014       48.72%
aarch64                      34.9091       14.9320       57.23%
power10                      38.6160        8.5188       77.94%
powerpc                      39.7446       8.45781       78.72%

reciprocal-throughput         master       patched  improvement
x86_64                       35.1739       14.7603       58.04%
x86_64v2                     34.5976       11.2283       67.55%
x86_64v3                     27.3260        9.8550       63.94%
i686                         91.0282       30.8840       66.07%
aarch64                      22.5831        6.9615       69.17%
power10                      18.0386        3.0918       82.86%
powerpc                      20.7277       3.63396       82.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Mon, 28 Oct 2024 20:02:01 +0000 (17:02 -0300)]

math: Split s_erfF in erff and erfc

So we can eventually replace each implementation.

Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Mon, 28 Oct 2024 15:38:50 +0000 (12:38 -0300)]

math: Use cbrtf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic cbrtf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master        patched       improvement
x86_64                       68.6348        36.8908            46.25%
x86_64v2                     67.3418        36.6968            45.51%
x86_64v3                     63.4981        32.7859            48.37%
aarch64                      29.3172        12.1496            58.56%
power10                      18.0845         8.8893            50.85%
powerpc                      18.0859        8.79527            51.37%

reciprocal-throughput         master        patched       improvement
x86_64                       36.4369        13.3565            63.34%
x86_64v2                     37.3611        13.1149            64.90%
x86_64v3                     31.6024        11.2102            64.53%
aarch64                      18.6866        7.3474             60.68%
power10                       9.4758        3.6329             61.66%
powerpc                      9.58896        3.90439            59.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Adhemerval Zanella [Fri, 8 Nov 2024 09:13:50 +0000 (09:13 +0000)]

benchtests: Add tanf benchmark

Random inputs in [-pi, pi].

Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Tue, 29 Oct 2024 16:40:29 +0000 (13:40 -0300)]

benchtests: Add lgammaf benchmark

Random inputs in the range [-20.0,20.0].

Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Tue, 29 Oct 2024 12:28:01 +0000 (09:28 -0300)]

benchtests: Add erfcf benchmark

It is based on binary64 erfc-inputs, with random inputs in
[0,b=0x1.41bbf6p+3] where b in the smallest number such that
erfcf(b) rounds to 0 (to nearest).

Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Mon, 28 Oct 2024 18:53:30 +0000 (15:53 -0300)]

benchtests: Add erff benchmark

It is based on binary64 erf-inputs, with random inputs in [0,b=0x1.f5a888p+1]
where b in the smallest number such that erff(b) rounds to 1 (to nearest).

Reviewed-by: DJ Delorie <dj@redhat.com>

commit | commitdiff | tree

Adhemerval Zanella [Mon, 28 Oct 2024 13:00:49 +0000 (10:00 -0300)]

benchtests: Add cbrtf benchmark

Based on binary64 benchtests, with random inputs in [1,8].

commit | commitdiff | tree

H.J. Lu [Mon, 28 Oct 2024 22:01:14 +0000 (06:01 +0800)]

elf: Handle static PIE with non-zero load address [BZ #31799]

For a static PIE with non-zero load address, its PT_DYNAMIC segment
entries contain the relocated values for the load address in static PIE.
Since static PIE usually doesn't have PT_PHDR segment, use p_vaddr of
the PT_LOAD segment with offset == 0 as the load address in static PIE
and adjust the entries of PT_DYNAMIC segment in static PIE by properly
setting the l_addr field for static PIE. This fixes BZ #31799.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>

commit | commitdiff | tree

Siddhesh Poyarekar [Thu, 21 Nov 2024 22:05:11 +0000 (17:05 -0500)]

x86/string: Use `movsl` instead of `movsd` in strncat [BZ #32344]

The previous patch missed strncat, so fixed that.

Resolves: BZ #32344

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

commit | commitdiff | tree

Florian Weimer [Thu, 21 Nov 2024 20:10:52 +0000 (21:10 +0100)]

stdlib: Make getenv thread-safe in more cases

Async-signal-safety is preserved, too.  In fact, getenv is fully
reentrant and can be called from the malloc call in setenv
(if a replacement malloc uses getenv during its initialization).

This is relatively easy to implement because even before this change,
setenv, unsetenv, clearenv, putenv do not deallocate the environment
strings themselves as they are removed from the environment.

The main changes are:

* Use release stores for environment array updates, following
  the usual pattern for safely publishing immutable data
  (in this case, the environment strings).

* Do not deallocate the environment array.  Instead, keep older
  versions around and adopt an  exponential resizing policy.  This
  results in an amortized constant space leak per active environment
  variable, but there already is such a leak for the variable itself
  (and that is even length-dependent, and includes no-longer used
  values).

* Add a seqlock-like mechanism to retry getenv if a concurrent
  unsetenv is observed.  Without that, it is possible that
  getenv returns NULL for a variable that is never unset.  This
  is visible on some AArch64 implementations with the newly
  added stdlib/tst-getenv-unsetenv test case.  The mechanism
  is not a pure seqlock because it tolerates one write from
  unsetenv.  This avoids the need for a second copy of the
  environ array that getenv can read from a signal handler
  that happens to interrupt an unsetenv call.

No manual updates are included with this patch because environ
usage with execve, posix_spawn, system is still not thread-safe
relative unsetenv.  The new process may end up with an environment
that misses entries that were never unset.  This is the same issue
described above for getenv.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Andrew Pinski [Fri, 15 Nov 2024 03:03:20 +0000 (19:03 -0800)]

aarch64: Remove non-temporal load/stores from oryon-1's memset

The hardware architects have a new recommendation not to use
non-temporal load/stores for memset. This patch removes this path.
I found there was no difference in the memset speed with/without
non-temporal load/stores either.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Andrew Pinski [Fri, 15 Nov 2024 03:03:19 +0000 (19:03 -0800)]

aarch64: Remove non-temporal load/stores from oryon-1's memcpy

The hardware architects have a new recommendation not to use
non-temporal load/stores for memcpy. This patch removes this path.
I found there was no difference in the memcpy speed with/without
non-temporal load/stores either.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Sachin Monga [Wed, 20 Nov 2024 21:50:00 +0000 (16:50 -0500)]

powerpc64le: _init/_fini file changes for ROP

The ROP instructions were added in ISA 3.1 (ie, Power10), however they
were defined so that if executed on older cpus, they would behave as
nops. This allows us to emit them on older cpus and they'd just be
ignored, but if run on a Power10, then the binary would be ROP protected.

Hash instructions use negative offsets so the default position
of ROP pointer is FRAME_ROP_SAVE from caller's SP.

Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve
additional 16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

commit | commitdiff | tree

Samuel Thibault [Wed, 20 Nov 2024 18:51:08 +0000 (19:51 +0100)]

mman.h: Fix MAP_HASSEMPHORE typo

BSD's MAP_HASSEMAPHORE is with an A. MAP_HASSEMPHORE is not used in any
Debian software for instance.

commit | commitdiff | tree

Andreas Schwab [Wed, 20 Nov 2024 12:15:44 +0000 (13:15 +0100)]

misc: remove extra va_end in error_tail (bug 32233)

This is an addendum to commit b7b52b9dec ("error, error_at_line: Add
missing va_end calls"), which added the va_end calls in the callers where
they belong.

commit | commitdiff | tree

Andreas Schwab [Wed, 20 Nov 2024 09:01:29 +0000 (10:01 +0100)]

intl: avoid alloca for arbitrary sizes (bug 32380)

Use malloc for the copy of the domain name and the category value, which
can both be of arbitrary size.

commit | commitdiff | tree

Yury Khrustalev [Wed, 20 Nov 2024 11:20:33 +0000 (11:20 +0000)]

manual: Add description of AArch64-specific pkey flags

Describe AArch64 specific flags PKEY_DISABLE_READ and PKEY_DISABLE_EXECUTE that
are available on AArch64 systems with enabled Stage 1 permission overlays
feature introduced in Armv8.9 / 9.4 (FEAT_S1POE).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Yury Khrustalev [Wed, 20 Nov 2024 11:16:36 +0000 (11:16 +0000)]

AArch64: Add support for memory protection keys

This patch adds support for memory protection keys on AArch64 systems with
enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4
(FEAT_S1POE) [1].

1. Internal functions "pkey_read" and "pkey_write" to access data
    associated with memory protection keys.
2. Implementation of API functions "pkey_get" and "pkey_set" for
    the AArch64 target.
3. AArch64-specific PKEY flags for READ and EXECUTE (see below).
4. New target-specific test that checks behaviour of pkeys on
    AArch64 targets.
5. This patch also extends existing generic test for pkeys.
6. HWCAP constant for Permission Overlay Extension feature.

To support more accurate mapping of underlying permissions to the
PKEY flags, we introduce additional AArch64-specific flags. The full
list of flags is:

- PKEY_UNRESTRICTED: 0x0 (for completeness)
- PKEY_DISABLE_ACCESS: 0x1 (existing flag)
- PKEY_DISABLE_WRITE: 0x2 (existing flag)
- PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific)
- PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific)

The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as
it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ.
For this reason mapping between permission bits RWX and "restrictions"
bits awxr (a for disable access, etc) becomes complicated:

- PKEY_DISABLE_ACCESS disables both R and W
- PKEY_DISABLE_{WRITE,READ} disables W and R respectively
- PKEY_DISABLE_EXECUTE disables X

Combinations like the one below are accepted although they are redundant:

- PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ | PKEY_DISABLE_WRITE

Reverse mapping tries to retain backward compatibility and ORs
PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and
PKEY_DISABLE_WRITE would be present.

This will break code that compares pkey_get output with == instead
of using bitwise operations. The latter is more correct since PKEY_*
constants are essentially bit flags.

It should be noted that PKEY_DISABLE_ACCESS does not prevent execution.

[1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4

Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

commit | commitdiff | tree

Andrew Pinski [Wed, 20 Nov 2024 11:08:53 +0000 (11:08 +0000)]

AArch64: Remove thunderx{,2} memcpy

ThunderX1 and ThunderX2 have been retired for a few years now.
So let's remove the thunderx{,2} specific versions of memcpy.
The performance gain or them was for medium and large sizes
while the generic (aarch64) memcpy will handle just slightly worse.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

commit | commitdiff | tree

Joseph Myers [Tue, 19 Nov 2024 22:25:39 +0000 (22:25 +0000)]

Fix femode_t conditionals for arc and or1k

Two of the architecture bits/fenv.h headers define femode_t if
__GLIBC_USE (IEC_60559_BFP_EXT), instead of the correct condition
__GLIBC_USE (IEC_60559_BFP_EXT_C23) (both were added after commit
0175c9e9be5f0b2000859666b6e1ef3696f1123b, but were probably first
developed before it and then not updated to take account of its
changes). This results in failures of the installed headers check for
fenv.h when building with GCC 15 (defaults to -std=gnu23 - we don't
yet have an installed-headers test specifically for C23 mode and don't
yet require a compiler with such a mode for building glibc) together
with a combination of options leaving C23 features enabled, since the
declarations of functions using femode_t use the correct conditions;
see
<https://sourceware.org/pipermail/libc-testresults/2024q4/013163.html>.
Fix the conditionals to get <fenv.h> to work correctly in C23 mode
again.

Tested with build-many-glibcs.py (arc-linux-gnu, arch-linux-gnuhf,
or1k-linux-gnu-hard, or1k-linux-gnu-soft).

commit | commitdiff | tree

Mahesh Bodapati [Tue, 19 Nov 2024 20:57:35 +0000 (15:57 -0500)]

powerpc64le: Optimized strcat for POWER10

This patch adds an optimized strcat which makes use of the default
strcat function which calls the Power10 strcpy and strlen routines.

commit | commitdiff | tree

Peter Bergner [Tue, 5 Nov 2024 22:05:53 +0000 (16:05 -0600)]

powerpc: Improve the inline asm for syscall wrappers

Update the inline asm syscall wrappers to match the newer register constraint
usage in INTERNAL_VSYSCALL_CALL_TYPE. Use the faster mfocrf instruction when
available, rather than the slower mfcr microcoded instruction.

commit | commitdiff | tree

gfleury [Mon, 18 Nov 2024 11:21:45 +0000 (13:21 +0200)]

htl: move pthread_attr_init into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

A mirror of the official glibc repository

RSS Atom