git.ipfire.org Git - thirdparty/glibc.git/commit

math: Use erf from CORE-MATH

The current implementation precision shows the following accuracy, on
three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with
10e9 uniform randomly generated numbers for each range (first column
is the accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MIN, -4.2]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-4.2, 4.2]
* FE_TONEAREST
     0:       9764404513  97.64%
     1:        235595487   2.36%
* FE_UPWARD
     0:       9468013928  94.68%
     1:        531986072   5.32%
* FE_DOWNWARD
     0:       9493787693  94.94%
     1:        506212307   5.06%
* FE_TOWARDZERO
     0:       9585271351  95.85%
     1:        414728649   4.15%

* Range [4.2, DBL_MAX]
* FE_TONEAREST
     0:      10000000000 100.00%
* FE_UPWARD
     0:      10000000000 100.00%
* FE_DOWNWARD
     0:      10000000000 100.00%
* FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master       patched   improvement
x86_64                      38.2754       78.0311      -103.87%
x86_64v2                    38.3325       75.7555       -97.63%
x86_64v3                    34.6604       28.3182        18.30%
aarch64                     23.1499       21.4307         7.43%
power10                     12.3051       9.3766         23.80%

Latency                      master       patched   improvement
x86_64                      84.3062      121.3580       -43.95%
x86_64v2                    84.1817      117.4250       -39.49%
x86_64v3                    81.0933       70.6458        12.88%
aarch64                      35.012       29.5012        15.74%
power10                     21.7205       18.4589        15.02%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

author	Adhemerval Zanella <adhemerval.zanella@linaro.org>
	Fri, 10 Oct 2025 18:15:29 +0000 (15:15 -0300)
committer	Adhemerval Zanella <adhemerval.zanella@linaro.org>
	Mon, 27 Oct 2025 12:34:04 +0000 (09:34 -0300)
commit	72a48e45bdcc68decb3d7cd281f1262e0af817ff
tree	65a76b84bc98f61a727a46a666da42b7ec3655a4	tree \| snapshot
parent	1cae0550e8e0024b348d6962827d47f2db5df475	commit \| diff

SHARED-FILES		diff \| blob \| blame \| history
math/auto-libm-test-in		diff \| blob \| blame \| history
math/auto-libm-test-out-erf		diff \| blob \| blame \| history
sysdeps/i386/Makefile		diff \| blob \| blame \| history
sysdeps/ieee754/dbl-64/libm-test-ulps		diff \| blob \| blame \| history
sysdeps/ieee754/dbl-64/s_erf.c		diff \| blob \| blame \| history