]> git.ipfire.org Git - thirdparty/glibc.git/commit
x86_64: Optimize modf/modff for x86_64-v2
authorAdhemerval Zanella <adhemerval.zanella@linaro.org>
Mon, 16 Jun 2025 13:17:37 +0000 (10:17 -0300)
committerAdhemerval Zanella <adhemerval.zanella@linaro.org>
Fri, 11 Jul 2025 16:01:31 +0000 (13:01 -0300)
commitc055c54e960579619304c7fb998e6bc12e82c5bd
treec4db98a12d980896de92f1645478f883093721da
parent3d3572f59059e2b19b8541ea648a6172136ec42e
x86_64: Optimize modf/modff for x86_64-v2

The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size.  On Ryzen 9 (zen3) with
gcc 14.2.1:

x86_64-v2
reciprocal-throughput        master        patch       difference
workload-0_1                 7.9610       7.7914            2.13%
workload-1_maxint            9.4323       7.8021           17.28%
workload-maxint_maxfloat     8.7379       7.8049           10.68%
workload-integral            7.9492       7.7991            1.89%

latency                      master        patch       difference
workload-0_1                 7.9511      10.8910          -36.97%
workload-1_maxint           15.8278      10.9048           31.10%
workload-maxint_maxfloat    11.3495      10.9139            3.84%
workload-integral           11.5938      10.9071            5.92%

x86_64-v3
reciprocal-throughput        master        patch       difference
workload-0_1                 8.7522       7.9781            8.84%
workload-1_maxint            9.6690       7.9872           17.39%
workload-maxint_maxfloat     8.7634       7.9857            8.87%
workload-integral            8.7397       7.9893            8.59%

latency                      master        patch       difference
workload-0_1                 8.7447       9.5589           -9.31%
workload-1_maxint           13.7480       9.5690           30.40%
workload-maxint_maxfloat    10.0092       9.5680            4.41%
workload-integral            9.7518       9.5743            1.82%

For x86_64-v1 the optimization is done through a new ifunc selector.
The avx is to follow other SSE4_1 optimization (like trunc) to avoid
the ifunc for x86_64-v3.

Checked on x86_64-linux-gnu.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
sysdeps/x86_64/fpu/math-use-builtins-trunc.h [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/Makefile
sysdeps/x86_64/fpu/multiarch/s_modf-avx.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modf-c.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modf-sse4_1.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modf.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modff-avx.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modff-c.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modff-sse4_1.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_modff.c [new file with mode: 0644]