]> git.ipfire.org Git - thirdparty/glibc.git/commit
x86_64: Optimize large size copy in memmove-ssse3
authorMayShao-oc <MayShao-oc@zhaoxin.com>
Sat, 29 Jun 2024 03:58:27 +0000 (11:58 +0800)
committerH.J. Lu <hjl.tools@gmail.com>
Sun, 30 Jun 2024 13:26:43 +0000 (06:26 -0700)
commitc19457aec67da28a3f78badef53556cd55640a6e
tree41d5357bb9426f975e15ad88e30196c35e0cfc49
parent44d757eb9f4484dbc3aa32042ab64cdf9374e093
x86_64: Optimize large size copy in memmove-ssse3

This patch optimizes large size copy using normal store when src > dst
and overlap.  Make it the same as the logic in memmove-vec-unaligned-erms.S.

Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non-
temporal threshold, this patch updates that value to
'__x86_shared_non_temporal_threshold'.  Currently, the
__x86_shared_non_temporal_threshold is cpu-specific, and different CPUs
will have different values based on the related nt-benchmark results.
However, in memmove-ssse3, the nontemporal threshold uses
'__x86_shared_cache_size_half', which sounds unreasonable.

The performance is not changed drastically although shows overall
improvements without any major regressions or gains.

Results on Zhaoxin KX-7000:
bench-memcpy geometric_mean(N=20) New / Original: 0.999

bench-memcpy-random geometric_mean(N=20) New / Original: 0.999

bench-memcpy-large geometric_mean(N=20) New / Original: 0.978

bench-memmove geometric_mean(N=20) New / Original: 1.000

bench-memmmove-large geometric_mean(N=20) New / Original: 0.962

Results on Intel Core i5-6600K:
bench-memcpy geometric_mean(N=20) New / Original: 1.001

bench-memcpy-random geometric_mean(N=20) New / Original: 0.999

bench-memcpy-large geometric_mean(N=20) New / Original: 1.001

bench-memmove geometric_mean(N=20) New / Original: 0.995

bench-memmmove-large geometric_mean(N=20) New / Original: 0.936
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
sysdeps/x86_64/multiarch/memmove-ssse3.S