]> git.ipfire.org Git - thirdparty/glibc.git/commit
x86: Improve vec generation in memset-vec-unaligned-erms.S
authorNoah Goldstein <goldstein.w.n@gmail.com>
Sun, 6 Feb 2022 06:54:18 +0000 (00:54 -0600)
committerSunil K Pandey <skpgkp2@gmail.com>
Thu, 5 May 2022 16:19:16 +0000 (09:19 -0700)
commitef264d262b0cee60bf1b85fb898b4ab5d0ae8288
treeafeac136bfd288d59f09d0b5a95611fff963951a
parent53f4ff9ff4af14f77ac9b28ba6e15bf811043793
x86: Improve vec generation in memset-vec-unaligned-erms.S

No bug.

Split vec generation into multiple steps. This allows the
broadcast in AVX2 to use 'xmm' registers for the L(less_vec)
case. This saves an expensive lane-cross instruction and removes
the need for 'vzeroupper'.

For SSE2 replace 2x 'punpck' instructions with zero-idiom 'pxor' for
byte broadcast.

Results for memset-avx2 small (geomean of N = 20 benchset runs).

size, New Time, Old Time, New / Old
   0,    4.100,    3.831,     0.934
   1,    5.074,    4.399,     0.867
   2,    4.433,    4.411,     0.995
   4,    4.487,    4.415,     0.984
   8,    4.454,    4.396,     0.987
  16,    4.502,    4.443,     0.987

All relevant string/wcsmbs tests are passing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit b62ace2740a106222e124cc86956448fa07abf4d)
sysdeps/x86_64/memset.S
sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S