]> git.ipfire.org Git - thirdparty/gcc.git/commit
i386: Use Shuffles instead of shifts for Reduction in AMD znver4/5
authorPranav Gorantla <Pranav.Gorantla@amd.com>
Thu, 29 May 2025 13:02:24 +0000 (15:02 +0200)
committerJan Hubicka <hubicka@ucw.cz>
Thu, 29 May 2025 13:03:23 +0000 (15:03 +0200)
commit5080d98a383de244a7b78ae50456fd41881268c2
tree8e5f1088673eb346ac6643dbd4f27d162427e25f
parent6df697847773d21ad8276de38131413aa5c5e3b0
i386: Use Shuffles instead of shifts for Reduction in AMD znver4/5

In AMD znver4, znver5 targets vpshufd, vpsrldq have latencies 1,2 and
throughput 4 (2 for znver4),2 respectively. It is better to generate
shuffles instead of shifts wherever possible. In this patch we try to
generate appropriate shuffle instruction to copy higher half to lower
half instead of a simple right shift during horizontal vector reduction.

gcc/ChangeLog:

* config/i386/i386-expand.cc (emit_reduc_half): Use shuffles to
generate reduc half for V4SI, similar modes.
* config/i386/i386.h (TARGET_SSE_REDUCTION_PREFER_PSHUF): New Macro.
* config/i386/x86-tune.def (X86_TUNE_SSE_REDUCTION_PREFER_PSHUF):
New tuning.

gcc/testsuite/ChangeLog:

* gcc.target/i386/reduc-pshuf.c: New test.
gcc/config/i386/i386-expand.cc
gcc/config/i386/i386.h
gcc/config/i386/x86-tune.def
gcc/testsuite/gcc.target/i386/reduc-pshuf.c [new file with mode: 0644]