git.ipfire.org Git - thirdparty/gcc.git/commit

author	Kyrylo Tkachov <ktkachov@nvidia.com>
	Mon, 5 Aug 2024 18:29:44 +0000 (11:29 -0700)
committer	Kyrylo Tkachov <ktkachov@nvidia.com>
	Mon, 12 Aug 2024 09:41:04 +0000 (11:41 +0200)
commit	fcc766c82cf8e0473ba54f1660c8282a7ce3231c
tree	47efffe04c8e7d64e763367d485c190e4956c95f	tree
parent	8d8db21eb726b785782f4a41ad85a0d4be63068a	commit \| diff

aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for Advanced SIMD

On many cores, including Neoverse V2 the throughput of vector ADD
instructions is higher than vector shifts like SHL.  We can lean on that
to emit code like:
  add     v0.4s, v0.4s, v0.4s
instead of:
  shl     v0.4s, v0.4s, 1

LLVM already does this trick.
In RTL the code gets canonincalised from (plus x x) to (ashift x 1) so I
opted to instead do this at the final assembly printing stage, similar
to how we emit CMLT instead of SSHR elsewhere in the backend.

I'd like to also do this for SVE shifts, but those will have to be
separate patches.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_simd_imm_shl<mode><vczle><vczbe>): Rewrite to new
syntax.  Add =w,w,vs1 alternative.
* config/aarch64/constraints.md (vs1): New constraint.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd_shl_add.c: New test.

gcc/config/aarch64/aarch64-simd.md		diff \| blob \| blame \| history
gcc/config/aarch64/constraints.md		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/advsimd_shl_add.c	[new file with mode: 0644]	blob