]> git.ipfire.org Git - thirdparty/gcc.git/commit
aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions.
authorSoumya AR <soumyaa@nvidia.com>
Tue, 10 Sep 2024 08:48:44 +0000 (14:18 +0530)
committerKyrylo Tkachov <ktkachov@nvidia.com>
Mon, 16 Sep 2024 14:53:45 +0000 (16:53 +0200)
commit4af196b2ebd662c5183f1998b0184985e85479b2
treee478bf17bff134be70537e9c69516d852d4790a4
parentf6e629a7134c6b83be4542b8cd26b7c4483d17f4
aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions.

On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
instructions like SHL have a throughput of 2. We can lean on that to emit code
like:
 add z31.b, z31.b, z31.b
instead of:
 lsl z31.b, z31.b, #1

The implementation of this change for SVE vectors is similar to a prior patch
<https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds
the above functionality for Neon vectors.

Here, the machine descriptor pattern is split up to separately accommodate left
and right shifts, so we can specifically emit an add for all left shifts by 1.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split pattern
to accomodate left and right shifts separately.
(*post_ra_v_ashl<mode>3): Matches left shifts with additional
constraint to check for shifts by 1.
(*post_ra_v_<optab><mode>3): Matches right shifts.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1
with corresponding add.
* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/adr_1.c: Likewise.
* gcc.target/aarch64/sve/adr_6.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
* gcc.target/aarch64/sve/shift_2.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/sve_shl_add.c: New test.
35 files changed:
gcc/config/aarch64/aarch64-sve.md
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s32.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s64.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u32.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u64.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
gcc/testsuite/gcc.target/aarch64/sve/adr_1.c
gcc/testsuite/gcc.target/aarch64/sve/adr_6.c
gcc/testsuite/gcc.target/aarch64/sve/cond_mla_7.c
gcc/testsuite/gcc.target/aarch64/sve/cond_mla_8.c
gcc/testsuite/gcc.target/aarch64/sve/shift_2.c
gcc/testsuite/gcc.target/aarch64/sve/sve_shl_add.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_s16.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_s32.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_s64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_s8.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_u16.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_u32.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_u64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rshl_u8.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c