]> git.ipfire.org Git - thirdparty/gcc.git/commit
aarch64: Use SVE ASRD instruction with Neon modes.
authorSoumya AR <soumyaa@nvidia.com>
Wed, 11 Dec 2024 04:15:09 +0000 (09:45 +0530)
committerSoumya AR <soumyaa@nvidia.com>
Wed, 11 Dec 2024 04:20:02 +0000 (09:50 +0530)
commite5569a20cf3791553ac324269001a7c7c0e56242
treea45d51879a5c0cf6d700730a1cbb056cec97f3bc
parent65b7c8db9c61bcdfd07a3404047dd2d2beac4bbb
aarch64: Use SVE ASRD instruction with Neon modes.

The ASRD instruction on SVE performs an arithmetic shift right by an immediate
for divide.

This patch enables the use of ASRD with Neon modes.

For example:

int in[N], out[N];

void
foo (void)
{
  for (int i = 0; i < N; i++)
    out[i] = in[i] / 4;
}

compiles to:

ldr q31, [x1, x0]
cmlt v30.16b, v31.16b, #0
and z30.b, z30.b, 3
add v30.16b, v30.16b, v31.16b
sshr v30.16b, v30.16b, 2
str q30, [x0, x2]
add x0, x0, 16
cmp x0, 1024

but can just be:

ldp q30, q31, [x0], 32
asrd z31.b, p7/m, z31.b, #2
asrd z30.b, p7/m, z30.b, #2
stp q30, q31, [x1], 32
cmp x0, x2

This patch also adds the following overload:
aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode)
Depending on the data mode, the function returns a predicate with the
appropriate bits set.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_ptrue_reg): New overload.
* config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Likewise.
* config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3
and *sdiv_pow2<mode>3 to support Neon modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/sve-asrd.c: New test.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/config/aarch64/aarch64-protos.h
gcc/config/aarch64/aarch64-sve.md
gcc/config/aarch64/aarch64.cc
gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c [new file with mode: 0644]