]> git.ipfire.org Git - thirdparty/gcc.git/commit
AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.
authorJennifer Schmitz <jschmitz@nvidia.com>
Tue, 11 Mar 2025 09:18:46 +0000 (02:18 -0700)
committerJennifer Schmitz <jschmitz@nvidia.com>
Fri, 9 May 2025 07:14:01 +0000 (09:14 +0200)
commit3d7e67ac0d9acc43927c2fb7c358924c84d90f37
tree9249d33fc6a2a82f5c4bf297f77231589f57717b
parent86a7642ef5979ff1cf28f4b3eda73dae8f0e0ef2
AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.

SVE loads and stores where the predicate is all-true can be optimized to
unpredicated instructions. For example,
svuint8_t foo (uint8_t *x)
{
  return svld1 (svptrue_b8 (), x);
}
was compiled to:
foo:
ptrue p3.b, all
ld1b z0.b, p3/z, [x0]
ret
but can be compiled to:
foo:
ldr z0, [x0]
ret

Late_combine2 had already been trying to do this, but was missing the
instruction:
(set (reg/i:VNx16QI 32 v0)
    (unspec:VNx16QI [
            (const_vector:VNx16BI repeat [
                    (const_int 1 [0x1])
                ])
            (mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106])
      [0 MEM <svuint8_t> [(unsigned char *)x_2(D)]+0 S[16, 16] A8])
        ] UNSPEC_PRED_X))

This patch adds a new define_insn_and_split that matches the missing
instruction and splits it to an unpredicated load/store. Because LDR
offers fewer addressing modes than LD1[BHWD], the pattern is
guarded under reload_completed to only apply the transform once the
address modes have been chosen during RA.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve.md (*aarch64_sve_ptrue<mode>_ldr_str):
Add define_insn_and_split to fold predicated SVE loads/stores with
ptrue predicates to unpredicated instructions.

gcc/testsuite/
* gcc.target/aarch64/sve/ptrue_ldr_str.c: New test.
* gcc.target/aarch64/sve/acle/general/attributes_6.c: Adjust
expected outcome.
* gcc.target/aarch64/sve/cost_model_14.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_4.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_5.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_6.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_7.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/peel_ind_2.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_1.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_2.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_3.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_4.c: Adjust expected outcome.
25 files changed:
gcc/config/aarch64/aarch64-sve.md
gcc/testsuite/gcc.target/aarch64/sve/acle/general/attributes_6.c
gcc/testsuite/gcc.target/aarch64/sve/cost_model_14.c
gcc/testsuite/gcc.target/aarch64/sve/cost_model_4.c
gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
gcc/testsuite/gcc.target/aarch64/sve/cost_model_6.c
gcc/testsuite/gcc.target/aarch64/sve/cost_model_7.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_f16.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_f32.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_f64.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_mf8.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_s16.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_s32.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_s64.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_s8.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_u16.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_u32.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_u64.c
gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_u8.c
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_2.c
gcc/testsuite/gcc.target/aarch64/sve/ptrue_ldr_str.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/single_1.c
gcc/testsuite/gcc.target/aarch64/sve/single_2.c
gcc/testsuite/gcc.target/aarch64/sve/single_3.c
gcc/testsuite/gcc.target/aarch64/sve/single_4.c