AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.
SVE loads and stores where the predicate is all-true can be optimized to
unpredicated instructions. For example,
svuint8_t foo (uint8_t *x)
{
return svld1 (svptrue_b8 (), x);
}
was compiled to:
foo:
ptrue p3.b, all
ld1b z0.b, p3/z, [x0]
ret
but can be compiled to:
foo:
ldr z0, [x0]
ret
Late_combine2 had already been trying to do this, but was missing the
instruction:
(set (reg/i:VNx16QI 32 v0)
(unspec:VNx16QI [
(const_vector:VNx16BI repeat [
(const_int 1 [0x1])
])
(mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106])
[0 MEM <svuint8_t> [(unsigned char *)x_2(D)]+0 S[16, 16] A8])
] UNSPEC_PRED_X))
This patch adds a new define_insn_and_split that matches the missing
instruction and splits it to an unpredicated load/store. Because LDR
offers fewer addressing modes than LD1[BHWD], the pattern is
guarded under reload_completed to only apply the transform once the
address modes have been chosen during RA.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve.md (*aarch64_sve_ptrue<mode>_ldr_str):
Add define_insn_and_split to fold predicated SVE loads/stores with
ptrue predicates to unpredicated instructions.