]> git.ipfire.org Git - thirdparty/gcc.git/commit
aarch64: Use VNx16BI for more permutations [PR121294]
authorRichard Sandiford <richard.sandiford@arm.com>
Thu, 14 Aug 2025 16:56:50 +0000 (17:56 +0100)
committerRichard Sandiford <richard.sandiford@arm.com>
Thu, 14 Aug 2025 16:56:50 +0000 (17:56 +0100)
commit4cf9d4ebdd68a724eb41044cd8f2a4d466d81c7f
tree65d9e000f75b02c82ea983089791ce486e9c7d49
parentd755aa03db0ad5b71ee7f39b09c92870789f2f00
aarch64: Use VNx16BI for more permutations [PR121294]

The patterns for the predicate forms of svtrn1/2, svuzp1/2,
and svzip1/2 are shared with aarch64_vectorize_vec_perm_const.
The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI
respectively.  Thus, for all four element widths, there is one
significant bit per element, for both the inputs and the output.

That's appropriate for aarch64_vectorize_vec_perm_const but not
for the ACLE intrinsics, where every bit of the output is
significant, and where every bit of the selected input elements
is therefore also significant.  The current expansion can lead
the optimisers to simplify inputs by changing the upper bits
of the input elements (since the current patterns claim that
those bits don't matter), which in turn leads to wrong code.

The ACLE expansion should operate on VNx16BI instead, for all
element widths.

There was already a pattern for a VNx16BI-only form of TRN1, for
constructing certain predicate constants.  The patch generalises it to
handle the other five permutations as well.  For the reasons given in
the comments, this is done by making the permutation unspec an operand
to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing
unspecs, and rather than adding a new unspec for each permutation.

gcc/
PR target/121294
* config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete.
(UNSPEC_PERMUTE_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>):
Replace with...
(@aarch64_sve_<perm_insn><mode>_acle)
(*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns.
* config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn):
Update accordingly.
* config/aarch64/aarch64-sve-builtins-functions.h
(binary_permute::expand): Use the new _acle patterns for
predicate operations.

gcc/testsuite/
PR target/121294
* gcc.target/aarch64/sve/acle/general/perm_2.c: New test.
* gcc.target/aarch64/sve/acle/general/perm_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_4.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_5.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_6.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_7.c: Likewise.
gcc/config/aarch64/aarch64-sve-builtins-functions.h
gcc/config/aarch64/aarch64-sve.md
gcc/config/aarch64/aarch64.cc
gcc/config/aarch64/iterators.md
gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_4.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_7.c [new file with mode: 0644]