]> git.ipfire.org Git - thirdparty/gcc.git/commit
match.pd: Fold VEC_PERM_EXPR chains implementing concat-and-extract
authorPengfei Li <Pengfei.Li2@arm.com>
Wed, 22 Oct 2025 11:17:07 +0000 (11:17 +0000)
committerPengfei Li <Pengfei.Li2@arm.com>
Fri, 24 Oct 2025 08:09:04 +0000 (08:09 +0000)
commita46dffee33a3a4bf52eb2d493aace3e77068318f
tree62cd0fc507f342e4aee2ecd278249ae8a82eab49
parent70f66ae8ae833192812c91598348c971e5b21c29
match.pd: Fold VEC_PERM_EXPR chains implementing concat-and-extract

When compiling the following code with SIMDe on AArch64:

__m128i lo = _mm_srli_si128(a, 12);
__m128i hi = _mm_slli_si128(b, 4);
__m128i res = _mm_blend_epi16(hi, lo, 3);

current GCC produces:

mov     v31.4s, 0
ext     v30.16b, v0.16b, v31.16b, #12
ext     v0.16b, v31.16b, v1.16b, #12
ins     v0.s[0], v30.s[0]

instead of the more efficient:

ext     v0.16b, v0.16b, v1.16b, #12

GCC builds three VEC_PERM_EXPRs for the intrinsic calls. The first two
implement vector shifts and the final one implements the blend, but they
use different vector modes. The forward propagation fails to optimize
this case because VIEW_CONVERT_EXPRs in between block the folding.

This patch adds a match.pd pattern to recognize the concat-and-extract
idiom and folds the VEC_PERM_EXPR chain, even when VIEW_CONVERT_EXPRs
split the chain.

Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Fold VEC_PERM_EXPR chains implementing vector
concat-and-extract.

gcc/testsuite/ChangeLog:

* gcc.dg/fold-vecperm-1.c: New test.
gcc/match.pd
gcc/testsuite/gcc.dg/fold-vecperm-1.c [new file with mode: 0644]