AArch64: Implement four and eight chunk VLA concats [PR118272]
The following testcase
#pragma GCC target ("+sve")
extern char __attribute__ ((simd, const)) fn3 (int, short);
void test_fn3 (float *a, float *b, double *c, int n)
{
for (int i = 0; i < n; ++i)
a[i] = fn3 (b[i], c[i]);
}
at -Ofast ICEs because my previous patch only added support for combining 2
partial SVE vectors into a bigger vector. However There can also 4 and 8
piece subvectors.
This patch fixes this by implementing the missing expansions.
gcc/ChangeLog:
PR target/96342
PR target/118272
* config/aarch64/aarch64-sve.md (vec_init<mode><Vquad>,
vec_initvnx16qivnx2qi): New.
* config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector):
Rewrite to support any arbitrary combinations.
* config/aarch64/iterators.md (SVE_NO2E): Update to use SVE_NO4E
(SVE_NO2E, Vquad): New.
gcc/testsuite/ChangeLog:
PR target/96342
PR target/118272
* gcc.target/aarch64/vect-simd-clone-3.c: New test.