In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always
to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0:
/* Ensure that the stepped sequence always selects from the same
input pattern. */
unsigned arg_npatterns
= ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
: VECTOR_CST_NPATTERNS (arg1);
resulting in wrong code-gen issues.
The patch fixes this by changing the condition to (q1 & 1) == 0.
gcc/ChangeLog:
PR tree-optimization/111048
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns
correctly.
(fold_vec_perm_cst): Remove workaround and again call
valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors.
(test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.