The following makes sure we are not lowering single-element interleaving
schemes in a way that defeats load vectorizing later but allows the
VMAT_ELEMENTWISE fallback to be used.
PR tree-optimization/120457
* tree-vect-slp.cc (vect_lower_load_permutations): Implement
the same heuristics as load vectorization for single-element
interleaving that spans multiple vectors.
if (!SLP_TREE_CHILDREN (load).is_empty ())
continue;
+ /* For single-element interleaving spanning multiple vectors avoid
+ lowering, we want to use VMAT_ELEMENTWISE later. */
+ if (ld_lanes_lanes == 0
+ && SLP_TREE_LANES (load) == 1
+ && !DR_GROUP_NEXT_ELEMENT (first)
+ && maybe_gt (group_lanes,
+ TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (load))))
+ return;
+
/* We want to pattern-match special cases here and keep those
alone. Candidates are splats and load-lane. */