vect: Reduce group size of consecutive strided accesses.
Consecutive load permutations like {0, 1, 2, 3} or {4, 5, 6, 7} in a
group of 8 only read a part of the group, leaving a gap.
For strided accesses we can elide the permutation and, instead of
accessing the whole group, use the number of SLP lanes. This
effectively increases the vector size as we don't load gaps. On top we
do not need to emit the permutes at all.
gcc/ChangeLog:
* tree-vect-slp.cc (vect_load_perm_consecutive_p): New function.
(vect_lower_load_permutations): Use.
(vect_optimize_slp_pass::remove_redundant_permutations): Use.
* tree-vect-stmts.cc (has_consecutive_load_permutation): New
function that uses vect_load_perm_consecutive_p.
(get_load_store_type): Use.
(vectorizable_load): Reduce group size.
* tree-vectorizer.h (struct vect_load_store_data): Add
subchain_p.
(vect_load_perm_consecutive_p): Declare.