The categorization of uncounted loops as
LOOP_VINFO_EARLY_BREAKS_VECT_PEELED disables prolog peeling by
default. This is due to the assumption that you have early break
exits following the IV counting main exit. For such loops, prolog
peeling is indeed problematic.
For enabling prolog peeling in uncounted loops it is sufficient, when
duplicating the loop for the prolog, to convert the prolog loop into a
counted loop, inserting a counting IV exit at the end, thus resulting
in the kind of early-break loop already supported by the compiler.
The pre-existing exits will continue to point to the exit node, while
the new exit will point to the vectorized loop, directing control flow
there once the number of iterations required for alignment are
completed.
In order to achieve this, we note that `vect_set_loop_condition'
replaces the condition in the main exit of a counted loop, all the
while inserting the prolog IV and its update statement. The design
strategy is thus:
- Have `slpeel_tree_duplicate_loop_to_edge_cfg' add a dummy main
exit to the loop in the non-exiting branch of the original "main"
exit in the loop, between the condition-containing BB and the latch
BB. For the original exit, if the exit condition is true, the
edge->dest will remain unchanged. The dummy exit will replicate
this control-flow, with the exiting branch of the if statement
initially leading to the same exit BB as the preceding exit.
- As this new basic block will contain the IV-counting exit
condition, its exit edge will be used for the control flow when
alignment is achieved and thus we mark it as the new `new_exit'.
This exit is then used in `redirect_edge_and_branch_force (new_exit,
preheader)' and its basic block passed to `vect_set_loop_condition',
wherein its condition will be replaced accordingly, correctly
completing the setting up of our prolog loop.
- In order to control this new functionality in
slpeel_tree_duplicate_loop_to_edge_cfg we are, however, required to
add a new parameter to the function. This is to be set to true when
we have an uncounted loop AND we're generating its prolog. This is
done via the `bool duplicate_main_e' parameter, defaulting to false,
allowing existing calls to the function to remain unchanged.