In vect_analyze_slp_reduction, the early bail "if (*limit == 0) return
false" blocked all SLP discovery including the single-lane fallback path.
However, single-lane SLP trees (group_size == 1) do not consume the
discovery limit as they cannot cause exponential tree growth.
This causes vectorization failures in loops with many independent
conditional reductions: multi-lane grouping attempts exhaust the limit,
then the single-lane fallback that would have succeeded is incorrectly
rejected.
The fix moves the limit check to only guard chain analysis (which builds
multi-lane trees and does consume limit), allowing the single-lane
fallback to always proceed.
This improves 731.astcenc_r (-Ofast) by 3.8% on EMR and 1.4% on Znver5 with single-copy.
gcc/ChangeLog:
* tree-vect-slp.cc (vect_analyze_slp_reduction): Don't bail out
early when SLP discovery limit is exhausted; only guard the chain
analysis which may build multi-lane trees. Single-lane fallback
does not consume limit and should always be attempted.
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
{
slp_instance_kind kind = slp_inst_kind_reduc_group;
- /* If there's no budget left bail out early. */
- if (*limit == 0)
- return false;
-
- /* Try to gather a reduction chain. */
+ /* Try to gather a reduction chain. Only attempt if there's budget left
+ since chain analysis may build multi-lane trees that consume limit. */
if (! force_single_lane
+ && *limit != 0
&& STMT_VINFO_DEF_TYPE (scalar_stmt) == vect_reduction_def
&& vect_analyze_slp_reduc_chain (vinfo, bst_map, scalar_stmt,
max_tree_size, limit))