From: Richard Biener <rguenther@suse.de>
Date: Tue, 28 Apr 2026 09:00:38 +0000 (+0200)
Subject: [x86] override vector_costs::better_main_loop_than_p
X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=efeeb755192cb03e5b24b44cb4fb563c11626a8e;p=thirdparty%2Fgcc.git

[x86] override vector_costs::better_main_loop_than_p

This overrides vector_costs::better_main_loop_than_p to avoid
regressing gcc.target/i386/vect-partial-vectors-2.c with
--param ix86-vect-compare-costs=1.  As the user (or a tuning model)
asks for masked epilogs the vectorizer considers to mask the
main loop in case it effectively works as a standalone vector epilog
due to known small number of iterations of the loop.  While the
generic cost compare rightfully figures masking of AVX is more expensive
than not masking with SSE it does not consider the cost of the epilog.

This compensates with a x86 specific heuristic that prefers the
masked loop if the loop cannot be vectorized with a non-masked
main loop and at most a single vector epilog plus a single scalar
epilog iteration.  This is a reasonable heuristic for x86 and
a small number of iterations as icache footprint matters here,
so considering the possibility of 3 vector epilogs and 1 scalar
iteration does not look profitable.  Unless testcases will prove
to us otherwise.

I'm not sure if it makes sense to preserve --param ix86-vect-compare-costs=0
in the end, if people think so I'll duplicate the testcase with
both modes explicitly specified.

	* tree-vectorizer.h (vector_costs::vinfo): New accessor.
	* config/i386/i386.cc (ix86_vector_costs::better_main_loop_than_p):
	Prefer a masked main loop if we can elide enough of (vector)
	epilog loop iterations.
---

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index cfec6845c16..b92338bc6dd 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26128,6 +26128,7 @@ public:
 			      tree vectype, int misalign,
 			      vect_cost_model_location where) override;
   void finish_cost (const vector_costs *) override;
+  bool better_main_loop_than_p (const vector_costs *) const override;
 
 private:
 
@@ -26987,6 +26988,28 @@ ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
   vector_costs::finish_cost (scalar_costs);
 }
 
+/* Return true if THIS should be preferred over OTHER as main vector loop.  */
+
+bool
+ix86_vector_costs::better_main_loop_than_p (const vector_costs *other) const
+{
+  loop_vec_info this_loop_vinfo = as_a<loop_vec_info> (this->vinfo ());
+  loop_vec_info other_loop_vinfo = as_a<loop_vec_info> (other->vinfo ());
+
+  /* If the other loop is masked it does not need an epilog.  Prefer that
+     if the current loop cannot be vectorized fully with a vector
+     epilogs with at most one scalar iteration left.  */
+  if (LOOP_VINFO_NITERS_KNOWN_P (this_loop_vinfo)
+      && LOOP_VINFO_USING_PARTIAL_VECTORS_P (other_loop_vinfo)
+      && known_gt (LOOP_VINFO_VECT_FACTOR (other_loop_vinfo),
+		   LOOP_VINFO_INT_NITERS (this_loop_vinfo))
+      && (popcount_hwi (LOOP_VINFO_INT_NITERS (this_loop_vinfo) & ~1)
+	  > (param_vect_epilogues_nomask != 0)))
+    return false;
+
+  return vector_costs::better_main_loop_than_p (other);
+}
+
 /* Validate target specific memory model bits in VAL. */
 
 static unsigned HOST_WIDE_INT
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index c01b17b3ee6..de50ed3277c 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1804,8 +1804,11 @@ public:
   unsigned int epilogue_cost () const;
   unsigned int outside_cost () const;
   unsigned int total_cost () const;
+
   unsigned int suggested_unroll_factor () const;
   machine_mode suggested_epilogue_mode (int &masked) const;
+
+  vec_info *vinfo () const { return m_vinfo; }
   bool costing_for_scalar () const { return m_costing_for_scalar; }
 
 protected: