middle-end: Add new parameter to scale scalar loop costing in vectorizer

author Tamar Christina <tamar.christina@arm.com>

Mon, 9 Jun 2025 06:03:27 +0000 (07:03 +0100)

committer Tamar Christina <tamar.christina@arm.com>

Mon, 9 Jun 2025 06:05:04 +0000 (07:05 +0100)
author Tamar Christina <tamar.christina@arm.com>
Mon, 9 Jun 2025 06:03:27 +0000 (07:03 +0100)
committer Tamar Christina <tamar.christina@arm.com>
Mon, 9 Jun 2025 06:05:04 +0000 (07:05 +0100)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

index 189a52b9b9590c18188208e612a09f17383147c3..17929b3cf15031574b7ec22d9866db493669e35d 100644 (file)
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17325,6 +17325,10 @@ this parameter.  The default value of this parameter is 50.
  @item vect-induction-float
  Enable loop vectorization of floating point inductions.
  
+@item vect-scalar-cost-multiplier
+Apply the given multiplier % to scalar loop costing during vectorization.
+Increasing the cost multiplier will make vector loops more profitable.
+
  @item vrp-block-limit
  Maximum number of basic blocks before VRP switches to a lower memory algorithm.
  
diff --git a/gcc/params.opt b/gcc/params.opt

index 1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367 100644 (file)
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
  Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
  Enable loop vectorization of floating point inductions.
  
+-param=vect-scalar-cost-multiplier=
+Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100) IntegerRange(0, 10000) Param Optimization
+The scaling multiplier as a percentage to apply to all scalar loop costing when performing vectorization profitability analysis.  The default value is 100.
+
  -param=vrp-block-limit=
  Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
  Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c

new file mode 100644 (file)

index 0000000..bfe49ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8-a+sve --param vect-scalar-cost-multiplier=1000 -fdump-tree-vect-details" } */
+
+void
+foo (char *restrict a, int *restrict b, int *restrict c,
+     int *restrict d, int stride)
+{
+    if (stride <= 1)
+        return;
+
+    for (int i = 0; i < 3; i++)
+        {
+            int res = c[i];
+            int t = b[i * stride];
+            if (a[i] != 0)
+                res = t * d[i];
+            c[i] = res;
+        }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc

index 8c5761d3c55d52c95829f98e4215ae80cebe85d0..9ac4d7e5f7a099a7039cd4186666cf64328b8ee6 100644 (file)
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
       TODO: Consider assigning different costs to different scalar
       statements.  */
  
-  scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
+  scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost ()
+                            * param_vect_scalar_cost_multiplier) / 100;
  
    /* Add additional cost for the peeled instructions in prologue and epilogue
       loop.  (For fully-masked loops there will be no peeling.)
author	Tamar Christina <tamar.christina@arm.com>
	Mon, 9 Jun 2025 06:03:27 +0000 (07:03 +0100)
committer	Tamar Christina <tamar.christina@arm.com>
	Mon, 9 Jun 2025 06:05:04 +0000 (07:05 +0100)
gcc/doc/invoke.texi		patch \| blob \| blame \| history
gcc/params.opt		patch \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c	[new file with mode: 0644]	patch \| blob
gcc/tree-vect-loop.cc		patch \| blob \| blame \| history