git.ipfire.org Git - thirdparty/gcc.git/commit

aarch64: Take into account when VF is higher than known scalar iters

Consider low overhead loops like:

void
foo (char *restrict a, int *restrict b, int *restrict c, int n)
{
  for (int i = 0; i < 9; i++)
    {
      int res = c[i];
      int t = b[i];
      if (a[i] != 0)
        res = t;
      c[i] = res;
    }
}

For such loops we use latency only costing since the loop bounds is known and
small.

The current costing however does not consider the case where niters < VF.

So when comparing the scalar vs vector costs it doesn't keep in mind that the
scalar code can't perform VF iterations.  This makes it overestimate the cost
for the scalar loop and we incorrectly vectorize.

This patch takes the minimum of the VF and niters in such cases.
Before the patch we generate:

note:  Original vector body cost = 46
note:  Vector loop iterates at most 1 times
note:  Scalar issue estimate:
note:    load operations = 2
note:    store operations = 1
note:    general operations = 1
note:    reduction latency = 0
note:    estimated min cycles per iteration = 1.000000
note:    estimated cycles per vector iteration (for VF 32) = 32.000000
note:  SVE issue estimate:
note:    load operations = 5
note:    store operations = 4
note:    general operations = 11
note:    predicate operations = 12
note:    reduction latency = 0
note:    estimated min cycles per iteration without predication = 5.500000
note:    estimated min cycles per iteration for predication = 12.000000
note:    estimated min cycles per iteration = 12.000000
note:  Low iteration count, so using pure latency costs
note:  Cost model analysis:

vs after:

note:  Original vector body cost = 46
note:  Known loop bounds, capping VF to 9 for analysis
note:  Vector loop iterates at most 1 times
note:  Scalar issue estimate:
note:    load operations = 2
note:    store operations = 1
note:    general operations = 1
note:    reduction latency = 0
note:    estimated min cycles per iteration = 1.000000
note:    estimated cycles per vector iteration (for VF 9) = 9.000000
note:  SVE issue estimate:
note:    load operations = 5
note:    store operations = 4
note:    general operations = 11
note:    predicate operations = 12
note:    reduction latency = 0
note:    estimated min cycles per iteration without predication = 5.500000
note:    estimated min cycles per iteration for predication = 12.000000
note:    estimated min cycles per iteration = 12.000000
note:  Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations
note:  Low iteration count, so using pure latency costs
note:  Cost model analysis:

gcc/ChangeLog:

* config/aarch64/aarch64.cc (adjust_body_cost):
Cap VF for low iteration loops.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/asrdiv_4.c: Update bounds.
* gcc.target/aarch64/sve/cond_asrd_2.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_6.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_7.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_8.c: Likewise.
* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
* gcc.target/aarch64/sve/spill_6.c: Likewise.
* gcc.target/aarch64/sve/sve_iters_low_1.c: New test.
* gcc.target/aarch64/sve/sve_iters_low_2.c: New test.

author	Tamar Christina <tamar.christina@arm.com>
	Sun, 22 Sep 2024 12:34:10 +0000 (13:34 +0100)
committer	Tamar Christina <tamar.christina@arm.com>
	Sun, 22 Sep 2024 12:34:10 +0000 (13:34 +0100)
commit	e84e5d034124c6733d3b36d8623c56090d4d17f7
tree	68b0498b773f821d9c9a630bab3d4717dc9e4715	tree
parent	673822455ed8b68559e13aef149163294490c69e	commit \| diff

gcc/config/aarch64/aarch64.cc		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/asrdiv_4.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_2.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_6.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_7.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_8.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/miniloop_1.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/spill_6.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/sve_iters_low_1.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/aarch64/sve/sve_iters_low_2.c	[new file with mode: 0644]	blob