]> git.ipfire.org Git - thirdparty/gcc.git/commit
middle-end: Apply loop->unroll directly in vectorizer
authorTamar Christina <tamar.christina@arm.com>
Tue, 24 Jun 2025 06:14:27 +0000 (07:14 +0100)
committerTamar Christina <tamar.christina@arm.com>
Tue, 24 Jun 2025 06:14:27 +0000 (07:14 +0100)
commit7f87bfa4a7302ce663db51fb073a40045052cc11
tree4c894cc3f19ae5aa16cc7b4a52fda708fadaf508
parent309dbcea2cabb31bde1a65cdfd30bb7f87b170a2
middle-end: Apply loop->unroll directly in vectorizer

Consider the loop

void f1 (int *restrict a, int n)
{
#pragma GCC unroll 4 requested
  for (int i = 0; i < n; i++)
    a[i] *= 2;
}

Which today is vectorized and then unrolled 3x by the RTL unroller due to the
use of the pragma.  This is unfortunate because the pragma was intended for the
scalar loop but we end up with an unrolled vector loop and a longer path to the
entry which has a low enough VF requirement to enter.

This patch instead seeds the suggested_unroll_factor with the value the user
requested and instead uses it to maintain the total VF that the user wanted the
scalar loop to maintain.

In effect it applies the unrolling inside the vector loop itself.  This has the
benefits for things like reductions, as it allows us to split the accumulator
and so the unrolled loop is more efficient.  For early-break it allows the
cbranch call to be shared between the unrolled elements, giving you more
effective unrolling because it doesn't need the repeated cbranch which can be
expensive.

The target can then choose to create multiple epilogues to deal with the "rest".

The example above now generates:

.L4:
        ldr     q31, [x2]
        add     v31.4s, v31.4s, v31.4s
        str     q31, [x2], 16
        cmp     x2, x3
        bne     .L4

as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates:

.L4:
        ldp     q30, q31, [x2]
        add     v30.4s, v30.4s, v30.4s
        add     v31.4s, v31.4s, v31.4s
        stp     q30, q31, [x2], 32
        cmp     x3, x2
        bne     .L4

gcc/ChangeLog:

* doc/extend.texi: Document pragma unroll interaction with vectorizer.
* tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New.
(class _loop_vec_info): Add user_unroll.
* tree-vect-loop.cc (vect_analyze_loop_1): Set
suggested_unroll_factor and retry.
(_loop_vec_info::_loop_vec_info): Initialize user_unroll.
(vect_transform_loop): Clear the loop->unroll value if the pragma was
used.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/unroll-vect.c: New test.
gcc/doc/extend.texi
gcc/testsuite/gcc.target/aarch64/unroll-vect.c [new file with mode: 0644]
gcc/tree-vect-loop.cc
gcc/tree-vectorizer.h