]> git.ipfire.org Git - thirdparty/gcc.git/commit
middle-end: Fix incorrect codegen with PFA and VLS [PR119351]
authorTamar Christina <tamar.christina@arm.com>
Wed, 16 Apr 2025 12:09:05 +0000 (13:09 +0100)
committerTamar Christina <tamar.christina@arm.com>
Wed, 16 Apr 2025 12:09:05 +0000 (13:09 +0100)
commit46ccce1de686c1b437eff43431dc20d20d4687c0
tree3d6bd3f717931c1669431f7effb17e0cc6ea87d7
parent473dde525248a694c0f4e62b31a7fc24b238c5b0
middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

The following example:

#define N 512
#define START 2
#define END 505

int x[N] __attribute__((aligned(32)));

int __attribute__((noipa))
foo (void)
{
  for (signed int i = START; i < END; ++i)
    {
      if (x[i] == 0)
        return i;
    }
  return -1;
}

generates incorrect code with fixed length SVE because for early break we need
to know which value to start the scalar loop with if we take an early exit.

Historically this means that we take the first element of every induction.
this is because there's an assumption in place, that even with masked loops the
masks come from a whilel* instruction.

As such we reduce using a BIT_FIELD_REF <, 0>.

When PFA was added this assumption was correct for non-masked loop, however we
assumed that PFA for VLA wouldn't work for now, and disabled it using the
alignment requirement checks.  We also expected VLS to PFA using scalar loops.

However as this PR shows, for VLS the vectorizer can, and does in some
circumstances choose to peel using masks by masking the first iteration of the
loop with an additional alignment mask.

When this is done, the first elements of the predicate can be inactive. In this
example element 1 is inactive based on the calculated misalignment.  hence the
-1 value in the first vector IV element.

When we reduce using BIT_FIELD_REF we get the wrong value.

This patch updates it by creating a new scalar PHI that keeps track of whether
we are the first iteration of the loop (with the additional masking) or whether
we have taken a loop iteration already.

The generated sequence:

pre-header:
  bb1:
    i_1 = <number of leading inactive elements>

header:
  bb2:
    i_2 = PHI <i_1(bb1), 0(latch)>
    …

early-exit:
  bb3:
    i_3 = iv_step * i_2 + PHI<vector-iv>

Which eliminates the need to do an expensive mask based reduction.

This fixes gromacs with one OpenMP thread. But with > 1 there is still an issue.

gcc/ChangeLog:

PR tree-optimization/119351
* tree-vectorizer.h (LOOP_VINFO_MASK_NITERS_PFA_OFFSET,
LOOP_VINFO_NON_LINEAR_IV): New.
(class _loop_vec_info): Add mask_skip_niters_pfa_offset and
nonlinear_iv.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_analyze_scalar_cycles_1): Record non-linear inductions.
(vectorizable_induction): If early break and PFA using masking create a
new phi which tracks where the scalar code needs to start...
(vectorizable_live_operation): ...and generate the adjustments here.
(vect_use_loop_mask_for_alignment_p): Reject non-linear inductions and
early break needing peeling.

gcc/testsuite/ChangeLog:

PR tree-optimization/119351
* gcc.target/aarch64/sve/peel_ind_10.c: New test.
* gcc.target/aarch64/sve/peel_ind_10_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_5.c: New test.
* gcc.target/aarch64/sve/peel_ind_5_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_6.c: New test.
* gcc.target/aarch64/sve/peel_ind_6_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_7.c: New test.
* gcc.target/aarch64/sve/peel_ind_7_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_8.c: New test.
* gcc.target/aarch64/sve/peel_ind_8_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_9.c: New test.
* gcc.target/aarch64/sve/peel_ind_9_run.c: New test.
14 files changed:
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_5_run.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_6_run.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_7.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_7_run.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_8.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_8_run.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9_run.c [new file with mode: 0644]
gcc/tree-vect-loop.cc
gcc/tree-vectorizer.h