]> git.ipfire.org Git - thirdparty/gcc.git/commit
tree-optimization/90579 - avoid STLF fail by better optimizing
authorRichard Biener <rguenther@suse.de>
Wed, 12 Feb 2025 13:18:06 +0000 (14:18 +0100)
committerRichard Biener <rguenth@gcc.gnu.org>
Fri, 14 Feb 2025 07:28:50 +0000 (08:28 +0100)
commit27653070db35216d5115cc25672fcc6a51203d26
tree902b096ac125b500960271a4536e1705f91955f5
parent8caf67eea7e1b29a4437f07d13c300d9fdb04827
tree-optimization/90579 - avoid STLF fail by better optimizing

For the testcase in question which uses a fold-left vectorized
reduction of a reverse iterating loop we'd need two forwprop
invocations to first bypass the permute emitted for the reverse
iterating loop and then to decompose the vector load that only
feeds element extracts.  The following moves the first transform
to a match.pd pattern and makes sure we fold the element extracts
when the vectorizer emits them so the single forwprop pass can
then pick up the vector load decomposition, avoiding the forwarding
fail that causes.

Moving simplify_bitfield_ref also makes forwprop remove the dead
VEC_PERM_EXPR via the simple-dce it uses - this was also
previously missing.

PR tree-optimization/90579
* tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to
match.pd.
(pass_forwprop::execute): Adjust.
* match.pd (bit_field_ref (vec_perm ...)): New pattern
modeled after simplify_bitfield_ref.
* tree-vect-loop.cc (vect_expand_fold_left): Fold the
element extract stmt, combining it with the vector def.

* gcc.target/i386/pr90579.c: New testcase.
gcc/match.pd
gcc/testsuite/gcc.target/i386/pr90579.c [new file with mode: 0644]
gcc/tree-ssa-forwprop.cc
gcc/tree-vect-loop.cc