The following swaps the loop splitting pass and the final value
replacement pass to avoid keeping the IV of the earlier loop
live when not necessary. The existing gcc.target/i386/pr87007-5.c
testcase shows that we otherwise fail to elide an empty loop
later. I don't see any good reason why loop splitting would need
final value replacement, all exit values honor the constraints
we place on loop header PHIs automatically.
* passes.def: Exchange loop splitting and final value
replacement passes.
* gcc.target/i386/pr87007-5.c: Make sure we split the loop
and eliminate both in the end.
form if possible. */
NEXT_PASS (pass_tree_loop_init);
NEXT_PASS (pass_tree_unswitch);
- NEXT_PASS (pass_scev_cprop);
NEXT_PASS (pass_loop_split);
+ NEXT_PASS (pass_scev_cprop);
NEXT_PASS (pass_loop_versioning);
NEXT_PASS (pass_loop_jam);
/* All unswitching, final value replacement and splitting can expose
/* { dg-do compile } */
-/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize" } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence. */
#include<math.h>
d1 = sqrt (d3);
}
+/* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
+/* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */