Avoid adding loop-carried ops to long chains
Avoid adding loop-carried ops to long chains, otherwise the whole chain will
have dependencies across the loop iteration. Just keep loop-carried ops in a
separate chain.
E.g.
x_1 = phi(x_0, x_2)
y_1 = phi(y_0, y_2)
a + b + c + d + e + x1 + y1
SSA1 = a + b;
SSA2 = c + d;
SSA3 = SSA1 + e;
SSA4 = SSA3 + SSA2;
SSA5 = x1 + y1;
SSA6 = SSA4 + SSA5;
With the patch applied, these test cases improved by 32%~100%.
S242:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}
Case 1:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}
Case 2:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}
The value is the execution time
A: original version
B: with FMA patch g:
e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base on A)
C: with current patch(base on B)
A B C B/A C/A
s242 2.859 5.152 2.859 1.
802028681 1
case 1 5.489 5.488 3.511 0.999818 0.64
case 2 7.216 7.499 4.885 1.039218 0.68
gcc/ChangeLog:
PR tree-optimization/110148
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle loop-carried
ops in this function.