I've gone back and forth of these problems multiple times. We have two passes,
ext-dce and combine which eliminate extensions using totally different
mechanisms.
ext-dce looks for cases where the state of upper bits in an object aren't
observable and if they aren't observable, then eliminates extensions which set
those bits.
combine looks for cases where we know the state of the upper bits and can prove
an extension is just setting those bits to their prior value. Combine also
looks for cases where the precise extension isn't really important, just the
knowledge that the upper bits are zero or sign extended from a narrower mode
is needed.
Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job. If the
actions of ext-dce (or any other pass for that matter) make
SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, then
combine can end up generating incorrect code.
--
When ext-dce eliminates an extension and turns it into a subreg copy (without
any known SUBREG_PROMOTED_VAR state). Since we can no longer guarantee the
destination object has any known extension state, we scurry around and wipe
SUBREG_PROMOTED_VAR state for the destination object.
That's fine and dandy, but ultimately insufficient. Consider if the
destination of the optimized extension was used as a source in a simple copy
insn. Furthermore assume that the destination of that copy is used within a
SUBREG expression with SUBREG_PROMOTED_VAR set. ext-dce's actions have
clobbered the SUBREG_PROMOTED_VAR state on the destination of that copy, albeit
indirectly.
This patch addresses this problem by taking the set of pseudos directly
impacted by ext-dce's actions and expands that set by building a transitive
closure for pseudos connected via copies. We then scurry around finding
SUBREG_PROMOTED_VAR state to wipe for everything in that expanded set of
pseudos. Voila, everything just works.
--
The other approach here would be to further expand the liveness sets inside
ext-dce. That's a simpler path forward, but ultimately regresses the quality
of codes we do care about.
One good piece of news is that with the transitive closure bits in place, we
can eliminate a bit of the live set expansion we had in place for
SUBREG_PROMOTED_VAR objects.
--
So let's take one case of the 5 that have been reported.
In ext-dce we have this insn:
> (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
> (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 162)
> (nil)))
There are reachable uses of (reg 134):
> (insn 49 47 52 6 (set (mem/c:HI (lo_sum:DI (reg/f:DI 186)
> (symbol_ref:DI ("al") [flags 0x86] <var_decl 0x7ffff73c2da8 al>)) [2 al+0 S2 A16])
> (subreg/s/v:HI (reg:DI 134 [ al_lsm.9 ]) 0)) 279 {*movhi_internal}
> (expr_list:REG_DEAD (reg/f:DI 186)
> (nil)))Obviously safe if we were to remove the extension.
> (insn 52 49 53 6 (set (reg:DI 176)
> (and:DI (reg:DI 134 [ al_lsm.9 ])
> (const_int 5 [0x5]))) "j.c":21:12 106 {*anddi3}
> (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
> (nil)))
> (insn 53 52 56 6 (set (reg:SI 177 [ _8 ])
> (zero_extend:SI (subreg:HI (reg:DI 176) 0))) "j.c":21:12 551 {*zero_extendhisi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 176)
> (nil))) Safe to remove the extension as we only read the low 16 bits from the destination register (reg 176) in insn 53.
> (insn 27 26 29 3 (set (reg:DI 162)
> (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 134 [ al_lsm.9 ]) 0)
> (const_int 1 [0x1])))) "j.c":17:17 8 {addsi3_extended}
> (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
> (nil)))
> (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
> (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 162)
> (nil)))
Again, not as obvious as the first case, but we only read the low 16 bits from
(reg 162) in insn 29. So those upper bits in (reg 134) don't matter.
> (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
> (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
> (nil))
> (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
> (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
> (nil)))Also safe in isolation. But worth noting that if we remove the extension at insn 29, then the promoted status on (reg:DI 144) in insn 30 is no longer valid.
Setting aside the promoted state of (reg:DI 144) at insn 30 for a minute, let's
look into combine.
> (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
> (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
> (nil)) [ ... ]
> (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
> (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
> (nil)))
> (jump_insn 31 30 32 3 (set (pc)
> (if_then_else (eq (reg:DI 135 [ al.2_3 ])
> (const_int 0 [0]))
> (label_ref:DI 41)
> (pc))) "j.c":4:55 371 {*branchdi}
> (int_list:REG_BR_PROB
536870913 (nil))
> -> 41)
Combine will do its thing on insns 30/31. Essentially the sign extension is
not necessary in this context, assuming the promoted subreg status in insn 30
-- the equality test doesn't really care about the kind of extension, just
knowing the value is extended is enough to safely elide the extension.
And now we've come to the crux the problem. That promotion state needs to be
adjusted. The new ext-dce code will see that copy at insn 26 and add (reg 144)
to the set of registers that need promotion state wiped. And everything is
happy after that.
The other cases are similar in nature.
--
This has been bootstrapped and regression tested on x86_64 and aarch64.
Variants have bootstrapped & regression tested on several other platforms and
it's survived testing on the crosses as well.
Pushing to the trunk...
PR rtl-optimization/120242
PR rtl-optimization/120627
PR rtl-optimization/120736
PR rtl-optimization/120813
gcc/
* ext-dce.cc (ext_dce_process_uses): Remove some cases where we
unnecessarily expanded live sets for promoted subregs.
(expand_changed_pseudos): New function.
(reset_subreg_promoted_p): Use it.
gcc/testsuite/
* gcc.dg/torture/pr120242.c: New test.
* gcc.dg/torture/pr120627.c: Likewise.
* gcc.dg/torture/pr120736.c: Likewise.
* gcc.dg/torture/pr120813.c: Likewise.
y = XEXP (y, 0);
else if (SUBREG_P (y) && SUBREG_BYTE (y).is_constant ())
{
- /* We really want to know the outer code here, ie do we
- have (ANY_EXTEND (SUBREG ...)) as we need to know if
- the extension matches the SUBREG_PROMOTED state. In
- that case optimizers can turn the extension into a
- simple copy. Which means that bits outside the
- SUBREG's mode are actually live.
-
- We don't want to mark those bits live unnecessarily
- as that inhibits extension elimination in important
- cases such as those in Coremark. So we need that
- outer code.
-
- But if !TRULY_NOOP_TRUNCATION_MODES_P, the mode
+ /* If !TRULY_NOOP_TRUNCATION_MODES_P, the mode
change performed by Y would normally need to be a
TRUNCATE rather than a SUBREG. It is probably the
guarantee provided by SUBREG_PROMOTED_VAR_P that
regardless of the outer code. See PR 120050. */
if (!REG_P (SUBREG_REG (y))
|| (SUBREG_PROMOTED_VAR_P (y)
- && ((GET_CODE (SET_SRC (x)) == SIGN_EXTEND
- && SUBREG_PROMOTED_SIGNED_P (y))
- || (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
- && SUBREG_PROMOTED_UNSIGNED_P (y))
- || !TRULY_NOOP_TRUNCATION_MODES_P (
- GET_MODE (y),
- GET_MODE (SUBREG_REG (y))))))
+ && (!TRULY_NOOP_TRUNCATION_MODES_P (
+ GET_MODE (y),
+ GET_MODE (SUBREG_REG (y))))))
break;
bit = subreg_lsb (y).to_constant ();
}
}
+/* Walk the IL and build the transitive closure of all the REGs tied
+ together by copies where either the source or destination is
+ marked in CHANGED_PSEUDOS. */
+
+static void
+expand_changed_pseudos (void)
+{
+ /* Build a vector of registers related by a copy. This is meant to
+ speed up the next step by avoiding full IL walks. */
+ struct copy_pair { rtx first; rtx second; };
+ auto_vec<copy_pair> pairs;
+ for (rtx_insn *insn = get_insns(); insn; insn = NEXT_INSN (insn))
+ {
+ if (!NONDEBUG_INSN_P (insn))
+ continue;
+
+ rtx pat = PATTERN (insn);
+
+ /* Simple copies to a REG from another REG or SUBREG of a REG. */
+ if (GET_CODE (pat) == SET
+ && REG_P (SET_DEST (pat))
+ && (REG_P (SET_SRC (pat))
+ || (SUBREG_P (SET_SRC (pat))
+ && REG_P (SUBREG_REG (SET_SRC (pat))))))
+ {
+ rtx src = (REG_P (SET_SRC (pat))
+ ? SET_SRC (pat)
+ : SUBREG_REG (SET_SRC (pat)));
+ pairs.safe_push ({ SET_DEST (pat), src });
+ }
+
+ /* Simple copies to a REG from another REG or SUBREG of a REG
+ held inside a PARALLEL. */
+ if (GET_CODE (pat) == PARALLEL)
+ {
+ for (int i = XVECLEN (pat, 0) - 1; i >= 0; i--)
+ {
+ rtx elem = XVECEXP (pat, 0, i);
+
+ if (GET_CODE (elem) == SET
+ && REG_P (SET_DEST (elem))
+ && (REG_P (SET_SRC (elem))
+ || (SUBREG_P (SET_SRC (elem))
+ && REG_P (SUBREG_REG (SET_SRC (elem))))))
+ {
+ rtx src = (REG_P (SET_SRC (elem))
+ ? SET_SRC (elem)
+ : SUBREG_REG (SET_SRC (elem)));
+ pairs.safe_push ({ SET_DEST (elem), src });
+ }
+ }
+ continue;
+ }
+ }
+
+ /* Now we have a vector with copy pairs. Iterate over that list
+ updating CHANGED_PSEUDOS as we go. Eliminate copies from the
+ list as we go as they don't need further processing. */
+ bool changed = true;
+ while (changed)
+ {
+ changed = false;
+ unsigned int i;
+ copy_pair *p;
+ FOR_EACH_VEC_ELT (pairs, i, p)
+ {
+ if (bitmap_bit_p (changed_pseudos, REGNO (p->second))
+ && bitmap_set_bit (changed_pseudos, REGNO (p->first)))
+ {
+ pairs.unordered_remove (i);
+ changed = true;
+ }
+ }
+ }
+}
/* We optimize away sign/zero extensions in this pass and replace
them with SUBREGs indicating certain bits are don't cares.
static void
reset_subreg_promoted_p (void)
{
+ /* This pass eliminates zero/sign extensions on pseudo regs found
+ in CHANGED_PSEUDOS. Elimination of those extensions changes if
+ the pseudos are known to hold values extended to wider modes
+ via SUBREG_PROMOTED_VAR. So we wipe the SUBREG_PROMOTED_VAR
+ state on all affected pseudos.
+
+ But that is insufficient. We might have a copy from one REG
+ to another (possibly with the source register wrapped with a
+ SUBREG). We need to wipe SUBREG_PROMOTED_VAR on the transitive
+ closure of the original CHANGED_PSEUDOS and registers they're
+ connected to via copies. So expand the set. */
+ expand_changed_pseudos ();
+
/* If we removed an extension, that changed the promoted state
of the destination of that extension. Thus we need to go
find any SUBREGs that reference that pseudo and adjust their
--- /dev/null
+/* { dg-do run } */
+/* { dg-additional-options "-fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+char f1(char a, char b) {
+ return b == 0 ? a : b;
+}
+int f2(int a, int b) {
+ return b ? a : 0;
+}
+struct l {
+ unsigned m;
+ int n;
+};
+struct l ae;
+char af = -2;
+unsigned ah = 4;
+int aj = 8;
+int *test = &aj;
+int main() {
+ao:
+ if (f2(f1(4, af++), *test) <= 0) {
+ for (; ae.n; ae.n++)
+ ;
+ if (ah)
+ goto ao;
+ }
+ if (af != 1)
+ __builtin_abort ();
+ __builtin_exit (0);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-additional-options "-fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+unsigned char sub(unsigned char t, unsigned char u) { return t - u; }
+unsigned char mul(unsigned char t, unsigned char u) { return t * u; }
+int x(int aa, int ab) {
+ return ab >= 32 || aa > 18446744073709551615UL >> ab ? aa : aa << ab;
+}
+int ag;
+int ah = 249;
+char ap;
+static short ar[5][9];
+int *as = &ag;
+void bf(char cf) {
+ for (; ap <= 8; ap++) {
+ (ar[1][7] = mul(x(-1L, sub(cf, 247) / cf), ag <= 0)) || ar[1][4]++;
+ *as = ag;
+ }
+ return;
+}
+int main() {
+ bf(ah);
+ if (ar[1][7] != 255)
+ __builtin_abort ();
+ __builtin_exit (0);
+}
+
--- /dev/null
+/* { dg-do run } */
+/* { dg-additional-options "-fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+unsigned char aa (unsigned char ab, int o) { return ab > o ? ab : 0; }
+int p;
+int s;
+static unsigned char q = 255;
+int r;
+int *v = &s;
+int main() {
+ p = v != 0;
+ for (; r < 8; ++r) {
+ if (s)
+ break;
+ s = aa(p * q++, 6) <= 0;
+ }
+ if (q != 1)
+ __builtin_abort ();
+ __builtin_exit (0);
+}
+
--- /dev/null
+/* { dg-do run } */
+/* { dg-additional-options "-fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+short s (short t, short u) { return u == 0 ? 0 : t / u; }
+int x[6];
+int y;
+unsigned ak = 1;
+unsigned short al = 65527;
+unsigned *am = &ak;
+int main() {
+ for (int i = 0; i < 6; i++) {
+ x[i] = i;
+ }
+ for (;;) {
+ unsigned long ar = 2080554998UL;
+ char as = 4;
+ if (s(34, al++) < ar)
+ if (*am)
+ break;
+ }
+ y = x[al & 5];
+ if ((y ^ 5UL) != 4)
+ __builtin_abort ();
+ __builtin_exit (0);
+}
+
+