>>> Produces 2 regressions:
>>> |
>>> | regressions.sum:
>>> | Running gcc:gcc.target/aarch64/aarch64.exp ...
>>> | FAIL: gcc.target/aarch64/tbz_1.c check-function-bodies g1
>>> | FAIL: gcc.target/aarch64/tbz_1.c check-function-bodies g2
>> I've reproduced this locally. I haven't gotten into the debug cycle
>> yet, but wanted folks to know it's mine AFAICT to avoid duplicating
>> debugging efforts.
>
> I think the fix is just a testcase fix.
> Currently the testcase has:
> ** tbnz w[0-9]+, #?0, .L([0-9]+)
>
> But that should just can be:
> ** tbnz [wx][0-9]+, #?0, .L([0-9]+)
>
> to match both x0 and w0 there. But are valid in this case with bit 0.
> For both g1 and g2.
Yea, so I wanted to verify this was behaving per expectation. It is. We have
this in g1:
> (insn 2 4 3 2 (set (reg/v:SI 101 [ x ])
> (zero_extend:SI (reg:QI 0 x0 [ x ]))) "j.c":16:1 149 {*zero_extendqisi2_aarch64}
> (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
> (nil)))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (set (reg:SI 102)
> (and:SI (reg/v:SI 101 [ x ])
> (const_int 1 [0x1]))) "j.c":17:6 discrim 1 551 {andsi3}
> (expr_list:REG_DEAD (reg/v:SI 101 [ x ])
> (nil)))
We can trivially see that the zero extension is pointless because the value is
masked with 0x1 immediately thereafter. So ext-dce replaces the zero_extend
with a subreg which then simplifies the sequence into:
> (insn 2 4 3 2 (set (reg/v:SI 101 [ x ])
> (reg:SI 0 x0 [ x ])) "j.c":16:1 104 {*movsi_aarch64}
> (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
> (nil)))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (set (reg:SI 102)
> (and:SI (reg/v:SI 101 [ x ])
> (const_int 1 [0x1]))) "j.c":17:6 discrim 1 551 {andsi3}
> (expr_list:REG_DEAD (reg/v:SI 101 [ x ])
> (nil)))
Combine then turns it into:
> (insn 28 4 2 2 (set (reg:SI 103 [ x ])
> (reg:SI 0 x0 [ x ])) "j.c":16:1 -1
> (expr_list:REG_DEAD (reg:SI 0 x0 [ x ])
> (nil)))
> (note 2 28 3 2 NOTE_INSN_DELETED)
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (note 6 3 7 2 NOTE_INSN_DELETED)
> (note 7 6 8 2 NOTE_INSN_DELETED)
> (jump_insn 8 7 9 2 (parallel [
> (set (pc)
> (if_then_else (eq (zero_extract:DI (subreg:DI (reg:SI 103 [ x ]) 0)
> (const_int 1 [0x1])
> (const_int 0 [0]))
> (const_int 0 [0]))
> (label_ref:DI 14)
> (pc)))
> (clobber (reg:CC 66 cc))
> ]) "j.c":17:6 discrim 1 76 {aarch64_tbzeqdidi}
> (expr_list:REG_UNUSED (reg:CC 66 cc)
> (expr_list:REG_DEAD (reg:SI 103 [ x ])
> (int_list:REG_BR_PROB
966367641 (nil))))
Without the recent ext-dce improvement we get this out of combine:
> (jump_insn 8 7 9 2 (parallel [
> (set (pc)
> (if_then_else (eq (zero_extract:SI (reg:SI 0 x0 [ x ])
> (const_int 1 [0x1])
> (const_int 0 [0]))
> (const_int 0 [0]))
> (label_ref:DI 14)
> (pc)))
> (clobber (reg:CC 66 cc))
> ]) "k.c":17:6 discrim 1 47 {aarch64_tbeqsisi}
> (expr_list:REG_UNUSED (reg:CC 66 cc)
> (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
> (int_list:REG_BR_PROB
966367644 (nil))))
Note the difference in zero_extract's mode within the bit-test insn. DI vs SI.
Hence the w vs x difference in the resulting assembly code.
gcc/testsuite
* gcc.target/aarch64/tbz_1.c: Update expected output.