resource.cc (mark_target_live_regs): Don't look past target insn, PR115182
The PR115182 regression is that a delay-slot for a conditional branch,
is no longer filled with an insn that has been "sunk" because of
r15-518-g99b1daae18c095, for cris-elf w. -O2 -march=v10.
There are still sufficient "nearby" dependency-less insns that the
delay-slot shouldn't be empty. In particular there's one candidate in
the loop, right after an off-ramp branch, off the loop: a move from
$r9 to $r3.
beq .L2
nop
move.d $r9,$r3
But, the resource.cc data-flow-analysis incorrectly says it collides
with registers "live" at that .L2 off-ramp. The off-ramp insns
(inlined from simple_rand) look like this (left-to-right direction):
.L2:
move.d $r12,[_seed.0]
move.d $r13,[_seed.0+4]
ret
movem [$sp+],$r8
So, a store of a long long to _seed, a return instruction and a
restoring multi-register-load of r0..r8 (all callee-saved registers)
in the delay-slot of the return insn. The return-value is kept in
$r10,$r11 so in total $r10..$r13 live plus the stack-pointer and
return-address registers. But, mark_target_live_regs says that
$r0..$r8 are also live because it *includes the registers live for the
return instruction*! While they "come alive" after the movem, they
certainly aren't live at the "off-ramp" .L2 label.
The problem is in mark_target_live_regs: it consults a hash-table
indexed by insn uid, where it tracks the currently live registers with
a "generation" count to handle when it moves around insn, filling
delay-slots. As a fall-back, it starts with registers live at the
start of each basic block, calculated by the comparatively modern df
machinery (except that it can fail finding out which basic block an
insn belongs to, at which times it includes all registers film at 11),
and tracks the semantics of insns up to each insn.
You'd think that's all that should be done, but then for some reason
it *also* looks at insns *after the target insn* up to a few branches,
and includes that in the set of live registers! This is the code in
mark_target_live_regs that starts with the call to
find_dead_or_set_registers. I couldn't make sense of it, so I looked
at its history, and I think I found the cause; it's a thinko or
possibly two thinkos. The original implementation, gcc-git-described
as
r0-97-g9c7e297806a27f, later moved from reorg.c to resource.c in
r0-20470-gca545bb569b756.
I believe the "extra" lookup was intended to counter flaws in the
reorg.c/resource.c register liveness analysis; to inspect insns along
the execution paths to exclude registers that, when looking at
subsequent insns, weren't live. That guess is backed by a sentence in
the updated (i.e. deleted) part of the function head comment for
mark_target_live_regs: "Next, scan forward from TARGET looking for
things set or clobbered before they are used. These are not live."
To me that sounds like flawed register-liveness data.
An epilogue expanded as RTX (i.e. not just assembly code emitted as
text) is introduced in basepoints/
gcc-0-1334-gbdac5f5848fb, so before
that time, nobody would notice that saved registers were included as
live registers in delay-slots in "next-to-last" basic blocks.
Then in
r0-24783-g96e9c98d59cc40, the intersection ("and") was changed
to a union ("or"), i.e. it added to the set of live registers instead
of thinning it out. In the gcc-patches archives, I see the patch
submission doesn't offer a C test-case and only has RTX snippets
(apparently for SPARC). The message does admit that the change goes
"against what the comments in the code say":
https://gcc.gnu.org/pipermail/gcc-patches/1999-November/021836.html
It looks like this was related to a bug with register liveness info
messed up when moving a "delay-slotted" insn from one slot to another.
But, I can't help but thinking it's just papering over a register
liveness bug elsewhere.
I think, with a reliable "DF_LR_IN", the whole thing *after* tracking
from start-of-bb up to the target insn should be removed; thus.
This patch also removes the now-unused find_dead_or_set_registers
function.
At r15-518, it fixes the issue for CRIS and improves coremark scores
at -O2 -march=v10 a tiny bit (about 0.05%).
PR rtl-optimization/115182
* resource.cc (mark_target_live_regs): Don't look for
unconditional branches after the target to improve on the
register liveness.
(find_dead_or_set_registers): Remove unused function.