arm: Prefer POP {lo-reg} over LDR lo-reg, ... for thumb2 [PR118089]
For thumb2, popping a single low register off the stack should prefer
POP over LDR to mirror the behaviour of the PUSH on entry. This saves
a couple of bytes in the resulting image. This is a relatively niche
case as it's rare to push a single low register onto the stack, but
still worth getting right.
Whilst fixing this I've also restructured the code here somewhat to
fix a bug I observed by inspection and to improve the code slightly.
Firstly, the single register case is hoisted above the main loop.
This not only avoids creating some RTL that immediately becomes
garbage but also avoids us needing to check for this case in every
iteration of the main loop body.
Secondly, we iterate over just the non-zero bits in the reg mask
rather than every bit and then checking if there's work to do for that
bit.
Finally, when emitting a pop that also pops SP off the stack we
shouldn't be emitting a stack-adjust CFA note. The new SP value comes
from the popped value, not from an adjustment of the previous SP
value.
gcc:
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure.
Don't emit LDR on thumb2 when POP can be used for smaller code.
Don't add a CFA adjust note when SP is popped off the stack.
gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c: New test.