So the first special case of clearing bits from Shreya's work. We can clear an
arbitrary number of high bits by shifting left by the number of bits to clear,
then logically shifting right to put everything in place. Similarly we can
clear an arbitrary number of low bits with a right logical shift followed by a
left shift. Naturally this only applies when the constant synthesis budget is
2+ insns.
Even with mvconst_internal still enabled this does consistently show various
small code generation improvements.
I have seen a notable regression. The two shift form to wipe out high bits
isn't handled well by ext-dce. Essentially it looks like we don't recognize
the sequence as wiping upper bits, instead it makes bits live and as a result
we're unable to remove a prior zero extension. I've opened a bug for this
issue.
The other case I've seen is CSE related. If we had a number of masking
operations with the same mask, we might have previously CSE'd the constant. In
that scenario each instance of masking would be a single AND using the CSE'd
register holding the constant, whereas with this patch it'll be a pair of
shifts. But on a good uarch design the pair of shifts would be fused into a
single op. Given this is relatively rare and on the margins from a performance
standpoint I'm not going to worry about it.
This has spun in my tester for riscv32-elf and riscv64-elf. Bootstrap and
regression test is in flight and due in an hour or so. Waiting on the
upstream pre-commit tester and the bootstrap test before moving forward.
gcc/
* config/riscv/riscv.cc (synthesize_and): When profitable, use two
shift combinations to clear high or low bits rather than synthsizing
the constant.