The following patch to the x86_64 backend improves the code generated
for a decrement followed by a conditional move. The primary change is
to recognize that after subtracting one, checking the result is -1 (or
equivalently that the original value was zero) can be implemented using
the borrow/carry flag instead of requiring an explicit test instruction.
This is achieved by a new define_insn_and_split that allows combine to
split the desired sequence/composite into a *subsi_3 and *movsicc_noc.
The other change with this patch is/are a pair of peephole2 optimizations
to eliminate register-to-register moves generated during register
allocation. During reload, the compiler doesn't know that inverting
the condition of a conditional cmove can sometimes reduce register
pressure, but this is easy to tidy up during the peephole2 pass (where
swapping the order of the insn's operands performs the required
logic inversion).
Both improvements are demonstrated by the case below:
int foo(int x) {
if (x == 0)
x = 16;
else x--;
return x;
}
And the value of the peephole2 clean-up can be seen on its own in:
int bar(int x) {
x--;
if (x == 0)
x = 16;
return x;
}
Before:
bar: movl %edi, %eax
movl $16, %edx
subl $1, %eax
cmove %edx, %eax
ret
After:
bar: subl $1, %edi
movl $16, %eax
cmovne %edi, %eax
ret
These idioms were inspired by the source code of NIST SciMark4's
Random_nextDouble function, where the tweaks above result in
a ~1% improvement in the MonteCarlo benchmark kernel.
2021-07-30 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (*dec_cmov<mode>): New define_insn_and_split
to generate a conditional move using the carry flag after sub $1.
(peephole2): Eliminate a register-to-register move by inverting
the condition of a conditional move.
gcc/testsuite/ChangeLog
* gcc.target/i386/dec-cmov-1.c: New test.
* gcc.target/i386/dec-cmov-2.c: New test.