Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts. This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers. This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.
A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }
on x86_64-pc-linux-gnu, gcc -O2 currently generates:
foo: movq %rsi, %rax
movq %rdi, %r8
movq %rax, %rdi
movq %rdx, %rax
movq %rcx, %rdx
addq %r8, %rax
adcq %rdi, %rdx
ret
with this patch, we now generate the much improved:
foo: movq %rdx, %rax
movq %rcx, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
2023-05-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/43644
* lower-subreg.cc (resolve_simple_move): Don't emit a clobber
immediately before moving a multi-word register by parts.
gcc/testsuite/ChangeLog
PR target/43644
* gcc.target/i386/pr43644.c: New test case.
{
unsigned int i;
- if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
- emit_clobber (dest);
-
for (i = 0; i < words; ++i)
{
rtx t = simplify_gen_subreg_concatn (word_mode, dest,
--- /dev/null
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, __int128 y)
+{
+ return x+y;
+}
+
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */