]> git.ipfire.org Git - thirdparty/gcc.git/commit
x86: Use previous scratch register in LCP stall peepholes master trunk
authorH.J. Lu <hjl.tools@gmail.com>
Sat, 20 Jun 2026 23:13:38 +0000 (07:13 +0800)
committerH.J. Lu <hjl.tools@gmail.com>
Tue, 23 Jun 2026 07:01:47 +0000 (15:01 +0800)
commit1f774d902f1ec9ae6a487e00ba49514d3b37057f
tree06af10eb7801e4c01993bf9b6dc43ad9c5c97a31
parent6d842965024cad9f6d3588d300c11c2c2e4a2dba
x86: Use previous scratch register in LCP stall peepholes

Since LCP stall peepholes are added after register allocation, each
peephole may use a different scratch register.  For input:

extern void bar (void);

void
foo (short *dst)
{
  dst[0] = 3;
  asm volatile ("" : : : "memory");
  dst[2] = 3;
  bar ();
  dst[1] = 3;
  asm volatile ("" : : : "memory");
  dst[4] = 3;
}

with LCP stall peepholes, GCC generates:

movl $3, %eax
pushq %rbx
movq %rdi, %rbx
movw %ax, (%rdi)
movl $3, %edx
movw %dx, 4(%rdi)
call bar
movl $3, %ecx
movw %cx, 2(%rbx)
movl $3, %esi
movw %si, 8(%rbx)
popq %rbx

using 4 different scratch registers vs without LCP stall peepholes:

pushq %rbx
movq %rdi, %rbx
movw $3, (%rdi)
movw $3, 4(%rdi)
call bar
movw $3, 2(%rbx)
movw $3, 8(%rbx)
popq %rbx

Add ix86_output_lcp_stall_peephole to generate LCP stall peepholes with
the previous scratch register:

1. Scan backward for the previous scratch register definition with
the same immediate operand in the same basic block.
2. The previous scratch register is unusable if it is set between the
previous scratch register definition and the current instruction.
3. If a usable previous scratch register is found, ignore the allocated
scratch register and use the previous scratch register.  Otherwise, use
the allocated scratch register.

so that the same scratch register can be reused if possible:

movl $3, %eax
pushq %rbx
movq %rdi, %rbx
movw %ax, (%rdi)
movw %ax, 4(%rdi)
call bar
movl $3, %ecx
movw %cx, 2(%rbx)
movw %cx, 8(%rbx)
popq %rbx

I backported this patch to GCC 16:

1. When bootstrapping GCC 16 with only C and C++ enabled, this optimization
triggers 54 times.  No regressions.
2. When building glibc 2.44, this optimization triggers 33 times.  No
regressions.
3. When building Linux kernel 7.1.1, this optimization triggers 2099 times.
Kernel boots correctly.

Tested on Linux/x86-64 and Linux/i686.

gcc/

PR target/125893
* config/i386/i386-expand.cc (ix86_expand_lcp_stall_peephole): New.
* config/i386/i386-protos.h (ix86_expand_lcp_stall_peephole):
Likewise.
* config/i386/i386.md (TARGET_LCP_STALL peepholes): Call
ix86_expand_lcp_stall_peephole.

gcc/testsuite/

PR target/125893
* gcc.target/i386/pr125893-1.c: New test.
* gcc.target/i386/pr125893-2.c: Likewise.
* gcc.target/i386/pr125893-3.c: Likewise.
* gcc.target/i386/pr125893-4.c: Likewise.
* gcc.target/i386/pr125893-5.c: Likewise.
* gcc.target/i386/pr125893-6.c: Likewise.
* gcc.target/i386/pr125893-7.c: Likewise.
* gcc.target/i386/pr125893-8.c: Likewise.
* gcc.target/i386/pr125893-9.c: Likewise.
* gcc.target/i386/pr125893-10.c: Likewise.
* gcc.target/i386/pr125893-11.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
14 files changed:
gcc/config/i386/i386-expand.cc
gcc/config/i386/i386-protos.h
gcc/config/i386/i386.md
gcc/testsuite/gcc.target/i386/pr125893-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-10.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-11.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-4.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-7.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-8.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr125893-9.c [new file with mode: 0644]