i386: Improve reg pressure of double word right shift then truncate.
This patch improves register pressure during reload, inspired by PR 97756.
Normally, a double-word right-shift by a constant produces a double-word
result, the highpart of which is dead when followed by a truncation.
The dead code calculating the high part gets cleaned up post-reload, so
the issue isn't normally visible, except for the increased register
pressure during reload, sometimes leading to odd register assignments.
Providing a post-reload splitter, which clobbers a single wordmode
result register instead of a doubleword result register, helps (a bit).
An example demonstrating this effect is:
unsigned long foo (__uint128_t n)
{
unsigned long a = n & MASK60;
unsigned long b = (n >> 60);
b = b & MASK60;
unsigned long c = (n >> 120);
return a+b+c;
}
which currently with -O2 generates (13 instructions):
foo: movabsq $
1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rsi, %rax
shrdq $60, %rdi, %rax
movq %rax, %rdx
movq %rsi, %rax
movq %rdi, %rsi
andq %rcx, %rax
shrq $56, %rsi
andq %rcx, %rdx
addq %rsi, %rax
addq %rdx, %rax
ret
with this patch, we generate one less mov (12 instructions):
foo: movabsq $
1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rdi, %rdx
movq %rsi, %rax
movq %rdi, %rsi
shrdq $60, %rdi, %rdx
andq %rcx, %rax
shrq $56, %rsi
addq %rsi, %rax
andq %rcx, %rdx
addq %rdx, %rax
ret
The significant difference is easier to see via diff:
< shrdq $60, %rdi, %rax
< movq %rax, %rdx
---
> shrdq $60, %rdi, %rdx
Admittedly a single "mov" isn't much of a saving on modern architectures,
but as demonstrated by the PR, people still track the number of them.
2023-11-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): New
define_insn_and_split to optimize register usage of doubleword
right shifts followed by truncation.