Break false dependency chain on Zen5
Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk
instructions. Those can be tested by the following benchmark
jh@shroud:~> cat ee.c
int
main()
{
int a = 10;
int b = 0;
for (int i = 0; i <
1000000000; i++)
{
asm volatile ("xor %0, %0": "=r" (b));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
}
return 0;
}
jh@shroud:~> cat bmk.sh
gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out
jh@shroud:~> sh bmk.sh tzcnt
real 0m0.886s
user 0m0.886s
sys 0m0.000s
real 0m0.886s
user 0m0.886s
sys 0m0.000s
jh@shroud:~> sh bmk.sh blsi
real 0m0.979s
user 0m0.979s
sys 0m0.000s
real 0m2.418s
user 0m2.418s
sys 0m0.000s
jh@shroud:~> sh bmk.sh blsr
real 0m0.986s
user 0m0.986s
sys 0m0.000s
real 0m2.422s
user 0m2.421s
sys 0m0.000s
jh@shroud:~> sh bmk.sh blsmsk
real 0m0.973s
user 0m0.973s
sys 0m0.000s
real 0m2.422s
user 0m2.422s
sys 0m0.000s
We already have runable that controls tzcnt together with lzcnt and popcnt.
Since it seems that only tzcnt is affected I added new tunable to control tzcnt
only. I also added splitters for blsi/blsr/blsmsk implemented analogously to
existing splitter for lzcnt.
The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but
they usually have same destination as source. However it is good to break the
dependency chain to avoid patogolical cases and it is quite cheap overall, so I
think we want to enable this for generic. I will send followup patch for this.
Bootstrapped/regtested x86_64-linux, will commit it shortly.
gcc/ChangeLog:
* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro.
(TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro.
* config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false
dependency.
(*bmi_blsi_<mode>_ccno): Add splitter for false dependency.
(*bmi_blsi_<mode>_falsedep): New pattern.
(*bmi_blsmsk_<mode>): Add splitter for false dependency.
(*bmi_blsmsk_<mode>_falsedep): New pattern.
(*bmi_blsr_<mode>): Add splitter for false dependency.
(*bmi_blsr_<mode>_cmp): Add splitter for false dependency
(*bmi_blsr_<mode>_cmp_falsedep): New pattern.
* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune.
gcc/testsuite/ChangeLog:
* gcc.target/i386/blsi.c: New test.
* gcc.target/i386/blsmsk.c: New test.
* gcc.target/i386/blsr.c: New test.