Tests have shown that on modern CPUs it's interesting to wait a bit less
in cpu_relax(). Till now we were looping down to 60 iterations and then
switching to just barriers. Increasing the threshold to 90 iterations
left before getting out of the loop improved the average and max time
to grab a write lock by a few percent (e.g. 10% at 1us, 20% at 256ns
or lower). Higher values tend to progressively lose that gain so let's
stick to this one. This was measured on an EPYC 74F3 like previous
measurements that initially led to this value, and the value might
possibly depend on the mask applied to the loop counter.
This is plock commit
74ca0a7307fa6aec3139f27d3b7e534e1bdb748e.
loops -= 32768;
}
#endif
- for (; loops >= 60; loops --)
+ for (; loops >= 90; loops --)
pl_cpu_relax();
for (; loops >= 1; loops--)