Running a stick-table stress with -dMglobal under 56 threads shows
extreme contention on the pool's free_list because it has to be
processed in two phases and only used to implement a cpu_relax() on
the retry path.
Let's at least implement exponential back-off here to limit the neighbor's
noise and reduce the time needed to successfully acquire the pointer. Just
doing so shows there's still contention but almost doubled the performance,
from 1.1 to 2.1M req/s.
ret = _HA_ATOMIC_LOAD(&pool->free_list);
do {
while (unlikely(ret == POOL_BUSY)) {
- __ha_cpu_relax();
- ret = _HA_ATOMIC_LOAD(&pool->free_list);
+ ret = (void*)pl_wait_new_long((ulong*)&pool->free_list, (ulong)ret);
}
if (ret == NULL)
return;
free_list = _HA_ATOMIC_LOAD(&pool->free_list);
do {
while (unlikely(free_list == POOL_BUSY)) {
- __ha_cpu_relax();
- free_list = _HA_ATOMIC_LOAD(&pool->free_list);
+ free_list = (void*)pl_wait_new_long((ulong*)&pool->free_list, (ulong)free_list);
}
_HA_ATOMIC_STORE(&item->next, free_list);
__ha_barrier_atomic_store();