The code in srv_add_to_idle_list() has its roots in 2.0 with commit
9ea5d361ae ("MEDIUM: servers: Reorganize the way idle connections are
cleaned."). At this era we didn't yet have the current set of atomic
load/store operations and we used to perform loads using volatile casts
after a barrier. It turns out that this function has kept this schema
over the years, resulting in a big mfence stalling all the pipeline
in the function:
| static __inline void
| __ha_barrier_full(void)
| {
| __asm __volatile("mfence" ::: "memory");
27.08 | mfence
| if ((volatile void *)srv->idle_node.node.leaf_p == NULL) {
0.84 | cmpq $0x0,0x158(%r15)
0.74 | je 35f
| return 1;
Switching these for a pair of atomic loads got rid of this and brought
0.5 to 3% extra performance depending on the tests due to variations
elsewhere, but it has never been below 0.5%. Note that the second load
doesn't need to be atomic since it's protected by the lock, but it's
cleaner from an API and code review perspective. That's also why it's
relaxed.
This was the last user of _ha_barrier_full(), let's try not to
reintroduce it now!
HA_SPIN_UNLOCK(IDLE_CONNS_LOCK, &idle_conns[tid].idle_conns_lock);
_HA_ATOMIC_INC(&srv->curr_idle_thr[tid]);
- __ha_barrier_full();
- if ((volatile void *)srv->idle_node.node.leaf_p == NULL) {
+ if (HA_ATOMIC_LOAD(&srv->idle_node.node.leaf_p) == NULL) {
HA_SPIN_LOCK(OTHER_LOCK, &idle_conn_srv_lock);
- if ((volatile void *)srv->idle_node.node.leaf_p == NULL) {
+ if (_HA_ATOMIC_LOAD(&srv->idle_node.node.leaf_p) == NULL) {
srv->idle_node.key = tick_add(srv->pool_purge_delay,
now_ms);
eb32_insert(&idle_conn_srv, &srv->idle_node);