There's no point looking for freshly attached readers if there are none,
taking this lock requires an atomic write to a shared area, something we
clearly want to avoid.
A general test with 213-byte messages on different thread counts shows
how the performance degrades across CCX and how this patch improves the
situation:
Before After
3C6T/1CCX: 6.39 Mmsg/s 6.35 Mmsg/s
6C12T/2CCX: 2.90 Mmsg/s 3.16 Mmsg/s
12C24T/4CCX: 2.14 Mmsg/s 2.33 Mmsg/s
24C48T/8CCX: 1.75 Mmsg/s 1.92 Mmsg/s
This tends to confirm that the queues will really be needed and that
they'll have to be per-ccx hence per thread-group. They will amortize
the number of updates on head & tail (one per multiple messages).
HA_ATOMIC_STORE(lock_ptr, readers);
/* notify potential readers */
- if (sent) {
+ if (sent && HA_ATOMIC_LOAD(&ring->readers_count)) {
HA_RWLOCK_RDLOCK(RING_LOCK, &ring->lock);
list_for_each_entry(appctx, &ring->waiters, wait_entry)
appctx_wakeup(appctx);