As reported by github user @JB0925 in issue #2427, there is a possible
crash in pool_flush(). The problem is that if the free_list is not empty
in the first test, and is empty at the moment the xchg() is performed,
for example because another thread called it in parallel, we place a
POOL_BUSY there that is never removed later, causing the next thread to
wait forever.
This was introduced in 2.5 with commit
2a4523f6f ("BUG/MAJOR: pools: fix
possible race with free() in the lockless variant"). It has probably
very rarely been detected, because:
- pool_flush() is only called when stopping is set
- the function does nothing if global pools are disabled, which is
the case on most modern systems with a fast memory allocator.
It's possible to reproduce it by modifying __task_free() to call
pool_flush() on 1% of the calls instead of only when stopping.
The fix is quite simple, it consists in moving the zeroing of the
entry in the break path after verifying that the entry was not already
busy.
This must be backported wherever commit
2a4523f6f is.
*/
for (bucket = 0; bucket < CONFIG_HAP_POOL_BUCKETS; bucket++) {
next = pool->buckets[bucket].free_list;
- do {
+ while (1) {
while (unlikely(next == POOL_BUSY))
next = (void*)pl_wait_new_long((ulong*)&pool->buckets[bucket].free_list, (ulong)next);
if (next == NULL)
break;
- } while (unlikely((next = _HA_ATOMIC_XCHG(&pool->buckets[bucket].free_list, POOL_BUSY)) == POOL_BUSY));
- if (next) {
- _HA_ATOMIC_STORE(&pool->buckets[bucket].free_list, NULL);
- __ha_barrier_atomic_store();
+ next = _HA_ATOMIC_XCHG(&pool->buckets[bucket].free_list, POOL_BUSY);
+ if (next != POOL_BUSY) {
+ HA_ATOMIC_STORE(&pool->buckets[bucket].free_list, NULL);
+ break;
+ }
}
while (next) {