From 78ef847e549482263c7f9dbfe5f93206a6bcd01c Mon Sep 17 00:00:00 2001 From: Nathan Monfils Date: Mon, 5 May 2025 10:47:43 +0200 Subject: [PATCH] manager.c: Invalid ref-counting when purging events We have a use-case where we generate a *lot* of events on the AMI, and then when doing `manager show eventq` we would see some events which would linger for hours or days in there. Obviously something was leaking. Testing allowed us to track down this logic bug in the ref-counting on the event purge. Reproducing the bug was not super trivial, we managed to do it in a production-like load testing environment with multiple AMI consumers. The race condition itself: 1. something allocates and links `session` 2. `purge_sessions` iterates over that `session` (takes ref) 3. `purge_session` correctly de-referencess that session 4. `purge_session` re-evaluates the while() loop, taking a reference 5. `purge_session` exits (`n_max > 0` is false) 6. whatever allocated the `session` deallocates it, but a reference is now lost since we exited the `while` loop before de-referencing. 7. since the destructor is never called, the session->last_ev->usecount is never decremented, leading to events lingering in the queue The impact of this bug does not seem major. The events are small and do not seem, from our testing, to be causing meaningful additional CPU usage. Mainly we wanted to fix this issue because we are internally adding prometheus metrics to the eventq and those leaked events were causing the metrics to show garbage data. (cherry picked from commit 81785db9663d64a9c997a484bc097852a123d86b) --- main/manager.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/main/manager.c b/main/manager.c index 9eecda634f..241cec2b26 100644 --- a/main/manager.c +++ b/main/manager.c @@ -7466,7 +7466,9 @@ static int purge_sessions(int n_max) } i = ao2_iterator_init(sessions, 0); ao2_ref(sessions, -1); - while ((session = ao2_iterator_next(&i)) && n_max > 0) { + + /* The order of operations is significant */ + while (n_max > 0 && (session = ao2_iterator_next(&i))) { ao2_lock(session); if (session->sessiontimeout && (now > session->sessiontimeout) && !session->inuse) { if (session->authenticated -- 2.47.2