From: Willy Tarreau Date: Sat, 30 Jul 2022 07:53:22 +0000 (+0200) Subject: BUG/MEDIUM: queue/threads: limit the number of entries dequeued at once X-Git-Tag: v2.7-dev3~49 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=ddab05b98a55abfd394ba3a734720a63f1e2e8ac;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: queue/threads: limit the number of entries dequeued at once When testing strong queue contention on a 48-thread machine, some crashes would frequently happen due to process_srv_queue() never leaving and processing pending requests forever. A dump showed more than 500000 loops at once. The problem is that other threads find it working so they don't do anything and are free to process their pending requests. Because of this, the dequeuing thread can be kept busy forever and does not process its own requests anymore (fortunately the watchdog stops it). This patch adds a limit to the number of rounds, it limits it to maxpollevents, which is reasonable because it's also an indicator of latency and batches size. However there's a catch. If all requests are finished when the thread ends the loop, there might not be enough anymore to restart processing the queue. Thus we tolerate to re-enter the loop to process one request at a time when it doesn't have any anymore. This way we're leaving more room for another thread to take on this task, and we're sure to eventually end this loop. Doing this has also improved the overall dequeuing performance by ~20% in highly contended situations with 48 threads. It should be backported at least to 2.4, maybe even 2.2 since issues were faced in the past on machines having many cores. --- diff --git a/src/queue.c b/src/queue.c index 6e0e13032b..73bd3c26ee 100644 --- a/src/queue.c +++ b/src/queue.c @@ -373,9 +373,15 @@ void process_srv_queue(struct server *s) /* let's repeat that under the lock on each round. Threads competing * for the same server will give up, knowing that at least one of - * them will check the conditions again before quitting. + * them will check the conditions again before quitting. In order + * to avoid the deadly situation where one thread spends its time + * dequeueing for others, we limit the number of rounds it does. + * However we still re-enter the loop for one pass if there's no + * more served, otherwise we could end up with no other thread + * trying to dequeue them. */ - while (!stop && s->served < (maxconn = srv_dynamic_maxconn(s))) { + while (!stop && (done < global.tune.maxpollevents || !s->served) && + s->served < (maxconn = srv_dynamic_maxconn(s))) { if (HA_SPIN_TRYLOCK(QUEUE_LOCK, &s->queue.lock) != 0) break; @@ -385,6 +391,8 @@ void process_srv_queue(struct server *s) break; _HA_ATOMIC_INC(&s->served); done++; + if (done >= global.tune.maxpollevents) + break; } HA_SPIN_UNLOCK(QUEUE_LOCK, &s->queue.lock); }