From: Willy Tarreau Date: Wed, 16 Oct 2024 16:08:39 +0000 (+0200) Subject: BUG/MEDIUM: queue: make sure never to queue when there's no more served conns X-Git-Tag: v3.1-dev10~62 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=ca275d99ce02e72d707fc87da133d739cdda5146;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: queue: make sure never to queue when there's no more served conns Since commit 53f52e67a0 ("BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server"), we've got two reports again still showing the theoretically impossible condition in pendconn_add(), including a single threaded one. Thanks to the traces, the issue could be tracked down to the redispatch part. In fact, in non-determinist LB algorithms (RR, LC, FAS), we don't perform the LB if there are pending connections in the backend, since it indicates that previous attempts already failed, so we directly return SRV_STATUS_FULL. And contrary to a previous belief, it is possible to meet this condition with be->served==0 when redispatching (and likely with maxconn not greater than the number of threads). The problem is that in this case, the entry is queued and then the pendconn_must_try_again() function checks if any connections are currently being served to detect if we missed a race, and tries again, but that situation is not caused by a concurrent thread and will never fix itself, resulting in the loop. All that part around pendconn_must_try_again() is still quite brittle, and a safer approach would involve a sequence counter to detect new arrivals and dequeues during the pendconn_add() call. But it's more sensitive work, probably for a later fix. This fix must be backported wherever the fix above was backported. Thanks to Patrick Hemmer, as well as Damien Claisse and Basha Mougamadou from Criteo for their help on tracking this one! --- diff --git a/src/backend.c b/src/backend.c index b2f924c278..c8bb33e81c 100644 --- a/src/backend.c +++ b/src/backend.c @@ -681,7 +681,7 @@ int assign_server(struct stream *s) /* if there's some queue on the backend, with certain algos we * know it's because all servers are full. */ - if (s->be->queue.length && s->be->queue.length != s->be->beconn && + if (s->be->queue.length && s->be->served && s->be->queue.length != s->be->beconn && (((s->be->lbprm.algo & (BE_LB_KIND|BE_LB_NEED|BE_LB_PARM)) == BE_LB_ALGO_FAS)|| // first ((s->be->lbprm.algo & (BE_LB_KIND|BE_LB_NEED|BE_LB_PARM)) == BE_LB_ALGO_RR) || // roundrobin ((s->be->lbprm.algo & (BE_LB_KIND|BE_LB_NEED|BE_LB_PARM)) == BE_LB_ALGO_SRR))) { // static-rr