From: Willy Tarreau <w@1wt.eu>
Date: Mon, 19 May 2025 13:57:03 +0000 (+0200)
Subject: BUG/MAJOR: leastconn: never reuse the node after dropping the lock
X-Git-Tag: v3.2-dev17~27
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=99d6c889d07820686bab7631e751cb618e4d55e5;p=thirdparty%2Fhaproxy.git

BUG/MAJOR: leastconn: never reuse the node after dropping the lock

On ARM with 80 cores and a single server, it's sometimes possible to see
a segfault in fwlc_get_next_server() around 600-700k RPS. It seldom
happens as well on x86 with 128 threads with the same config around 1M
rps. It turns out that in fwlc_get_next_server(), before calling
fwlc_srv_reposition(), we have to drop the lock and that one takes it
back again.

The problem is that anything can happen to our node during this time,
and it can be freed. Then when continuing our work, we later iterate
over it and its next to find a node with an acceptable key, and by
doing so we can visit either uninitialized memory or simply nodes that
are no longer in the tree.

A first attempt at fixing this consisted in artificially incrementing
the elements count before dropping the lock, but that turned out to be
even worse because other threads could loop forever on such an element
looking for an entry that does not exist. Maintaining a separate
refcount didn't work well either, and it required to deal with the
memory release while dropping it, which is really not convenient.

Here we're taking a different approach consisting in simply not
trusting this node anymore and going back to the beginning of the
loop, as is done at a few other places as well. This way we can
safely ignore the possibly released node, and the test runs reliably
both on the arm and the x86 platforms mentioned above. No performance
regression was observed either, likely because this operation is quite
rare.

No backport is needed since this appeared with the leastconn rework
in 3.2.
---

diff --git a/src/lb_fwlc.c b/src/lb_fwlc.c
index 9023a7aaf..10ad147b5 100644
--- a/src/lb_fwlc.c
+++ b/src/lb_fwlc.c
@@ -786,12 +786,16 @@ redo:
 						 * The server has more requests than expected,
 						 * let's try to reposition it, to avoid too
 						 * many threads using the same server at the
-						 * same time.
+						 * same time. From the moment we release the
+						 * lock, we cannot trust the node nor tree_elt
+						 * anymore, so we need to loop back to the
+						 * beginning.
 						 */
 						if (i >= FWLC_LISTS_NB) {
 							HA_RWLOCK_RDUNLOCK(LBPRM_LOCK, &p->lbprm.lock);
 							fwlc_srv_reposition(s);
 							HA_RWLOCK_RDLOCK(LBPRM_LOCK, &p->lbprm.lock);
+							goto redo;
 						}
 						i++;
 						continue;