From: Olivier Houchard <ohouchard@haproxy.com>
Date: Tue, 17 Dec 2024 14:39:21 +0000 (+0100)
Subject: BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().
X-Git-Tag: v3.2-dev2~64
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=111ea83ed4e13ac3ab028ed5e95201a1b4aa82b8;p=thirdparty%2Fhaproxy.git

BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().

pendconn_grab_from_px() was called when a server was brought back up, to
get some streams waiting in the proxy's queue and get them to run on the
newly available server. It is very similar to process_srv_queue(),
except it only goes through the proxy's queue, which can be a problem,
because there is a small race condition that could lead us to add more
streams to the server queue just as it's going down. If that happens,
the server would just be ignored when back up by new streams, as its
queue is not empty, and it would never try to process its queue.
The other problem with pendconn_grab_from_px() is that it is very
liberal with how it dequeues streams, and it is not very good at
enforcing maxconn, it could lead to having 3*maxconn connections.
For both those reasons, just get rid of pendconn_grab_from_px(), and
just use process_srv_queue().
Both problems are easy to reproduce, especially on a 64 threads machine,
set a maxconn to 100, inject in H2 with 1000 concurrent connections
containing up to 100 streams each, and after a few seconds/minutes the
max number of concurrent output streams will be much higher than
maxconn, and eventually the server will stop processing connections.

It may be related to github issue #2744. Note that it doesn't totally
fix the problem, we can occasionally see a few more connections than
maxconn, but the max that have been observed is 4 more connections, we
no longer get multiple times maxconn.

have more outgoing connections than maxconn,
This should be backported up to 2.6.
---

diff --git a/src/server.c b/src/server.c
index e18a07cbe0..b4f432c10c 100644
--- a/src/server.c
+++ b/src/server.c
@@ -5646,7 +5646,7 @@ static struct task *server_warmup(struct task *t, void *context, unsigned int st
 	server_recalc_eweight(s, 1);
 
 	/* probably that we can refill this server with a bit more connections */
-	pendconn_grab_from_px(s);
+	process_srv_queue(s);
 
 	HA_SPIN_UNLOCK(SERVER_LOCK, &s->lock);
 
@@ -6432,10 +6432,10 @@ static int _srv_update_status_op(struct server *s, enum srv_op_st_chg_cause caus
 		    !(s->flags & SRV_F_BACKUP) && s->next_eweight)
 			srv_shutdown_backup_streams(s->proxy, SF_ERR_UP);
 
-		/* check if we can handle some connections queued at the proxy. We
-		 * will take as many as we can handle.
+		/* check if we can handle some connections queued.
+		 * We will take as many as we can handle.
 		 */
-		xferred = pendconn_grab_from_px(s);
+		process_srv_queue(s);
 
 		tmptrash = alloc_trash_chunk();
 		if (tmptrash) {
@@ -6628,10 +6628,10 @@ static int _srv_update_status_adm(struct server *s, enum srv_adm_st_chg_cause ca
 		    !(s->flags & SRV_F_BACKUP) && s->next_eweight)
 			srv_shutdown_backup_streams(s->proxy, SF_ERR_UP);
 
-		/* check if we can handle some connections queued at the proxy. We
-		 * will take as many as we can handle.
+		/* check if we can handle some connections queued.
+		 * We will take as many as we can handle.
 		 */
-		xferred = pendconn_grab_from_px(s);
+		process_srv_queue(s);
 	}
 	else if (s->next_admin & SRV_ADMF_MAINT) {
 		/* remaining in maintenance mode, let's inform precisely about the