From: Aurelien DARRAGON <adarragon@haproxy.com>
Date: Tue, 6 Aug 2024 12:29:56 +0000 (+0200)
Subject: BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
X-Git-Tag: v3.1-dev5~42
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=8f1fd96;p=thirdparty%2Fhaproxy.git

BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak

An issue has been introduced with cd99440 ("BUG/MAJOR: server/addr: fix
a race during server addr:svc_port updates").

Indeed, in the above commit we implemented the atomic_sync task which is
responsible for consuming pending server events to apply the changes
atomically. For now only server's addr updates are concerned.

To prevent the task from causing contention, a budget was assigned to it.
It can be controlled with the global tunable
'tune.events.max-events-at-once': the task may not process more than this
number of events at once.

However, a bug was introduced with this budget logic: each time the task
has to be interrupted because it runs out of budget, we reschedule the
task to finish where it left off, but the current event which was already
removed from the queue wasn't processed yet. This means that this pending
event (each tune.events.max-events-at-once) is effectively lost.

When the atomic_sync task deals with large number of concurrent events,
this bug has 2 known consequences: first a server's addr/port update
will be lost every 'tune.events.max-events-at-once'. This can of course
cause reliability issues because if the event is not republished
periodically, the server could stay in a stale state for indefinite amount
of time. This is the case when the DNS server flaps for instance: some
servers may not come back UP after the incident as described in GH #2666.

Another issue is that the lost event was not cleaned up, resulting in a
small memory leak. So in the end, it means that the bug is likely to
cause more and more degradation over time until haproxy is restarted.

As a workaround, 'tune.events.max-events-at-once' may be set to the
maximum number of events expected per batch. Note however that this value
cannot exceed 10 000, otherwise it could cause the watchdog to trigger due
to the task being busy for too long and preventing other threads from
making any progress. Setting higher values may not be optimal for common
workloads so it should only be used to mitigate the bug while waiting for
this fix.

Since tune.events.max-events-at-once defaults to 100, this bug only
affects configs that involve more than 100 servers whose addr:port
properties are likely to be updated at the same time (batched updates
from cli, lua, dns..)

To fix the bug, we move the budget check after the current event is fully
handled. For that we went from a basic 'while' to 'do..while' loop as we
assume from the config that 'tune.events.max-events-at-once' cannot be 0.
While at it, we reschedule the task once thread isolation ends (it was not
required to perform the reschedule while under isolation) to give the hand
back faster to waiting threads.

This patch should be backported up to 2.9 with cd99440. It should fix
GH #2666.
---

diff --git a/src/server.c b/src/server.c
index 05f72fffaf..2f27df8538 100644
--- a/src/server.c
+++ b/src/server.c
@@ -228,7 +228,11 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 	struct event_hdl_async_event *event;
 
 	/* check for new server events that we care about */
-	while ((event = event_hdl_async_equeue_pop(&server_atomic_sync_queue))) {
+	do {
+		event = event_hdl_async_equeue_pop(&server_atomic_sync_queue);
+		if (!event)
+			break; /* no events in queue */
+
 		if (event_hdl_sub_type_equal(event->type, EVENT_HDL_SUB_END)) {
 			/* ending event: no more events to come */
 			event_hdl_async_free_event(event);
@@ -237,20 +241,6 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 			break;
 		}
 
-		if (!remain) {
-			/* STOP: we've already spent all our budget here, and
-			 * considering we possibly are under isolation, we cannot
-		         * keep blocking other threads any longer.
-			 *
-			 * Reschedule the task to finish where we left off if
-			 * there are remaining events in the queue.
-			 */
-			if (!event_hdl_async_equeue_isempty(&server_atomic_sync_queue))
-				task_wakeup(task, TASK_WOKEN_OTHER);
-			break;
-		}
-		remain--;
-
 		/* new event to process */
 		if (event_hdl_sub_type_equal(event->type, EVENT_HDL_SUB_SERVER_INETADDR)) {
 			struct sockaddr_storage new_addr;
@@ -327,7 +317,7 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 			srv_set_addr_desc(srv, 1);
 		}
 		event_hdl_async_free_event(event);
-	}
+	} while (--remain); // event_hdl_tune.max_events_at_once is expected to be > 0
 
 	/* some events possibly required thread_isolation:
 	 * now that we are done, we must leave thread isolation before
@@ -336,6 +326,18 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 	if (thread_isolated())
 		thread_release();
 
+	if (!remain) {
+		/* we stopped because we've already spent all our budget here,
+		 * and considering we possibly were under isolation, we cannot
+	         * keep blocking other threads any longer.
+		 *
+		 * Reschedule the task to finish where we left off if
+		 * there are remaining events in the queue.
+		 */
+		if (!event_hdl_async_equeue_isempty(&server_atomic_sync_queue))
+			task_wakeup(task, TASK_WOKEN_OTHER);
+	}
+
 	return task;
 }