From: Willy Tarreau <w@1wt.eu>
Date: Wed, 17 Apr 2024 14:25:20 +0000 (+0200)
Subject: BUG/MEDIUM: evports: do not clear returned events list on signal
X-Git-Tag: v3.0-dev8~33
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=36d92dcd9b62c5af0d7499c07479d6995565db9f;p=thirdparty%2Fhaproxy.git

BUG/MEDIUM: evports: do not clear returned events list on signal

Since 2.0 with commit 0ba4f483d2 ("MAJOR: polling: add event ports
support (Solaris)"), the polling system on Solaris suffers from a
signal handling problem. It turns out that this API is very bizarre,
as reported events are automatically unregistered and their counter
is updated in the same variable that was used to pass the count on
input, making it difficult to handle certain error codes (how should
one handle ENOSYS for example?). And to complete everything, the API
is able to return both EINTR and an event if a signal is reported.

The code tries to deal with certain such cases (e.g. ETIME for timeout
can also report an event), otherwise it defaults to clearing the
event counter upon error. This has the effect that EINTR clears the
list of events, which are also automatically cleared from the set by
the system.

This is visible when using external checks where the SIGCHLD of the
leaving child causes a wakeup that ruins the event counter and causes
endless loops, apparently due to the queued inter-thread byte in the
pipe used to wake threads up that never gets removed in this case.
Note that extcheck would also deserve deeper investigation because it
can immediately re-trigger a check in such a case, which is not normal.

Removing the wiping of the nevlist variable fixes the problem.

This can be backported to all versions since it affects 2.0.
---

diff --git a/src/ev_evports.c b/src/ev_evports.c
index 07676e65a7..0acded2f81 100644
--- a/src/ev_evports.c
+++ b/src/ev_evports.c
@@ -194,6 +194,12 @@ static void _do_poll(struct poller *p, int exp, int wake)
 				   evports_evlist_max,
 				   &nevlist, /* updated to the number of events retrieved */
 				   &timeout_ts);
+
+		/* Be careful, nevlist here is always updated by the syscall
+		 * even on status == -1, so it must always be respected
+		 * otherwise events are lost. Awkward API BTW, I wonder how
+		 * they thought ENOSYS ought to be handled... -WT
+		 */
 		if (status != 0) {
 			int e = errno;
 			switch (e) {
@@ -206,7 +212,7 @@ static void _do_poll(struct poller *p, int exp, int wake)
 				/* nevlist >= 0 */
 				break;
 			default:
-				nevlist = 0;
+				/* signal or anything else */
 				interrupted = 1;
 				break;
 			}