From: Willy Tarreau Date: Tue, 14 Nov 2023 06:55:04 +0000 (+0100) Subject: BUG/MEDIUM: connection: report connection errors even when no mux is installed X-Git-Tag: v2.9-dev10~77 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=6a4591c3d09da3fbd949073a32285a4afaa6785d;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: connection: report connection errors even when no mux is installed An annoying issue was met when testing the reverse-http mechanism, by which failed connection attempts would apparently not be attempted again when there was no connect timeout. It turned out to be more generalized than the rhttp system, and actually affects all outgoing connections relying on NPN or ALPN to choose the mux, on which no mux is installed and for which the subscriber (ssl_sock) must be notified instead. The problem appeared during 2.2-dev1 development. First, commit 062df2c23 ("MEDIUM: backend: move the connection finalization step to back_handle_st_con()") broke the error reporting by testing CO_FL_ERROR only under CO_FL_CONNECTED. While it still worked OK for cases where a mux was present, it did not for this specific situation because no single error path would be considered when no mux was present. Changing the CO_FL_CONNECTED test to also include CO_FL_ERROR did work, until a few commits later with 477902bd2 ("MEDIUM: connections: Get ride of the xprt_done callback.") which removed the xprt_done callback that was used to indicate success or failure of the transport layer setup, since, as the commit explains, we can report this via the mux. What this last commit says is true, except when there is no mux. For this, however, the sock_conn_iocb() function (formerly conn_fd_handler) is called for such errors, evaluates a number of conditions, none of which is matched in this error condition case, since sock_conn_check() instantly reports an error causing a jump to the leave label. There, the mux is notified if installed, and the function returns. In other error condition cases, readiness and activity are checked for both sides, the tasklets woken up and the corresponding subscriber flags removed. This means that a sane (and safe) approach would consist in just notifying the subscriber in case of error, if such a subscriber still exists: if still there, it means the event hasn't been caught earlier, then it's the right moment to report it. And since this is done after conn_notify_mux(), it still leaves all control to the mux once it's installed. This commit should be progressively backported as far as 2.2 since it's where the problem was introduced. It's important to clearly check the error path in each function to make sure the fix still does what it's supposed to. --- diff --git a/src/sock.c b/src/sock.c index fd9278383f..7ec4d46006 100644 --- a/src/sock.c +++ b/src/sock.c @@ -894,6 +894,14 @@ void sock_conn_iocb(int fd) if (unlikely(conn->flags & CO_FL_ERROR)) { if (conn_ctrl_ready(conn)) fd_stop_both(fd); + + if (conn->subs) { + t = conn->subs->tasklet; + conn->subs->events = 0; + if (!conn->subs->events) + conn->subs = NULL; + tasklet_wakeup(t); + } } }