From: Christopher Faulet Date: Fri, 18 Nov 2022 13:52:08 +0000 (+0100) Subject: BUG/MEDIUM: raw-sock: Don't report connection error if something was received X-Git-Tag: v2.7-dev9~11 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=dfefebcd7a01faab9988d0aa0811d43bdfc7c665;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: raw-sock: Don't report connection error if something was received In raw_sock_to_buf(), if a low-level error is reported, we no longer immediately set an error on the connexion if something was received. This may happen when a RST is received with data. This way, we let a chance to the mux to process received data first instead of immediately aborting. This patch should fix some spurious health-check failures. It is pretty hard to observe, but with a server immediately returning the response followed by a RST, without waiting the request, it is possible to have some health-check errors. For instance, with the following tcploop server: tcploop 8000 L Q W N1 A S:"HTTP/1.0 200 OK\r\n\r\n" F K ( Accept -> send response -> FIN -> Close) we can have such strace output: 15:11:21.433005 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 38 15:11:21.433141 fcntl(38, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 15:11:21.433233 setsockopt(38, SOL_TCP, TCP_NODELAY, [1], 4) = 0 15:11:21.433359 setsockopt(38, SOL_TCP, TCP_QUICKACK, [0], 4) = 0 15:11:21.433457 connect(38, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 15:11:21.434215 epoll_ctl(4, EPOLL_CTL_ADD, 38, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=38, u64=38}}) = 0 15:11:21.434468 epoll_wait(4, [{events=EPOLLOUT, data={u32=38, u64=38}}], 200, 21) = 1 15:11:21.434810 recvfrom(38, 0x7f32a83e5020, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 15:11:21.435405 sendto(38, "OPTIONS / HTTP/1.0\r\ncontent-leng"..., 41, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 41 15:11:21.435833 epoll_ctl(4, EPOLL_CTL_MOD, 38, {events=EPOLLIN|EPOLLRDHUP, data={u32=38, u64=38}}) = 0 15:11:21.435907 epoll_wait(4, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=38, u64=38}}], 200, 17) = 1 15:11:21.436024 recvfrom(38, "HTTP/1.0 200 OK\r\n\r\n", 16320, 0, NULL, NULL) = 19 15:11:21.436189 close(38) = 0 15:11:21.436402 write(2, "[WARNING] (163564) : Server bac"..., 184[WARNING] (163564) : Server back-http/www is DOWN, reason: Socket error, check duration: 5ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. The response was received, but it is ignored because an error was reported too. The error handling must be refactored. But it a titanic stain. Thus, for now, a good fix is to delay the error report when something was received. The error will be reported on the next receive, if any. This patch should fix the issue #1863, but it must be confirmed. At least it fixes the above example. It must be backported to 2.6. For older versions, it must be evaluated first. --- diff --git a/src/raw_sock.c b/src/raw_sock.c index e172b1d4e5..af95c82835 100644 --- a/src/raw_sock.c +++ b/src/raw_sock.c @@ -329,7 +329,7 @@ static size_t raw_sock_to_buf(struct connection *conn, void *xprt_ctx, struct bu * of recv()'s return value 0, so we have no way to tell there was * an error without checking. */ - if (unlikely(fdtab[conn->handle.fd].state & FD_POLL_ERR)) + if (unlikely(!done && fdtab[conn->handle.fd].state & FD_POLL_ERR)) conn->flags |= CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_SOCK_WR_SH; goto leave; }