From: Willy Tarreau Date: Thu, 18 Jul 2013 19:49:32 +0000 (+0200) Subject: BUG/MEDIUM: splicing: fix abnormal CPU usage with splicing X-Git-Tag: v1.5-dev20~332 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=61d39a0e2a047df78f7f3bfcf5584090913cdc65;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: splicing: fix abnormal CPU usage with splicing Mark Janssen reported an issue in 1.5-dev19 which was introduced in 1.5-dev12 by commit 96199b10. From time to time, randomly, the CPU usage spikes to 100% for seconds to minutes. A deep analysis of the traces provided shows that it happens when waiting for the response to a second pipelined HTTP request, or when trying to handle the received shutdown advertised by epoll() after the last block of data. Each time, splice() was involved with data pending in the pipe. The cause of this was that such events could not be taken into account by splice nor by recv and were left pending : - the transfer of the last block of data, optionally with a shutdown was not handled by splice() because of the validation that to_forward is higher than MIN_SPLICE_FORWARD ; - the next recv() call was inhibited because of the test on presence of data in the pipe. This is also what prevented the recv() call from handling a response to a pipelined request until the client had ACKed the previous response. No less than 4 different methods were experimented to fix this, and the current one was finally chosen. The principle is that if an event is not caught by splice(), then it MUST be caught by recv(). So we remove the condition on the pipe's emptiness to perform an recv(), and in order to prevent recv() from being used in the middle of a transfer, we mark supposedly full pipes with CO_FL_WAIT_ROOM, which makes sense because the reason for stopping a splice()-based receive is that the pipe is supposed to be full. The net effect is that we don't wake up and sleep in loops during these transient states. This happened much more often than expected, sometimes for a few cycles at end of transfers, but rarely long enough to be noticed, unless a client timed out with data pending in the pipe. The effect on CPU usage is visible even when transfering 1MB objects in pipeline, where the CPU usage drops from 10 to 6% on a small machine at medium bandwidth. Some further improvements are needed : - the last chunk of a splice() transfer is never done using splice due to the test on to_forward. This is wrong and should be performed with splice if the pipe has not yet been emptied ; - si_chk_snd() should not be called when the write event is already being polled, otherwise we're almost certain to get EAGAIN. Many thanks to Mark for all the traces he cared to provide, they were essential for understanding this issue which was not reproducible without. Only 1.5-dev is affected, no backport is needed. --- diff --git a/src/raw_sock.c b/src/raw_sock.c index e030253591..b9bb8dcb1e 100644 --- a/src/raw_sock.c +++ b/src/raw_sock.c @@ -165,6 +165,7 @@ int raw_sock_to_pipe(struct connection *conn, struct pipe *pipe, unsigned int co /* We've read enough of it for this time, let's stop before * being asked to poll. */ + conn->flags |= CO_FL_WAIT_ROOM; break; } } /* while */ diff --git a/src/stream_interface.c b/src/stream_interface.c index 8a21d398a3..90e4044fb0 100644 --- a/src/stream_interface.c +++ b/src/stream_interface.c @@ -964,20 +964,29 @@ static void si_conn_recv_cb(struct connection *conn) if (conn->flags & CO_FL_ERROR) goto out_error; - if (conn->flags & CO_FL_WAIT_ROOM) /* most likely the pipe is full */ + if (conn->flags & CO_FL_WAIT_ROOM) { + /* the pipe is full or we have read enough data that it + * could soon be full. Let's stop before needing to poll. + */ si->flags |= SI_FL_WAIT_ROOM; + __conn_data_stop_recv(conn); + } /* splice not possible (anymore), let's go on on standard copy */ } abort_splice: - /* release the pipe if we can, which is almost always the case */ - if (chn->pipe && !chn->pipe->data) { + if (chn->pipe && unlikely(!chn->pipe->data)) { put_pipe(chn->pipe); chn->pipe = NULL; } - while (!chn->pipe && !(conn->flags & (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH | CO_FL_WAIT_RD | CO_FL_WAIT_ROOM | CO_FL_HANDSHAKE))) { + /* Important note : if we're called with POLL_IN|POLL_HUP, it means the read polling + * was enabled, which implies that the recv buffer was not full. So we have a guarantee + * that if such an event is not handled above in splice, it will be handled here by + * recv(). + */ + while (!(conn->flags & (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH | CO_FL_WAIT_RD | CO_FL_WAIT_ROOM | CO_FL_HANDSHAKE))) { max = bi_avail(chn); if (!max) {