Before muxes were used, we used to refrain from reading past the
buffer's reserve. But with muxes which have their own buffer, this
rule was a bit forgotten, resulting in an extraneous read to be
performed just because the rx buffer cannot be entirely transferred
to the stream layer:
sendto(12, "GET /?s=16k HTTP/1.1\r\nhost: 127."..., 84, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 84
recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 16320, 0, NULL, NULL) = 16320
recvfrom(12, ".
123456789.12345", 16, 0, NULL, NULL) = 16
recvfrom(12, "6789.
123456789.
12345678\n.
1234567"..., 15244, 0, NULL, NULL) = 182
recvfrom(12, 0x1e5d5d6, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
Here the server sends 16kB of payload after a headers block, the mux reads
16320 into the ibuf, and the stream layer consumes 15360 from the first
h1_rcv_buf(), which leaves 960 into the buffer and releases a few indexes.
The buffer cannot be realigned due to these remaining data, and a subsequent
read is made on 16 bytes, then again on 182 bytes.
By avoiding to read too much on the first call, we can avoid needlessly
filling this buffer:
recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 15360, 0, NULL, NULL) = 15360
recvfrom(12, "456789.
123456789.
123456789.12345"..., 16220, 0, NULL, NULL) = 1158
recvfrom(12, 0x1d52a3a, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
This is much more efficient and uses less RAM since the first buffer that
was emptied can now be released.
Note that a further improvement (tested) consists in reading even less
(typically 1kB) so that most of the data are transferred in zero-copy, and
are not read until process_stream() is scheduled. This patch doesn't do that
for now so that it can be backported without any obscure impact.
if (b_data(&h1c->ibuf) > 0 && b_data(&h1c->ibuf) < 128)
b_slow_realign_ofs(&h1c->ibuf, trash.area, sizeof(struct htx));
+ max = buf_room_for_htx_data(&h1c->ibuf);
+
/* avoid useless reads after first responses */
if (!h1c->h1s ||
(!(h1c->flags & H1C_F_IS_BACK) && h1c->h1s->req.state == H1_MSG_RQBEFORE) ||
- ((h1c->flags & H1C_F_IS_BACK) && h1c->h1s->res.state == H1_MSG_RPBEFORE))
+ ((h1c->flags & H1C_F_IS_BACK) && h1c->h1s->res.state == H1_MSG_RPBEFORE)) {
flags |= CO_RFL_READ_ONCE;
- max = buf_room_for_htx_data(&h1c->ibuf);
+ /* we know that the first read will be constrained to a smaller
+ * read by the stream layer in order to respect the reserve.
+ * Reading too much will result in global.tune.maxrewrite being
+ * left at the end of the buffer, and in a very small read
+ * being performed again to complete them (typically 16 bytes
+ * freed in the index after headers were consumed) before
+ * another larger read. Instead, given that we know we're
+ * waiting for a header and we'll be limited, let's perform a
+ * shorter first read that the upper layer can retrieve by just
+ * a pointer swap and the next read will be doable at once in
+ * an empty buffer.
+ */
+ if (max > global.tune.bufsize - global.tune.maxrewrite)
+ max = global.tune.bufsize - global.tune.maxrewrite;
+ }
+
if (max) {
if (h1c->flags & H1C_F_IN_FULL) {
h1c->flags &= ~H1C_F_IN_FULL;