When an H2 mux works with a slow downstream connection and without the
mux-mux mode, it is possible that a single stream will allocate all 32
buffers in the connection. This is not desirable at all because 1) it
brings no value, and 2) it allocates a lot of memory per connection,
which, in addition to using a lot of memory, tends to degrade performance
due to cache thrashing.
This patch improves the situation by refraining from sending data frames
over a connection when more mbufs than streams are allocated. On a test
featuring 10k connections each with a single stream reading from the
cache, this patch reduces the RAM usage from ~180k buffers to ~20k bufs,
and improves the bandwidth. This may even be backported later to recent
versions to improve memory usage. Note however that it is efficient only
when combined with
e16762f8a ("OPTIM: mux-h2: call h2_send() directly
from h2_snd_buf()"), and tends to slightly reduce the single-stream
performance without it, so in case of a backport, the two need to be
considered together.
mbuf = br_tail(h2c->mbuf);
retry:
+ if (br_count(h2c->mbuf) > h2c->nb_streams) {
+ /* more buffers than streams allocated, pointless
+ * to continue, we'd use more RAM for no reason.
+ */
+ h2s->flags |= H2_SF_BLK_MROOM;
+ TRACE_STATE("waiting for room in output buffer", H2_EV_TX_FRAME|H2_EV_TX_DATA|H2_EV_H2S_BLK, h2c->conn, h2s);
+ goto end;
+ }
+
if (!h2_get_buf(h2c, mbuf)) {
h2c->flags |= H2_CF_MUX_MALLOC;
h2s->flags |= H2_SF_BLK_MROOM;