MEDIUM: mux-h2: do not needlessly refrain from sending data early
The mux currently refrains from sending data before H2_CS_FRAME_H, i.e.
before the peer's SETTINGS frame was received. While it makes sense on
the frontend, it's causing harm on the backend because it forces the
first request to be sent in two halves over an extra RTT: first the
preface and settings, second the request once the settings are received.
This is totally contrary to the philosophy of the H2 protocol, consisting
in permitting the client to send as soon as possible.
Actually what happens is the following:
- process_stream() calls connect_server()
- connect_server() creates a connection, and if the proto/alpn is guessed
or known, the mux is instantiated for the current request.
- the H2 init code wakes the h2 tasklet up and returns
- process_stream() tries to send the request using h2_snd_buf(), but that
one sees that we're before H2_CS_FRAME_H, refrains from doing so and
returns.
- process_stream() subscribes and quits
- the h2 tasklet can now execute to send the preface and settings, which
leave as a first TCP segment. The connection is ready.
- the iocb is woken again once the server's SETTINGS frame is received,
turning the connection to the H2_CS_FRAME_H state, and the iocb wake
up process_stream().
- process_stream() executes again and can try to send again.
- h2_snd_buf() is called and finally sends the request as a second TCP
segment.
Not only this is inefficient, but it also renders 0-RTT and TFO impossible
on H2 connections. When 0-RTT is used, only the preface and settings leave
as early data (the very first data of that connection), which is totally
pointless.
In order to fix this, we have to go through a few steps:
- first we need to let data be sent to a server immediately after the
SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of
H2_CS_FRAME_H). However, some protocol extensions are advertised by
the server using SETTINGS (e.g. RFC8441) and some requests might need
to know the existence of such extensions. For this reason we're adding
a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some
operations were not done because a server's SETTINGS frame is needed.
This is set when trying to send a protocol upgrade or extended CONNECT
during H2_CS_SETTINGS1, indicating that it's needed to wait for
H2_CS_FRAME_H in this case. The flag is always set on frontend
connections. This is what is being done in this patch.
- second, we need to be able to push the preface opportunistically with
the first h2_snd_buf() so that it's not needed to wake the tasklet up
just to send that and wake process_stream() again. This will be in a
separate patch.
By doing the first step, we're at least saving one needless tasklet
wakeup per connection (~9%), which results in ~5% backend connection
rate increase.