Amaury Denoyelle [Tue, 24 May 2022 15:22:07 +0000 (17:22 +0200)]
MINOR: h3: abort read on unknown uni stream
As specified by HTTP/3 draft, an unknown unidirectional stream can be
aborted. To do this, use a new flag QC_SF_READ_ABORTED. When the MUX
detects this flag, QCS instance is automatically freed.
Previously, such streams were instead automatically drained. By aborting
them, we economize some useless memcpy instruction. On future data
reception, QCS instance is not found in the tree and considered as
already closed. The frame payload is thus deleted without copying it.
Amaury Denoyelle [Tue, 24 May 2022 14:27:41 +0000 (16:27 +0200)]
CLEANUP: h3: remove h3 uni tasklet
Remove all unnecessary bits of code for H3 unidirectional streams. Most
notable, an individual tasklet is not require anymore for each stream.
This is useless since the merge of RX/TX uni streams handling with
bidirectional streams code.
Amaury Denoyelle [Tue, 24 May 2022 13:26:07 +0000 (15:26 +0200)]
MEDIUM: quic: refactor uni streams RX
The whole QUIC stack is impacted by this change :
* at quic-conn level, a single function is now used to handle uni and
bidirectional streams. It uses qcc_recv() function from MUX.
* at MUX level, qc_recv() io-handler function does not skip uni streams
* most changes are conducted at app layer. Most notably, all received
data is handle by decode_qcs operation.
Now that decode_qcs is the single app read function, the H3 layer can be
simplified. Uni streams parsing was extracted from h3_attach_ruqs() to
h3_decode_qcs().
h3_decode_qcs() is able to deal with all HTTP/3 frame types. It first
check if the frame is valid for the H3 stream type. Most notably,
SETTINGS parsing was moved from h3_control_recv() into h3_decode_qcs().
This commit has some major benefits besides removing duplicated code.
Mainly, QUIC flow control is now enforced for uni streams as with bidi
streams. Also, an unknown frame received on control stream does not set
an error : it is now silently ignored as required by the specification.
Some cleaning in H3 code is already done with this patch :
h3_control_recv() and h3_attach_ruqs() are removed as they are now
unused. A final patch should clean up the unneeded remaining bit.
Amaury Denoyelle [Tue, 24 May 2022 13:25:19 +0000 (15:25 +0200)]
MINOR: h3: define non-h3 generic parsing function
Define a new function h3_parse_uni_stream_no_h3(). It can be used to
handle the payload of streams which does not convey H3 frames. This is
mainly useful for QPACK encoder/decoder streams. It can also be used for
a stream of unknown type which should be drain without parsing it.
This patch is useful to extract code in a dedicated function. It will be
simple to reuse it in h3_decode_qcs() when uni-streams reception is
unify with bidirectional streams, without using dedicated stream tasklet.
Amaury Denoyelle [Tue, 24 May 2022 13:24:32 +0000 (15:24 +0200)]
MINOR: h3: check if frame is valid for stream type
Define a new function h3_is_frame_valid(). It returns if a frame is
valid or not depending on the stream which received it.
For the moment, it is used in h3_decode_qcs() which only deals with
bidirectional streams. Soon, uni streams will use the same function,
rendering the frame type check useful.
Amaury Denoyelle [Tue, 24 May 2022 13:14:53 +0000 (15:14 +0200)]
MINOR: h3: refactor uni streams initialization
Define a new function h3_init_uni_stream(). This can be used to read the
stream type of an unidirectional stream. There is no functional change
with previous code.
This patch will be useful to unify reception for uni streams with
bidirectional ones.
Amaury Denoyelle [Tue, 24 May 2022 13:24:03 +0000 (15:24 +0200)]
MINOR: h3: define stream type
Define a new enum h3s_t. This is used to differentiate between the
different stream types used in a HTTP/3 connection, including the QPACK
encoder/decoder streams.
For the moment, only bidirectional streams is positioned. This patch
will be useful to unify reception of uni streams with bidirectional
ones.
Amaury Denoyelle [Mon, 23 May 2022 12:25:53 +0000 (14:25 +0200)]
MINOR: h3/qpack: use qcs as type in decode callbacks
Replace h3_uqs type by qcs in stream callbacks. This change is done in
the context of unification between bidi and uni-streams. h3_uqs type
will be unneeded when this is achieved.
Amaury Denoyelle [Mon, 23 May 2022 09:39:14 +0000 (11:39 +0200)]
BUG/MINOR: mux-quic: refactor uni streams TX/send H3 SETTINGS
Remove the unneeded skip over unidirectional streams in qc_send(). This
unify sending for both uni and bidi streams.
In fact, the only local unidirectional streams in use for the moment is
the H3 Control stream responsible of SETTINGS emission. The frame was
already properly generated in qcs.tx.buf, but not send due to stream
skip in qc_send(). Now, there is no need to ignore uni streams so remove
this condition.
This fixes the emission of H3 settings which is now properly emitted.
Uni and bidi streams use the same set of funtcions for sending. One of
the most notable gain is that flow-control is now enforced for uni
streams.
Amaury Denoyelle [Mon, 23 May 2022 14:12:49 +0000 (16:12 +0200)]
MINOR: mux-quic: emit STREAM_STATE_ERROR in qcc_recv
Emit STREAM_STATE_ERROR connection error in two cases :
* if receiving data for send-only stream
* if receiving data on a locally initiated stream not open yet
For the moment the first case cannot be encoutered as uni streams
reception does not use qcc_recv(). However, this will be soon
implemented with the unification between bidi and uni streams.
Amaury Denoyelle [Tue, 24 May 2022 09:13:46 +0000 (11:13 +0200)]
MINOR: h3: reject too big frames
The whole frame payload must have been received to demux a H3 frames,
except for H3 DATA which can be fragmented into multiple HTX blocks.
If the frame is bigger than the buffer and is not a DATA frame, a
connection error is reported with error H3_EXCESSIVE_LOAD.
This should be completed in the future with the H3 settings to limit the
size of uncompressed header section.
This code is more generic : it can handle every H3 frames. This is done
in order to be able to use h3_decode_qcs() to demux both uni and bidir
streams.
Amaury Denoyelle [Tue, 24 May 2022 12:47:48 +0000 (14:47 +0200)]
MINOR: mux-quic: disable read on CONNECTION_CLOSE emission
Similar to sending, read operations are disabled when a CONNECTION_CLOSE
frame has been emitted.
Most notably, this prevents unneeded loop demuxing when the H3 layer has
issue an error and cannot process the buffer payload anymore.
Note that read is not prevented for unidirectional streams for the
moment. This will supported soon with the unification of bidir and uni
streams treatment.
Amaury Denoyelle [Mon, 23 May 2022 14:12:15 +0000 (16:12 +0200)]
MINOR: quic: support CONNECTION_CLOSE_APP emission
Complete quic-conn API for error reporting. A new parameter <app> is
defined in the function quic_set_connection_close(). This will transform
the frame into a CONNECTION_CLOSE_APP type.
This type of frame will be generated by the applicative layer, h3 or
hq-interop for the moment. A new function qcc_emit_cc_app() is exported
by the MUX layer for them.
Amaury Denoyelle [Tue, 24 May 2022 13:06:10 +0000 (15:06 +0200)]
MINOR: h3: refactor h3_control_send()
The only change is that the H3_CF_SETTINGS_SENT flag if-condition is
replaced by a BUG_ON statement. This may help to catch multiple calls on
h3_control_send() instead of silently ignore them.
Amaury Denoyelle [Tue, 24 May 2022 14:30:11 +0000 (16:30 +0200)]
BUG/MINOR: h3: prevent overflow when parsing SETTINGS
h3_parse_settings_frm() read one byte after the frame payload. Fix the
parsing code. In most cases, this has no impact as we are inside an
allocated buffer but it could cause a segfault depending on the buffer
alignment.
Amaury Denoyelle [Tue, 24 May 2022 12:55:43 +0000 (14:55 +0200)]
CLEANUP: h3: rename struct h3 -> h3c
struct h3 represents the whole HTTP/3 connection. A new type h3s was
recently introduced to represent a single HTTP/3 stream. To facilitate
the analogy with other haproxy code, most notable in MUX, rename h3 type
to h3c.
Amaury Denoyelle [Tue, 24 May 2022 14:53:56 +0000 (16:53 +0200)]
MINOR: mux-quic: delay cs_endpoint allocation
Do not allocate cs_endpoint for every QCS instances in qcs_new().
Instead, this is delayed to qc_attach_cs() function.
In effect, with H3 as app protocol, cs_endpoint will be allocated on
HEADERS parsing. Thus, no cs_endpoint is allocated for H3 unidirectional
streams which do not convey any HTTP data.
Amaury Denoyelle [Tue, 24 May 2022 16:14:28 +0000 (18:14 +0200)]
MINOR: h3: mark ncbuf as const on h3_b_dup
h3_b_dup() is used to obtains a ncbuf representation into a struct
buffer. ncbuf can thus be marked as a const parameter. This will allows
function which already manipulates a const ncbuf to use it.
BUG/MINOR: task: Don't defer tasks release when HAProxy is stopping
A running or queued task is not released when task_destroy() is called,
except if it is the current task. Its process function is set to NULL and we
let the scheduler to release the task. However, when HAProxy is stopping, it
never happens and some tasks may leak. To fix the issue, we now also rely on
the global MODE_STOPPING flag. When this flag is set, the task is always
immediately released.
This patch should fix the issue #1714. It could be backported as far as 2.4
but it's not a real problem in practice because it only happens on
deinit. The leak exists on previous versions but not MODE_STOPPING flag.
Emeric Brun [Wed, 25 May 2022 08:25:45 +0000 (10:25 +0200)]
BUG/MEDIUM: peers: prevent unitialized multiple listeners on peers section
The previous fix:
BUG/MEDIUM: peers: fix segfault using multiple bind on peers
Prevents to declare multiple listeners on a peers sections but if
peers protocol is extended to support this we could raise the bug
again.
Indeed, after allocating a new listener and adding it to a list the
code mistakenly re-configure the first element of the list instead
of the new added one, and the last one remains finally uninitialized.
The previous fix assure there is no more than one listener in this
list but this could be changed in futur.
This patch patch assures we configure and initialize the newly added
listener instead of the first one in the list.
This patch could be backported until version 2.0 to complete
BUG/MEDIUM: peers: fix segfault using multiple bind on peers
Emeric Brun [Wed, 25 May 2022 08:12:07 +0000 (10:12 +0200)]
BUG/MEDIUM: peers: fix segfault using multiple bind on peers sections
If multiple "bind" lines were present on the "peers" section, multiple
listeners were added to a list but the code mistakenly initialize
the first member and this first listener was re-configured instead of
the newly created one. The last one remains uninitialized causing a null
dereference a soon a connection is received.
In addition, the 'peers' sections and protocol are not currently designed to
handle multiple listeners.
This patch check if there is already a listener configured on the 'peers'
section when we want to create a new one. This is rising an error if
a listener is already present showing the file and line in the error
message.
To keep the file and line number of the previous listener available
for the error message, the 'bind_conf_uniq_alloc' function was modified
to keep the file/line data the struct 'bind_conf' was firstly
allocated (previously it was updated each time the 'bind_conf' was
reused).
BUG/MEDIUM: resolvers: Don't defer resolutions release in deinit function
resolvers_deinit() function is called on error, during post-parsing stage,
or on deinit, when HAProxy is stopped. It releases all entities: resolvers,
resolutions and SRV requests. There is no reason to defer the resolutions
release by moving them in the death_row list because this function is
terminal. And it is in fact a bug. Resolutions must not be released at the
end of the function because resolvers were already freed. However some
resolutions may still be attached to a reolver. Thus, when we try to remove
it from the resolver's tree, in resolv_reset_resolution(), this resolver was
already released.
So now, resolution are immediately released. It means there is no more
reason to track this function. calls to
enter_resolver_code()/leave_resolver_code() have been removed.
This patch should fix the issue #1680 and may be related to #1485. It must
be backported as far as 2.2.
Willy Tarreau [Tue, 24 May 2022 13:34:26 +0000 (15:34 +0200)]
MEDIUM: h1: enlarge the scope of accepted version chars with accept-invalid-http-request
We used to support both RTSP and HTTP protocol version names with and
without accept-invalid-http-request, but since this is based on the
characters themselves, any protocol made of chars {0-9/.HPRST} was
possible and not others. Now that such non-standard protocols are
restricted to accept-invalid-http-request, there's no reason for not
allowing other letters. With this patch, characters {0-9./A-Z} are
permitted when the option is set.
This patch hardens the verification of the HTTP/1.x version line
(i.e. the first line within an HTTP/1.x request) to verify that
the protocol name within the version actually reads "HTTP".
Previously protocols that superficially resembled the wire-format
of HTTP/1.x and having a 4-letter acronym as the protocol name, such
as RTSP would pass this check.
This patch fixes GitHub issue #540, it must be backported to all
supported versions. The legacy, non-HTX parser is affected as well,
a fix must be created for it as well.
Note that such protocols can still be used when option
accept-invalid-http-request is set.
Willy Tarreau [Tue, 24 May 2022 05:43:57 +0000 (07:43 +0200)]
CLEANUP: init: address a coverity warning about possible multiply overflow
In issue #1585 Coverity suspects a risk of multiply overflow when
calculating the SSL cache size, though in practice the cache is
limited to 2^32 anyway thus it cannot really happen. Nevertheless,
casting the operation should be sufficient to avoid marking it as a
false positive.
This patch was useful mainly for the docker image of QUIC interop to
have traces on stdout.
A better solution has been found by integrating this patch directly in
the qns repository which is used to build the docker image. Thus, this
hack is not require anymore in the main repository.
Amaury Denoyelle [Mon, 23 May 2022 06:52:58 +0000 (08:52 +0200)]
BUG/MEDIUM: mux-quic: adjust buggy proxy closing support
The wake handler detects if the frontend is closed. This can happen if
the proxy has been disabled individually or even on process soft-stop.
Before this patch, in this condition QCS instances were freed before
being detached from the cs_endpoint. This clearly violates the haproxy
connection architecture and cause a BUG_ON statement crash in cs_free().
To handle this properly, cs_endpoint is notified by setting RD_SH|WR_SH
on connection flags. The cs_endpoint will thus use the detach operation
which allows the QCS instance to be freed.
This code allows the soft-stop process to complete as soon as possible.
However, the client is not notified about the connection closing. It
should be done by emitting a H3 GOAWAY + CONNECTION_CLOSE. Sadly, this
is impossible at this stage because the listener sockets are closed so
the quic-conn cannot use it to emit new frames. At this stage the client
will most probably detect connection closing on its idle timeout
expiration.
Thus, to completely support proxy closing/soft-stop, important
architecture changes are required in QUIC socket management. This is
also linked with the reload feature.
Tim Duesterhus [Sun, 22 May 2022 10:40:58 +0000 (12:40 +0200)]
CLEANUP: tools: Clean up non-QUIC error message handling in str2sa_range()
If QUIC support is enabled both branches of the ternary conditional are
identical, upsetting Coverity. Move the full conditional into the non-QUIC
preprocessor branch to make the code more clear.
Willy Tarreau [Fri, 20 May 2022 21:31:51 +0000 (23:31 +0200)]
[RELEASE] Released version 2.6-dev11
Released version 2.6-dev11 with the following main changes :
- CI: determine actual LibreSSL version dynamically
- BUG/MEDIUM: ncbuf: fix null buffer usage
- MINOR: ncbuf: fix warnings for testing build
- MEDIUM: http-ana: Add a proxy option to restrict chars in request header names
- MEDIUM: ssl: Delay random generator initialization after config parsing
- MINOR: ssl: Add 'ssl-propquery' global option
- MINOR: ssl: Add 'ssl-provider' global option
- CLEANUP: Add missing header to ssl_utils.c
- CLEANUP: Add missing header to hlua_fcn.c
- CLEANUP: Remove unused function hlua_get_top_error_string
- BUILD: fix build warning on solaris based systems with __maybe_unused.
- MINOR: tools: add get_exec_path implementation for solaris based systems.
- BUG/MINOR: ssl: Fix crash when no private key is found in pem
- CLEANUP: conn-stream: Remove cs_applet_shut declaration from header file
- MINOR: applet: Prepare appctx to own the session on frontend side
- MINOR: applet: Let the frontend appctx release the session
- MINOR: applet: Change return value for .init callback function
- MINOR: stream: Export stream_free()
- MINOR: applet: Add appctx_init() helper fnuction
- MINOR: applet: Add a function to finalize frontend appctx startup
- MINOR: applet: Add function to release appctx on error during init stage
- MEDIUM: dns: Refactor dns appctx creation
- MEDIUM: spoe: Refactor SPOE appctx creation
- MEDIUM: lua: Refactor cosocket appctx creation
- MEDIUM: httpclient: Refactor http-client appctx creation
- MINOR: sink: Add a ref to sink in the sink_forward_target structure
- MEDIUM: sink: Refactor sink forwarder appctx creation
- MINOR: peers: Add a ref to peers section in the peer structure
- MEDIUM: peers: Refactor peer appctx creation
- MINOR: applet: Add API to start applet on a thread subset
- MEDIUM: applet: Add support for async appctx startup on a thread subset
- MINOR: peers: Track number of applets run by thread
- MEDIUM: peers: Balance applets across threads
- MINOR: conn-stream/applet: Stop setting appctx as the endpoint context
- CLEANUP: proxy: Remove dead code when parsing "http-restrict-req-hdr-names" option
- REGTESTS: abortonclose: Fix some race conditions
- MINOR: ssl: Add 'ssl-provider-path' global option
- CLEANUP: http_ana: Make use of the return value of stream_generate_unique_id()
- BUG/MINOR: spoe: Fix error handling in spoe_init_appctx()
- CLEANUP: peers: Remove unreachable code in peer_session_create()
- CLEANUP: httpclient: Remove useless test on ss_dst in httpclient_applet_init()
- BUG/MEDIUM: quic: fix Rx buffering
- OPTIM: quic: realign empty Rx buffer
- BUG/MINOR: ncbuf: fix ncb_is_empty()
- MINOR: ncbuf: refactor ncb_advance()
- BUG/MINOR: mux-quic: update session's idle delay before stream creation
- MINOR: h3: do not wait a complete frame for demuxing
- MINOR: h3: flag demux as full on HTX full
- MEDIUM: mux-quic: implement recv on io-cb
- MINOR: mux-quic: remove qcc_decode_qcs() call in XPRT
- MINOR: mux-quic: reorganize flow-control frames emission
- MINOR: mux-quic: implement MAX_STREAM_DATA emission
- MINOR: mux-quic: implement MAX_DATA emission
- BUG/MINOR: mux-quic: support nul buffer with qc_free_ncbuf()
- MINOR: mux-quic: free RX buf if empty
- BUG/MEDIUM: config: Reset outline buffer size on realloc error in readcfgfile()
- BUG/MINOR: check: Reinit the buffer wait list at the end of a check
- MEDIUM: check: No longer shutdown the connection in .wake callback function
- REORG: check: Rename and export I/O callback function
- MEDIUM: check: Use the CS to handle subscriptions for read/write events
- BUG/MINOR: quic: break for error on sendto
- MINOR: quic: abort on unlisted errno on sendto()
- MINOR: quic: detect EBADF on sendto()
- BUG/MEDIUM: quic: fix initialization for local/remote TPs
- CLEANUP: quic: adjust comment/coding style for TPs init
- BUG/MINOR: cfgparse: abort earlier in case of allocation error
- MINOR: quic: Dump initial derived secrets
- MINOR: quic_tls: Add quic_tls_derive_retry_token_secret()
- MINOR: quic_tls: Add quic_tls_decrypt2() implementation
- MINOR: quic: Retry implementation
- MINOR: cfgparse: Update for "cluster-secret" keyword for QUIC Retry
- MINOR: quic: Move quic_lstnr_dgram_dispatch() out of xprt_quic.c
- BUILD: stats: Missing headers inclusions from stats.h
- MINOR: quic_stats: Add a new stats module for QUIC
- MINOR: quic: Attach proxy QUIC stats counters to the QUIC connection
- BUG/MINOR: quic: Fix potential memory leak during QUIC connection allocations
- MINOR: quic: QUIC stats counters handling
- MINOR: quic: Add tune.quic.retry-threshold keyword
- MINOR: quic: Dynamic Retry implementation
- MINOR: quic/mux-quic: define CONNECTION_CLOSE send API
- MINOR: mux-quic: emit FLOW_CONTROL_ERROR
- MINOR: mux-quic: emit STREAM_LIMIT_ERROR
- MINOR: mux-quic: close connection on error if different data at offset
- BUG/MINOR: peers: fix error reporting of "bind" lines
- CLEANUP: config: improve address parser error report for unmatched protocols
- CLEANUP: config: provide cleare hints about unsupported QUIC addresses
- MINOR: protocol: replace ctrl_type with xprt_type and clarify it
- MINOR: listener: provide a function to process all of a bind_conf's arguments
- MINOR: config: use the new bind_parse_args_list() to parse a "bind" line
- CLEANUP: listener: add a comment about what the BC_SSL_O_* flags are for
- MINOR: listener: add a new "options" entry in bind_conf
- CLEANUP: listener: replace all uses of bind_conf->is_ssl with BC_O_USE_SSL
- CLEANUP: listener: replace bind_conf->generate_cers with BC_O_GENERATE_CERTS
- CLEANUP: listener: replace bind_conf->quic_force_retry with BC_O_QUIC_FORCE_RETRY
- CLEANUP: listener: store stream vs dgram at the bind_conf level
- MINOR: listener: detect stream vs dgram conflict during parsing
- MINOR: listener: set the QUIC xprt layer immediately after parsing the args
- MINOR: listener/ssl: set the SSL xprt layer only once the whole config is known
- MINOR: connection: add flag MX_FL_FRAMED to mark muxes relying on framed xprt
- MINOR: config: detect and report mux and transport incompatibilities
- MINOR: listener: automatically select a QUIC mux with a QUIC transport
- MINOR: listener: automatically enable SSL if a QUIC transport is found
- BUG/MINOR: quic: Fixe a typo in qc_idle_timer_task()
- BUG/MINOR: quic: Missing <conn_opening> stats counter decrementation
- BUILD/MINOR: cpuset fix build for FreeBSD 13.1
- CI: determine actual OpenSSL version dynamically
David CARLIER [Wed, 18 May 2022 14:45:40 +0000 (15:45 +0100)]
BUILD/MINOR: cpuset fix build for FreeBSD 13.1
the cpuset api changes done fir the future 14 release had been
backported to the 13.1 release so changing the cpuset api of choice
condition change accordingly.
When we receive a CONNECTION_CLOSE frame, we should decrement this counter
if the handshake state was not successful and if we have not received
a TLS alert from the TLS stack.
[WARNING] (17867) : config : Proxy 'decrypt': A certificate was specified but SSL was not enabled on bind 'quic4@:4449' at [quic-mini.cfg:24] (use 'ssl').
Let's automatically turn SSL on when QUIC is detected, as it doesn't
exist without SSL anyway. It solves the runtime issue, and also makes
sure it is not possible to accidentally configure a quic listener with
no certificate since the error is detected via the SSL checks.
A warning is emitted in this case, to encourage the user to fix the
configuration so that it remains reviewable.
Willy Tarreau [Fri, 20 May 2022 16:07:06 +0000 (18:07 +0200)]
MINOR: listener: automatically select a QUIC mux with a QUIC transport
When no mux protocol is configured on a bind line with "proto", and the
transport layer is QUIC, right now mux_h1 is being used, leading to a
crash.
Now when the transport layer of the bind line is already known as being
QUIC, let's automatically try to configure the QUIC mux, so that users
do not have to enter "proto quic" all the time while it's the only
supported option. this means that the following line now works:
Willy Tarreau [Fri, 20 May 2022 15:53:32 +0000 (17:53 +0200)]
MINOR: config: detect and report mux and transport incompatibilities
Till now, placing "proto h1" or "proto h2" on a "quic" bind or placing
"proto quic" on a TCP line would parse fine but would crash when traffic
arrived. The reason is that there's a strong binding between the QUIC
mux and QUIC transport and that they're not expected to be called with
other types at all.
Now that we have the mux's type and we know the type of the protocol used
on the bind conf, we can perform such checks. This now returns:
[ALERT] (16978) : config : frontend 'decrypt' : stream-based MUX protocol 'h2' is incompatible with framed transport of 'bind quic4@:4448' at [quic-mini.cfg:27].
[ALERT] (16978) : config : frontend 'decrypt' : frame-based MUX protocol 'quic' is incompatible with stream transport of 'bind :4448' at [quic-mini.cfg:29].
This config tightening is only tagged MINOR since while such a config,
despite not reporting error, cannot work at all so even if it breaks
experimental configs, they were just waiting for a single connection
to crash.
MINOR: connection: add flag MX_FL_FRAMED to mark muxes relying on framed xprt
In order to be able to check compatibility between muxes and transport
layers, we'll need a new flag to tag muxes that work on framed transport
layers like QUIC. Only QUIC has this flag now.
Willy Tarreau [Fri, 20 May 2022 15:14:31 +0000 (17:14 +0200)]
MINOR: listener/ssl: set the SSL xprt layer only once the whole config is known
We used to preset XPRT_SSL on bind_conf->xprt when parsing the "ssl"
keyword, which required to be careful about what QUIC could have set
before, and which makes it impossible to consider the whole line to
set all options.
Now that we have the BC_O_USE_SSL option on the bind_conf, it becomes
easier to set XPRT_SSL only once the bind_conf's args are parsed.
Willy Tarreau [Fri, 20 May 2022 15:10:00 +0000 (17:10 +0200)]
MINOR: listener: set the QUIC xprt layer immediately after parsing the args
It used to be set when parsing the listeners' addresses but this comes
with some difficulties in that other places have to be careful not to
replace it (e.g. the "ssl" keyword parser).
Now we know what protocols a bind_conf line relies on, we can set it
after having parsed the whole line.
Willy Tarreau [Fri, 20 May 2022 14:20:52 +0000 (16:20 +0200)]
MINOR: listener: detect stream vs dgram conflict during parsing
Now that we have a function to parse all bind keywords, and that we
know what types of sock-level and xprt-level protocols a bind_conf
is using, it's easier to centralize the check for stream vs dgram
conflict by putting it directly at the end of the args parser. This
way it also works for peers, provides better precision in the report,
and will also allow to validate transport layers. The check was even
extended to detect inconsistencies between xprt layer (which were not
covered before). It can even detect that there are two incompatible
"bind" lines in a single peers section.
Willy Tarreau [Fri, 20 May 2022 14:15:01 +0000 (16:15 +0200)]
CLEANUP: listener: store stream vs dgram at the bind_conf level
Let's collect the set of xprt-level and sock-level dgram/stream protocols
seen on a bind line and store that in the bind_conf itself while they're
being parsed. This will make it much easier to detect incompatibilities
later than the current approch which consists in scanning all listeners
in post-parsing.
Willy Tarreau [Fri, 20 May 2022 13:52:31 +0000 (15:52 +0200)]
MINOR: listener: add a new "options" entry in bind_conf
There is no way to store useful info there, yet there's about one entry
per boolean. Let's add an "options" attribute which will collect various
options.
In practice, even the BC_O_SSL_* flags and a few info such as strict_sni
could move there.
Willy Tarreau [Fri, 20 May 2022 13:44:17 +0000 (15:44 +0200)]
MINOR: config: use the new bind_parse_args_list() to parse a "bind" line
This now makes sure that both the peers' "bind" line and the regular one
will use the exact same parser with the exact same behavior. Note that
the parser applies after the address and that it could be factored
further, since the peers one still does quite a bit of duplicated work.
Willy Tarreau [Fri, 20 May 2022 13:41:45 +0000 (15:41 +0200)]
MINOR: listener: provide a function to process all of a bind_conf's arguments
The "bind" parsing code was duplicated for the peers section and as a
result it wasn't kept updated, resulting in slightly different error
behavior (e.g. errors were not freed, warnings were emitted as alerts)
Let's first unify it into a new dedicated function that properly reports
and frees the error.
Willy Tarreau [Fri, 20 May 2022 14:36:46 +0000 (16:36 +0200)]
MINOR: protocol: replace ctrl_type with xprt_type and clarify it
There's been some great confusion between proto_type, ctrl_type and
sock_type. It turns out that ctrl_type was improperly chosen because
it's not the control layer that is of this or that type, but the
transport layer, and it turns out that the transport layer doesn't
(normally) denaturate the underlying control layer, except for QUIC
which turns dgrams to streams. The fact that the SOCK_{DGRAM|STREAM}
set of values was used added to the confusion.
Let's replace it with xprt_type which reuses the later introduced
PROTO_TYPE_* values, and update the comments to explain which one
works at what level.
Willy Tarreau [Fri, 20 May 2022 13:19:48 +0000 (15:19 +0200)]
BUG/MINOR: peers: fix error reporting of "bind" lines
In case the str2listener() parser reports a generic error with no message
when parsing the argument of a "bind" statement in a "peers" section, the
reported error indicates an invalid address on the empty arg. This has
existed since 2.0 with commit 355b2033e ("MINOR: cfgparse: SSL/TLS binding
in "peers" sections."), so this must be backported till 2.0.
Amaury Denoyelle [Fri, 20 May 2022 13:14:57 +0000 (15:14 +0200)]
MINOR: mux-quic: close connection on error if different data at offset
As specified by the RFC reception of different STREAM data for the same
offset should be treated with a CONNECTION_CLOSE with error
PROTOCOL_VIOLATION.
Use ncbuf API to detect this case : if add operation fails with
NCB_RET_DATA_REJ with add mode NCB_ADD_COMPARE.
Amaury Denoyelle [Fri, 20 May 2022 14:45:32 +0000 (16:45 +0200)]
MINOR: mux-quic: emit STREAM_LIMIT_ERROR
Send a CONNECTION_CLOSE on reception of a STREAM frame for a STREAM id
exceeding the maximum value enforced. Only implemented for bidirectional
streams for the moment.
Amaury Denoyelle [Fri, 20 May 2022 13:05:07 +0000 (15:05 +0200)]
MINOR: mux-quic: emit FLOW_CONTROL_ERROR
Send a CONNECTION_CLOSE if the peer emits more data than authorized by
our flow-control. This is implemented for both stream and connection
level.
Fields have been added in qcc/qcs structures to differentiate received
offsets for limit enforcing with consumed offsets for sending of
MAX_DATA/MAX_STREAM_DATA frames.
Amaury Denoyelle [Fri, 20 May 2022 13:04:38 +0000 (15:04 +0200)]
MINOR: quic/mux-quic: define CONNECTION_CLOSE send API
Define an API to easily set a CONNECTION_CLOSE. This will mainly be
useful for the MUX when an error is detected which require to close the
whole connection.
On the MUX side, a new flag is added when a CONNECTION_CLOSE has been
prepared. This will disable add future send operations.
We rely on <conn_opening> stats counter and tune.quic.retry_threshold
setting to dynamically start sending Retry packets. We continue to send such packets
when "quic-force-retry" setting is set. The difference is when we receive tokens.
We check them regardless of this setting because the Retry could have been
dynamically started. We must also send Retry packets when we receive Initial
packets without token if the dynamic Retry threshold was reached but only for connection
which are not currently opening or in others words for Initial packets without
connection already instantiated. Indeed, we must not send Retry packets for all
Initial packets without token. For instance a client may have already sent an
Initial packet without receiving Retry packet because the Retry feature was not
started, then the Retry starts on exeeding the threshold value due to others
connections, then finally our client decide to send another Initial packet
(to ACK Initial CRYPTO data for instance). It does this without token. So, for
this already existing connection we must not send a Retry packet.
This QUIC specific keyword may be used to set the theshold, in number of
connection openings, beyond which QUIC Retry feature will be automatically
enabled. Its default value is 100.
First commit to handle the QUIC stats counters. There is nothing special to say
except perhaps for ->conn_openings which is a gauge to count the number of
connection openings. It is incremented after having instantiated a quic_conn
struct, then decremented when the handshake was successful (handshake completed
state) or failed or when the connection timed out without reaching the handshake
completed state.
BUG/MINOR: quic: Fix potential memory leak during QUIC connection allocations
Move the code which finalizes the QUIC connections initialisations after
having called qc_new_conn() into this function to benefit from its
error handling to release the memory allocated for QUIC connections
the initialization of which could not be finalized.
BUILD: stats: Missing headers inclusions from stats.h
If we add a new stats module to C source files including only
stats.h we get these errors:
include/haproxy/stats.h:39:31: error: array type has incomplete element type
‘struct name_desc’
39 | extern const struct name_desc stat_fields[];
include/haproxy/stats.h:55:50: warning: ‘struct listener’ declared inside
parameter list will not be visible outside of this definition or declaration
55 | int stats_fill_li_stats(struct proxy *px, struct listener *l, int flags,
name_desc struct is defined in tools-t.h and listener struct in listner-t.h.
Here is the format of a token:
- format (1 byte)
- ODCID (from 9 up 21 bytes)
- creation timestamp (4 bytes)
- salt (16 bytes)
A format byte is required to distinguish the Retry token from others sent in
NEW_TOKEN frames.
The Retry token is ciphered after having derived a strong secret from the cluster secret
and generated the AEAD AAD, as well as a 16 bytes long salt. This salt is
added to the token. Obviously it is not ciphered. The format byte is not
ciphered too.
The AAD are built by quic_generate_retry_token_aad() which concatenates the version,
the client SCID and the IP address and port. We had to implement quic_saddr_cpy()
to copy the IP address and port to the AAD buffer. Only the Retry SCID is generated
on our side to build a Retry packet, the others fields come from the first packet
received by the client. It must reuse this Retry SCID in response to our Retry packet.
So, we have not to store it on our side. Everything is offloaded to the client (stateless).
quic_generate_retry_token() must be used to generate a Retry packet. It calls
quic_pkt_encrypt() to cipher the token.
quic_generate_retry_check() must be used to check the validity of a Retry token.
It is able to decipher a token which arrives into an Initial packet in response
to a Retry packet. It calls parse_retry_token() after having deciphered the token
to store the ODCID into a local quic_cid struct variable. Finally this ODCID may
be stored into the transport parameter thanks to qc_lstnr_params_init().
The Retry token lifetime is 10 seconds. This lifetime is also checked by
quic_generate_retry_check(). If quic_generate_retry_check() fails, the received
packet is dropped without anymore packet processing at this time.
This function does exactly the same thing as quic_tls_decrypt(), except that
it does reuse its input buffer as output buffer. This is needed
to decrypt the Retry token without modifying the packet buffer which
contains this token. Indeed, this would prevent us from decryption
the packet itself as the token belong to the AEAD AAD for the packet.
This function must be used to derive strong secrets from a non pseudo-random
secret (cluster-secret setting in our case) and an IV. First it call
quic_hkdf_extract_and_expand() to do that for a temporary strong secret (tmpkey)
then two calls to quic_hkdf_expand() reusing this strong temporary secret
to derive the final strong secret and IV.
Willy Tarreau [Fri, 20 May 2022 07:13:38 +0000 (09:13 +0200)]
BUG/MINOR: cfgparse: abort earlier in case of allocation error
In issue #1563, Coverity reported a very interesting issue about a
possible UAF in the config parser if the config file ends in with a
very large line followed by an empty one and the large one causes an
allocation failure.
The issue essentially is that we try to go on with the next line in case
of allocation error, while there's no point doing so. If we failed to
allocate memory to read one config line, the same may happen on the next
one, and blatantly dropping it while trying to parse what follows it. In
the best case, subsequent errors will be incorrect due to this prior error
(e.g. a large ACL definition with many patterns, followed by a reference of
this ACL).
Let's just immediately abort in such a condition where there's no recovery
possible.
This may be backported to all versions once the issue is confirmed to be
addressed.
Amaury Denoyelle [Thu, 19 May 2022 14:45:37 +0000 (16:45 +0200)]
BUG/MEDIUM: quic: fix initialization for local/remote TPs
The local and remote TPs were both processed through the same function
quic_transport_params_init(). This caused the remote TPs to be
overwritten with values configured for our local usage.
Change this by reserving quic_transport_params_init() only for our local
TPs. Remote TPs are simply initialized via
quic_dflt_transport_params_cpy().
This bug could result in a connection closed in error by the client due
to a violation of its TPs. For example, curl client closed the
connection after receiving too many CONNECTION_ID due to an invalid
active_connection_id value used.
Amaury Denoyelle [Wed, 18 May 2022 16:26:13 +0000 (18:26 +0200)]
MINOR: quic: abort on unlisted errno on sendto()
If an unlisted errno is reported, abort the process. If a crash is
reported on this condition, we must determine if the error code is a
bug, should interrupt emission on the fd or if we can retry the syscall.
Amaury Denoyelle [Wed, 18 May 2022 16:14:12 +0000 (18:14 +0200)]
BUG/MINOR: quic: break for error on sendto
If sendto returns an error, we should not retry the call and break from
the sending loop. An exception is made for EINTR which allows to retry
immediately the syscall.
This bug caused an infinite loop reproduced when the process is in the
closing state by SIGUSR1 but there is still QUIC data emission left.
MEDIUM: check: Use the CS to handle subscriptions for read/write events
Instead of using the health-check to subscribe to read/write events, we now
rely on the conn-stream. Indeed, on the server side, the conn-stream's
endpoint is a multiplexer. Thus it seems appropriate to handle subscriptions
for read/write events the same way than for the streams. Of course, the I/O
callback function is not the same. We use srv_chk_io_cb() instead of
cs_conn_io_cb().
REORG: check: Rename and export I/O callback function
event_srv_chk_io() function is renamed srv_chk_io_cb() to be consistant with
the I/O callback function of connections. In addition, this function is
exported. It will be required to use the conn-stream's subscriptions.
MEDIUM: check: No longer shutdown the connection in .wake callback function
The connection is already closed by the health-check itself. Thus there is
now reason to duplicate this part in the .wake callback function. It is
enough to wake the health-check and wait.
BUG/MINOR: check: Reinit the buffer wait list at the end of a check
The buffer wait list is used to deal with buffer allocation failure. But at
the end of health-check, it must be reinitialized. There is no reason to
reason to get a buffer between two health-check runs. And in fact, the
associated flags, CHK_ST_IN_ALLOC and CHK_ST_OUT_ALLOC, are already cleared
at the end of a health-check.
This patch must be backported as far as 2.2. On the 2.2, MT_LIST_ADDED and
MT_LIST_DEL must be used instead of LIST_INLIST and LIST_DEL_INIT.
BUG/MEDIUM: config: Reset outline buffer size on realloc error in readcfgfile()
When the line parsing failed because outline buffer must be reallocated, if
my_realloc2() call fails, the buffer size must be reset. Indeed, in this case
the current line is skipped, a fatal error is reported and we jump to the next
line. At this stage the outline buffer is NULL. If the buffer size is not reset,
the next call to parse_line() crashes because we try to write in the buffer. We
fail to detect the outline buffer is too small to copy any character.
To fix the issue, outlinesize variable must be set to 0 when outline allocation
failed.
This patch should fix the issue #1563. It must be backported as far as 2.2.
Amaury Denoyelle [Wed, 18 May 2022 14:19:47 +0000 (16:19 +0200)]
MINOR: mux-quic: free RX buf if empty
Release the QCS RX buffer if emptied afer qcs_consume(). This improves
memory usage and avoids a QCS to keep an allocated buffer, particularly
when no data is received anymore. Buffer is automatically reallocated if
needed via qc_get_ncbuf().
Amaury Denoyelle [Tue, 17 May 2022 16:53:21 +0000 (18:53 +0200)]
BUG/MINOR: mux-quic: support nul buffer with qc_free_ncbuf()
qc_free_ncbuf() may now be used with a NCBUF_NULL buffer as parameter.
This is useful when using this function on a QCS with no allocated
buffer. This case was not reproduced for the moment, but it will soon
become more present as buffers will be released if emptied.
Also a call to offer_buffers() is added to conform with the dynamic
buffer management of haproxy.
Amaury Denoyelle [Mon, 16 May 2022 14:19:59 +0000 (16:19 +0200)]
MINOR: mux-quic: implement MAX_DATA emission
This commit is similar to the previous one but deals with MAX_DATA for
connection-level data flow control. It uses the same function
qcc_consume_qcs() to update flow control level and generate a MAX_DATA
frame if needed.
Send MAX_STREAM_DATA frames when at least half of the allocated
flow-control has been demuxed, frame and cleared. This is necessary to
support QUIC STREAM with received data greater than a buffer.
Transcoders must use the new function qcc_consume_qcs() to empty the QCS
buffer. This will allow to monitor current flow-control level and
generate a MAX_STREAM_DATA frame if required. This frame will be emitted
via qc_io_cb().
Adjust the mechanism for MAX_STREAMS_BIDI emission. When a bidirectional
stream is removed, current flow-control level is checked. If needed, a
MAX_STREAMS_BIDI frame is generated and inserted in a new list in the
QCS instance. The new frames will be emitted at the start of qc_send().
This has no impact on the current MAX_STREAMS_BIDI behavior. However,
this mechanism is more flexible and will allow to implement quickly
MAX_STREAM_DATA/MAX_DATA emission.
Amaury Denoyelle [Wed, 18 May 2022 09:38:22 +0000 (11:38 +0200)]
MINOR: mux-quic: remove qcc_decode_qcs() call in XPRT
Slightly change the interface for qcc_recv() between MUX and XPRT. The
MUX is now responsible to call qcc_decode_qcs(). This is cleaner as now
the XPRT does not have to deal with an extra QCS parameter and the MUX
will call qcc_decode_qcs() only if really needed.
This change is possible since there is no extra buffering for
out-of-order STREAM frames and the XPRT does not have to handle buffered
frames.
Amaury Denoyelle [Mon, 16 May 2022 11:54:59 +0000 (13:54 +0200)]
MEDIUM: mux-quic: implement recv on io-cb
Previously, qc_io_cb() of mux-quic only dealt with TX. Add support for
RX in it. This is done through a new function qc_recv(qcc). It loops
over all QCS instances and call qcc_decode_qcs(qcs).
This has no impact from the quic-conn layer as qcc_decode_qcs(qcs) is
called directly. However, this allows to have a resume point when demux
is blocked on the upper layer HTX full buffer.
Note that for the moment, only RX for bidirectional streams is managed
in qc_io_cb(). Unidirectional streams use their own mechanism for both
TX/RX. It should be unified in the near future in a refactoring.
Amaury Denoyelle [Mon, 16 May 2022 11:54:31 +0000 (13:54 +0200)]
MINOR: h3: flag demux as full on HTX full
Flag QCS if HTX buffer is full on demux. This will block all future
operations on QCS demux and should limit unnecessary decode_qcs() calls.
The flag is cleared on rcv_buf operation called by conn-stream.
Amaury Denoyelle [Thu, 12 May 2022 14:56:16 +0000 (16:56 +0200)]
MINOR: h3: do not wait a complete frame for demuxing
Previously, H3 demuxer refused to proceed the payload if the frame was
not entirely received and the QCS buffer is not full. This code was
duplicated from the H2 demuxer.
In H2, this is a justified optimization as only one frame at a time can
be demuxed. However, this is not the case in H3 with interleaved frames
in the lower layer QUIC STREAM frames.
This condition is now removed. H3 demuxer will proceed payload as soon
as possible. An exception is kept for HEADERS frame as the code is not
able to deal with partial HEADERS.
With this change, H3 demuxer should consume less memory. To ensure that
we never received a HEADER bigger than the RX buffer, we should use the
H3 SETTINGS_MAX_FIELD_SECTION_SIZE.
Amaury Denoyelle [Tue, 17 May 2022 16:03:37 +0000 (18:03 +0200)]
BUG/MINOR: mux-quic: update session's idle delay before stream creation
This commit is an adaptation from the following patch :
commit d0de6776826ee18da74e6949752e2f44cba8fdf2
Author: Willy Tarreau <w@1wt.eu>
Date: Fri Feb 4 09:05:37 2022 +0100
BUG/MINOR: mux-h2: update the session's idle delay before creating the stream
This should fix the incorrect timeouts present in httplog format for
QUIC requests.
Amaury Denoyelle [Tue, 17 May 2022 16:52:39 +0000 (18:52 +0200)]
MINOR: ncbuf: refactor ncb_advance()
First adjusted some typos in comments inside the function. Second,
change the naming of some variable to reduce confusion.
A special case has been inserted when advance is done inside a GAP block
and this block is the last of the buffer. In this case, the whole buffer
will be emptied, equivalent to a ncb_init() operation.
Amaury Denoyelle [Tue, 17 May 2022 16:52:22 +0000 (18:52 +0200)]
BUG/MINOR: ncbuf: fix ncb_is_empty()
ncb_is_empty() was plainly incorrect as it directly dereferences the
memory to read offset blocks instead of ncb_read_off(). The result is
undefined.
Also, BUG_ON() statement is wrong when the buffer starts with a data
block. In this case, ncb_head() is not the first gap offset but instead
just random data. The calculated sum in BUG_ON() statement has thus no
meaning and may cause an abort. Adjust this by reorganizing the whole
function. Only the first data block size is read. If and only if not
nul, the first gap size is then checked.
ncb_is_full() has been rewritten to share the same model as
ncb_is_empty().
Amaury Denoyelle [Tue, 17 May 2022 13:01:25 +0000 (15:01 +0200)]
OPTIM: quic: realign empty Rx buffer
quic_rx_pkts_del() function removes packets from QUIC RX buffer. In most
cases, the buffer will be emptied after it. In this case, it's useful to
realign it. This will avoid future data wrapping and use of an
unnecessary junk to fill a too small contiguous space.