MINOR: quic-be: helper functions to save/restore transport params (0-RTT)
Define quic_early_transport_params new struct for QUIC transport parameters
in relation with 0-RTT. This parameters must be saved during a first session to
be reused for 0-RTT next sessions.
qc_early_transport_params_cpy() copies the 0-RTT transport parameters to be
saved during a first connection to a backend. The copy is made from
a quic_transport_params struct to a quic_ealy_transport_params struct.
On the contrary, qc_early_transport_params_reuse() copies the transport parameters
to be reused for a 0-RTT session from a previous one. The copy is made
from a quic_early_transport_params strcut to a quic_transport_params struct.
Also add QUIC_EV_EARLY_TRANSP_PARAMS trace event to dump such 0-RTT
transport parameters from traces.
MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN
Add a per thread ist struct to srv_per_thread struct to store the QUIC token to
be reused for subsequent sessions.
Parse at packet level (from qc_parse_ptk_frms()) these tokens and store
them calling qc_try_store_new_token() newly implemented function. This is
this new function which does its best (may fail) to update the tokens.
Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token()
implemented by this patch.
Rename ->data qf_new_token struct field to ->w_data to distinguish it from
->r_data new field used to parse the NEW_TOKEN frame. Indeed to build the
NEW_TOKEN we need to write it to a static buffer into the frame struct. To
parse it we only need to store the address of the token field into the
RX buffer.
BUG/MEDIUM: quic-be: do not launch the connection migration process
At this time the connection migration is not supported by QUIC backends.
This patch prevents this process to be launched for connections to QUIC backends.
Furthermore, the connection migration process could be started systematically
when connecting a backend to INADDR_ANY, leading to crashes into qc_handle_conn_migration()
(when referencing qc->li).
Thank you to @InputOutputZ for having reported this issue in GH #3178.
This patch simply checks the connection type (listener or not) before checking if
a connection migration must be started.
No need to backport because support for QUIC backends is available from 3.3.
BUG/MINOR: acme: more explicit error when BIO_new_file()
Replace the error message of BIO_new_file() when the account-key cannot
be created on disk by "acme: cannot create the file '%s'". It was
previously "acme: out of memory." Which is unclear.
BUG/MEDIUM: init: 'devnullfd' not properly closed for master
Since commit "1ec59d3 MINOR: init: Make devnullfd global and create it
earlier in init" the devnullfd pointing towards /dev/null gets created
early in the init process but it was closed after the call to
"mworker_run_master". The master process never got to the FD closing
code and we had an FD leak.
Amaury Denoyelle [Mon, 10 Nov 2025 14:24:35 +0000 (15:24 +0100)]
BUG/MINOR: do not account backend connections into maxconn
Remove QUIC backend connections from global actconn accounting. Indeed,
this counter is only used on the frontend side. This is required to
ensure maxconn coherence.
BUG/MEDIUM: stats-file: fix shm-stats-file preload not working anymore
Due to recent commit 5c299dee ("MEDIUM: stats: consider that shared stats
pointers may be NULL") shm-stats-file preloading suddenly stopped working
In fact preloading should be considered as an initializing step so the
counters may be assigned there without checking for NULL first.
Indeed there are supposed to be NULL because preloading occurs before
counters_{fe,be}_shared_prepare() which takes care of setting the pointers
for counters if they weren't set before.
Obviously this corner-case was overlooked during 5c299dee writing and
testing. Thanks to Nick Ramirez for having reported the issue.
No backport needed, this issue is specific to 3.3.
MINOR: stats-proxy: ensure future-proof FN_AGE manipulation in me_generate_field()
Commit ad1bdc33 ("BUG/MAJOR: stats-file: fix crash on non-x86 platform
caused by unaligned cast") revealed an ambiguity in me_generate_field()
around FN_AGE manipulation. For now FN_AGE can only be stored as u32 or
s32, but in the future we could also support 64bit FN_AGES, and the
current code assumes 32bits types and performs and explicit unsigned int
cast. Instead we group current 32 bits operations for FF_U32 and FF_S32
formats, and let room for potential future formats for FN_AGE.
Commit ad1bdc33 also suggested that the fix was temporary and the approach
must change, but after a code review it turns out the current approach
(generic types manipulation under me_generate_field()) is legit. The
introduction of shm-stats-file feature didn't change the logic which
was initially implemented in 3.0. It only extended it and since shared
stats are now spread over thread-groups since 3.3, the use of atomic
operations made typecasting errors more visible, and structure mapping
change from d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)") was in
fact the only change to blame for the crash on non-x86 platforms.
With ambiguities removed in me_generate_field(), let's hope we don't face
similar bugs in the future. Indeed, with generic counters, and more
specifically shared ones (which leverage atomic ops), great care must be
taken when changing their underlying types as me_generate_field() solely
relies on stat_col descriptor to know how to read the stat from a generic
pointer, so any breaking change must be reflected in that function as well
Amaury Denoyelle [Mon, 10 Nov 2025 10:02:45 +0000 (11:02 +0100)]
MEDIUM/OPTIM: quic: alloc quic_conn after CID collision check
On Initial packet parsing, a new quic_conn instance is allocated via
qc_new_conn(). Then a CID is allocated with its value derivated from
client ODCID. On CID tree insert, a collision can occur if another
thread was already parsing an Initial packet from the same client. In
this case, the connection is released and the packet will be requeued to
the other thread.
Originally, CID collision check was performed prior to quic_conn
allocation. This was changed by the commit below, as this could cause
issue on quic_conn alloc failure.
However, this procedure is less optimal. Indeed, qc_new_conn() performs
many steps, thus it could be better to skip it on Initial CID collision,
which can happen frequently. This patch restores the older order of
operations, with CID collision check prior to quic_conn allocation.
To ensure this does not cause again the same bug, the CID is removed in
case of quic_conn alloc failure. This should prevent any loop as it
ensures that a CID found in the global tree does not point to a NULL
quic_conn, unless if CID is attach to a foreign thread. When this thread
will parse a re-enqueued packet, either the quic_conn is already
allocated or the CID has been removed, triggering a fresh CID and
quic_conn allocation procedure.
BUG/MEDIUM: quic: handle collision on CID generation
CIDs are provided by haproxy so that the peer can use them as DCID of
its packets. Their value is set via a random generator. It happens on
several occasions during connection lifetime:
* via ODCID derivation if haproxy is the server
* on quic_conn init if haproxy is the client
* during post-handshake if haproxy is the server
* on RETIRE_CONNECTION_ID frame parsing
CIDs are stored in a global tree. On ODCID derivation, a check is
performed to ensure the CID is not a duplicate value. This is mandatory
to properly handle multiple INITIAL packets from the same client on
different thread.
However, for the other cases, no check is performed for CID collision.
As _quic_cid_insert() is silent, the issue is not detected at all. This
results in a CID advertized to the peer but not stored in the global
one. In the end, this may cause two issues. The first one is that
packets from the client which use the new CID will be rejected by
haproxy, most probably with a STATELESS_RESET. The second issue is that
it can cause a crash during quic_conn release. Indeed, the CID is stored
in the quic_conn local tree and thus eb_delete() for the global tree
will be performed. As <leaf_p> member is uninit, this results in a
segfault.
Note that this issue is pretty rare. It can only be observed if running
with a high number of concurrent connections in parallel, so that the
random generator will provide duplicate values. Patch is still labelled
as MEDIUM as this modifies code paths used frequently.
To fix this, _quic_cid_insert() unsafe function is completely removed.
Instead, quic_cid_insert() can be used, which reports an error code if a
collision happens. CID are then stored in the quic_conn tree only after
global tree insert success. Here is the solution for each steps if a
collision occurs :
* on init as client: the connection is completely released
* post-handshake: the CID is immediately released. The connection is
kept, but it will miss an extra CID.
* on RETIRE_CONNECTION_ID parsing: a loop is implemented to retry random
generation. It it fails several times, the connection is closed in
error.
A small convenience change is made to quic_cid_insert(). Output
parameter <new_tid> can now be NULL, which is useful as most of the
times caller do not care about it.
Split new_quic_cid() function into multiple ones. This patch should not
introduce any visible change. The objective is to render CID allocation
and generation more modular.
The first advantage of this patch is to bring code simplication. In
particular, conn CID sequence number increment and insertion into
connection tree is simpler than before. Another improvment is also that
errors could now be handled easier at each different steps of the CID
init.
This patch is a prerequisite for the fix on CID collision, thus it must
be backported prior to it to every affected version.
MINOR: quic: adjust CID conn tree alloc in qc_new_conn()
Change qc_new_conn() so that the connection CID tree is allocated
earlier in the function. This patch does not introduce a behavior
change. Its objective is to facilitate future evolutions on CIDs
handling.
This patch is a prerequisite for the fix on CID collision, thus it must
be backported prior to it to every affected version.
BUG/MINOR: quic: close connection on CID alloc failure
During RETIRE_CONNECTION_ID frame parsing, a new connection ID is
immediately reallocated after the release of the previous one. This is
done to ensure that the peer will never run out of DCID.
Prior to this patch, a CID allocation failure was be silently ignored.
This prevent the emission of a new CID, which could prevent the peer to
emit packets if it had no other CIDs available for use. Now, such error
is considered fatal to the connection. This is the safest solution as
it's better to close connections when memory is running low.
Willy Tarreau [Mon, 10 Nov 2025 10:55:33 +0000 (11:55 +0100)]
BUG/MEDIUM: config: for word expansion, empty or non-existing are the same
Amaury reported a case where "${FOO[*]}" still produces an empty field.
It happens if the variable is defined but does not contain any non-space
characters. The reason is that we special-case word expansion only on
non-existing vars. Let's change the ordering of operations so that word-
expanded vars always pretend the current arg is not an empty quote, so
that we don't make any difference between a non-existing var and an
empty one.
No backport is needed unless commit 1968731765 ("BUG/MEDIUM: config:
solve the empty argument problem again") is.
Willy Tarreau [Sat, 8 Nov 2025 11:12:00 +0000 (12:12 +0100)]
[RELEASE] Released version 3.3-dev12
Released version 3.3-dev12 with the following main changes :
- MINOR: quic: enable SSL on QUIC servers automatically
- MINOR: quic: reject conf with QUIC servers if not compiled
- OPTIM: quic: adjust automatic ALPN setting for QUIC servers
- MINOR: sample: optional AAD parameter support to aes_gcm_enc/dec
- REGTESTS: converters: check USE_OPENSSL in aes_gcm.vtc
- BUG/MINOR: resolvers: ensure fair round robin iteration
- BUG/MAJOR: stats-file: fix crash on non-x86 platform caused by unaligned cast
- OPTIM: backend: skip conn reuse for incompatible proxies
- SCRIPTS: build-ssl: allow to build a FIPS version without FIPS
- OPTIM: proxy: move atomically access fields out of the read-only ones
- SCRIPTS: build-ssl: fix rpath in AWS-LC install for openssl and bssl bin
- CI: github: update to macos-26
- BUG/MINOR: quic: fix crash on client handshake abort
- MINOR: quic: do not set conn member if ssl_sock_ctx
- MINOR: quic: remove connection arg from qc_new_conn()
- BUG/MEDIUM: server: Add a rwlock to path parameter
- BUG/MEDIUM: server: Also call srv_reset_path_parameters() on srv up
- BUG/MEDIUM: mux-h1: fix 414 / 431 status code reporting
- BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle
- BUG/MEDIUM: connections: permit to permanently remove an idle conn
- MEDIUM: cfgparse: deprecate 'master-worker' keyword alone
- MEDIUM: cfgparse: 'daemon' not compatible with -Ws
- DOC: configuration: deprecate the master-worker keyword
- MINOR: quic: remove <mux_state> field
- BUG/MEDIUM: stick-tables: Make sure we handle expiration on all tables
- MEDIUM: stick-tables: Optimize the expiration process a bit.
- MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws
- MINOR: acme: generate a temporary key pair
- MEDIUM: acme: generate a key pair when no file are available
- BUILD: ssl/ckch: wrong function name in ckch_conf_kws
- BUILD: acme: acme_gen_tmp_x509() signedness and unused variables
- BUG/MINOR: acme: fix initialization issue in acme_gen_tmp_x509()
- BUILD: ssl/ckch: fix ckch_conf_kws parsing without ACME
- MINOR: server: move the lock inside srv_add_idle()
- DOC: acme: crt-store allows you to start without a certificate
- BUG/MINOR: acme: allow 'key' when generating cert
- MINOR: stconn: Add counters to SC to know number of bytes received and sent
- MINOR: stream: Add samples to get number of bytes received or sent on each side
- MINOR: counters: Add req_in/req_out/res_in/res_out counters for fe/be/srv/li
- MINOR: stream: Remove bytes_in and bytes_out counters from stream
- MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li
- MINOR: stats: Add stats about request and response bytes received and sent
- MINOR: applet: Add function to get amount of data in the output buffer
- MINOR: channel: Remove total field from channels
- DEBUG: stream: Add bytes_in/bytes_out value for both SC in session dump
- MEDIUM: stktables: Limit the number of stick counters to 100
- BUG/MINOR: config: Limit "tune.maxpollevents" parameter to 1000000
- BUG/MEDIUM: server: close a race around ready_srv when deleting a server
- BUG/MINOR: config: emit warning for empty args when *not* in discovery mode
- BUG/MEDIUM: config: solve the empty argument problem again
- MEDIUM: config: now reject configs with empty arguments
- MINOR: tools: add support for ist to the word fingerprinting functions
- MINOR: tools: add env_suggest() to suggest alternate variable names
- MINOR: tools: have parse_line's error pointer point to unknown variable names
- MINOR: cfgparse: try to suggest correct variable names on errors
- IMPORT: cebtree: Replace offset calculation with offsetof to avoid UB
- BUG/MINOR: acme: wrong dns-01 challenge in the log
- MEDIUM: backend: Defer conn_xprt_start() after mux creation
- MINOR: peers: Improve traces for peers
- MEDIUM: peers: No longer ack updates during a full resync
- MEDIUM: peers: Remove commitupdate field on stick-tables
- BUG/MEDIUM: peers: Fix update message parsing during a full resync
- MINOR: sample/stats: Add "bytes" in req_{in,out} and res_{in,out} names
- BUG/MEDIUM: stick-tables: Make sure updates are seen as local
- BUG/MEDIUM: proxy: use aligned allocations for struct proxy
- BUG/MEDIUM: proxy: use aligned allocations for struct proxy_per_tgroup
- BUG/MINOR: acme: avoid a possible crash on error paths
Willy Tarreau [Fri, 7 Nov 2025 21:27:25 +0000 (22:27 +0100)]
BUG/MINOR: acme: avoid a possible crash on error paths
In acme_EVP_PKEY_gen(), an error message is printed if *errmsg is set,
however, since commit 546c67d13 ("MINOR: acme: generate a temporary key
pair"), errmsg is passed as NULL in at least one occurrence, leading
the compiler to issue a NULL deref warning at -O3. And indeed, if the
errors are encountered, a crash will occur. No backport is needed.
Willy Tarreau [Fri, 7 Nov 2025 21:10:45 +0000 (22:10 +0100)]
BUG/MEDIUM: proxy: use aligned allocations for struct proxy_per_tgroup
In 3.2, commit f879b9a18 ("MINOR: proxies: Add a per-thread group field
to struct proxy") introduced struct proxy_per_tgroup that is declared as
thread_aligned, but is allocated using calloc(). Thus it is at risk of
crashing on machines using instructions requiring 64-byte alignment such
as AVX512. Let's use ha_aligned_zalloc_typed() instead of malloc().
For 3.2, we don't have aligned allocations, so instead the THREAD_ALIGNED()
will have to be removed from the struct definition. Alternately, we could
manually align it as is done for fdtab.
Willy Tarreau [Fri, 7 Nov 2025 21:05:21 +0000 (22:05 +0100)]
BUG/MEDIUM: proxy: use aligned allocations for struct proxy
Commit fd012b6c5 ("OPTIM: proxy: move atomically access fields out of
the read-only ones") caused the proxy struct to be 64-byte aligned,
which allows the compiler to use optimizations such as AVX512 to zero
certain fields. However the struct was allocated using calloc() so it
was not necessarily aligned, causing segv on startup on compatible
machines. Let's just use ha_aligned_zalloc_typed() to allocate the
struct.
BUG/MEDIUM: stick-tables: Make sure updates are seen as local
In stktable_touch_with_exp, if it is a local update, add it to the
pending update list even if it's already in the tree as a remote update,
otherwise it will never be communicated to other peers;
It used to work before 3.2 because of the ordering of operations, but
it's been broken by adding an extra step with the pending update list,
so we now have to explicitely check for that.
MINOR: sample/stats: Add "bytes" in req_{in,out} and res_{in,out} names
Number of bytes received or sent by a client or a server are now
saved. Sample fetches and stats fields to retrieve these informations are
renamed to add "bytes" in names to avoid any ambiguity with number of
requests and responses.
BUG/MEDIUM: peers: Fix update message parsing during a full resync
The commit 590c5ff2e ("MEDIUM: peers: No longer ack updates during a full
resync") introduced a regression. During a full resync, the ID of an update
message is not parsed at all. Thus, the parsing of the whole message in
desynchronized.
On full resync the update id itself is ignored, to not be acked, but it must
be parsed. It is now fixed.
MEDIUM: peers: Remove commitupdate field on stick-tables
This stick-table field was atomically updated with the last update id pushed
and dumped on the CLI but never used otherwise. And all peer sessions share
the same id because it is a stick-table info. So the info in peers dump is
pretty limited.
MEDIUM: peers: No longer ack updates during a full resync
ACK messages received by a peer sending updates during a full resync are
ignored. So, on the other side, there is no reason to still send these ACK
messages. Let's skip them.
In addition, the received updates during this stage are not considered as to
be acked. It is important to be sure to properly emit ACK messages once the
full sync finished.
Trace messages for peers were only protocol oriented and information
provided were quite light. With this patch, the traces were
improved. information about the peer, its applet and the section are
dumped. Several verbosities are now available and messages are dumped at
different levels depending on the context. It should easier to track issues
in the peers.
MEDIUM: backend: Defer conn_xprt_start() after mux creation
In connect_server(), defer the call to conn_xprt_start() until after we
had a chance to create the mux. The xprt can behave differently
depending on if a mux is or is not available at this point, as if it is,
it may want to wait until some data comes from the mux.
BUG/MINOR: acme: wrong dns-01 challenge in the log
Since 861fe532046 ("MINOR: acme: add the dns-01-record field to the
sink"), the dns-01 challenge is output in the dns_record trash, instead
of the global trash.
The send_log string was never updated with this change, and dumps some
data from the global trash instead. Since the last data emitted in the
trash seems to be the dns-01 token from the authorization object, it
looks like the response to the challenge.
Ben Kallus [Wed, 29 Oct 2025 12:38:51 +0000 (08:38 -0400)]
IMPORT: cebtree: Replace offset calculation with offsetof to avoid UB
This is the same as the equivalent fix in ebtree:
The C standard specifies that it's undefined behavior to dereference
NULL (even if you use & right after). The hand-rolled offsetof idiom
&(((s*)NULL)->f) is thus technically undefined. This clutters the
output of UBSan and is simple to fix: just use the real offsetof when
it's available.
Willy Tarreau [Tue, 24 Jun 2025 15:25:55 +0000 (17:25 +0200)]
MINOR: cfgparse: try to suggest correct variable names on errors
When an empty argument comes from the use of a non-existing variable,
we'll now detect the difference with an empty variable (error pointer
points to the variable's name instead), and submit it to env_suggest()
to see if another variable looks likely to be the right one or not.
This can be quite useful to quickly figure how to fix misspelled variable
names. Currently only series of letters, digits and underscores are
attempted to be resolved as a name. A typical example is:
peer "${HAPROXY_LOCAL_PEER}" 127.0.0.1:10000
which produces:
[ALERT] (24231) : config : parsing [bug-argv4.cfg:2]: argument number 1 at position 13 is empty and marks the end of the argument list:
peer "${HAPROXY_LOCAL_PEER}" 127.0.0.1:10000
^
[NOTICE] (24231) : config : Hint: maybe you meant HAPROXY_LOCALPEER instead ?
Willy Tarreau [Tue, 24 Jun 2025 15:20:33 +0000 (17:20 +0200)]
MINOR: tools: have parse_line's error pointer point to unknown variable names
When an argument is empty, parse_line() currently returns a pointer to
the empty string itself. This is convenient, but it's only actionable by
the user who will see for example "${HAPROXY_LOCALPEER}" and figure what
is wrong. Here we slightly change the reported pointer so that if an empty
argument results from the evaluation of an empty variable (meaning that
all variables in string are empty and no other char is present), then
instead of pointing to the opening quote, we'll return a pointer to the
first character of the variable's name. This will allow to make a
difference between an empty variable and an unknown variable, and for
the caller to take action based on this.
I.e. before we would get:
log "${LOG_SERVER_IP}" local0
^
if LOG_SERVER_IP is not set, and now instead we'll get this:
Willy Tarreau [Tue, 24 Jun 2025 15:18:52 +0000 (17:18 +0200)]
MINOR: tools: add env_suggest() to suggest alternate variable names
The purpose here is to look in the environment for a variable whose
name looks like the provided one. This will be used to try to auto-
correct misspelled environment variables that would silently be turned
to an empty string.
Willy Tarreau [Tue, 24 Jun 2025 15:14:47 +0000 (17:14 +0200)]
MINOR: tools: add support for ist to the word fingerprinting functions
The word fingerprinting functions are used to compare similar words to
suggest a correctly spelled one that looks like what the user proposed.
Currently the functions only support const char*, but there's no reason
for this, and it would be convenient to support substrings extracted
from random pieces of configurations. Here we're adding new variants
"_with_len" that take these ISTs and which are in fact a slight change
of the original ones that the old ones now rely on.
Willy Tarreau [Tue, 24 Jun 2025 06:24:28 +0000 (08:24 +0200)]
MEDIUM: config: now reject configs with empty arguments
As prepared during 3.2, we must error on empty arguments because they
mark the end of the line and cause subsequent arguments to be silently
ignored. It was too late in 3.2 to turn that into an error so it's a
warning, but for 3.3 it needed to be an alert.
This patch does that. It doesn't instantly break, instead it counts
one fatal error per violating line. This allows to emit several errors
at once, which can often be caused by the same variable being missed,
or a group of variables sharing a same misspelled prefix for example.
Tests show that it helps locate them better. It also explains what to
look for in the config manual for help with variables expansion.
Willy Tarreau [Tue, 24 Jun 2025 16:18:18 +0000 (18:18 +0200)]
BUG/MEDIUM: config: solve the empty argument problem again
This mostly reverts commit ff8db5a85 ("BUG/MINOR: config: Stopped parsing
upon unmatched environment variables").
As explained in commit #2367, finally the fix above was incorrect because
it causes other trouble such as this:
log "192.168.100.${NODE}" "local0"
being resolved to this:
log 192.168.100.local0
when NODE does not exist due to the loss of the spaces. In fact, while none
of us was well aware of this, when the user had:
server app 127.0.0.1:80 "${NO_CHECK}" weight 123
in fact they should have written it this way:
server app 127.0.0.1:80 "${NO_CHECK[*]}" weight 123
so that the variable is expanded to zero, one or multiple words, leaving
no empty arg (like in shell). This is supported since 2.3 with commit fa41cb6 so the right fix is in the config, let's revert the fix and
properly address the issue.
Some changes are necessary however, since after that patch, the in_arg
checks were added and are now inserting an empty argument even for
proper error reporting. For example, the following statement:
acl foo path "/a" "${FOO[*]}" "/b"
would complain about an empty arg at FOO due to in_arg=1, while dropping
this in_arg=1 with the following config:
acl foo path "/a" "${FOO}" "/b"
would silently stop after "/a" instead of complaining about an empty
field. So the approach here consists in noting whether or not something
was written since the quotes were emitted, in order to decide whether
or not to produce an argument. This way, "" continues to be an explicitly
empty arg, just like the same with an unknown variable, while "${FOO[*]}"
is allowed to prevent the creation of an argument if empty.
This should be backported to *some* versions, but the risk that some
configs were altered to rely on the broken fix is not null. At least
recent LTS should be reverted. Note that this requires previous commit:
BUG/MINOR: config: emit warning for empty args when *not* in discovery mode
otherwise this will break again configs relying on HAPROXY_LOCALPEER and
maybe a few other variables set at the end of discovery.
Willy Tarreau [Thu, 6 Nov 2025 16:57:37 +0000 (17:57 +0100)]
BUG/MINOR: config: emit warning for empty args when *not* in discovery mode
This actually reverses the condition of commit 5f1fad1690 ("BUG/MINOR:
config: emit warning for empty args only in discovery mode"). Indeed,
some variables are not known in discovery mode (e.g. HAPROXY_LOCALPEER),
and statements like:
peer "${HAPROXY_LOCALPEER}" 127.0.0.1:10000
are broken during discovery mode. It turns out that the warning is
currently hidden by commit ff8db5a85d ("BUG/MINOR: config: Stopped
parsing upon unmatched environment variables") since it silently drops
empty args which is sufficient to hide the warning, but it also breaks
other configs and needs to be reverted, which will break configs like
above again.
In issue #2995 we were not fully decided about discovery mode or not,
and already suspected some possible issues without being able to guess
which ones. The only downside of not displaying them in discovery mode
is that certain empty fields on the rare keywords specific to master
mode might remain silent until used. Let's just flip the condition to
check for empty args in normal mode only.
This should be backported to 3.2 after some time of observation.
Willy Tarreau [Thu, 6 Nov 2025 18:52:10 +0000 (19:52 +0100)]
BUG/MEDIUM: server: close a race around ready_srv when deleting a server
When a server is being disabled or deleted, in case it matches the
backend's ready_srv, this one is reset. However it's currently done in
a non-atomic way when the server goes down, and that could occasionally
reset the entry matching another server, but more importantly if in
parallel some requests are dequeued for that server, it may re-appear
there after having been removed, leading to a possible crash once it
is fully removed, as shown in issue #3177.
Let's make sure we reset the pointer when detaching the server from
the proxy, and use a CAS in both cases to only reset this server.
This fix needs to be backported to 3.2. There, srv_detach() is in
server.c instead of server.h. Thanks to Basha Mougamadou for the
detailed report and the useful backtraces.
BUG/MINOR: config: Limit "tune.maxpollevents" parameter to 1000000
"tune.maxpollevents" global parameter was not limited. It was possible to
set any integer value. But this value is used to allocate the array of
events used by epoll. With a huge value, it seems the allocation silently
fail, making haproxy totally unresponsive.
So let's to limit its value to 1 million. It is pretty high and it should
not be an issue to forbid greater values. The documentation was updated
accordingly.
This patch could be backported to all stable branches.
MEDIUM: stktables: Limit the number of stick counters to 100
"tune.stick-counters" global parameter was accepting any positive integer
value. But the maximum value is incredibly high. Setting a huge value has
signitifcant impact on memory and CPU usage. To avoid any issue, this value
is now limited to 100. It should be greater enough to all usage.
MINOR: applet: Add function to get amount of data in the output buffer
The helper function applet_output_data() returns the amount of data in the
output buffer of an applet. For applets using the new API, it is based on
data present in the outbuf buffer. For legacy applets, it is based on input
data present in the input channel's buffer. The HTX version,
applet_htx_output_data(), is also available
MINOR: stats: Add stats about request and response bytes received and sent
In previous patches, these counters were added per frontend, backend, server
and listener. With this patch, these counters are reported on stats,
including promex.
Note that the stats file minor version was incremented by one because the
shm_stats_file_object struct size has changed.
MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li
bytes_in and bytes_out counters per frontend, backend, listener and server
were removed and we now rely on, respectively on, req_in and res_in
counters.
MINOR: stream: Remove bytes_in and bytes_out counters from stream
per-stream bytes_in and bytes_out counters was removed and replaced by
req.in and res.in. Coorresponding samples still exists but replies on new
counters.
MINOR: counters: Add req_in/req_out/res_in/res_out counters for fe/be/srv/li
Thanks to the previous patch, and based on info available on the stream, it
is now possible to have counters for frontends, backends, servers and
listeners to report number of bytes received and sent on both sides.
MINOR: stream: Add samples to get number of bytes received or sent on each side
req.in and req.out samples can now be used to get the number of bytes
received by a client and send to the server. And res.in and res.out samples
can be used to get the number of bytes received by a server and send to the
client. These info are stored in the logs structure inside a stream.
MINOR: stconn: Add counters to SC to know number of bytes received and sent
<bytes_in> and <bytes_out> counters were added to SC to count, respectively,
the number of bytes received from an endpoint or sent to an endpoint. These
counters are updated for connections and applets.
DOC: acme: crt-store allows you to start without a certificate
If your acme certificate is declared in a crt-store, and the certificate
file does not exist on the disk, HAProxy will start with a temporary key
pair.
Willy Tarreau [Thu, 6 Nov 2025 12:12:04 +0000 (13:12 +0100)]
MINOR: server: move the lock inside srv_add_idle()
Almost all callers of _srv_add_idle() lock the list then call the
function. It's not the most efficient and it requires some care from
the caller to take care of that lock. Let's change this a little bit by
having srv_add_idle() that takes the lock and calls _srv_add_idle() that
is now inlined. This way callers don't have to handle the lock themselves
anymore, and the lock is only taken around the sensitive parts, not the
function call+return.
Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS
to 2.32-2.37M RPS on a 128-thread system.
BUG/MINOR: acme: fix initialization issue in acme_gen_tmp_x509()
src/acme.c: In function ‘acme_gen_tmp_x509’:
src/acme.c:2685:15: error: ‘digest’ may be used uninitialized [-Werror=maybe-uninitialized]
2685 | if (!(X509_sign(newcrt, pkey, digest)))
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/acme.c:2628:23: note: ‘digest’ was declared here
2628 | const EVP_MD *digest;
| ^~~~~~
MEDIUM: acme: generate a key pair when no file are available
When an acme keyword is associated to a crt and key, and the corresponding
files does not exist, HAProxy would not start.
This patch allows to configure acme without pre-generating a keypair before
starting HAProxy. If the files does not exist, it tries to generate a unique
keypair in memory, that will be used for every ACME certificates that don't
have a file on the disk yet.
This patch provides two functions acme_gen_tmp_pkey() and
acme_gen_tmp_x509().
These functions generates a unique keypair and X509 certificate that
will be stored in tmp_x509 and tmp_pkey. If the key pair or certificate
was already generated they will return the existing one.
The key is an RSA2048 and the X509 is generated with a expiration in the
past. The CN is "expired".
These are just placeholders to be used if we don't have files.
MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws
This is an API change, instead of passing a ckch_data alone, the
ckch_conf_kws.func() is called with a ckch_store.
This allows the callback to access the whole ckch_store, with the
ckch_conf and the ckch_data. But it requires the ckch_conf to be
actually put in the ckch_store before.
MEDIUM: stick-tables: Optimize the expiration process a bit.
In process_tables_expire(), if the table we're analyzing still has
entries, and thus should be put back into the tree, do not put it in the
mt_list, to have it put back into the tree the next time the task runs.
There is no problem with putting it in the tree right away, as either
the next expiration is in the future, or we handled the maximum number
of expirations per task call and we're about to stop, anyway.
BUG/MEDIUM: stick-tables: Make sure we handle expiration on all tables
In process_tables_expire(), when parsing all the tables with expiration
set, to check if the any entry expired, make sure we start from the
oldest one, we can't just rely on eb32_first(), because of sign issues
on the timestamp.
Not doing that may mean some tables are not considered for expiration.
This patch removes <mux_state> field from quic_conn structure. The
purpose of this field was to indicate if MUX layer above quic_conn is
not yet initialized, active, or already released.
It became tedious to properly set it as initialization order of the
various quic_conn/conn/MUX layers now differ between the frontend and
backend sides, and also depending if 0-RTT is used or not. Recently, a
new change introduced in connect_server() will allow to initialize QUIC
MUX earlier if ALPN is cached on the server structure. This had another
level of complexity.
Thus, this patch removes <mux_state> field completely. Instead, a new
flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place
only on close XPRT callback invokation. It can be mixed with the new
utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the
status of conn/MUX layers now without an extra quic_conn field.
DOC: configuration: deprecate the master-worker keyword
Deprecate the 'master-worker' keyword in the global section.
Split the configuration of the 'no-exit-on-failure' subkeyword in
another section which is not deprecated yet and explains that its only
meant for debugging purpose.
MEDIUM: cfgparse: 'daemon' not compatible with -Ws
Emit a warning when the 'daemon' keyword is used in master-worker mode
for systemd (-Ws). This never worked and was always ignored by setting
MODE_FOREGROUND during cmdline parsing.
Willy Tarreau [Wed, 5 Nov 2025 09:51:27 +0000 (10:51 +0100)]
BUG/MEDIUM: connections: permit to permanently remove an idle conn
There's currently a function conn_delete_from_tree() which is used to
detach an idle connection from the tree it's currently attached to so
that it is no longer found. This function is used in three circumstances:
- when picking a new connection that no longer has any avail stream
- when temporarily working on the connection from an I/O handler,
in which case it's re-added at the end
- when killing a connection
The 2nd case above is quite specific, as it requires to preserve the
CO_FL_LIST_MASK flags so that the connection can be re-inserted into
the proper tree when leaving the handler. However, there's a catch.
When killing a connection, we want to be certain it will not be
reinserted into the tree. The flags preservation is causing a tiny
race if an I/O happens while the connection is in the kill list,
because in this case the I/O handler will note the connection flags,
do its work, then reinsert the connection where it believed it was,
then the connection gets purged, and another user can find it in the
tree.
The issue is very difficult to reproduce. On a 128-thread machine it
happens in H2 around 500k req/s after around 50M requests. In H1 it
happens after around 1 billion requests.
The fix here consists in passing an extra argument to the function to
indicate if the removal is permanent or not. When it's permanent, the
function will clear the associated flags. The callers were adjusted
so that all those dequeuing a connection in order to kill it do it
permanently and all other ones do it only temporarily.
A slightly different approach could have worked: the function could
always remove all flags, and the callers would need to restore them.
But this would require trickier modifications of the various call
places, compared to only passing 0/1 to indicate the permanent status.
This will need to be backported to all stable versions. The issue was
at least reproduced since 3.1 (not tested before). The patch will need
to be adjusted for 3.2 and older, because a 2nd argument "thr" was
added in 3.3, so the patch will not apply to older versions as-is.
Willy Tarreau [Wed, 5 Nov 2025 07:46:33 +0000 (08:46 +0100)]
BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle
In h2_detach(), it looks possible to place a dead connection back to
the idle list, and to later call h2_release() on it once detected as
dead. It's not certain that it happens but nothing in the code shows
it is not possible, so better make sure it cannot happen.
This should be preventively backported to all versions.
BUG/MEDIUM: mux-h1: fix 414 / 431 status code reporting
The more detailed status code reporting introduced with bc967758a2 is
checking against the error state to determine whether it is a too long
URL or too large headers. The check used always returns true which
results in a 414 as the error state is only set at a later point.
This commit adjusts the check to use the current state instead to return
the intended status code.
BUG/MEDIUM: server: Also call srv_reset_path_parameters() on srv up
Also call srv_reset_path_parameters() when the server changed states,
and got up. It is not enough to do it when the server goes down, because
there's a small race condition, and a connection could get established
just after we did it, and could have set the path parameters.
BUG/MEDIUM: server: Add a rwlock to path parameter
Add a rwlock to control the server's path_parameter, to make sure
multiple threads don't set it at the same time, and it can't be seen in
an inconsistent state.
Also don't set the parameter every time, only set them if they have
changed, to prevent needless writes.
MINOR: quic: remove connection arg from qc_new_conn()
This patch is similar to the previous one, this time dealing with
qc_new_conn(). This function was asymetric on frontend and backend side,
as connection argument was set only in the latter case.
This was required prior due to qc_alloc_ssl_sock_ctx() signature. This
has changed with the previous patch, thus qc_new_conn() can also be
realigned on both FE and BE sides. <conn> member of quic_conn instance
is always set outside it, in qc_xprt_start() on the backend case.
MINOR: quic: do not set conn member if ssl_sock_ctx
ssl_sock_ctx is a generic object used both on TCP/SSL and QUIC stacks.
Most notably it contains a <conn> member which is a pointer to struct
connection.
On QUIC frontend side, this member is always set to NULL. Indeed,
connection is only created after handshake completion. However, this has
changed for backend side, where the connection is instantiated prior to
its quic_conn counterpart. Thus, ssl_sock_ctx member would be set in
this case as a convenience for use later in qc_ssl_do_hanshake().
However, this method was unsafe as the connection can be released,
without resetting ssl_sock_ctx member. Thus, the previous patch fixes
this by using on <conn> member through the quic_conn instance which is
the proper way.
Thus, this patch resets ssl_sock_ctx <conn> member to NULL. This is
deemed the cleanest method as it ensures that both frontend and backend
sides must not use it anymore.
BUG/MINOR: quic: fix crash on client handshake abort
On backend side, a connection can be aborted and released prior to
handshake completion. This causes a crash in qc_ssl_do_hanshake() as
<conn> member of ssl_sock_ctx is not reset in this case.
To fix this, use <conn> member of quic_conn instead. This is safe as it
is properly set to NULL when a connection is released.
No impact on the frontend side as <conn> member is not accessed. Indeed,
in this case connection is most of the times allocated after handshake
completion.
macOS-15 images seems to have difficulties to run the reg-tests since a
few days for an unknown reason. Doing a rollback of both VTest2 and
haporxy doesn't seem to fix the problem so this is probably related to a
change in github actions.
This patch switches the image to the new macos-26 images which seems to
fix the problem.
Willy Tarreau [Mon, 3 Nov 2025 08:30:47 +0000 (09:30 +0100)]
OPTIM: proxy: move atomically access fields out of the read-only ones
Perf top showed that h1_snd_buf() was having great difficulties accessing
the proxy's server_id_hdr_name field in the middle of the headers loop.
Moving the assignment out of the loop to a local variable moved the
problem there as well:
| if (!(h1m->flags & H1_MF_RESP) && isttest(h1c->px->server_id_hdr_n
0.10 |20b0: mov -0x120(%rbp),%rdi
1.33 | mov 0x60(%rdi),%r10
0.01 | test %eax,%eax
0.18 | jne 2118
12.87 | mov 0x350(%r10),%rdi
0.01 | test %rdi,%rdi
0.05 | je 2118
| mov 0x358(%r10),%r11
It turns out that there are several atomically accessed fields in its
vicinity, causing the cache line to bounce all the time. Let's collect
the few frequently changed fields and place them together at the end
of the structure, and plug the 32-bit hole with another isolated field.
Doing so also reduced a little bit the cost of decrementing be->be_conn
in process_stream(), and overall the HTTP/1 performance increased by
about 1% both on ARM and x86_64.
OPTIM: backend: skip conn reuse for incompatible proxies
When trying to reuse a backend connection, a connection hash is
calculated to match an entry with similar parameters. Previously, this
operation was skipped if the stream content wasn't based on HTTP, as it
would have been incompatible with http-reuse.
With the introduction of SPOP backends, this condition was removed, so
that it can also benefit from connection reuse. However, this means that
now hash calcul is always performed when connecting to a server, even
for TCP or log backends. This is unnecessary as these proxies cannot
perform connection reuse.
Note also that reuse mode is resetted on postparsing for incompatible
backends. This at least guarantees that no tree lookup will be performed
via be_reuse_connection(). However, connection lookup is still performed
in the session via session_get_conn() which is another unnecessary
operation.
Thus, this patch restores the condition so that reuse operations are now
entirely skipped if a backend mode is incompatible. This is implemented
via a new utility function named be_supports_conn_reuse().
This could be backported up to 3.1, as this commit could be considered
as a performance regression for tcp/log backend modes.
Willy Tarreau [Mon, 3 Nov 2025 06:13:53 +0000 (07:13 +0100)]
BUG/MAJOR: stats-file: fix crash on non-x86 platform caused by unaligned cast
Since commit d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)"), the
last_state_change field in the counters is a uint (to match how it's
reported). However, it happens that there are explicit casts in function
me_generate_field() to retrieve the value, and which cause crashes on
aarch64 and likely other non-x86 64-bit platforms due to atomically
reading an unaligned 64-bit value, and may even randomly crash other
64-bit platforms when reading past the end of the structure.
The fix for now adapts the cast to match the one used by the accessed
type (i.e. unsigned int), but the approach must change, as there's
nothing there which allows to figure whether or not the type is correct
by just reading the code. At minima a typeof() on a named field is
needed, but this requires more invasive changes, hence this temporary
fix.
No backport is needed, as stats-file is only in 3.3.
Damien Claisse [Wed, 29 Oct 2025 09:56:34 +0000 (09:56 +0000)]
BUG/MINOR: resolvers: ensure fair round robin iteration
Previous fixes restored round robin iteration, but an imbalance remains
when the response tree contains record types other than A or AAAA. Let's
take the following example: the DNS answers two A records and a CNAME.
The response "tree" (which is actually flat, more like a list) may look
as follows, ordered by hash:
- 1st item: first A record with IP 1
- 2nd item: second A record with IP 2
- 3rd item: CNAME record
As a consequence, resolv_get_ip_from_response will iterate as follows,
while the TTL is still valid:
- 1st call: DNS request is done, response tree is created, iteration
starts at the first item, IP 1 is returned.
- 2nd call: cached response tree is used, iteration starts at the second
item, IP 2 is returned.
- 3rd call: cached response tree is used, iteration starts at the third
item, but it's a CNAME, so we continue to the next item, which restarts
iteration at the first item, and IP 1 is returned.
- 4th call: cached response tree is used and iteration restarts at the
beginning, returning IP 1 again.
The 1-2-1-1-2-1-1-2 sequence will repeat, so IP 1 will be used twice as
often as IP 2, creating a strong imbalance. Even with more IP addresses,
the first one by hashing order in the tree will always receive twice the
traffic of the others.
To fix this, set the next iteration item to the one following the selected
IP record, if any. This ensures we never use the same IP twice in a row.
This commit should be backported where 3023e9819 ("BUG/MINOR: resolvers:
Restore round-robin selection on records in DNS answers") is, so as far
as 2.6.
MINOR: sample: optional AAD parameter support to aes_gcm_enc/dec
The aes_gcm_enc() and aes_gcm_dec() sample converters now accept an
optional fifth argument for Additional Authenticated Data (AAD). When
provided, the AAD value is base64-decoded and used during AES-GCM
encryption or decryption. Both string and variable forms are supported.
This enables use cases that require authentication of additional data.
Amaury Denoyelle [Fri, 31 Oct 2025 09:12:55 +0000 (10:12 +0100)]
OPTIM: quic: adjust automatic ALPN setting for QUIC servers
If a QUIC server is declared without ALPN, "h3" value is automatically
set during _srv_parse_finalize().
This patch adjusts this operation. Instead of relying on
ssl_sock_parse_alpn(), a plain strdup() is used. This is considered more
efficient as the ALPN string is constant in this case. This method is
already used for listeners on the frontend side.
Amaury Denoyelle [Fri, 31 Oct 2025 08:57:54 +0000 (09:57 +0100)]
MINOR: quic: reject conf with QUIC servers if not compiled
Ensure that QUIC support is compiled into haproxy when a QUIC server is
configured. This check is performed during _srv_parse_finalize() so that
it is detected both on configuration parsing and when adding a dynamic
server via the CLI.
Note that this changes the behavior of srv_is_quic() utility function.
Previously, it always returned false when QUIC support wasn't compiled.
With this new check introduced, it is now guaranteed that a QUIC server
won't exist if compilation support is not active. Hence srv_is_quic()
does not rely anymore on USE_QUIC define.
Amaury Denoyelle [Fri, 31 Oct 2025 08:58:57 +0000 (09:58 +0100)]
MINOR: quic: enable SSL on QUIC servers automatically
Previously, QUIC servers were rejected if SSL was not explicitely
activated using 'ssl' configuration keyword.
Change this behavior : now SSL is automatically activated for QUIC
servers when the keyword is missing. A warning is displayed as it is
considered better to explicitely note that SSL is in use.
Willy Tarreau [Fri, 31 Oct 2025 09:09:57 +0000 (10:09 +0100)]
[RELEASE] Released version 3.3-dev11
Released version 3.3-dev11 with the following main changes :
- BUG/MEDIUM: mt_list: Make sure not to unlock the element twice
- BUG/MINOR: quic-be: unchecked connections during handshakes
- BUG/MEDIUM: cli: also free the trash chunk on the error path
- MINOR: initcalls: Add a new initcall stage, STG_INIT_2
- MEDIUM: stick-tables: Use a per-shard expiration task
- MEDIUM: stick-tables: Remove the table lock
- MEDIUM: stick-tables: Stop if stktable_trash_oldest() fails.
- MEDIUM: stick-tables: Stop as soon as stktable_trash_oldest succeeds.
- BUG/MEDIUM: h1-htx: Don't set HTX_FL_EOM flag on 1xx informational messages
- BUG/MEDIUM: h3: properly encode response after interim one in same buf
- BUG/MAJOR: pools: fix default pool alignment
- MINOR: ncbuf: extract common types
- MINOR: ncbmbuf: define new ncbmbuf type
- MINOR: ncbmbuf: implement add
- MINOR: ncbmbuf: implement iterator bitmap utilities functions
- MINOR: ncbmbuf: implement ncbmb_data()
- MINOR: ncbmbuf: implement advance operation
- MINOR: ncbmbuf: add tests as standalone mode
- BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling
- MINOR: quic: remove received CRYPTO temporary tree storage
- MINOR: stats-file: fix typo in shm-stats-file object struct size detection
- MINOR: compiler: add FIXED_SIZE(size, type, name) macro
- MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct
- BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency
- BUG/MEDIUM: build: limit excessive and counter-productive gcc-15 vectorization
- BUG/MEDIUM: stick-tables: Don't loop if there's nothing left
- MINOR: acme: add the dns-01-record field to the sink
- MINOR: acme: display the complete challenge_ready command in the logs
- BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el
- MINOR: quic: remove unused conn-tx-buffers limit keyword
- MINOR: quic: prepare support for options on FE/BE side
- MINOR: quic: rename "no-quic" to "tune.quic.listen"
- MINOR: quic: duplicate glitches FE option on BE side
- MINOR: quic: split congestion controler options for FE/BE usage
- MINOR: quic: split Tx options for FE/BE usage
- MINOR: quic: rename max Tx mem setting
- MINOR: quic: rename retry-threshold setting
- MINOR: quic: rename frontend sock-per-conn setting
- BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage
- BUG/MINOR: quic: split option for congestion max window size
- BUG/MINOR: quic: rename and duplicate stream settings
- BUG/MEDIUM: applet: Improve again spinning loops detection with the new API
- Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency"
- Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct"
- Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro"
- BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt)
- BUG/MINOR: stick-tables: properly index string-type keys
- BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1
- BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims
- MEDIUM: quic: Fix build with openssl-compat
- MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty
- MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()
- BUG/MEDIUM: cli: do not return ACKs one char at a time
- BUG/MEDIUM: ssl: Crash because of dangling ckch_store reference in a ckch instance
- BUG/MINOR: ssl: Remove unreachable code in CLI function
- BUG/MINOR: acl: warn if "_sub" derivative used with an explicit match
- DOC: config: fix confusing typo about ACL -m ("now" vs "not")
- DOC: config: slightly clarify the ssl_fc_has_early() behavior
- MINOR: ssl-sample: add ssl_fc_early_rcvd() to detect use of early data
- CI: disable fail-fast on fedora rawhide builds
- MINOR: http: fix 405,431,501 default errorfile
- BUG/MINOR: init: Do not close previously created fd in stdio_quiet
- MINOR: init: Make devnullfd global and create it earlier in init
- MINOR: init: Use devnullfd in stdio_quiet calls instead of recreating a fd everytime
- MEDIUM: ssl: Add certificate password callback that calls external command
- MEDIUM: ssl: Add local passphrase cache
- MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert'
- BUG/MINOR: resolvers: Apply dns-accept-family setting on additional records
- MEDIUM: h1: Immediately try to read data for frontend
- REGTEST: quic: add ssl_reuse.vtc new QUIC test
- BUG/MINOR: ssl: returns when SSL_CTX_new failed during init
- MEDIUM: ssl/ech: config and load keys
- MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI
- MINOR: listener: implement bind_conf_find_by_name()
- MINOR: ssl/ech: key management via stats socket
- CI: github: add USE_ECH=1 to haproxy for openssl-ech job
- DOC: configuration: "ech" for bind lines
- BUG/MINOR: ech: non destructive parsing in cli_find_ech_specific_ctx()
- DOC: management: document ECH CLI commands
- MEDIUM: mux-h2: do not needlessly refrain from sending data early
- MINOR: mux-h2: extract the code to send preface+settings into its own function
- BUG/MINOR: mux-h2: send the preface along with the first request if needed
Willy Tarreau [Wed, 29 Oct 2025 16:38:07 +0000 (17:38 +0100)]
BUG/MINOR: mux-h2: send the preface along with the first request if needed
Tests involving 0-RTT and H2 on the backend show that 0-RTT is being
partially used but does not work. The analysis shows that only the
preface and settings are sent using early-data and the request is sent
separately. As explained in the previous patch, this is caused by the
fact that a wakeup of the iocb is needed just to send the preface, then
a new call to process_stream is needed to try sending again.
Here with this patch, we're making h2_snd_buf() able to send the preface
if it was not yet sent. Thanks to this, the preface, settings and first
request can now leave as a single TCP segment. In case of TLS with 0-RTT,
it now allows all the block to leave in early data.
Even in clear-text H2, we're now seeing a 15% lower context-switch count,
and the number of calls to process_stream() per connection dropped from 3
to 2. The connection rate increased by an extra 9.5%. Compared to without
the last 3 patches, this is a 22% reduction of context-switches, 33%
reduction of process_stream() calls, and 15.7% increase in connection
rate. And more importantly, 0-RTT now really works with H2 on the
backend, saving one full RTT on the first request.
This fix is only for a missed optimization and a non-functional 0-RTT
on the backend. It's worth backporting it, but it doesn't cause enough
harm to hurry a backport. Better wait for it to live a little bit in
3.3 (till at least a week or two after the final release) before
backporting it. It's not sure that it's worth going beyond 3.2 in any
case. It depends on the these two previous commits:
MEDIUM: mux-h2: do not needlessly refrain from sending data early
MINOR: mux-h2: extract the code to send preface+settings into its own function
Willy Tarreau [Wed, 29 Oct 2025 16:28:57 +0000 (17:28 +0100)]
MINOR: mux-h2: extract the code to send preface+settings into its own function
The code that deals with sending preface + settings and changing the
state currently is in h2_process_mux(), but we'll want to do it as
well from h2_snd_buf(), so let's move it to a dedicate function first.
At this point there is no functional change.
Willy Tarreau [Wed, 29 Oct 2025 16:38:07 +0000 (17:38 +0100)]
MEDIUM: mux-h2: do not needlessly refrain from sending data early
The mux currently refrains from sending data before H2_CS_FRAME_H, i.e.
before the peer's SETTINGS frame was received. While it makes sense on
the frontend, it's causing harm on the backend because it forces the
first request to be sent in two halves over an extra RTT: first the
preface and settings, second the request once the settings are received.
This is totally contrary to the philosophy of the H2 protocol, consisting
in permitting the client to send as soon as possible.
Actually what happens is the following:
- process_stream() calls connect_server()
- connect_server() creates a connection, and if the proto/alpn is guessed
or known, the mux is instantiated for the current request.
- the H2 init code wakes the h2 tasklet up and returns
- process_stream() tries to send the request using h2_snd_buf(), but that
one sees that we're before H2_CS_FRAME_H, refrains from doing so and
returns.
- process_stream() subscribes and quits
- the h2 tasklet can now execute to send the preface and settings, which
leave as a first TCP segment. The connection is ready.
- the iocb is woken again once the server's SETTINGS frame is received,
turning the connection to the H2_CS_FRAME_H state, and the iocb wake
up process_stream().
- process_stream() executes again and can try to send again.
- h2_snd_buf() is called and finally sends the request as a second TCP
segment.
Not only this is inefficient, but it also renders 0-RTT and TFO impossible
on H2 connections. When 0-RTT is used, only the preface and settings leave
as early data (the very first data of that connection), which is totally
pointless.
In order to fix this, we have to go through a few steps:
- first we need to let data be sent to a server immediately after the
SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of
H2_CS_FRAME_H). However, some protocol extensions are advertised by
the server using SETTINGS (e.g. RFC8441) and some requests might need
to know the existence of such extensions. For this reason we're adding
a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some
operations were not done because a server's SETTINGS frame is needed.
This is set when trying to send a protocol upgrade or extended CONNECT
during H2_CS_SETTINGS1, indicating that it's needed to wait for
H2_CS_FRAME_H in this case. The flag is always set on frontend
connections. This is what is being done in this patch.
- second, we need to be able to push the preface opportunistically with
the first h2_snd_buf() so that it's not needed to wake the tasklet up
just to send that and wake process_stream() again. This will be in a
separate patch.
By doing the first step, we're at least saving one needless tasklet
wakeup per connection (~9%), which results in ~5% backend connection
rate increase.
BUG/MINOR: ech: non destructive parsing in cli_find_ech_specific_ctx()
cli_find_ech_specific_ctx() parses the <frontend>/<bind_conf> and sets
a \0 in place the '/'. But the originals tring is still used to emit
messages in the CLI so we only output the frontend part.
This patch do the parsing in a trash buffer instead.
This patch extends the ECH support by adding runtime CLI commands to
view and modify ECH configurations.
New commands are added to the HAProxy CLI:
- "show ssl ech [<name>]" displays all ECH configurations or a specific
one.
- "add ssl ech <name> <payload>" adds a new PEM-formatted ECH
configuration.
- "set ssl ech <name> <payload>" replaces all existing ECH
configurations.
- "del ssl ech <name> [<age-in-secs>]" removes ECH configurations,
optionally filtered by age.
MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI
This patch adds functions to expose Encrypted Client Hello (ECH) status
and outer SNI information for logging and sample fetching.
Two new helper functions are introduced in ech.c:
- conn_get_ech_status() places the ECH processing status string into a
buffer.
- conn_get_ech_outer_sni() retrieves the outer SNI value if ECH
succeeded.
Two new sample fetch keywords are added:
- "ssl_fc_ech_status" returns the ECH status string.
- "ssl_fc_ech_outer_sni" returns the outer SNI value seen during ECH.
These allow ECH information to be used in HAProxy logs, ACLs, and
captures.
This patch introduces the USE_ECH option in the Makefile to enable
support for Encrypted Client Hello (ECH) with OpenSSL.
A new function, load_echkeys, is added to load ECH keys from a specified
directory. The SSL context initialization process in ssl_sock.c is
updated to load these keys if configured.
A new configuration directive, `ech`, is introduced to allow users to
specify the ECH key directory in the listener configuration.
BUG/MINOR: ssl: returns when SSL_CTX_new failed during init
In ssl_sock_initial_ctx(), returns when SSL_CTX_new() failed instead of
trying to apply anything on the ctx. This may avoid crashing when
there's not enough memory anymore during configuration parsing.
Note that this test does not work with OpenSSL 3.5.0 QUIC API because
the callback set by SSL_CTX_sess_set_new_cb() (ssl_sess_new_srv_cb()) is not
called (at least for QUIC clients)
The role of this new QUIC test is to run the same SSL/TCP test as
reg-tests/ssl/ssl_reuse.vtc but with QUIC connections where applicable (only with
TLSv1.3).
To do so, this QUIC test uses the "include" vtc command to run ssl/ssl_reuse.vtc
It also sets the VTC_SOCK_TYPE environment variable with the "setenv" command and
"quic" as value. This will ask vtest2 to use QUIC sockets for all "fd@{...}"
addresses prefixed by "${VTC_SOCK_TYPE}+" socket type if VTC_SOCK_TYPE value is "quic".
The SSL/TCP is modified to set this environment variable with "setenv -ifunset"
from ssl/ssl_reuse.vtc with "stream" as value, if it not already set.
Thanks to this latter patch, vtest2 retrieves the VTC_SOCK_TYPE environment variable
value, then it parses the vtc file to retrieve all the fd addresses prefixed by
"${VTC_SOCK_TYPE}+" and creates a QUIC socket or a TCP socket depending on this
variable value.
Olivier Houchard [Wed, 29 Oct 2025 15:39:49 +0000 (15:39 +0000)]
MEDIUM: h1: Immediately try to read data for frontend
In h1_init(), if we're a frontend connection, immediately attempt to
read data, if the connection is ready, instead of just subscribing.
There may already be data available, at least if we're using 0RTT.
This may be backported up to 2.8 in a while, after 3.3 is released, so
that if it causes problem, we have a chance to hear about it.
BUG/MINOR: resolvers: Apply dns-accept-family setting on additional records
dns-accept-family setting was only evaluated for responses to A / AAAA DNS
queries. It was ignored when additional records in SRV responses were
parsed.
With this patch, whena SRV responses is parsed, additional records not
matching the dns-accept-family setting are ignored, as expected.