MEDIUM: peer: Improve management of reconnect timer and heartbeat messages
heartbeat messages are sent to keep a connection active. However, a
heartbeat messages was sent periodically, even if some other messages were
sent during this period. It is not an issue but it is useless. heartbeat
messages should only be sent if nothing was sent to the peer since a
moment. So, in this patch, the heartbeat timer is rearmed each time a message
is sent.
On the receiver side, the reconnect timer was only rearmed when a heartbeat
message was received instead of rearming it for any messages. Again, it is
not an issue because the inactivity is managed with PEER_F_ALIVE flag. This
flag is removed when the reconnect timer timed out but it is reinserted when
something is received. But an periodic wakeup may be uselessly performed.
So, in this patch, the reconnect timer is rearmed each time a message is
received.
MEDIUM: peers: Save date of the last update to wake the peer applet
Instead of looking for new updates in each updates lists to wake a peer
applet up, we now only detect that some updates should have been inserted by
comparing the date of the last update inserted in the list and the last
update sent to the peer.
It is not 100% accurrate of course. Some extra wakeups may be observed. But
this should not lead to any spinning loop because the operation is performed
by the sync task. This task is woken up when a timeout is fired or when an
update was inserted. However, this saves several loops on the updates lists.
MAJOR: peers: Remove the update lock by using a mt-list to deal with updates
In this patch, the update tree is replaced by a mt-list. It is a huge patch
with several changes. Main ones are in the function sending updates.
By using a list instead of a tree, we loose the order between updates and
the ability to restart for a given update using its id (the key to index
updates in the tree). However, to use the tree, it had to be locked and it
was a cause of contention between threads, leading the watchdog to kill the
process in worst cases. Because the idea it to split the updates by buckets
to divide the contention on updated, the order between updates will be lost
anyway. So, the tree can be replaced by a list. By using a mt-list, we can
also remove the update lock.
To be able to use a list instead of a tree, each peer must save its position
in the list, to be able to process new entries only at each loop. These
marker are "special" sticky session of type STKSESS_UPDT_MARKER. Of course,
these marker are not in any stick-tables but only in updates lists. And only
the ownr of a marker can move it in the list. Each peer owns two markers for
each list (so two markers per shared table). The first one used a start
point for a loop, and the other one used as stop point. The way these marker
are moved in the list is not obvious, especially for the first one.
Updates sent during a full resync are now handled exactly a the same way
than other updates. Only the moment the stop marker is set is different.
MINOR: stktable: Use an enum to type a sticky session in the updates tree
Instead of using a boolean to know if an entry in the updates tree is local
or not, an enum is used. This change will be mandatory when updates tree
will be replaced by a list to be able to add markers owned by each peer.
So now a sticky sessin has no type (STKSESS_UPDT_NONE) if it is not in the
updates tree. STKSESS_UPDT_LOCAL is used for local entries and
STKSESS_UPDT_REMOTE for remote ones. STKSESS_UPDT_MARKER is not used for
now.
MINOR: peers: Separate id of update messages from the update tree
Now the updates are no longer tracked by stick-table and we are no longer
use their id to detect missed updates, there is no reason to have a matching
between the internal update id in the id used in updated messages.
So, now, for a given peer, id of the last update messages sent is saved in
each shared table and it is incremented when a new message is updated.
MEDIUM: peers: No longer ack updates during a full resync
ACK messages received by a peer sending updates during a full resync are
ignored. So, on the other side, there is no reason to still send these ACK
messages. Let's skip them.
MAJOR: peers: Stop to track acked updates per shared table
This patch is quite small but the change is really important. Thanks to the
previous patch, we can use PEER_F_SYNCHED flag to know if a peer is
synchronized or not. So instead of tracking last ack messages for each table
to be able to restart at a given point when the peer reconnects, we decided
to restart from the begining if a peer is not synchronized when a new
connection is established.
So, it is a huge change because, on reconnect, instead of pushing some
missed updates, all local updates are pushed again. Most of time, it is not
a problem because nowadays, connection are quite stable, especially because
a heartbeat message is sent to keep it active. The only drawback is when a
peer is restarted. In that case, we have no way to know it is synchronized
because he learned table contents from it old local peer.
This change is mandatory. First to replace the update tree by a mt-list and
remove the update lock. Then to split this list by buckets to reduce
contention.
MEDIUM: peers: Add infos in peer structure to know if it is synchronized or not
Info about the last update message sent are now saved for each peer. The
shared-table and the update message id are saved. These information are used
when a ack message is received to know if it matches the last update message
sent. When this matches, we are sure the peer as received all updates sent
and is synchronized. This information is saved thanks to the flag
PEER_F_SYNCHED.
So, at any time, we know if a peer is synchronized or not.
Trace messages for peers were only protocol oriented and information
provided were quite light. With this patch, the traces were
improved. information about the peer, its applet and the section are
dumped. Several verbosities are now available and messages are dumped at
different levels depending on the context. It should easier to track issues
in the peers.
BUG/MEDIUM: h1-htx: Don't set HTX_FL_EOM flag on 1xx informational messages
1xx informational messages are part of the HTTP response. It is not expected
to have a HX_FL_EOM flag set after parsing such messages when received from
a server. It is espacially important whne an informational messages is
processed on client side while the final response was not recieved yet, to
not erroneously detect the end of the message.
The HTTP multiplexers seem to ignore the HTX_FL_EOM flag for information
messages, but it remains an error from the HTX specification point of
view. So it must be fixed.
While it should theorically be backported as far as 3.0, it is a good idea
to not do so for now because no bug was reported and regressions may happen.
Olivier Houchard [Wed, 15 Oct 2025 14:01:21 +0000 (16:01 +0200)]
MEDIUM: stick-tables: Stop as soon as stktable_trash_oldest succeeds.
stktable_trash_oldest() goes through all the shards, trying to free a
number of entries. Going through each shard is expensive, as we have to
take the shard lock, so stop as soon as we free'd at least one entry, as
it is only called when we want to make room for one entry.
Olivier Houchard [Tue, 14 Oct 2025 16:51:32 +0000 (18:51 +0200)]
MEDIUM: stick-tables: Stop if stktable_trash_oldest() fails.
In stksess_new(), if the table is full, we call stktable_trash_oldest()
to remove a few entries so that we have some room for a new one.
It is unlikely, but possible, that stktable_trash_oldest() will fail. If
so, just give up and do not add the new entry, instead of adding it
anyway.
Give up if stktable_trash_oldest() fails to free any entry
MEDIUM: stick-tables: Use a per-shard expiration task
Instead of having per-table expiration tasks, just use one per shard.
The task will now go through all the tables to expire entries. When a
table gets an expiration earlier than the one previously known, it will
be put in a mt-list, and the task will be responsible to put it into an
eb32, ordered based on the next expiration.
Each per-shard task will run on a different thread, so it should lead to
a better load distribution than the per-table tasks.
Olivier Houchard [Thu, 16 Oct 2025 13:45:52 +0000 (15:45 +0200)]
MINOR: initcalls: Add a new initcall stage, STG_INIT_2
Add a new initcall stage, STG_INIT_2, for stuff to be called after
step_init_2() is called, so after we know for sure that global.nbthread
will be set.
Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of
STG_INIT, in anticipation for it to be enhanced and have a need for
global.nbthread.
Willy Tarreau [Mon, 20 Oct 2025 12:50:27 +0000 (14:50 +0200)]
BUG/MEDIUM: cli: also free the trash chunk on the error path
Since commit 20ec1de214 ("MAJOR: cli: Refacor parsing and execution of
pipelined commands"), command not returning any response (e.g. "quit")
don't pass through the free_trash_chunk() call, possibly leaking the
cmdline buffer. A typical way to reproduce it is to loop on "quit" on
the CLI, though it very likely affects other specific commands.
Let's make sure in the release handler that we always release that
chunk in any case. This must be backported to 3.2.
BUG/MINOR: quic-be: unchecked connections during handshakes
This bug impacts only the backends.
The ->conn (pointer to struct connection) member validity of the ssl_sock_ctx
struct was not checked before being dereferenced, leading to possible crashes
in qc_ssl_do_hanshake() during handshake.
This was reported by GH #3163 issue.
No need to backport because the QUIC backend support arrived with 3.3
Olivier Houchard [Sun, 19 Oct 2025 21:17:55 +0000 (23:17 +0200)]
BUG/MEDIUM: mt_list: Make sure not to unlock the element twice
In mt_list_delete(), if the element was not in a list, then n and p will
point to it, and so setting n->prev and n->next will be enough to unlock it.
Don't do it twice, as once it's been done the first time, another thread may
be working with it, and may have added it to a list already, and doing it
a second time can lead to list inconsistencies.
Willy Tarreau [Sat, 18 Oct 2025 09:24:05 +0000 (11:24 +0200)]
[RELEASE] Released version 3.3-dev10
Released version 3.3-dev10 with the following main changes :
- BUG/MEDIUM: connections: Only avoid creating a mux if we have one
- BUG/MINOR: sink: retry attempt for sft server may never occur
- CLEANUP: mjson: remove MJSON_ENABLE_RPC code
- CLEANUP: mjson: remove MJSON_ENABLE_PRINT code
- CLEANUP: mjson: remove MJSON_ENABLE_NEXT code
- CLEANUP: mjson: remove MJSON_ENABLE_BASE64 code
- CLEANUP: mjson: remove unused defines and math.h
- BUG/MINOR: http-ana: Reset analyse_exp date after 'wait-for-body' action
- CLEANUP: mjson: remove unused defines from mjson.h
- BUG/MINOR: acme: avoid overflow when diff > notAfter
- DEV: patchbot: use git reset+checkout instead of pull
- MINOR: proxy: explicitly permit abortonclose on frontends and clarify the doc
- REGTESTS: fix h2_desync_attacks to wait for the response
- REGTESTS: http-messaging: fix the websocket and upgrade tests not to close early
- MINOR: proxy: only check abortonclose through a dedicated function
- MAJOR: proxy: enable abortonclose by default on HTTP proxies
- MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default
- MAJOR: proxy: enable abortonclose by default on TLS listeners
- MINOR: h3/qmux: Set QC_SF_UNKNOWN_PL_LENGTH flag on QCS when headers are sent
- MINOR: stconn: Add two fields in sedesc to replace the HTX extra value
- MINOR: h1-htx: Increment body len when parsing a payload with no xfer length
- MINOR: mux-h1: Set known input payload length during demux
- MINOR: mux-fcgi: Set known input payload length during demux
- MINOR: mux-h2: Use <body_len> H2S field for payload without content-length
- MINOR: mux-h2: Set known input payload length of the sedesc
- MINOR: h3: Set known input payload length of the sedesc
- MINOR: stconn: Move data from kip to kop when data are sent to the consumer
- MINOR: filters: Reset knwon input payload length if a data filter is used
- MINOR: hlua/http-fetch: Use <kip> instead of HTX extra field to get body size
- MINOR: cache: Use the <kip> value to check too big objects
- MINOR: compression: Use the <kip> value to check body size
- MEDIUM: mux-h1: Stop to use HTX extra value when formatting message
- MEDIUM: htx: Remove the HTX extra field
- MEDIUM: acme: don't insert acme account key in ckchs_tree
- BUG/MINOR: acme: memory leak from the config parser
- CI: cirrus-ci: bump FreeBSD image to 14-3
- BUG/MEDIUM: ssl: take care of second client hello
- BUG/MINOR: ssl: always clear the remains of the first hello for the second one
- BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor
- MEDIUM: applet: Forward <kip> to applets
- DEBUG: mux-h1: Dump <kip> and <kop> values with sedesc info
- BUG/MINOR: ssl: leak in ssl-f-use
- BUG/MINOR: ssl: leak crtlist_name in ssl-f-use
- BUILD: makefile: disable tail calls optimizations with memory profiling
- BUG/MEDIUM: apppet: Improve spinning loop detection with the new API
- BUG/MINOR: ssl: Free global_ssl structure contents during deinit
- BUG/MINOR: ssl: Free key_base from global_ssl structure during deinit
- MEDIUM: jwt: Remove certificate support in jwt_verify converter
- MINOR: jwt: Add new jwt_verify_cert converter
- MINOR: jwt: Do not look into ckch_store for jwt_verify converter
- MINOR: jwt: Add new "jwt" certificate option
- MINOR: jwt: Add specific error code for known but unavailable certificate
- DOC: jwt: Add doc about "jwt_verify_cert" converter
- MINOR: ssl: Dump options in "show ssl cert"
- MINOR: jwt: Add new "add/del/show ssl jwt" CLI commands
- REGTEST: jwt: Test new CLI commands
- BUG/MINOR: ssl: Potential NULL deref in trace macro
- MINOR: regex: use a thread-local match pointer for pcre2
- BUG/MEDIUM: pools: fix bad freeing of aligned pools in UAF mode
- MEDIUM: pools: detect() when munmap() fails in UAF mode
- TESTS: quic: useless param for b_quic_dec_int()
- BUG/MEDIUM: pools: fix crash on filtered "show pools" output
- BUG/MINOR: pools: don't report "limited to the first X entries" by default
- BUG/MAJOR: lb-chash: fix key calculation when using default hash-key id
- BUG/MEDIUM: stick-tables: Don't forget to dec count on failure.
- BUG/MINOR: quic: check applet_putchk() for 'show quic' first line
- TESTS: quic: fix uninit of quic_cc_path const member
- BUILD: ssl: can't build when using -DLISTEN_DEFAULT_CIPHERS
- BUG/MAJOR: quic: uninitialized quic_conn_closed struct members
- BUG/MAJOR: quic: do not reset QUIC backends fds in closing state
- BUG/MINOR: quic: SSL counters not handled
- DOC: clarify the experimental status for certain features
- MINOR: config: remove experimental status on tune.disable-fast-forward
- MINOR: tree-wide: add missing TAINTED flags for some experimental directives
- MEDIUM: config: warn when expose-experimental-directives is used for no reason
- BUG/MEDIUM: threads/config: drop absent threads from thread groups
- REGTESTS: remove experimental from quic/retry.vtc
Willy Tarreau [Fri, 17 Oct 2025 18:55:43 +0000 (20:55 +0200)]
REGTESTS: remove experimental from quic/retry.vtc
Recent commit 8b7a82cd30 ("MEDIUM: config: warn when
expose-experimental-directives is used for no reason") triggered on
this test exactly for the reason it was made for. The tests were just
done without quic on it. Let's drop the unneeded option.
Willy Tarreau [Fri, 17 Oct 2025 18:36:00 +0000 (20:36 +0200)]
BUG/MEDIUM: threads/config: drop absent threads from thread groups
Thread groups can be assigned arbitrary thread ranges, but if the
mentioned threads do not exist, this causes crashes in listener_accept()
or some connections to be ignored. The reason is that the calculated
mask is derived from the thread group's enabled threads count. Examples:
This commit removes missing threads, emits a warning when the thread
group just has less threads than requested, and an error when it is
left with no threads at all.
This must be backported to 3.1 since the issue is present there already.
Willy Tarreau [Fri, 17 Oct 2025 16:15:12 +0000 (18:15 +0200)]
MEDIUM: config: warn when expose-experimental-directives is used for no reason
If users start to enable expose-experimental-directives for the purpose
of testing one specific feature, there are chances that the option remains
forever and hides the experimental status of other options.
Let's emit a warning if the option appears and is not used. This will
remind users that they can now drop it, and help keep configs safe for
future upgrades.
Willy Tarreau [Fri, 17 Oct 2025 15:57:40 +0000 (17:57 +0200)]
MINOR: tree-wide: add missing TAINTED flags for some experimental directives
We normally taint the process when using experimental directives, but
a handful of places were missed so we don't always know that they are
in use. Let's fix these places (hint for future directives, just look
for places checking for "experimental_directives_allowed", and add
"mark_tainted(TAINTED_CONFIG_EXP_KW_DECLARED);").
Willy Tarreau [Fri, 17 Oct 2025 16:55:03 +0000 (18:55 +0200)]
MINOR: config: remove experimental status on tune.disable-fast-forward
The option was turned to off by default in 2.8 with commit 2f7c82bfd
("BUG/MINOR: haproxy: Fix option to disable the fast-forward"), however
at the same time it should have dropped its experimental status since
the feature is enabled by default. The only goal of the option is to
debug something, like many other tune.xxx options. The option should
still normally not be used without being invited to do so by developers
looking for something specific though.
This could be backported if desired to simplify debugging, though this
has never been needed for now.
Willy Tarreau [Fri, 17 Oct 2025 16:39:03 +0000 (18:39 +0200)]
DOC: clarify the experimental status for certain features
Certain features require "expose-experimental-directives" to be set in
the global section. Let's clarify that experimental featuers are only
maintained in best effort mode, may break during the stable cycle, and
are generally not maintained beyond the release of the next LTS branch
since it is extremely challenging, and early adopters are expected to
upgrade to benefit from improvements anyway.
The SSL counters were not handled at all for QUIC connections. This patch
implement ssl_sock_update_counters() extracting the code from ssl_sock.c
and call this function where applicable both in TLS/TCP and QUIC parts.
BUG/MAJOR: quic: do not reset QUIC backends fds in closing state
This bug impacts only the backends.
When entering the closing state, a quic_closed_conn is used to replace the quic_conn.
In this state, the ->fd value was reset to -1 value calling qc_init_fd(). This value
is used by qc_may_use_saddr() which supposes it cannot be -1 for a backend, leading
->li to be dereferencd, which is legal only for a listener.
This bug impacts only the backend but with possible crash when qc_may_use_saddr()
is called: qc_test_fd() is false leading qc->li to be dereferenced. This is legal
only for a listener.
This patch prevents such fd value resettings for backends.
No need to backport because the QUIC backends support arrived with 3.3.
BUG/MAJOR: quic: uninitialized quic_conn_closed struct members
A quic_conn_closed struct is initialized to replace the quic_conn when the
connection enters the closing to reduce the connection memory footprint.
->max_udp_payload quic_conn_close was not initialized leading to possible
BUG_ON()s in qc_rcv_buf() when comparing the RX buf size to this payload.
->cntrs counters were alon not initialized with the only consequence
to generate wrong values for these counters.
BUILD: ssl: can't build when using -DLISTEN_DEFAULT_CIPHERS
Emeric reported that he can't build haproxy anymore since 9bc6a034
("BUG/MINOR: ssl: Free global_ssl structure contents during deinit").
src/ssl_sock.c:7020:40: error: comparison with string literal results in unspecified behavior [-Werror=address]
7020 | if (global_ssl.listen_default_ciphers != LISTEN_DEFAULT_CIPHERS)
| ^~
src/ssl_sock.c:7023:41: error: comparison with string literal results in unspecified behavior [-Werror=address]
7023 | if (global_ssl.connect_default_ciphers != CONNECT_DEFAULT_CIPHERS)
| ^~
src/ssl_sock.c: At top level:
Indeed the mentionned patch is checking the pointer in order to free
something freeable, but that can't work because these constant are
strings literal which can be passed from the compiler and not pointers.
Also the test is not useful, because these strings are strdup() in
__ssl_sock_init, so they can be free directly.
Must be backported in every stable branches with 9bc6a034.
Amaury Denoyelle [Mon, 13 Oct 2025 16:16:22 +0000 (18:16 +0200)]
BUG/MINOR: quic: check applet_putchk() for 'show quic' first line
Ensure applet_putchk() return value is checked when outputing via the
CLI 'show quic' header line.
This is only to align with other usages of the same function, as trash
output buffer should always be large enough for it. As such, the command
is simply aborted if this is not the case.
This should fix coverity report from github issue #3139.
Olivier Houchard [Tue, 14 Oct 2025 16:11:31 +0000 (18:11 +0200)]
BUG/MEDIUM: stick-tables: Don't forget to dec count on failure.
In stksess_new(), if we failed to allocate memory for the new stksess,
don't forget to decrement the table entry count, as nobody else will
do it for us.
An artificially high count could lead to at least purging entries while
there is no need to.
Willy Tarreau [Thu, 16 Oct 2025 08:30:57 +0000 (10:30 +0200)]
BUG/MAJOR: lb-chash: fix key calculation when using default hash-key id
A subtle regression was introduced in 3.0 by commit faa8c3e02 ("MEDIUM:
lb-chash: Deterministic node hashes based on server address"). When keys
are calculated from the server's ID (which is the default), due to the
reorganisation of the code, the key ended up being hashed twice instead
of being multiplied by the scaling range.
While most users will never notice it, it is blocking some large cache
users from upgrading from 2.8 to 3.0 or 3.2 because the keys are
redistributed.
After a check with users on the mailing list [1] it was estimated that
keep the current situation is the worst choice because those who have
not yet upgraded will face the problem while by fixing it, those who
already have and for whom it happened smoothly will handle it just
right again.
As such this fix must be backported to 3.0 without waiting (in order
to preserve those who upgrade from two redistributions). Please note
that only configurations featuring "hash-type consistent" and not
having "hash-key" present with a value other than "id" are affected,
others are not (e.g. "hash-key addr" is unaffected).
Willy Tarreau [Thu, 16 Oct 2025 06:38:35 +0000 (08:38 +0200)]
BUG/MINOR: pools: don't report "limited to the first X entries" by default
With the fix in commit 982805e6a3 ("BUG/MINOR: pools: Fix the dump of
pools info to deal with buffers limitations"), the max count is now
compared to the number of dumped pools instead of the configured
numbered, and keeping >= is no longer valid because maxcnt is set by
default to the same value when not set, so this means that since this
patch we're always displaying "limited to the first X entries" where X
is the number of dumped entries even in the absence of any limitation.
Let's just fix the comparison to only show this when the limit is lower.
This must be backported to 3.2 where the patch above already is.
Willy Tarreau [Thu, 16 Oct 2025 06:27:44 +0000 (08:27 +0200)]
BUG/MEDIUM: pools: fix crash on filtered "show pools" output
The truncation of pools output that was adressed in commit 982805e6a3
("BUG/MINOR: pools: Fix the dump of pools info to deal with buffers
limitations") required to split the pools filling from dumping. However
there is a problem when a limit is passed that is lower than the number
of pools or if a pool name is specified or if pool caches are disabled,
because in this case the number of filled slots will be lower than the
initially allocated one, and empty entries will be visited either by the
sort functions when filling the entries if "byxxx" is specified, or by
the dump function after the last entry, but none of these functions was
expecting to be passed a NULL entry.
Let's just re-adjust nbpools to match the number of filled entries at
the end. Anyway the totals are calculated on the number of dumped
entries.
This must be backported to 3.2 since the fix above was backported there
as well.
The third parameter passed to b_quic_dec_int() is unitialized. This is not a bug.
But this disturbs coverity for an unknown reason as revealed by GH issue #3154.
This patch takes the opportunity to use NULL as passed value to avoid using such
an uneeded third parameter.
Should be backported to 3.2 where this unit test was introduced.
Willy Tarreau [Mon, 13 Oct 2025 17:22:31 +0000 (19:22 +0200)]
MEDIUM: pools: detect() when munmap() fails in UAF mode
Better check that munmap() always works, otherwise it means we might
have miscalculated an address, and if it fails silently, it will eat
all the memory extremely quickly. Let's add a BUG_ON() on munmap's
return.
Willy Tarreau [Mon, 13 Oct 2025 17:15:55 +0000 (19:15 +0200)]
BUG/MEDIUM: pools: fix bad freeing of aligned pools in UAF mode
As reported by Christopher, in UAF mode memory release of aligned
objects as introduced in commit ef915e672a ("MEDIUM: pools: respect
pool alignment in allocations") does not work. The padding calculation
in the freeing code is no longer correct since it now depends on the
alignment, so munmap() fails on EINVAL. Fortunately we don't care much
about it since we know it's the low bits of the passed address, which
is much simpler to compute, since all mmaps are page-aligned.
There's no need to backport this, as this was introduced in 3.3.
Willy Tarreau [Mon, 13 Oct 2025 14:47:50 +0000 (16:47 +0200)]
MINOR: regex: use a thread-local match pointer for pcre2
The pcre2 matching requires an array of matches for grouping, that is
allocated when executing the rule by pre-processing it, and that is
immediately freed after use. This is quite inefficient and results in
annoying patterns in "show profiling" that attribute the allocations
to libpcre2 and the releases to haproxy.
A good suggestion from Dragan is to pre-allocate these per thread,
since the entry is not specific to a regex. In addition we're already
limited to MAX_MATCH matches so we don't even have the problem of
having to grow it while parsing nor processing.
The current patch adds a per-thread pair of init/deinit functions to
allocate a thread-local entry for that, and gets rid of the dynamic
allocations. It will result in cleaner memory management patterns and
slightly higher performance (+2.5%) when using pcre2.
MINOR: jwt: Add new "add/del/show ssl jwt" CLI commands
The new "add/del ssl jwt <file>" commands allow to change the "jwt" flag
of an already loaded certificate. It allows to delete certificates used
for JWT validation, which was not yet possible.
The "show ssl jwt" command iterates over all the ckch_stores and dumps
the ones that have the option set.
DOC: jwt: Add doc about "jwt_verify_cert" converter
Add information about the new "jwt_verify_cert" converter and update the
existing "jwt_converter" doc to remove mentions of certificates from it.
Add information about the new "jwt" certificate option.
MINOR: jwt: Add specific error code for known but unavailable certificate
A certificate that does not have the 'jwt' flag enabled cannot be used
for JWT validation. We now raise a specific return value so that such a
case can be identified.
This option can be used to enable the use of a given certificate for JWT
verification. It defaults to 'off' so certificates that are declared in
a crt-store and will be used for JWT verification must have a
"jwt on" option in the configuration.
This converter will be in charge of performing the same operation as the
'jwt_verify' one except that it takes a full-on pem certificate path
instead of a public key path as parameter.
The certificate path can be either provided directly as a string or via
a variable. This allows to use certificates that are not known during
init to perform token validation.
MEDIUM: jwt: Remove certificate support in jwt_verify converter
The jwt_verify converter will not take full-on certificates anymore
in favor of a new soon to come jwt_verify_cert. We might end up with a
new jwt_verify_hmac in the future as well which would allow to deprecate
the jwt_verify converter and remove the need for a specific internal
tree for public keys.
The logic to always look into the internal jwt tree by default and
resolve to locking the ckch tree as little as possible will also be
removed. This allows to get rid of the duplicated reference to
EVP_PKEYs, the one in the jwt tree entry and the one in the ckch_store.
BUG/MINOR: ssl: Free global_ssl structure contents during deinit
Some fields of the global_ssl structure are strings that are strdup'ed
but never freed. There is only one static global_ssl structure so not
much memory is used but we might as well free it during deinit.
This patch can be backported to all stable branches.
BUG/MEDIUM: apppet: Improve spinning loop detection with the new API
Conditions to detect the spinning loop for applets based on the new API are
not accurrate. We cannot continue to check the channel's buffers state to
know if an applet has made some progress. At least, we must also check the
applet's buffers.
After digging to find the right way to do, it was clear that the best is to
use something similar to what is performed for the streams, namely, checking
read and write events. And in fact, it is quite easy to do with the new
API. So let's do so.
Willy Tarreau [Fri, 10 Oct 2025 09:28:35 +0000 (11:28 +0200)]
BUILD: makefile: disable tail calls optimizations with memory profiling
The purpose of memory profiling precisely is to figure what function
allocates and what function frees for specific objects. It turns out
that a non-negligible number of release callbacks basically do nothing
but a free() or pool_free() call and return, which the compiler happily
turns into a jump, making the caller of that callback appear as the
real one. That's how we can see libcrypto release to pools such as
ssl-capture for example, which also makes the per-DSO calls appear
wrong:
Worse, as can be seen on the last line above, there can be a single pool
per call place (since we don't release to arbitrary pools), and the stats
are misleading by reporting the first used pool only when a same function
can call multiple release callbacks. This is why the free call totals
10k ssl-capture and 10072 ssl-keylogfile.
Let's just disable tail call optimization when using memory profiling.
The gains are only very marginal and complicate so much the debugging
that it's not worth it. Now the output is correct, and no longer claims
that libcrypto is the caller:
An attempt was made to only instrument pool_free() to place a compiler
barrier, but that resulted in much larger code and wouldn't cover
functions ending with a simple "free()" call. "ha_free()" however is
already immune against tail call optimization since it has to write
the NULL when returning from free().
This should be backported to recent stable releases that are still
regularly being debugged.
For now, no applets are using the <kop> value when consuming data. At least,
as far as I know. But it remains a good idea to keep the applet API
compatible. So now, the <kip> of the opposite side is properly forwarded to
applets.
BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor
By refactoring the HTX to remove the extra field, a bug was introduced in
the stream-connector part. The <kip> (known input payload) value of a sedesc
was moved to <kop> (knwon output payload) using the same sedesc. Of course,
this is totally wrong. <kip> value of a sedesc must be forwarded to the
opposite side.
In addition, the operation is performed in sc_conn_send(). In this function,
we manipulate the stream-connectors. So se_fwd_kip() function was changed to
use the stream-connectors directely.
Now, the function sc_ep_fwd_kip() is now called with the both
stream-connectors to properly forward <kip> from on side to the opposite
side.
Willy Tarreau [Thu, 9 Oct 2025 16:47:54 +0000 (18:47 +0200)]
BUG/MINOR: ssl: always clear the remains of the first hello for the second one
William rightfully pointed that despite the ssl capture being a
structure, some of its entries are only set for certain contents,
so we need to always zero it before using it so as to clear any
remains of a previous use, otherwise we could possibly report some
entries that were only present in the first hello and not the second
one. No need to clear the data though, since any remains will not be
referenced by the fields.
This must be backported wherever commit 336170007c ("BUG/MEDIUM: ssl:
take care of second client hello") is backported.
Willy Tarreau [Thu, 9 Oct 2025 14:13:18 +0000 (16:13 +0200)]
BUG/MEDIUM: ssl: take care of second client hello
For a long time we've been observing some sporadic leaks of ssl-capture
pool entries on haproxy.org without figuring exactly the root cause. All
that was seen was that less calls to the free callback were made than
calls to the hello parsing callback, and these were never reproduced
locally.
It recently turned out to be triggered by the presence of "curves" or
"ecdhe" on the "bind" line. Captures have shown the presence of a second
client hello, called "Change Cipher Client Hello" in wireshark traces,
that calls the client hello callback again. That one wasn't prepared for
being called twice per connection, so it allocates an ssl-capture entry
and assigns it to the ex_data entry, possibly overwriting the previous
one.
In this case, the fix is super simple, just reuse the current ex_data
if it exists, otherwise allocate a new one. This completely solves the
problem.
Other callbacks have been audited for the same issue and are not
affected: ssl_ini_keylog() already performs this check and ignores
subsequent calls, and other ones do not allocate data.
This must be backported to all supported versions.
MEDIUM: acme: don't insert acme account key in ckchs_tree
Don't insert the acme account key in the ckchs_tree anymore. ckch_store
are not made to only include a private key. CLI operations are not
possible with them either. That doesn't make much sense to keep it that
way until we rework the ckch_store.
Thanks for previous changes, it is now possible to remove the <extra> field
from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because
it is now unsued.
MINOR: stconn: Move data from kip to kop when data are sent to the consumer
When data are sent to the consumer, the known output payload length is
updated using the known input payload length value and this last one is then
reset. se_fwd_kip() function is used for this purpose.
MINOR: h3: Set known input payload length of the sedesc
Set <kip> value when data are transfer to the upper layer, in h3_rcv_buf().
The difference between the known length of the payload before and after a
parsing loop is added to <kip> value. When a content-length is specified in
the message, the h3s <body_len> field is used. Otherwise, it is the h3s
<data_len> field.
MINOR: mux-h2: Set known input payload length of the sedesc
Set <kip> value when data are transfer to the upper layer, in h2_rcv_buf().
The new <body_len> filed of the H2S is used to increment <kip> value and
then it is reset. The patch relies on the previous one ("MINOR: mux-h2: Save
the known length of the payload").
MINOR: mux-h2: Use <body_len> H2S field for payload without content-length
Before, the <body_len> H2S field was only use for verity the annonced
content-lenght value was respected. Now, this field is used for all
messages. Messages with a content-length are still handled the same way.
<body_len> is set to the content-length value and decremented by the size of
each DATA frame. For other messages, the value is initialized to ULLONG_MAX
and still decremented by the size of each DATA frame. This change is
mandatory to properly define the known input payload length value of the
sedesc.
MINOR: mux-fcgi: Set known input payload length during demux
Set <kip> value during the response parsing. The difference between the body
length before and after a parsing loop is added. The patch relies on the
previous one ("MINOR: h1-htx: Increment body len when parsing a payload with
no xfer length").
MINOR: mux-h1: Set known input payload length during demux
Set <kip> value during the message parsing. The difference between the body
length before and after a parsing loop is added. The patch relies on the
previous one ("MINOR: h1-htx: Increment body len when parsing a payload with
no xfer length").
MINOR: h1-htx: Increment body len when parsing a payload with no xfer length
In the H1 parseur, the body length was only incremented when the transfer
length was known. So when the content-length was specified or when the
transfer-encoding value was set to "chunk".
Now for messages with unknown transfer length, it is also incremented. It is
mandatory to be able to remove the extra field from the HTX message.
MINOR: stconn: Add two fields in sedesc to replace the HTX extra value
For now, the HTX extra value is used to specify the known part, in bytes, of
the HTTP payload we will receive. It may concerne the full payload if a
content-length is specified or the current chunk for a chunk-encoded
message. The main purpose of this value is to be used on the opposite side
to be able to announce chunks bigger than a buffer. It can also be used to
check the validity of the payload on the sending path, to properly detect
too big or too short payload.
However, setting this information in the HTX message itself is not really
appropriate because the information is lost when the HTX message is consumed
and the underlying buffer released. So the producer must take care to always
add it in all HTX messages. it is especially an issue when the payload is
altered by a filter.
So to fix this design issue, the information will be moved in the sedesc. It
is a persistent area to save the information. In addition, to avoid the
ambiguity between what the producer say and what the consumer see, the
information will be splitted in two fields. In this patch, the fields are
added:
* kip : The known input payload length
* kop : The known output payload lenght
The producer will be responsible to set <kip> value. The stream will be
responsible to decrement <kip> and increment <kop> accordingly. And the
consumer will be responsible to remove consumed bytes from <kop>.
MINOR: h3/qmux: Set QC_SF_UNKNOWN_PL_LENGTH flag on QCS when headers are sent
QC_SF_UNKNOWN_PL_LENGTH flag is set on the qcs to know a payload of message
has an unknown length and not send a RESET_STREAM on shutdown. This flag was
based on the HTX extra field value. However, it is not necessary. When
headers are processed, before sending them, it is possible to check the HTX
start-line to know if the length of the payload is known or not.
So let's do so and don't use anymore the HTX extra field for this purpose.
Willy Tarreau [Wed, 8 Oct 2025 08:32:33 +0000 (10:32 +0200)]
MAJOR: proxy: enable abortonclose by default on TLS listeners
In the continuity of https://github.com/orgs/haproxy/discussions/3146,
we must also enable abortonclose by default for TLS listeners so as not
to needlessly compute TLS handshakes on dead connections. The change is
very small (just set the default value to 1 in the TLS code when neither
the option nor its opposite were set).
It may possibly cause some TLS handshakes to start failing with 3.3 in
certain legacy environments (e.g. TLS health-checks performed using only
a client hello and closing afterwards), and in this case it is sufficient
to disable the option using "no option abortonclose" in either the
affected frontend or the "defaults" section it derives from.
Willy Tarreau [Wed, 8 Oct 2025 08:27:45 +0000 (10:27 +0200)]
MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default
With this function we can now pass the desired default value for the
abortonclose option when neither the option nor its opposite were set.
Let's also take this opportunity for using it directly from the HTTP
analyser since there's no point in re-checking the proxy's mode there.
Willy Tarreau [Wed, 8 Oct 2025 08:18:35 +0000 (10:18 +0200)]
MAJOR: proxy: enable abortonclose by default on HTTP proxies
As discussed on https://github.com/orgs/haproxy/discussions/3146 and on
the mailing list, there's a marked preference for having abortonclose
enabled by default when relevant. The point being that with todays'
internet, the large majority of requests sent with a closed input
channel are aborted requests, and that it's pointless to waste resources
processing them.
This patch now considers both "option abortonclose" and its opposite
"no option abortonclose" to figure whether abortonclose is enabled or
disabled in a backend. When neither are set (thus not even inherited
from a defaults section), then it considers the proxy's mode, and HTTP
mode implies abortonclose by default.
This may make some legacy services fail starting with 3.3. In this case
it will be sufficient to add "no option abortonclose" in either the
affected backend or the defaults section it derives from. But for
internet-facing proxies it's better to stay with the option enabled.
Willy Tarreau [Tue, 7 Oct 2025 13:36:54 +0000 (15:36 +0200)]
MINOR: proxy: only check abortonclose through a dedicated function
In order to prepare for changing the way abortonclose works, let's
replace the direct flag check with a similarly named function
(proxy_abrt_close) which returns the on/off status of the directive
for the proxy. For now it simply reflects the flag's state.
Willy Tarreau [Tue, 7 Oct 2025 15:03:35 +0000 (17:03 +0200)]
REGTESTS: http-messaging: fix the websocket and upgrade tests not to close early
By default when building an H2 request, vtest sets the END_STREAM flag
on the HEADERS frame. This is problematic with the websocket and proto
upgrade tests since we're using CONNECT, because it immediately closes
afterwards, which does not correspond to what we're testing. Doing this
in abortonclose mode rightfully produces an error. Let's fix the test
so as not to set the flag on the HEADERS frame. However, doing so means
we'll receive a window update that we must also accept. Now the test
works both with and without abortonclose.
Willy Tarreau [Tue, 7 Oct 2025 14:34:51 +0000 (16:34 +0200)]
REGTESTS: fix h2_desync_attacks to wait for the response
Tests with abortonclose showed a bug with this test where the client
would close the stream immediately after sending the request, without
waiting for the response, causing some random failures on the server
side.
Willy Tarreau [Wed, 8 Oct 2025 06:34:43 +0000 (08:34 +0200)]
MINOR: proxy: explicitly permit abortonclose on frontends and clarify the doc
The "abortonclose" option was recently deprecated in frontends because its
action was essentially limited to the backend part (queuing etc). But in
3.3 we started to support it for TLS on frontends, though it would only
work when placed in a defaults section. Let's officially support it in
frontends, and take this opportunity to clarify the documentation on this
topic, which was incomplete regarding frontend and TLS support. Now the
doc tries to better cover the different use cases.
Willy Tarreau [Wed, 8 Oct 2025 02:35:52 +0000 (04:35 +0200)]
DEV: patchbot: use git reset+checkout instead of pull
The patchbot stopped on a previous ultra-rare forced push due to wanting
the user's name and e-mail before proceeding. We don't want merges nor
rebases anyway, only to reset the tree to the next one, so let's do that.