Amaury Denoyelle [Wed, 28 Jan 2026 09:37:38 +0000 (10:37 +0100)]
MEDIUM: ssl: remove connection from msg callback args
SSL msg callbacks are used for notification about sent/received SSL
messages. Such callbacks are registered via
ssl_sock_register_msg_callback().
Prior to this patch, connection was passed as first argument of these
callbacks. However, most of them do not use it. Worst, this may lead to
confusion as connection can be NULL in QUIC context.
This patch cleans this by removing connection argument. As an
alternative, connection can be retrieved in callbacks if needed using
ssl_sock_get_conn() but the code must be ready to deal with potential
NULL instances. As an example, heartbeat parsing callback has been
adjusted in this manner.
Amaury Denoyelle [Wed, 28 Jan 2026 08:53:40 +0000 (09:53 +0100)]
BUG/MEDIUM: ssl: fix msg callbacks on QUIC connections
With QUIC backend implementation, SSL code has been adjusted in several
place when accessing connection instance. Indeed, with QUIC usage, SSL
context is tied up to quic_conn, and code may be executed prior/after
connection instantiation. For example, on frontend side, connection is
only created after QUIC handshake completion.
The following patch tried to fix unsafe accesses to connection. In
particular, msg callbacks are not called anymore if connection is NULL.
However, most msg callbacks do not need to use the connection instance.
The only occurence where it is accessed is for heartbeat message
parsing, which is the only case of crash solved. The above fix is too
restrictive as it completely prevents execution of these callbacks when
connection is unset. This breaks several features with QUIC, such as SSL
key logging or samples based on ClientHello capture.
The current patch reverts the above one. Thus, this restores invokation
of msg callbacks for QUIC during the whole low-level connection
lifetime. This requires a small adjustment in heartbeat parsing callback
to prevent access on a NULL connection.
The issue on ClientHello capture was mentionned in github issue #2495.
Willy Tarreau [Thu, 29 Jan 2026 10:07:55 +0000 (11:07 +0100)]
BUG/MINOR: config/ssl: fix spelling of "expose-experimental-directives"
The help message for "ktls" mentions "expose-experimental-directive"
without the final 's', which is particularly annoying when copy-pasting
the directive from the error message directly into the config.
there exist some agents which mistakenly accept CRLF inside quoted
chunk extensions, making it possible to fool them by injecting one
extra chunk they won't see for example, or making them miss the end
of the body depending on how it's done. Haproxy, like most other
agents nowadays, doesn't care at all about chunk extensions and just
drops them, in agreement with the spec.
However, as discussed, since chunk extensions are basically never used
except for attacks, and that the cost of just matching quote pairs and
checking backslashed quotes is escape consistency remains relatively
low, it can make sense to add such a check to abort the message parsing
when this situation is encountered. Note that it has to be done at two
places, because there is a fast path and a slow path for chunk parsing.
Also note that it *will* cause transfers using improperly formatted chunk
extensions to fail, but since these are really not used, and that the
likelihood of them being used but improperly quoted certainly is much
lower than the risk of crossing a broken parser on the client's request
path or on the server's response path, we consider the risk as
acceptable. The test is not subject to the configurable parser exceptions
and it's very unlikely that it will ever be needed.
Since this is done in 3.4 which will be LTS, this patch will have to be
backported to 3.3 so that any unlikely trouble gets a chance to be
detected before users upgrade to 3.4.
Thanks to Ben for the discussion, and to Rajat Raghav for sparking it
in the first place even though the original report was mistaken.
Cc: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> Cc: Rajat Raghav <xclow3n@gmail.com> Cc: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Wed, 28 Jan 2026 16:18:50 +0000 (17:18 +0100)]
DOC: config: mention that idle connection sharing is per thread-group
There's already a tunable "tune.idle-pool.shared" allowing to enable or
disable idle connection sharing between threads. However the doc does not
mention that these connections are only shared between threads of the same
thread group, since 2.7 with commit 15c5500b6e ("MEDIUM: conn: make
conn_backend_get always scan the same group"). Let's clarify this and
also give a hint about "max-threads-per-group" which can be helpful for
machines with unified caches.
Willy Tarreau [Wed, 28 Jan 2026 15:59:40 +0000 (15:59 +0000)]
OPTIM: server: get rid of the last use of _ha_barrier_full()
The code in srv_add_to_idle_list() has its roots in 2.0 with commit 9ea5d361ae ("MEDIUM: servers: Reorganize the way idle connections are
cleaned."). At this era we didn't yet have the current set of atomic
load/store operations and we used to perform loads using volatile casts
after a barrier. It turns out that this function has kept this schema
over the years, resulting in a big mfence stalling all the pipeline
in the function:
Switching these for a pair of atomic loads got rid of this and brought
0.5 to 3% extra performance depending on the tests due to variations
elsewhere, but it has never been below 0.5%. Note that the second load
doesn't need to be atomic since it's protected by the lock, but it's
cleaner from an API and code review perspective. That's also why it's
relaxed.
This was the last user of _ha_barrier_full(), let's try not to
reintroduce it now!
Willy Tarreau [Wed, 28 Jan 2026 10:57:25 +0000 (10:57 +0000)]
OPTIM: proxy: separate queues fields from served
There's still a lot of contention when accessing the backend's
totpend and queueslength for every request in may_dequeue_tasks(),
even when queues are not used. This only happens because it's stored
in the same cache line as >beconn which is being written by other
threads:
0.01 | call sess_change_server
0.02 | mov 0x188(%r15),%esi ## s->queueslength
| if (may_dequeue_tasks(srv, s->be))
0.00 | mov 0xa8(%r12),%rax
0.00 | mov -0x50(%rbp),%r11d
0.00 | mov -0x60(%rbp),%r10
0.00 | test %esi,%esi
| jne 3349
0.01 | mov 0xa00(%rax),%ecx ## p->queueslength
8.26 | test %ecx,%ecx
4.08 | je 288d
This patch moves queueslength and totpend to their own cache line,
thus adding 64 bytes to the struct proxy, but gaining 3.6% of RPS
on a 64-core EPYC thanks to the elimination of this false sharing.
process_stream() goes down from 3.88% to 3.26% in perf top, with
the next top users being inc/dec (s->served) and be->beconn.
Willy Tarreau [Wed, 28 Jan 2026 10:38:22 +0000 (10:38 +0000)]
OPTIM: server: move queueslength in server struct
This field is shared by all threads and must be in the shared area
instead, because where it's placed, it slows down access to other
fields of the struct by false sharing. Just moving this field gives
a steady 2% gain on the request rate (1.93 to 1.96 Mrps) on a 64-core
EPYC.
Willy Tarreau [Wed, 28 Jan 2026 09:42:37 +0000 (10:42 +0100)]
DOC: config: mention some possible TLS versions restrictions for kTLS
It took me one hour of trial and fail to figure that kTLS and splicing
were not used only for reasons of TLS version, and that switching to
TLS v1.2 solved the issue. Thus, let's mention it in the doc so that
others find it more easily in the future.
MINOR: ssl: allow to disable certificate compression
This option allows to disable the certificate compression (RFC 8879)
using OpenSSL >= 3.2.0.
This feature is known to permit some denial of services by causing extra
memory allocations of approximately 22MiB and extra CPU work per
connection with OpenSSL versions affected by CVE-2025-66199.
( https://openssl-library.org/news/vulnerabilities/index.html#CVE-2025-66199 )
Setting this to "off" permits to mitigate the problem.
BUG/MAJOR: applet: Don't call I/O handler if the applet was shut
In 3.0, it was stated an applet could not be woken up after it was shutdown.
So the corresponding test in the applets I/O handler was removed. However,
it seems it may happen, especially when outgoing data are blocked on the
opposite side. But it is really unexpected because the "release" callback
function was already called and the appctx context was most probably
released.
Strangely, it was never detected by any applet till now. But the Prometheus
exporter was never updated and was still testing the shutdown. But when it
was refactored to use the new applet API in 3.3, the test was removed. And
this introduced a regression leading a crash because a server object could
be corrupted. Conditions to hit the bug are not really clear however.
So, now, to avoid any issue with all other applets, the test is performed in
task_process_applet(). The I/O handler is no longer called if the applet is
already shut.
The same is performed for applets still relying on the old API.
An amazing thanks to @idl0r for his invaluable help on this issue !
This patch should fix the issue #3244. It should first be backported to 3.3
and then slowly as far as 3.0.
BUG/MINOR: ssl: Encrypted keys could not be loaded when given alongside certificate
The SSL passphrase callback function was only called when loading
private keys from a dedicated file (separate from the corresponding
certificate) but not when both the certificate and the key were in the
same file.
We can now load them properly, regardless of how they are provided.
A flas had to be added in the 'passphrase_cb_data' structure because in
the 'ssl_sock_load_pem_into_ckch' function, when calling
'PEM_read_bio_PrivateKey' there might be no private key in the PEM file
which would mean that the callback never gets called (and cannot set the
'passphrase_idx' to -1).
BUG/MINOR: ssl: Properly manage alloc failures in SSL passphrase callback
Some error paths in 'ssl_sock_passwd_cb' (allocation failures) did not
set the 'passphrase_idx' to -1 which is the way for the caller to know
not to call the callback again so in some memory contention contexts we
could end up calling the callback 'infinitely' (or until memory is
finally available).
Willy Tarreau [Mon, 26 Jan 2026 10:18:04 +0000 (11:18 +0100)]
MEDIUM: pools: better check for size rounding overflow on registration
Certain object sizes cannot be controlled at declaration time because
the resulting object size may be slightly extended (tag, caller),
aligned and rounded up, or even doubled depending on pool settings
(e.g. if backup is used).
This patch addresses this by enlarging the type in the pool registration
to 64-bit so that no info is lost from the declaration, and extra checks
for overflows can be performed during registration after various rounding
steps. This allows to catch issues such as these ones and to report a
suitable error:
Willy Tarreau [Mon, 26 Jan 2026 10:31:24 +0000 (11:31 +0100)]
BUG/MINOR: stick-tables: abort startup on stk_ctr pool creation failure
Since 3.3 with commit 945aa0ea82 ("MINOR: initcalls: Add a new initcall
stage, STG_INIT_2"), stkt_late_init() calls stkt_create_stk_ctr_pool()
but doesn't check its return value, so if the pool creation fails, the
process still starts, which is not correct. This patch adds a check for
the return value to make sure we fail to start in this case. This was
not an issue before 3.3 because the function was called as a post-check
handler which did check for errors in the returned values.
Willy Tarreau [Mon, 26 Jan 2026 10:13:29 +0000 (11:13 +0100)]
BUG/MINOR: config: check capture pool creations for failures
A few capture pools can fail in case of too large values for example.
These include the req_uri, capture, and caphdr pools, and may be triggered
with "tune.http.logurilen 2147483647" in the global section, or one of
these in a frontend:
capture request header name len 2147483647
http-request capture src len 2147483647
tcp-request content capture src len 2147483647
These seem to be the only occurrences where create_pool()'s return value
is assigned without being checked, so let's add the proper check for
errors there. This can be backported as a hardening measure though the
risks and impacts are extremely low.
BUG/MEDIUM: mux-h1: Skip UNUSED htx block when formating the start line
UNUSED blocks were not properly handled when the H1 multiplexer was
formatting the start line of a request or a response. UNUSED was ignored but
not removed from HTX message. So the mux can loop infinitly on such block.
It could be seen a a major issue but in fact it happens only if a very
specific case on the reponse processing (at least I think so): the server
must send an interim message (a 100-continue for intance) with the final
response. HAProxy must receive both in same time and the final reponse must
be intercepted (via a http-response return action for instance), In that
case, the interim message is fowarded and the server final reponse is
removed and replaced by a proxy error message.
Now UNUSED htx blocks are properly skipped and removed.
BUG/MINOR: promex: Detach promex from the server on error dump its metrics dump
If an error occurres during the dump of a metric for a server, we must take
care to detach promex from the watcher list for this server. It must be
performed explicitly because on error, the applet state (st1) is changed, so
it is not possible to detach it during the applet release stage.
This patch must be backported with b4f64c0ab ("BUG/MEDIUM: promex: server
iteration may rely on stale server") as far as 3.0. On older versions, 2.8
and 2.6, the watcher_detach() line must be changed by "srv_drop(ctx->p[1])".
BUG/MINOR: hlua: consume error object if ignored after a failing lua_pcall()
We frequently use lua_pcall() to provide safe alternative functions
(encapsulated helpers) that prevent the process from crashing in case
of Lua error when Lua is executed from an unsafe environment.
However, some of those safe helpers don't handle errors properly. In case
of error, the Lua API will always put an error object on top of the stack
as stated in the documentation. This error object can be used to retrieve
more info about the error. But in some cases when we ignore it, we should
still consume it to prevent the stack from being altered with an extra
object when returning from the helper function.
It should be backported to all stable versions. If the patch doesn't apply
automatically, all that's needed is to check for lua_pcall() in hlua.c
and for other cases than 'LUA_OK', make sure that the error object is popped
from the stack before the function returns.
BUG/MEDIUM: hlua: fix invalid lua_pcall() usage in hlua_traceback()
Since commit 365ee28 ("BUG/MINOR: hlua: prevent LJMP in hlua_traceback()")
we now use lua_pcall() to protect sensitive parts of hlua_traceback()
function, and this to prevent Lua from crashing the process in case of
unexpected Lua error.
This is still relevant, but an error was made, as lua_pcall() was given
the nresult argument '1' when _hlua_traceback() internal function
doesn't push any argument on the stack. Because of this, it seems Lua
API still tries to push garbage object on top of the stack before
returning. This may cause functions that leverage hlua_traceback() in
the middle of stack manipulation to end up having a corrupted stack when
continuing after the hlua_traceback().
There doesn't seem to be many places where this could be a problem, as
this was discovered using the reproducer documented in f535d3e
("BUG/MEDIUM: debug: only dump Lua state when panicking"). Indeed, when
hlua_traceback() was used from the signal handler while the thread was
previously executing Lua, when returning to Lua after the handler the
Lua stack would be corrupted.
To fix the issue, we emphasize on the fact that the _hlua_traceback()
function doesn't push anything on the stack, returns 0, thus lua_pcall()
is given 0 'nresult' argument to prevent anything from being pushed after
the execution, preserving the original stack state.
This should be backported to all stable versions (because 365ee28 was
backported there)
Willy Tarreau [Thu, 22 Jan 2026 18:02:54 +0000 (19:02 +0100)]
[RELEASE] Released version 3.4-dev3
Released version 3.4-dev3 with the following main changes :
- BUILD: ssl: strchr definition changed in C23
- BUILD: tools: memchr definition changed in C23
- BUG/MINOR: cfgparse: wrong section name upon error
- MINOR: cfgparse: Refactor "userlist" parser to print it in -dKall operation
- BUILD: sockpair: fix build issue on macOS related to variable-length arrays
- BUG/MINOR: cli/stick-tables: argument to "show table" is optional
- REGTESTS: ssl: Fix reg-tests curve check
- CI: github: remove ERR=1 temporarly from the ECH job
- BUG/MINOR: ech/quic: enable ech configuration also for quic listeners
- MEDIUM: config: warn if some userlist hashes are too slow
- MINOR: cfgparse: remove duplicate "force-persist" in common kw list
- MINOR: sample: also support retrieving fc.timer.handshake without a stream
- MINOR: tcp-sample: permit retrieving tcp_info from the connection/session stage
- CLEANUP: connection: Remove outdated note about CO_FL `0x00002000` being unused
- MINOR: receiver: Dynamically alloc the "members" field of shard_info
- MINOR: stats: Increase the tgid from 8bits to 16bits
- BUG/MINOR: stats-file: Use a 16bits variable when loading tgid
- BUG/MINOR: hlua_fcn: fix broken yield for Patref:add_bulk()
- BUG/MINOR: hlua_fcn: ensure Patref:add_bulk() is given a table object before using it
- BUG/MINOR: net_helper: fix IPv6 header length processing
- MEDIUM: counters: Dynamically allocate per-thread group counters
- MEDIUM: counters: Remove some extra tests
- BUG/MEDIUM: threads: Fix binding thread on bind.
- BUG/MEDIUM: quic: fix ACK ECN frame parsing
- MEDIUM: counters: mostly revert da813ae4d7cb77137ed
- BUG/MINOR: http_act: fix deinit performed on uninitialized lf_expr in release_http_map()
- MINOR: queues: Turn non_empty_tgids into a long array.
- MINOR: threads: Eliminate all_tgroups_mask.
- BUG/MEDIUM: queues: Fix arithmetic when feeling non_empty_tgids
- MEDIUM: thread: Turn the group mask in thread set into a group counter
- BUG/MINOR: proxy: free persist_rules
- MEDIUM: stream: refactor switching-rules processing
- REGTESTS: add test on backend switching rules selection
- MEDIUM: proxy: do not select a backend if disabled
- MEDIUM: proxy: implement publish/unpublish backend CLI
- MINOR: stats: report BE unpublished status
- MINOR: cfgparse: adapt warnif_cond_conflicts() error output
- MEDIUM: proxy: force traffic on unpublished/disabled backends
- MINOR: ssl: Factorize AES GCM data processing
- MINOR: ssl: Add new aes_cbc_enc/_dec converters
- REGTESTS: ssl: Add tests for new aes cbc converters
- MINOR: jwe: Add new jwt_decrypt_secret converter
- MINOR: jwe: Add new jwt_decrypt_cert converter
- REGTESTS: jwe: Add jwt_decrypt_secret and jwt_decrypt_cert tests
- DOC: jwe: Add doc for jwt_decrypt converters
- MINOR: jwe: Some algorithms not supported by AWS-LC
- REGTESTS: jwe: Fix tests of algorithms not supported by AWS-LC
- BUG/MINOR: cfgparse: fix "default" prefix parsing
- REORG/MINOR: cfgparse: eliminate code duplication by lshift_args()
- MEDIUM: systemd: implement directory loading
- CI: github: switch monthly Fedora Rawhide build to OpenSSL
- SCRIPTS: build-ssl: use QUICTLS_VERSION instead of QUICTLS=yes
- CI: github: define the right quictls version in each jobs
- CI: github: fix vtest.yml with "not quictls"
- MINOR: cli: use srv_drop() when server was created using new_server()
- BUG/MINOR: server: ensure server is detached from proxy list before being freed
- BUG/MEDIUM: promex: server iteration may rely on stale server
- SCRIPTS: build-ssl: clone the quictls branch directly
- SCRIPTS: build-ssl: fix quictls build for 1.1.1 versions
- BUG/MEDIUM: log: parsing log-forward options may result in segfault
- DOC: proxy-protocol: Add SSL client certificate TLV
- DOC: fix typos in the documentation files
- DOC: fix mismatched quotes typos around words in the documentation files
- REORG: cfgparse: move peers parsing to cfgparse-peers.c
- MINOR: tools: add chunk_escape_string() helper function
- MINOR: vars: store variable names for runtime access
- MINOR: vars: implement dump_all_vars() sample fetch
- DOC: vars: document dump_all_vars() sample fetch
- BUG/MEDIUM: ssl: fix error path on generate-certificates
- BUG/MEDIUM: ssl: fix generate-certificates option when SNI greater than 64bytes
- BUG/MEDIUM: mux-quic: prevent BUG_ON() on aborted uni stream close
- REGTESTS: ssl: fix generate-certificates w/ LibreSSL
- SCRIPTS: build: enable symbols in AWS-LC builds
- BUG/MINOR: proxy: fix deinit crash on defaults with duplicate name
- BUG/MEDIUM: debug: only dump Lua state when panicking
- MINOR: proxy: remove proxy_preset_defaults()
- MINOR: proxy: refactor defaults proxies API
- MINOR: proxy: simplify defaults proxies list storage
- MEDIUM: cfgparse: do not store unnamed defaults in name tree
- MEDIUM: proxy: implement persistent named defaults
Amaury Denoyelle [Thu, 18 Dec 2025 17:09:13 +0000 (18:09 +0100)]
MEDIUM: proxy: implement persistent named defaults
This patch changes the handling of named defaults sections. Prior to
this patch, every unreferenced defaults proxies were removed on post
parsing. Now by default, these sections are kept after postparsing and
only purged on deinit. The objective is to allow reusing them as base
configuration for dynamic backends.
To implement this, refcount of every still addressable named sections is
incremented by one after parsing. This ensures that they won't be
removed even if referencing proxies are removed at runtime. This is done
via the new function proxy_ref_all_defaults().
To ensure defaults instances are still properly removed on deinit, the
inverse operation is performed : refcount is decremented by one on every
defaults sections via proxy_unref_all_defaults().
The original behavior can still be used by using the new global keyword
tune.defaults.purge. This is useful for users using configuration with
large number of defaults and not interested in dynamic backends
creation.
Amaury Denoyelle [Wed, 21 Jan 2026 09:22:23 +0000 (10:22 +0100)]
MEDIUM: cfgparse: do not store unnamed defaults in name tree
Defaults section are indexed by their name in defproxy_by_name tree. For
named sections, there is no duplicate : if two instances have the same
name, the older one is removed from the tree. However, this was not the
case for unnamed defaults which are all stored inconditionnally in
defproxy_by_name.
This commit introduces a new approach for unnamed defaults. Now, these
instances are never inserted in the defproxy_by_name tree. Indeed, this
is not needed as no tree lookup is performed with empty names. This may
optimize slightly config parsing with a huge number of named and unnamed
defaults sections, as the first ones won't fill up the tree needlessly.
However, defproxy_by_name tree is also used to purge unreferenced
defaults instances, both on postparsing and deinit. Thus, a new approach
is needed for unnamed sections cleanup. Now, each time a new defaults is
parsed, if the previous instance is unnamed, it is freed unless if
referenced by a proxy. When config parsing is ended, a similar operation
is performed to ensure the last unnamed defaults section won't stay in
memory. To implement this, last_defproxy static variable is now set to
global. Unnamed sections which cannot be removed due to proxies
referencing proxies will still be removed when such proxies are freed
themselves, at runtime or on deinit.
Amaury Denoyelle [Tue, 20 Jan 2026 13:33:46 +0000 (14:33 +0100)]
MINOR: proxy: simplify defaults proxies list storage
Defaults proxies instance are stored in a global name tree. When there
is a name conflict and the older entry cannot be simply discarded as it
is already referenced, the older entry is instead removed from the name
tree and inserted into the orphaned list.
The purpose of the orphaned list was to guarantee that any remaining
unreferenced defaults are purged either on postparsing or deinit.
However, this is in fact completely useless. Indeed on postparsing,
orphaned entries are always referenced. On deinit instead, defaults are
already freed along the cleanup of all frontend/backend instances clean
up, thanks to their refcounting.
This patch streamlines this by removing orphaned list. Instead, a
defaults section is inserted into a new global defaults_list during
their whole lifetime. This is not strictly necessary but it ensures that
defaults instances can still be accessed easily in the future if needed
even if not present in the name tree. On deinit, a BUG_ON() is added to
ensure that defaults_list is indeed emptied.
Another benefit from this patch is to simplify the defaults deletion
procedure. Orphaned simple list is replaced by a proper double linked
list implementation, so a single LIST_DELETE() is now performed. This
will be notably useful as defaults may be removed at runtime in the
future if backends deletion at runtime is implemented.
Amaury Denoyelle [Thu, 22 Jan 2026 10:16:14 +0000 (11:16 +0100)]
MINOR: proxy: refactor defaults proxies API
This patch renames functions which deal with defaults section. A common
"defaults_px_" prefix is defined. This serves as a marker to identify
functions which can only be used with proxies defaults capability. New
BUG_ON() are enforced to ensure this is valid.
Also, older proxy_unref_or_destroy_defaults() is renamed
defaults_px_detach().
Amaury Denoyelle [Tue, 20 Jan 2026 10:41:37 +0000 (11:41 +0100)]
MINOR: proxy: remove proxy_preset_defaults()
Function proxy_preset_defaults() purpose has evolved over time.
Originally, it was only used to initialize defaults proxies instances.
Until today, it was extended so that all proxies use it. Its objective
is to initialize settings to common default values.
To remove the confusion, this function is now removed. Its content is
integrated directly into init_new_proxy().
Willy Tarreau [Thu, 22 Jan 2026 11:01:22 +0000 (12:01 +0100)]
BUG/MEDIUM: debug: only dump Lua state when panicking
For a long time, we've tried to show the Lua state and backtrace when
dumping threads so as to be able to figure is (and which) Lua code was
misbehaving, e.g. by performing expensive library calls. Since 3.1 with
commit 365ee28510 ("BUG/MINOR: hlua: prevent LJMP in hlua_traceback()"),
it appears that the approach is more fragile (though that fix addressed
a real issue about out-of-memory), and it's possible to occasionally
observe crashes or CPU loops with "show threads" while running Lua
heavily. While users of "show threads" are rare, the watchdog warnings,
which were also enabled on 3.1, also trigger these issues, which is
even more of a concern.
This patch goes the simple way to address this for now: since the purpose
of the Lua backtrace was to help locate Lua call places upon a panic,
let's only call the backtrace on panic but not in other situations. After
a panic we obviously don't care that the Lua stack might be corrupted
since it's never going to be resumed anyway. This may be relaxed in the
future if a solution is found to reliably produce harmless Lua backtraces.
The commit above was backported to all stable branches, so this patch
will be needed everywhere. However, TAINTED_PANIC only appeared in 2.8,
and given the rarety of this bug before 3.1, it's probably not needed
to make any extra effort to go beyond 2.8.
It's easy enough to test a version for being subject to this issue,
by running the following Lua code:
local function stress(txn)
for _, backend in pairs(core.backends) do
for _, server in pairs(backend.servers) do
local stats = server:get_stats()
end
end
end
core.register_fetches("stress", stress)
in the following config file:
global
stats socket /tmp/haproxy.stat level admin mode 666
tune.lua.bool-sample-conversion normal
lua-load-per-thread "stress.lua"
listen stress
bind :8001
mode http
timeout client 5s
timeout server 5s
timeout connect 5s
http-request return status 200 content-type text/plain lf-string %[lua.stress()]
server s1 127.0.0.1:8000
and stressing port 8001 with 100+ connections requesting / in loop, then
issuing "show threads" on the CLI using socat in loops as well. Normally
it instantly segfaults (sometimes during the first "show").
Amaury Denoyelle [Thu, 22 Jan 2026 14:20:31 +0000 (15:20 +0100)]
BUG/MINOR: proxy: fix deinit crash on defaults with duplicate name
A defaults proxy instance may be move into the orphaned list when it is
replaced by a newer section with the same name. This is attached via
<next> member as a single linked list entry. However, proxy free does
not clear <next> attach point.
This causes a crash on deinit if orphaned list is not empty. First, all
frontend/backend instances are freed. This triggers the release of every
referenced defaults instances as their refcount reach zero, but orphaned
list is not clean up. A loop is then conducted on orphaned list via
proxy_destroy_all_unref_defaults(). This causes a segfault due to access
on already freed entries.
To fix this, this patch extends proxy_destroy_defaults(). If orphaned
list is not empty, a loop is performed to remove a possible entry of the
currently released defaults instance. This ensures that loop over
orphaned list won't be able to access to already freed entries.
This bug is pretty rare as it requires to have duplicate name in
defaults sections, and also to use settings which forces defaults
referencing, such as TCP/HTTP rules. This can be reproduced with the
minimal config here :
defaults def
http-request return status 200
frontend fe
bind :20080
defaults def
Note that in fact orphaned list looping is not strictly necessary, as
defaults instances are automatically removed via refcounting. This will
be the purpose of a future patch. However, to limit the risk of
regression on stable releases during backport, this patch uses the more
direct approach for now.
Since commit eb5279b15 ("BUG/MEDIUM: ssl: fix generate-certificates
option when SNI greater than 64bytes") the LibreSSL job does not seem to
work anymore.
Indeed the reg-tests was modified to add a SNI longer than 64 bytes,
without any concern about the DNS standard, which allows only 63 bytes
per label.
LibreSSL is stricter than the other libraries about that, and checks
that the SNI is compliant with the DNS RFC in the
tlsext_sni_is_valid_hostname() function
https://github.com/libressl/openbsd/blob/OPENBSD_7_8/src/lib/libssl/ssl_tlsext.c#L710
This patch fixes the issue by splitting the SNI with a second label to
reach more than 64 bytes.
Must be backported with eb5279b15 in every stable branches.
Amaury Denoyelle [Tue, 20 Jan 2026 18:00:37 +0000 (19:00 +0100)]
BUG/MEDIUM: mux-quic: prevent BUG_ON() on aborted uni stream close
When a QCS instance is fully closed on qcs_close_remote() invokation, it
is moved into purg_list for later cleanup. This reuses <el_send> list
element, so a BUG_ON() ensures that QCS is not already present in
send_list.
This code is safe for bidirectional streams, as local channel is only
closed after FIN or RESET_STREAM emission completion, so such QCS won't
be present in the send_list on full closure.
However, things are different for remote uni streams. As such streams do
not have any local channel, qcs_close_remote() will always proceed to
full closure. Most of the time this is fine, but the aformentionned
BUG_ON() could be triggered if emission is required on a remote uni
stream : this only happens after read was aborted and a STOP_SENDING
frame is prepared.
Fix this by adding an extra operation in qcs_close_remote() : on full
close, STOP_SENDING is cancelled if it was prepared and the QCS instance
is removed from send_list. This is safe as STOP_SENDING is unnecessary
after the remote channel is closed. This operation is performed before
purg_list insertion which prevents the BUG_ON() crash issue.
BUG/MEDIUM: ssl: fix generate-certificates option when SNI greater than 64bytes
The problem is that the certificate is generated with a CN greater than
64 bytes when the SNI is too long, which is not suppose to be supported,
and will end up with a handshake failure.
The patch fixes the issue by avoiding to add a CN when the SNI is longer than
64 bytes. Indeed this is not a mandatory field anymore and was deprecated more
than 20 years ago. The SAN DNS is enough for this case.
BUG/MEDIUM: ssl: fix error path on generate-certificates
It was reported by Przemyslaw Bromber that using the "generate-certificates"
option combined with AWS-LC would crash HAProxy when a request is done with a
SNI longer than 64 bytes.
The problem is that the certificate is generated with a CN greater than 64
bytes which results in ssl_sock_do_create_cert() returning NULL. This
NULL value being passed to SSL_set_SSL_CTX.
With OpenSSL, passing a NULL SSL_CTX does not seem to be an issue as it
would just ignore it.
With AWS_LC, passing a NULL seems to crash the function. This was
reported to upstream AWS-LC and fixed in patch 7487ad1dcd8
https://github.com/aws/aws-lc/pull/2946.
Hyeonggeun Oh [Fri, 26 Dec 2025 06:57:35 +0000 (15:57 +0900)]
DOC: vars: document dump_all_vars() sample fetch
Add documentation for the dump_all_vars() sample fetch function in the
configuration manual. This function was introduced in the previous commit
to dump all variables in a given scope with optional prefix filtering.
The documentation includes:
- Function signature and return type
- Description of output format
- Explanation of scope and prefix arguments
- Usage examples for common scenarios
This completes the implementation of GitHub issue #1623.
This patch implements dump_all_vars([scope],[prefix]) sample fetch
function that dumps all variables in a given scope, optionally
filtered by name prefix.
Output format: var1=value1, var2=value2, ...
- String values are quoted and escaped (", , \r, \n, \b, \0)
- All sample types are supported via sample_convert()
- Scope can be: sess, txn, req, res, proc
- Prefix filtering is optional
Example usage:
http-request return string %[dump_all_vars(txn)]
http-request return string %[dump_all_vars(txn,user)]
Hyeonggeun Oh [Tue, 13 Jan 2026 18:20:27 +0000 (03:20 +0900)]
MINOR: vars: store variable names for runtime access
Currently, variable names are only used during parsing and are not
stored at runtime. This makes it impossible to iterate through
variables and retrieve their names.
This patch adds infrastructure to store variable names:
- Add 'name' and 'name_len' fields to var_desc structure
- Add 'name' field to var structure
- Add VDF_NAME_ALLOCATED flag to track memory ownership
- Store names in vars_fill_desc(), var_set(), vars_check_arg(),
and parse_store()
- Free names in var_clear() and release_store_rule()
- Add ARGT_VAR handling in release_sample_arg() to free the
allocated name when the flag is set
This prepares the ground for implementing dump_all_vars() in the
next commit.
Tested with:
- ASAN-enabled build on Linux (TARGET=linux-glibc USE_OPENSSL=1
ARCH_FLAGS="-g -fsanitize=address")
- Regression tests: reg-tests/sample_fetches/vars.vtc
- Regression tests: reg-tests/startup/default_rules.vtc
Hyeonggeun Oh [Mon, 12 Jan 2026 18:07:15 +0000 (03:07 +0900)]
MINOR: tools: add chunk_escape_string() helper function
This function takes a string appends it to a buffer in a format
compatible with most languages (double-quoted, with special characters
escaped). It handles standard escape sequences like \n, \r, \", \\.
This generic utility is desined to be used for logging or debugging
purposes where arbitrary string data needs to be safely emitted without
breaking the output format. It will be primarily used by the upcoming
dump_all_vars() sample fetch to dump variable contents safely.
Hyeonggeun Oh [Tue, 20 Jan 2026 13:27:40 +0000 (22:27 +0900)]
REORG: cfgparse: move peers parsing to cfgparse-peers.c
This patch move the peers section parsing code from src/cfgparse.c to a
dedicated src/cfgparse-peers.c file. This seperation improves code
organization and prepares for further refactoring of the "peers" keyword
registration system.
No functional changes in this patch - the code is moved as-is with only
the necessary adjustments for compliation (adding SPDX header and
updating Makefile for build).
This is the first patch in a series to address issue #3221, which
reports that "peers" section keywords are not displayed with -dKall.
Egor Shestakov [Mon, 19 Jan 2026 17:27:50 +0000 (17:27 +0000)]
DOC: fix mismatched quotes typos around words in the documentation files
s/"no'/"no"
s/'private"/"private"
s/"flt'/"flt"
There isn't definite convention but people usually prefer to highlight
something important with quotation marks. For example, it's convenient
to find keywords from a text when they are quoted, mismatches make this
harder.
Add the PP2_SUBTYPE_SSL_CLIENT_CERT code point reservation in the
proxy protocol specification. This is useful in cases where the
backend needs to perform mTLS authentication, but the rules for
certificate validation are backend-specific (e.g. database of
allowed certificate hashes).
This is left optional to leave it up to the frontend configuration
to dictate whether to forward raw certificate data.
Support for this new TLV has been added in tlstunnel:
https://codeberg.org/emersion/tlstunnel/pulls/33
BUG/MEDIUM: log: parsing log-forward options may result in segfault
As reported by GH user @HiggsTeilchen on #3250, the use of "option
dont-parse-log" may result in segmentation fault when parsing the
configuration. In fact, "option assume-rfc6587-ntf" is also affected.
The reason behind this is that cfg_parse_log_forward() leverages the
cfg_parse_listen_match_option() function to check for generic proxy
options that are relevant in the PR_MODE_SYSLOG context. And while it
is not documented, this function assumes that the currently evaluated
proxy is stored in the global variable 'curproxy', which
cfg_parse_log_forward() doesn't offer.
cfg_parse_listen_match_option() uses curproxy to check the currently
evaluated proxy's capabilities is compatible with the option, so if a
proxy with the frontend capability was defined earlier in the config,
parsing would succeed, if curproxy points to proxy without the frontend
capabilty (ie: backend), a warning would be emitted to tell that the
option would be ignored while it is perfectly valid for the log-forward
proxy, and if no proxy was defined earlier in the config a segfault would
be triggered.
To fix the issue, we explicitly make "curproxy" global variable point to
the log-forward proxy being parsed in cfg_parse_log_forward() before
leveraging cfg_parse_listen_match_option() to check for compatible
options.
It must be backported with 834e9af8 ("MINOR: log: add options eval for
log-forward"), which was introduced in 3.2 precisely.
SCRIPTS: build-ssl: fix quictls build for 1.1.1 versions
The quictls build function was not using anymore the right make target
to build older versions. make all must be used instead of make build_sw
for 1.1.1.
BUG/MEDIUM: promex: server iteration may rely on stale server
When performing a promex dump, even though we hold reference on server
during resumption after a yield (ie: buffer full), the refcount mechanism
only guarantees that the server pointer will be valid upon resumption, not
that its content will be consistent. As such, sv->next may be garbage upon
resumption. Instead, we must rely on the watcher mechanism to iterate over
server list when resumption is involved like we already do for stats and
lua handlers.
It must be backported anywhere 071ae8ce3 (" BUG/MEDIUM: stats/server: use
watcher to track server during stats dump") was (up to 2.8 it seems)
BUG/MINOR: server: ensure server is detached from proxy list before being freed
There remained some cases (on error paths) were a server could be freed
while still attached on the parent proxy server list. In 3.3 this can be
problematic because new_server() automatically adds the server to the
parent proxy list.
The bug is insignificant because it is on errors paths during init and
often haproxy exits right after. But let's fix that to ensure no UAF or
undefined behavior occurs because of that.
This patch depends on ("MINOR: cli: use srv_drop() when server was created using new_server()")
It must be backported in 3.3 with the above mentioned patch.
MINOR: cli: use srv_drop() when server was created using new_server()
Now that new_server() is becoming more and more complex, we need to
take care that servers created using new_server() must be released
using the corresponding release function srv_drop() which takes care
of properly de-initing the server and its members.
CI: github: define the right quictls version in each jobs
openssl+quictls is not maintained anymore (quictls/openssl), however we
still need to test openssl+quictls 1.1.1. Other openssl+quictls branches
don't need to be tested.
The quictls hardfork is tested in the 'quictls' job, it uses the
'main' branch in the quictls/quictls repository.
Ilia Shipitsin [Sat, 17 Jan 2026 20:00:47 +0000 (21:00 +0100)]
CI: github: switch monthly Fedora Rawhide build to OpenSSL
QuicTLS builds are already run on push and openssl+quictls patchset is
not maintained anymore. The patch switch from openssl+quictls to the
native openssl of fedora.
Fedora Rawhide builds are mainly useful to test the latest gcc and clang
versions as well as default options of the distribution.
The patch also contains a workaround to re-enable legacy algorithms
which are still tested on the CI.
Egor Shestakov [Thu, 15 Jan 2026 15:41:37 +0000 (15:41 +0000)]
BUG/MINOR: cfgparse: fix "default" prefix parsing
Fix the left shift of args when "default" prefix matches. The cause of the
bug was the absence of zeroing of the right element during the shift. The
same bug for "no" prefix was fixed by commit 0f99e3497, but missed for
"default".
The shift of ("default", "option", "dontlog-normal")
produced ("option", "dontlog-normal", "dontlog-normal")
instead of ("option", "dontlog-normal", "")
As an example, a valid config line:
default option dontlog-normal
REGTESTS: jwe: Fix tests of algorithms not supported by AWS-LC
Many tests use the A128KW algorithm which is not supported by AWS-LC but
instead of removing those tests we will just have a hardcoded value set
by default in this case.
This converter checks the validity and decrypts the content of a JWE
token that has an asymetric "alg" algorithm (RSA). In such a case, we
must provide a path to an already loaded certificate and private key
that has the "jwt" option set to "on".
This converter checks the validity and decrypts the content of a JWE
token that has a symetric "alg" algorithm. In such a case, we only
require a secret as parameter in order to decrypt the token.
REGTESTS: ssl: Add tests for new aes cbc converters
This test mimics what was already done for the aes_gcm converters. Some
data is encrypted and directly decrypted and we ensure that the output
was not changed.
Those converters allow to encrypt or decrypt data with AES in Cipher
Block Chaining mode. They work the same way as the already existing
aes_gcm_enc/_dec ones apart from the AEAD tag notion which is not
supported in CBC mode.
The parameter parsing and processing and the actual crypto part of the
aes_gcm converter are interleaved. This patch puts the crypto parts in a
dedicated function for better reuse in the upcoming JWE processing.
MEDIUM: proxy: force traffic on unpublished/disabled backends
A recent patch has introduced a new state for proxies : unpublished
backends. Such backends won't be eligilible for traffic, thus
use_backend/default_backend rules which target them won't match and
content switching rules processing will continue.
This patch defines a new frontend keywords 'force-be-switch'. This
keyword allows to ignore unpublished or disabled state. Thus,
use_backend/default_backend will match even if the target backend is
unpublished or disabled. This is useful to be able to test a backend
instance before exposing it outside.
This new keyword is converted into a persist rule of new type
PERSIST_TYPE_BE_SWITCH, stored in persist_rules list proxy member. This
is the only persist rule applicable to frontend side. Prior to this
commit, pure frontend proxies persist_rules list were always empty.
This new features requires adjustment in process_switching_rules(). Now,
when a use_backend/default_backend rule matches with an non eligible
backend, frontend persist_rules are inspected to detect if a
force-be-switch is present so that the backend may be selected.
Utility function warnif_cond_conflicts() is used when parsing an ACL.
Previously, the function directly calls ha_warning() to report an error.
Change the function so that it now takes the error message as argument.
Caller can then output it as wanted.
This change is necessary to use the function when parsing a keyword
registered as cfg_kw_list. The next patch will reuse it.
Amaury Denoyelle [Tue, 13 Jan 2026 15:24:52 +0000 (16:24 +0100)]
MINOR: stats: report BE unpublished status
A previous patch defines a new proxy status : unpublished backends. This
patch extends this by changing proxy status reported in stats. If
unpublished is set, an extra "(UNPUB)" is added to the field.
Also, HTML stats is also slightly updated. If a backend is up but
unpublished, its status will be reported in orange color.
Define a new set of CLI commands publish/unpublish backend <be>. The
objective is to be able to change the status of a backend to
unpublished. Such a backend is considered ineligible to traffic : this
allows to skip use_backend rules which target it.
Note that contrary to disabled/stopped proxies, an unpublished backend
still has server checks running on it.
Internally, a new proxy flags PR_FL_BE_UNPUBLISHED is defined. CLI
commands handler "publish backend" and "unpublish backend" are executed
under thread isolation. This guarantees that the flag can safely be set
or remove in the CLI handlers, and read during content-switching
processing.
MEDIUM: proxy: do not select a backend if disabled
A proxy can be marked as disabled using the keyword with the same name.
The doc mentions that it won't process any traffic. However, this is not
really the case for backends as they may still be selected via switching
rules during stream processing.
In fact, currently access to disabled backends will be conducted up to
assign_server(). However, no eligible server is found at this stage,
resulting in a connection closure or an HTTP 503, which is expected. So
in the end, servers in disabled backends won't receive any traffic. But
this is only because post-parsing steps are not performed on such
backends. Thus, this can be considered as functional but only via
side-effects.
This patch clarifies the handling of disable backends, so that they are
never selected via switching rules. Now, process_switching_rules() will
ignore disable backends and continue rules evaluation.
As this is a behavior change, this patch is labelled as medium. The
documentation manuel for use_backend is updated accordingly.
REGTESTS: add test on backend switching rules selection
Create a new test to ensure that switching rules selection is fine.
Currently, this checks that dynamic backend switching works as expected.
If a matching rule is resolved to an unexisting backend, the default
backend is used instead.
This regtest should be useful as switching-rules will be extended in a
future set of patches to add new abilities on backends, linked to
dynamic backend support.
This commit rewrites process_switching_rules() function. The objective
is to simplify backend selection so that a single unified
stream_set_backend() call is kept, both for regular and default backends
case.
This patch will be useful to add new capabilities on backends, in the
context of dynamic backend support implementation.
Amaury Denoyelle [Wed, 14 Jan 2026 10:19:13 +0000 (11:19 +0100)]
BUG/MINOR: proxy: free persist_rules
force-persist proxy keyword is converted into a persist_rule, stored in
proxy persist_rules list member. Each new rule is dynamically allocated
during parsing.
This commit fixes the memory leak on deinit due to a missing free on
persist_rules list entries. This is done via deinit_proxy()
modification. Each rule in the list is freed, along with its associated
ACL condition type.
Olivier Houchard [Thu, 15 Jan 2026 04:10:03 +0000 (05:10 +0100)]
MEDIUM: thread: Turn the group mask in thread set into a group counter
If we want to be able to have more than 64 thread groups, we can no
longer use thread group masks as long.
One remaining place where it is done is in struct thread_set. However,
it is not really used as a mask anywhere, all we want is a thread group
counter, so convert that mask to a counter.
Olivier Houchard [Thu, 15 Jan 2026 03:22:10 +0000 (03:22 +0000)]
BUG/MEDIUM: queues: Fix arithmetic when feeling non_empty_tgids
Fix the arithmetic when pre-filling non_empty_tgids when we still have
more than 32/64 thread groups left, to get the right index, we of course
have to divide the number of thread groups by the number of bits in a
long.
This bug was introduced by commit 7e1fed4b7a8b862bf7722117f002ee91a836beb5, but hopefully was not hit
because it requires to have at least as much thread groups as there are
bits in a long, which is impossible on 64bits machines, as MAX_TGROUPS
is still 32.
Now that it is unused, eliminate all_tgroups_mask, as we can't 64bits
masks to represent thread groups, if we want to be able to have more
than 64 thread groups.
MINOR: queues: Turn non_empty_tgids into a long array.
In order to be able to have more than 64 thread groups, turn
non_empty_tgids into a long array, so that we have enough bits to
represent everty thread group, and manipulate it with the ha_bit_*
functions.
Root cause is simple, in parse_http_set_map(), we define the release
function (which is responsible to clear lf_expr expressions used by the
action), prior to initializing the expressions, while the release
function assumes the expressions are always initialized.
For all similar actions, we already perform the init prior to setting
the related release function, but this was not the case for
parse_http_set_map(). We fix the bug by initializing the expressions
earlier.
Thanks to @Lzq-001 for having reported the issue and provided a simple
reproducer.
It should be backported to all stable versions, note for versions prior to
3.0, lf_expr_init() should be replace by LIST_INIT(), see 6810c41 ("MEDIUM: tree-wide: add logformat expressions wrapper")
Contrarily to what was previously believed, there are corner cases where
the counters may not be allocated, and we may want to make them optional
at a later date, so we have to check if those counters are there.
However, just checking that shared.tg is non-NULL is enough, we can then
assume that shared.tg[tgid - 1] has properly been allocated too.
Also modify the various COUNTER_SHARED_* macros to make sure they check
for that too.
Amaury Denoyelle [Tue, 13 Jan 2026 13:29:15 +0000 (14:29 +0100)]
BUG/MEDIUM: quic: fix ACK ECN frame parsing
ACK frames are either of type 0x02 or 0x03. The latter is an indication
that it contains extra ECN related fields. In haproxy QUIC stack, this
is considered as a different frame type, set to QUIC_FT_ACK_ECN, with
its own set of builder/parser functions.
This patch fixes ACK ECN parsing function. Indeed, the latter suffered
from two issues. First, 'first ACK range' and 'ACK ranges' were
inverted. Then, the three remaining ECN fields were simply ignored by
the parsing function.
This issue can cause desynchronization in the frames parsing code, which
may result in various result. Most of the time, the connection will be
aborted by haproxy due to an invalid frame content read.
Note that this issue was not detected earlier as most clients do not
enable ECN support if the peer is not able to emit ACK ECN frame first,
which haproxy currently never sends. Nevertheless, this is not the case
for every client implementation, thus proper ACK ECN parsing is
mandatory for a proper QUIC stack support.
Fix this by adjusting quic_parse_ack_ecn_frame() function. The remaining
ECN fields are parsed to ensure correct packet parsing. Currently, they
are not used by the congestion controller.
Olivier Houchard [Tue, 13 Jan 2026 10:42:32 +0000 (11:42 +0100)]
BUG/MEDIUM: threads: Fix binding thread on bind.
The code to parse the "thread" keyword on bind lines was changed to
check if the thread numbers were correct against the value provided with
max-threads-per-group, if any were provided, however, at the time those
thread keywords have been set, it may not yet have been set, and that
breaks the feature, so revert to check against MAX_THREADS_PER_GROUP instead,
it should have no major impact.
Olivier Houchard [Tue, 13 Jan 2026 07:01:28 +0000 (08:01 +0100)]
MEDIUM: counters: Remove some extra tests
Before updating counters, a few tests are made to check if the counters
exits. but those counters should always exist at this point, so just
remmove them.
This commit should have no impact, but can easily be reverted with no
functional impact if various crashes appear.
Olivier Houchard [Mon, 12 Jan 2026 03:25:34 +0000 (04:25 +0100)]
MEDIUM: counters: Dynamically allocate per-thread group counters
Instead of statically allocating the per-thread group counters,
based on the max number of thread groups available, allocate
them dynamically, based on the number of thread groups actually
used. That way we can increase the maximum number of thread
groups without using an unreasonable amount of memory.
The IPv6 header contains a payload length that excludes the 40 bytes of
IPv6 packet header, which differs from IPv4's total length which includes
it. As a result, the parser was wrong and would only see the IP part and
not the TCP one unless sufficient options were present tocover it.
This issue came in 3.4-dev2 with recent commit e88e03a6e4 ("MINOR:
net_helper: add ip.fp() to build a simplified fingerprint of a SYN"),
so no backport is needed.
BUG/MINOR: hlua_fcn: ensure Patref:add_bulk() is given a table object before using it
As reported by GH user @kanashimia in GH #3241, providing anything else
than a table to Patref:add_bulk() method could cause a segfault because
we were calling lua_next() with the lua object without ensuring it
actually is a table.
Let's add the missing lua_istable() check on the stack object before
calling lua_next() function on it.
It should be backported up to 3.2 with 884dc62 ("MINOR: hlua_fcn:
add Patref:add_bulk()")
BUG/MINOR: hlua_fcn: fix broken yield for Patref:add_bulk()
In GH #3241, GH user @kanashimia reported that the Patref:add_bulk()
method would raise a Lua exception when called with more than 101
elements at once.
As identified by @kanashimia there was an error in the way the
add_bulk() method was forced to yield after 101 elements precisely.
The yield is there to ensure Lua doesn't eat too much ressources at
once and doesn't impact haproxy's core responsiveness, but the check
for the yield was misplaced resulting in improper stack content upon
resume.
Thanks to user @kanashimia who even provided a reproducer which helped
a lot to troubleshoot the issue.
This fix should be backported up to 3.2 with 884dc62 ("MINOR: hlua_fcn:
add Patref:add_bulk()") where the bug was introduced.
Olivier Houchard [Mon, 12 Jan 2026 08:48:54 +0000 (09:48 +0100)]
BUG/MINOR: stats-file: Use a 16bits variable when loading tgid
Now that the tgid stored in the stats file has been increased to 16bits
by commit 022cb3ab7fdce74de2cf24bea865ecf7015e5754, don't forget to
increase the variable size when reading it from the file, too.
This should have no impact given the maximum thread group limit is still
32.
MINOR: stats: Increase the tgid from 8bits to 16bits
Increase the size of the stored tgid in the stat file from 8bits to
32bits, so that we can have more than 256 thread group. 65536 should be
enough for some time.
This bumps thet stat file minor version, as the structure changes.
MINOR: receiver: Dynamically alloc the "members" field of shard_info
Instead of always allocating MAX_TGROUPS members, allocate them
dynamically, using the number of thread groups we'll use, so that
increasing MAX_TGROUPS will not have a huge impact on the structure
size.
Tim Duesterhus [Fri, 9 Jan 2026 19:09:08 +0000 (20:09 +0100)]
CLEANUP: connection: Remove outdated note about CO_FL `0x00002000` being unused
This flag is used as of commit dcce9369129f6ca9b8eed6b451c0e20c226af2e3
("MINOR: connections: Add a new CO_FL_SSL_NO_CACHED_INFO flag"). This patch
should be backported to 3.3. Apparently dcce9369129 has been backported
to 3.2 and 3.1 already, with that change already applied, so no need for a
backport there.
Willy Tarreau [Sun, 11 Jan 2026 14:19:18 +0000 (15:19 +0100)]
MINOR: tcp-sample: permit retrieving tcp_info from the connection/session stage
The fc_xxx info that are retrieved over tcp_info could currently not
be accessed before a stream is created due to a test that verified the
existence of a stream. The rationale here was that the function works
both for frontend and backend. Let's always retrieve these info from
the session for the frontend case so that it now becomes possible to
set variables at connection/session time. The doc did not mention this
limitation so this could almost be considered as a bug.
Willy Tarreau [Sun, 11 Jan 2026 14:13:42 +0000 (15:13 +0100)]
MINOR: sample: also support retrieving fc.timer.handshake without a stream
Some timers, like the handshake timer, are stored in the session and are
only copied to the logs struct when a stream is created. But this means
we can't measure it without a stream, nor store it once for all in a
variable at session creation time. Let's extend the sample fetch function
to retrieve it from the session when no stream is present. The doc did not
mention this limitation so this could almost be considered as a bug.
Willy Tarreau [Fri, 9 Jan 2026 13:49:33 +0000 (14:49 +0100)]
MEDIUM: config: warn if some userlist hashes are too slow
It was reported in GH #2956 and more recently in GH #3235 that some
hashes are way too slow. The former triggers watchdog warnings during
checks, the second sees the config parsing take 20 seconds. This is
always due to the use of hash algorithms that are not suitable for use
in low-latency environments like web. They might be fine for a local
auth though. The difficulty, as explained by Philipp Hossner, is that
developers are not aware of this cost and adopt this without suspecting
any side effect.
The proposal here is to measure the crypt() call time and emit a warning
if it takes more than 10ms (which is already extreme). This was tested
by Philipp and confirmed to catch his case.
This is marked medium as it might start to report warnings on config
suffering from this problem without ever detecting it till now.
akarl10 [Thu, 1 Jan 2026 13:32:56 +0000 (14:32 +0100)]
BUG/MINOR: ech/quic: enable ech configuration also for quic listeners
Patch dba4fd24 ("MEDIUM: ssl/ech: config and load keys") introduced
ECH configuration for bind lines, but the QUIC configuration parsers
still suffers from not using the same code as the TCP/TLS one, so the
init for QUIC was missed.
CI: github: remove ERR=1 temporarly from the ECH job
The ECH job still fails to compile since the openssl 4.0 deprecated
functions were not removed yet. Let's remove ERR=1 temporarly.
We do know that there's a regression in OpenSSL 4.0 with these
reg-tests though:
Error: # top TEST reg-tests/ssl/set_ssl_crlfile.vtc FAILED (0.219) exit=2
Error: # top TEST reg-tests/ssl/set_ssl_cafile.vtc FAILED (0.236) exit=2
Error: # top TEST reg-tests/quic/set_ssl_crlfile.vtc FAILED (0.196) exit=2
OpenSSL changed the output from "Server Temp Key" in prior versions to
"Peer Temp Key" in recent ones.
https://github.com/openssl/openssl/commit/a39dc27c2573da14e85ca8961970c82009bd4ff6
It looks like it affects OpenSSL >=3.5.0
This broke the reg-test for e.g. Debian 13 builds, using OpenSSL 3.5.1