Willy Tarreau [Wed, 14 May 2025 13:54:58 +0000 (15:54 +0200)]
DOC: config: move the extraneous sections out of the "global" definition
Due to some historic mistakes that have spread to newly added sections,
a number of of recently added small sections found themselves described
under section 3 "global parameters" which is specific to "global" section
keywords. This is highly confusing, especially given that sections 3.1,
3.2, 3.3 and 3.10 directly start with keywords valid in the global section,
while others start with keywords that describe a new section.
Let's just create a new chapter "12. other sections" and move them all
there. 3.10 "HTTPclient tuning" however was moved to 3.4 as it's really
a definition of the global options assigned to the HTTP client. The
"programs" that are going away in 3.3 were moved at the end to avoid a
renumbering later.
Another nice benefit is that it moves a lot of text that was previously
keeping the global and proxies sections apart.
Willy Tarreau [Wed, 14 May 2025 12:26:38 +0000 (14:26 +0200)]
DOC: config: move stick-tables and peers to their own section
As suggested by Tim in issue #2953, stick-tables really deserve their own
section to explain the configuration. And peers have to move there as well
since they're totally dedicated to stick-tables.
Now we introduce a new section "Stick-tables and Peers", explaining the
concepts, and under which there is one subsection for stick-tables
configuration and one for the peers (which mostly keeps the existing
peers section).
Willy Tarreau [Wed, 14 May 2025 13:39:15 +0000 (15:39 +0200)]
DOC: config: move address formats definition to section 2
Section 2 describes the config file format, variables naming etc, so
there's no reason why the address format used in this file should be
in a separate section, let's bring it into section 2 as well.
DEBUG: mux-spop: Review some trace messages to adjust the message or the level
Some trace messages were not really accurrate, reporting a CLOSED connection
while only an error was reported on it. In addition, an TRACE_ERROR() was
used to report a short read on HELLO/DISCONNECT frames header. But it is not
an error. a TRACE_DEVEL() should be used instead.
This patch could be backported to 3.1 to ease future backports.
BUG/MEDIUM: mux-spop; Don't report a read error if there are pending data
When an read error is detected, no error must be reported on the SPOP
connection is there are still some data to parse. It is important to be sure
to process all data before reporting the error and be sure to not truncate
received frames. However, we must also take care to handle short read case
to not wait data that will never be received.
BUG/MEDIUM: mux-spop: Properly detect truncated frames on demux to report error
There was no test in the demux part to detect truncated frames and to report
an error at the connection level. The SPOP streams were properly switch to
half-closed state. But waiting the associated SPOE applets were woken up and
released, the SPOP connection could be woken up several times for nothing. I
never triggered the watchdog in that case, but it is not excluded.
Now, at the end of the demux function, if a specific test was added to
detect truncated frames to report an error and close the connection.
BUG/MEDIUM: spop-conn: Report short read for partial frames payload
When a frame was not fully received, a short read must be reported on the
SPOP connection to help the demux to handle truncated frames. This was
performed for frames truncated on the header part but not on the payload
part. It is now properly detected.
BUG/MEDIUM: mux-spop: Properly handle CLOSING state
The CLOSING state was not handled at all by the SPOP multiplexer while it is
mandatory when a DISCONNECT frame was sent and the mux should wait for the
DISCONNECT frame in reply from the agent. Thanks to this patch, it should be
fixed.
In addition, if an error occurres during the AGENT HELLO frame parsing, the
SPOP connection is no longer switched to CLOSED state and remains in ERROR
state instead. It is important to be able to send the DISCONNECT frame to
the agent instead of closing the TCP connection immediately.
This patch depends on following commits:
* BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
* MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
* BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
* BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame
parsing, were removed. The demux process now relies on the demux stream ID
to know if it is waiting for the frame header or the frame
payload. Concretly, when the demux stream ID is not set (dsi == -1), the
demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is
waiting for the frame payload. It is especially important to be able to
properly handle DISCONNECT frames sent by the agents.
SPOP_CS_RUNNING state is introduced to know the hello handshake was finished
and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK
frames with the agents.
It depends on the following fixes:
* MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
* BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
This change will be mandatory for the next fix. It must be backported to 3.1
with the commits above.
MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
After the ACK frame was parsed, it is useless to set the SPOP connection
state to SPOP_CS_FRAME_H state because this will be automatically handled by
the demux function. If it is not an issue, but this will simplify changes
for the next commit.
BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
Till now, only SPOP connections fully closed or those with a TCP connection on
error were concerned. But available streams could be reported for SPOP
connections in error or closing state. But in these states, no NOTIFY frames
will be sent and no ACK frames will be parsed. So, no new SPOP streams should be
opened.
BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
The demux stream ID of a SPOP connection, used when received frames are
parsed, must be a signed integer because it is set to -1 when the SPOP
connection is initialized. It will be important for the next fix.
BUG/MINOR: mux-spop: Don't report error for stream if ACK was already received
When a SPOP connection was closed or was in error, an error was
systematically reported on all its SPOP streams. However, SPOP streams that
already received their ACK frame must be excluded. Otherwise if an agent
sends a ACK and close immediately, the ACK will be ignored because the SPOP
stream will handle the error first.
BUG/MINOR: spoe: Don't report error on applet release if filter is in DONE state
When the SPOE applet was released, if a SPOE filter context was still
attached to it, an error was reported to the filter. However, there is no
reason to report an error if the ACK message was already received. Because
of this bug, if the ACK message is received and the SPOE connection is
immediately closed, this prevents the ACK message to be processed.
BUG/MINOR: hlua: Fix Channel:data() and Channel:line() to respect documentation
When the channel API was revisted, the both functions above was added. An
offset can be passed as argument. However, this parameter could be reported
to be out of range if there was not enough input data was received yet. It
is an issue, especially with a tcp rule, because more data could be
received. If an error is reported too early, this prevent the rule to be
reevaluated later. In fact, an error should only be reported if the offset
is part of the output data.
Another issue is about the conditions to report 'nil' instead of an empty
string. 'nil' was reported when no data was found. But it is not aligned
with the documentation. 'nil' must only be returned if no more data cannot
be received and there is no input data at all.
This patch should fix the issue #2716. It should be backported as far as 2.6.
Willy Tarreau [Tue, 13 May 2025 15:58:28 +0000 (17:58 +0200)]
MEDIUM: config: change default limits to 1024 threads and 32 groups
A test run on a dual-socket EPYC 9845 (2x160 cores) showed that we'll
be facing new limits during the lifetime of 3.2 with our current 16
groups and 256 threads max:
Raising the default limit to 1024 threads and 32 groups is sufficient
to buy us enough margin for a long time (hopefully, please don't laugh,
you, reader from the future):
We can change this default now because it has no functional effect
without any configured cpu-policy, so this will only be an opt-in
and it's better to do it now than to have an effect during the
maintenance phase. A tiny effect is a doubling of the number of
pool buckets and stick-table shards internally, which means that
aside slightly reducing contention in these areas, a dump of tables
can enumerate keys in a different order (hence the adjustment in the
vtc).
The only really visible effect is a slightly higher static memory
consumption (29->35 MB on a small config), but that difference
remains even with 50k servers so that's pretty much acceptable.
Thanks to Erwan Velu for the quick tests and the insights!
Willy Tarreau [Tue, 13 May 2025 14:20:53 +0000 (16:20 +0200)]
MEDIUM: cpu-topo: prefer grouping by CCX for "performance" and "efficiency"
Most of the time, machines made of multiple CPU types use the same L3
for them, and grouping CPUs by frequencies to form groups doesn't bring
any value and on the opposite can impair the incoming connection balancing.
This choice of grouping by cluster was made in order to constitute a good
choice on homogenous machines as well, so better rely on the per-CCX
grouping than the per-cluster one in this case. This will create less
clusters on machines where it counts without affecting other ones.
It doesn't seem necessary to change anything for the "resource" policy
since it selects a single cluster.
Willy Tarreau [Tue, 13 May 2025 14:18:25 +0000 (16:18 +0200)]
MEDIUM: cpu-topo: change "efficiency" to consider per-core capacity
This is similar to the previous change to the "performance" policy but
it applies to the "efficiency" one. Here we're changing the sorting
method to sort CPU clusters by average per-CPU capacity, and we evict
clusters whose per-CPU capacity is above 125% of the previous one.
Per-core capacity allows to detect discrepancies between CPU cores,
and to continue to focus on efficient ones as a priority.
Willy Tarreau [Tue, 13 May 2025 14:12:52 +0000 (16:12 +0200)]
MEDIUM: cpu-topo: change "performance" to consider per-core capacity
Running the "performance" policy on highly heterogenous systems yields
bad choices when there are sufficiently more small than big cores,
and/or when there are multiple cluster types, because on such setups,
the higher the frequency, the lower the number of cores, despite small
differences in frequencies. In such cases, we quickly end up with
"performance" only choosing the small or the medium cores, which is
contrary to the original intent, which was to select performance cores.
This is what happens on boards like the Orion O6 for example where only
the 4 medium cores and 2 big cores are choosen, evicting the 2 biggest
cores and the 4 smallest ones.
Here we're changing the sorting method to sort CPU clusters by average
per-CPU capacity, and we evict clusters whose per-CPU capacity falls
below 80% of the previous one. Per-core capacity allows to detect
discrepancies between CPU cores, and to continue to focus on high
performance ones as a priority.
Willy Tarreau [Tue, 13 May 2025 14:00:23 +0000 (16:00 +0200)]
MINOR: cpu-topo: provide a function to sort clusters by average capacity
The current per-capacity sorting function acts on a whole cluster, but
in some setups having many small cores and few big ones, it becomes
easy to observe an inversion of metrics where the many small cores show
a globally higher total capacity than the few big ones. This does not
necessarily fit all use cases. Let's add new a function to sort clusters
by their per-cpu average capacity to cover more use cases.
Willy Tarreau [Tue, 13 May 2025 13:39:35 +0000 (15:39 +0200)]
MINOR: cpu-topo: add a new "group-by-ccx" CPU policy
This cpu-policy will only consider CCX and not clusters. This makes
a difference on machines with heterogenous CPUs that generally share
the same L3 cache, where it's not desirable to create multiple groups
based on the CPU types, but instead create one with the different CPU
types. The variants "group-by-2/3/4-ccx" have also been added.
Let's also add some text explaining the difference between cluster
and CCX.
Willy Tarreau [Tue, 13 May 2025 09:40:44 +0000 (11:40 +0200)]
BUG/MINOR: cpu-topo: fix group-by-cluster policy for disordered clusters
Some (rare) boards have their clusters in an erratic order. This is
the case for the Radxa Orion O6 where one of the big cores appears as
CPU0 due to booting from it, then followed by the small cores, then the
medium cores, then the remaining big cores. This results in clusters
appearing this order: 0,2,1,0.
The core in cpu_policy_group_by_cluster() expected ordered clusters,
and performs ordered comparisons to decide whether a CPU's cluster has
already been taken care of. On the board above this doesn't work, only
clusters 0 and 2 appear and 1 is skipped.
Let's replace the cluster number comparison with a cpuset to record
which clusters have been taken care of. Now the groups properly appear
like this:
Tgrp/Thr Tid CPU set
1/1-2 1-2 2: 0,11
2/1-4 3-6 4: 1-4
3/1-6 7-12 6: 5-10
MINOR: quic: display QCS info on "show quic stream"
Complete stream output for "show quic" by displaying information from
its upper QCS. Note that QCS may be NULL if already released, so a
default output is also provided.
Add a new format for "show quic" command labelled as "stream". This is
an equivalent of "show sess", dedicated to the QUIC stack. Each active
QUIC streams are listed on a line with their related infos.
The main objective of this command is to ensure there is no freeze
streams remaining after a transfer.
Add accounting at qc_stream_desc level to be able to report the number
of allocated Tx buffers and the sum of their data. This represents data
ready for emission or already emitted and waiting on ACK.
To simplify this accounting, a new counter type bdata_ctr is defined in
quic_utils.h. This regroups both buffers and data counter, plus a
maximum on the buffer value.
These values are now displayed on QCS info used both on logline and
traces, and also on "show quic" output.
Willy Tarreau [Mon, 12 May 2025 15:45:33 +0000 (17:45 +0200)]
BUG/MEDIUM: h2/h3: reject some forbidden chars in :authority before reassembly
As discussed here:
https://github.com/httpwg/http2-spec/pull/936
https://github.com/haproxy/haproxy/issues/2941
It's important to take care of some special characters in the :authority
pseudo header before reassembling a complete URI, because after assembly
it's too late (e.g. the '/'). This patch does this, both for h2 and h3.
The impact on H2 was measured in the worst case at 0.3% of the request
rate, while the impact on H3 is around 1%, but H3 was about 1% faster
than H2 before and is now on par.
It may be backported after a period of observation, and in this case it
relies on this previous commit:
MINOR: http: add a function to validate characters of :authority
Thanks to @DemiMarie for reviving this topic in issue #2941 and bringing
new potential interesting cases.
Willy Tarreau [Mon, 12 May 2025 15:39:08 +0000 (17:39 +0200)]
MINOR: http: add a function to validate characters of :authority
As discussed here:
https://github.com/httpwg/http2-spec/pull/936
https://github.com/haproxy/haproxy/issues/2941
It's important to take care of some special characters in the :authority
pseudo header before reassembling a complete URI, because after assembly
it's too late (e.g. the '/').
This patch adds a specific function which was checks all such characters
and their ranges on an ist, and benefits from modern compilers
optimizations that arrange the comparisons into an evaluation tree for
faster match. That's the version that gave the most consistent performance
across various compilers, though some hand-crafted versions using bitmaps
stored in register could be slightly faster but super sensitive to code
ordering, suggesting that the results might vary with future compilers.
This one takes on average 1.2ns per character at 3 GHz (3.6 cycles per
char on avg). The resulting impact on H2 request processing time (small
requests) was measured around 0.3%, from 6.60 to 6.618us per request,
which is a bit high but remains acceptable given that the test only
focused on req rate.
BUG/MINOR: server: perform lbprm deinit for dynamic servers
Last commit 7361515 ("BUG/MINOR: server: dont depend on proxy for server
cleanup in srv_drop()") introduced a regression because the lbprm
server_deinit is not evaluated anymore with dynamic servers, possibly
resulting in a memory leak.
To fix the issue, in addition to free_proxy(), the server deinit check
should be manually performed in cli_parse_delete_server() as well.
BUG/MINOR: server: dont depend on proxy for server cleanup in srv_drop()
In commit b5ee8bebfc ("MINOR: server: always call ssl->destroy_srv when
available"), we made it so srv_drop() doesn't depend on proxy to perform
server cleanup.
It turns out this is now mandatory, because during deinit, free_proxy()
can occur before the final srv_drop(). This is the case when using Lua
scripts for instance.
In 2a9436f96 ("MINOR: lbprm: Add method to deinit server and proxy") we
added a freeing check under srv_drop() that depends on the proxy.
Because of that UAF may occur during deinit when using a Lua script that
manipulate server objects.
To fix the issue, let's perform the lbprm server deinit logic under
free_proxy() directly, where the DEINIT server hooks are evaluated.
Also, to prevent similar bugs in the future, let's explicitly document
in srv_drop() that server cleanups should assume that the proxy may
already be freed.
Willy Tarreau [Mon, 12 May 2025 14:06:28 +0000 (16:06 +0200)]
BUG/MINOR: cfgparse: improve the empty arg position report's robustness
OSS Fuzz found that the previous fix ebb19fb367 ("BUG/MINOR: cfgparse:
consider the special case of empty arg caused by \x00") was incomplete,
as the output can sometimes be larger than the input (due to variables
expansion) in which case the work around to try to report a bad arg will
fail. While the parse_line() function has been made more robust now in
order to avoid this condition, let's fix the handling of this special
case anyway by just pointing to the beginning of the line if the supposed
error location is out of the line's buffer.
All details here:
https://oss-fuzz.com/testcase-detail/5202563081502720
No backport is needed unless the fix above is backported.
Willy Tarreau [Mon, 12 May 2025 14:01:27 +0000 (16:01 +0200)]
BUG/MINOR: tools: improve parse_line()'s robustness against empty args
The fix in 10e6d0bd57 ("BUG/MINOR: tools: only fill first empty arg when
not out of range") was not that good. It focused on protecting against
<arg> becoming out of range to detect we haven't emitted anything, but
it's not the right way to detect this. We're always maintaining arg_start
as a copy of outpos, and that later one is incremented when emitting a
char, so instead of testing args[arg] against out+arg_start, we should
instead check outpos against arg_start, thereby eliminating the <out>
offset and the need to access args[]. This way we now always know if
we've emitted an empty arg without dereferencing args[].
There's no need to backport this unless the fix above is also backported.
BUG/MINOR: threads: fix soft-stop without multithreading support
When thread support is disabled ("USE_THREAD=" or "USE_THREAD=0" when
building), soft-stop doesn't work as haproxy never ends after stopping
the proxies.
This used to work fine in the past but suddenly stopped working with ef422ced91 ("MEDIUM: thread: make stopping_threads per-group and add
stopping_tgroups") because the "break;" instruction under the stopping
condition is never executed when support for multithreading is disabled.
To fix the issue, let's add an "else" block to run the "break;"
instruction when USE_THREAD is not defined.
BUG/MINOR: ssl/ckch: always ha_freearray() the previous entry during parsing
The ckch_conf_parse() function is the generic function which parses
crt-store keywords from the crt-store section, and also from a
crt-list.
When having multiple time the same keyword, a leak of the previous
value happens. This patch ensure that the previous value is always
freed before overwriting it.
This is the same problem as the previous "BUG/MINOR: ssl/ckch: always
free() the previous entry during parsing" patch, however this one
applies on PARSE_TYPE_ARRAY_SUBSTR.
BUG/MINOR: ssl/ckch: always free() the previous entry during parsing
The ckch_conf_parse() function is the generic function which parses
crt-store keywords from the crt-store section, and also from a crt-list.
When having multiple time the same keyword, a leak of the previous value
happens. This patch ensure that the previous value is always freed
before overwriting it.
BUG/MINOR: ssl: prevent multiple 'crt' on the same ssl-f-use line
The 'ssl-f-use' implementation doesn't prevent to have multiple time the
'crt' keyword, which overwrite the previous value. Letting users think
that is it possible to use multiple certificates on the same line, which
is not the case.
This patch emits an alert when setting the 'crt' keyword multiple times
on the same ssl-f-use line.
BUG/MINOR: ssl: doesn't fill conf->crt with first arg
Commit c7f29afc ("MEDIUM: ssl: replace "crt" lines by "ssl-f-use"
lines") forgot to remove an the allocation of the crt field which was
done with the first argument.
Since ssl-f-use takes keywords, this would put the first keyword in
"crt" instead of the certificate name.
Willy Tarreau [Fri, 9 May 2025 13:23:10 +0000 (15:23 +0200)]
MEDIUM: sock-inet: re-check IPv6 connectivity every 30s
IPv6 connectivity might start off (e.g. network not fully up when
haproxy starts), so for features like resolvers, it would be nice to
periodically recheck.
With this change, instead of having the resolvers code rely on a variable
indicating connectivity, it will now call a function that will check for
how long a connectivity check hasn't been run, and will perform a new one
if needed. The age was set to 30s which seems reasonable considering that
the DNS will cache results anyway. There's no saving in spacing it more
since the syscall is very check (just a connect() without any packet being
emitted).
The variables remain exported so that we could present them in show info
or anywhere else.
This way, "dns-accept-family auto" will now stay up to date. Warning
though, it does perform some caching so even with a refreshed IPv6
connectivity, an older record may be returned anyway.
Willy Tarreau [Thu, 8 May 2025 13:28:18 +0000 (15:28 +0200)]
DEBUG: pools: add a new integrity mode "backup" to copy the released area
This way we can preserve the entire contents of the released area for
later inspection. This automatically enables comparison at reallocation
time as well (like "integrity" does). If used in combination with
integrity, the comparison is disabled but the check of non-corruption
of the area mangled by integrity is still operated.
Willy Tarreau [Thu, 8 May 2025 17:52:16 +0000 (19:52 +0200)]
DEBUG: pool: permit per-pool UAF configuration
The new MEM_F_UAF flag can be set just after a pool's creation to make
this pool UAF for debugging purposes. This allows to maintain a better
overall performance required to reproduce issues while still having a
chance to catch UAF. It will only be used by developers who will manually
add it to areas worth being inspected, though.
BUG/MEDIUM: mux-quic: fix crash on invalid fctl frame dereference
Emission of flow-control frames have been recently modified. Now, each
frame is sent one by one, via a single entry list. If a failure occurs,
emission is interrupted and frame is reinserted into the original
<qcc.lfctl.frms> list.
This code is incorrect as it only checks if qcc_send_frames() returns an
error code to perform the reinsert operation. However, an error here
does not always mean that the frame was not properly emitted by lower
quic-conn layer. As such, an extra test LIST_ISEMPTY() must be performed
prior to reinsert the frame.
This bug would cause a heap overflow. Indeed, the reinsert frame would
be a random value. A crash would occur as soon as it would be
dereferenced via <qcc.lfctl.frms> list.
This was reproduced by issuing a POST with a big file and interrupt it
after just a few seconds. This results in a crash in about a third of
the tests. Here is an example command using ngtcp2 :
Willy Tarreau [Fri, 9 May 2025 08:51:30 +0000 (10:51 +0200)]
[RELEASE] Released version 3.2-dev15
Released version 3.2-dev15 with the following main changes :
- BUG/MEDIUM: stktable: fix sc_*(<ctr>) BUG_ON() regression with ctx > 9
- BUG/MINOR: acme/cli: don't output error on success
- BUG/MINOR: tools: do not create an empty arg from trailing spaces
- MEDIUM: config: warn about the consequences of empty arguments on a config line
- MINOR: tools: make parse_line() provide hints about empty args
- MINOR: cfgparse: visually show the input line on empty args
- BUG/MINOR: tools: always terminate empty lines
- BUG/MINOR: tools: make parseline report the required space for the trailing 0
- DEBUG: threads: don't keep lock label "OTHER" in the per-thread history
- DEBUG: threads: merge successive idempotent lock operations in history
- DEBUG: threads: display held locks in threads dumps
- BUG/MINOR: proxy: only use proxy_inc_fe_cum_sess_ver_ctr() with frontends
- Revert "BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame"
- MINOR: acme/cli: 'acme status' show the status acme-configured certificates
- MEDIUM: acme/ssl: remove 'acme ps' in favor of 'acme status'
- DOC: configuration: add "acme" section to the keywords list
- DOC: configuration: add the "crt-store" keyword
- BUG/MAJOR: queue: lock around the call to pendconn_process_next_strm()
- MINOR: ssl: add filename and linenum for ssl-f-use errors
- BUG/MINOR: ssl: can't use crt-store some certificates in ssl-f-use
- BUG/MINOR: tools: only fill first empty arg when not out of range
- MINOR: debug: bump the dump buffer to 8kB
- MINOR: stick-tables: add "ipv4" as an alias for the "ip" type
- MINOR: quic: extend return value during TP parsing
- BUG/MINOR: quic: use proper error code on missing CID in TPs
- BUG/MINOR: quic: use proper error code on invalid server TP
- BUG/MINOR: quic: reject retry_source_cid TP on server side
- BUG/MINOR: quic: use proper error code on invalid received TP value
- BUG/MINOR: quic: fix TP reject on invalid max-ack-delay
- BUG/MINOR: quic: reject invalid max_udp_payload size
- BUG/MEDIUM: peers: hold the refcnt until updating ts->seen
- BUG/MEDIUM: stick-tables: close a tiny race in __stksess_kill()
- BUG/MINOR: cli: fix too many args detection for commands
- MINOR: server: ensure server postparse tasks are run for dynamic servers
- BUG/MEDIUM: stick-table: always remove update before adding a new one
- BUG/MEDIUM: quic: free stream_desc on all data acked
- BUG/MINOR: cfgparse: consider the special case of empty arg caused by \x00
- DOC: config: recommend disabling libc-based resolution with resolvers
Willy Tarreau [Fri, 9 May 2025 07:55:39 +0000 (09:55 +0200)]
BUG/MINOR: cfgparse: consider the special case of empty arg caused by \x00
The reporting of the empty arg location added with commit 08d3caf30
("MINOR: cfgparse: visually show the input line on empty args") falls
victim of a special case detected by OSS Fuzz:
In short, making an argument start with "\x00" doesn't make it empty for
the parser, but still emits an empty string which is detected and
displayed. Unfortunately in this case the error pointer is not set so
the sanitization function crashes.
What we're doing in this case is that we fall back to the position of
the output argument as an estimate of where it was located in the input.
It's clearly inexact (quoting etc) but will still help the user locate
the problem.
No backport is needed unless the commit above is backported.
BUG/MEDIUM: quic: free stream_desc on all data acked
The following patch simplifies qc_stream_desc_ack(). The qc_stream_desc
instance is not freed anymore, even if all data were acknowledged. As
implies by the commit message, the caller is responsible to perform this
cleaning operation. f4a83fbb14bdd14ed94752a2280a2f40c1b690d2
MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
However, despite the commit instruction, qc_stream_desc_free()
invokation was not moved in the caller. This commit fixes this by adding
it after stream ACK handling. This is performed only when a transfer is
completed : all data is acknowledged and qc_stream_desc has been
released by its MUX stream instance counterpart.
This bug may cause a significant increase in memory usage when dealing
with long running connection. However, there is no memory leak, as every
qc_stream_desc attached to a connection are finally freed when quic_conn
instance is released.
Willy Tarreau [Thu, 8 May 2025 21:21:25 +0000 (23:21 +0200)]
BUG/MEDIUM: stick-table: always remove update before adding a new one
Since commit 388539faa ("MEDIUM: stick-tables: defer adding updates to a
tasklet"), between the entry creation and its arrival in the updates tree,
there is time for scheduling, and it now becomes possible for an stksess
entry to be requeued into the list while it's still in the tree as a remote
one. Only local updates were removed prior to being inserted. In this case
we would re-insert the entry, causing it to appear as the parent of two
distinct nodes or leaves, and to be visited from the first leaf during a
delete() after having already been removed and freed, causing a crash,
as Christian reported in issue #2959.
There's no reason to backport this as this appeared with the commit above
in 3.2-dev13.
MINOR: server: ensure server postparse tasks are run for dynamic servers
commit 29b76cae4 ("BUG/MEDIUM: server/log: "mode log" after server
keyword causes crash") introduced some postparsing checks/tasks for
server
Initially they were mainly meant for "mode log" servers postparsing, but
we already have a check dedicated to "tcp/http" servers (ie: only tcp
proto supported)
However when dynamic servers are added they bypass _srv_postparse() since
the REGISTER_POST_SERVER_CHECK() is only executed for servers defined in
the configuration.
To ensure consistency between dynamic and static servers, and ensure no
post-check init routine is missed, let's manually invoke _srv_postparse()
after creating a dynamic server added via the cli.
BUG/MINOR: cli: fix too many args detection for commands
d3f928944 ("BUG/MINOR: cli: Issue an error when too many args are passed
for a command") added a new check to prevent the command to run when
too many arguments are provided. In this case an error is reported.
However it turns out this check (despite marked for backports) was
ineffective prior to 20ec1de21 ("MAJOR: cli: Refacor parsing and
execution of pipelined commands") as 'p' pointer was reset to the end of
the buffer before the check was executed.
Now since 20ec1de21, the check works, but we have another issue: we may
read past initialized bytes in the buffer because 'p' pointer is always
incremented in a while loop without checking if we increment it past 'end'
(This was detected using valgrind)
To fix the issue introduced by 20ec1de21, let's only increment 'p' pointer
if p < end.
For 3.2 this is it, now for older versions, since d3f928944 was marked for
backport, a sligthly different approach is needed:
- conditional p increment must be done in the loop (as in this patch)
- max arg check must moved above "fill unused slots" comment where p is
assigned to the end of the buffer
Willy Tarreau [Wed, 7 May 2025 16:43:57 +0000 (18:43 +0200)]
BUG/MEDIUM: stick-tables: close a tiny race in __stksess_kill()
It might be possible not to see the element in the tree, then not to
see it in the update list, thus not to take the lock before deleting.
But an element in the list could have moved to the tree during the
check, and be removed later without the updt_lock.
Let's delete prior to checking the presence in the tree to avoid
this situation. No backport is needed since this arrived in -dev13
with the update list.
Willy Tarreau [Tue, 6 May 2025 09:32:34 +0000 (11:32 +0200)]
BUG/MEDIUM: peers: hold the refcnt until updating ts->seen
In peer_treat_updatemsg(), we call stktable_touch_remote() after
releasing the write lock on the TS, asking it to decrement the
refcnt, then we update ts->seen. Unfortunately this is racy and
causes the issue that Christian reported in issue #2959.
The sequence of events is very hard to trigger manually, but what happens
is the following:
T1. stktable_touch_remote(table, ts, 1);
-> at this point the entry is in the mt_list, and the refcnt is zero.
T2. stktable_trash_oldest() or process_table_expire()
-> these can run, because the refcnt is now zero.
The entry is cleanly deleted and freed.
T1. HA_ATOMIC_STORE(&ts->seen, 1)
-> we dereference freed memory.
A first attempt at a fix was made by keeping the refcnt held during
all the time the entry is in the mt_list, but this is expensive as
such entries cannot be purged, causing lots of skips during
trash_oldest_data(). This managed to trigger watchdogs, and was only
hiding the real cause of the problem.
The correct approach clearly is to maintain the ref_cnt until we
touch ->seen. That's what this patch does. It does not decrement
the refcnt, while calling stktable_touch_remote(), and does it
manually after touching ->seen. With this the problem is gone.
Note that a reproducer involves the following:
- a config with 10 stick-ctr tracking the same table with a
random key between 10M and 100M depending on the machine.
- the expiration should be between 10 and 20s. http_req_cnt
is stored and shared with the peers.
- 4 total processes with such a config on the local machine,
each corresponding to a different peer. 3 of the peers are
bound to half of the cores (all threads) and share the same
threads; the last process is bound to the other half with
its own threads.
- injecting at full load, ~256 conn, on the shared listening
port. After ~2x expiration time to 1 minute the lone process
should segfault in pools code due to a corrupted by_lru list.
This problem already exists in earlier versions but the race looks
narrower. Given how difficult it is to trigger on a given machine
in its current form, it's likely that it only happens once in a
while on stable branches. The fix must be backported wherever the
code is similar, and there's no hope to reproduce it to validate
the backport.
Add a checks on received max_udp_payload transport parameters. As
defined per RFC 9000, values below 1200 are invalid, and thus the
connection must be closed with TRANSPORT_PARAMETER_ERROR code.
Prior to this patch, an invalid value was silently ignored.
This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
BUG/MINOR: quic: fix TP reject on invalid max-ack-delay
Checks are implemented on some received transport parameter values,
to reject invalid ones defined per RFC 9000. This is the case for
max_ack_delay parameter.
The check was not properly implemented as it only reject values strictly
greater than the limit set to 2^14. Fix this by rejecting values of 2^14
and above. Also, the proper error code TRANSPORT_PARAMETER_ERROR is now
set.
This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
BUG/MINOR: quic: use proper error code on invalid received TP value
As per RFC 9000, checks must be implemented to reject invalid values for
received transport parameters. Such values are dependent on the
parameter type.
Checks were already implemented for ack_delay_exponent and
active_connection_id_limit, accordingly with the QUIC specification.
However, the connection was closed with an incorrect error code. Fix
this to ensure that TRANSPORT_PARAMETER_ERROR code is used as expected.
This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
BUG/MINOR: quic: reject retry_source_cid TP on server side
Close the connection on error if retry_source_connection_id transport
parameter is received. This is specified by RFC 9000 as this parameter
must not be emitted by a client. Previously, it was silently ignored.
This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
BUG/MINOR: quic: use proper error code on invalid server TP
This commit is similar to the previous one. It fixes the error code
reported when dealing with invalid received transport parameters. This
time, it handles reception of original_destination_connection_id,
preferred_address and stateless_reset_token which must not be emitted by
the client.
This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
BUG/MINOR: quic: use proper error code on missing CID in TPs
Handle missing received transport parameter value
initial_source_connection_id / original_destination_connection_id.
Previously, such case would result in an error reported via
quic_transport_params_store(), which triggers a TLS alert converted as
expected as a CONNECTION_CLOSE. The issue is that the error code
reported in the frame was incorrect.
Fix this by returning QUIC_TP_DEC_ERR_INVAL for such conditions. This is
directly handled via quic_transport_params_store() which set the proper
TRANSPORT_PARAMETER_ERROR code for the CONNECTION_CLOSE. However, no
error is reported so the SSL handshake is properly terminated without a
TLS alert. This is enough to ensure that the CONNECTION_CLOSE frame will
be emitted as expected.
This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
MINOR: quic: extend return value during TP parsing
Extend API used for QUIC transport parameter decoding. This is done via
the introduction of a dedicated enum to report the various error
condition detected. No functional change should occur with this patch,
as the only returned code is QUIC_TP_DEC_ERR_TRUNC, which results in the
connection closure via a TLS alert.
This patch will be necessary to properly reject transport parameters
with the proper CONNECTION_CLOSE error code. As such, it should be
backported up to 2.6 with the following series.
Willy Tarreau [Tue, 6 May 2025 08:54:48 +0000 (10:54 +0200)]
MINOR: stick-tables: add "ipv4" as an alias for the "ip" type
However the doc purposely says the opposite, to encourage migrating away
from "ip". The goal is that in the future we change "ip" to mean "ipv6",
which seems to be what most users naturally expect. But we cannot break
configurations in the LTS version so for now "ipv4" is the alias.
The reason for not changing it in the table is that the type name is
used at a few places (look for "].kw"):
- dumps
- promex
We'd rather not change that output for 3.2, but only do it in 3.3.
This way, 3.2 can be made future-proof by using "ipv4" in the config
without any other side effect.
Please see github issue #2962 for updates on this transition.
Willy Tarreau [Wed, 7 May 2025 08:00:08 +0000 (10:00 +0200)]
MINOR: debug: bump the dump buffer to 8kB
Now with the improved backtraces, the lock history and details in the
mux layers, some dumps appear truncated or with some chars alone at
the beginning of the line. The issue is in fact caused by the limited
dump buffer size (2kB for stderr, 4kB for warning), that cannot hold
a complete trace anymore.
Let's jump bump them to 8kB, this will be plenty for a long time.
Willy Tarreau [Wed, 7 May 2025 05:22:24 +0000 (07:22 +0200)]
BUG/MINOR: tools: only fill first empty arg when not out of range
In commit 3f2c8af313 ("MINOR: tools: make parse_line() provide hints
about empty args") we've added the ability to record the position of
the first empty arg in parse_line(), but that check requires to
access the args[] array for the current arg, which is not valid in
case we stopped on too large an argument count. Let's just check the
arg's validity before doing so.
This was reported by OSS Fuzz:
https://issues.oss-fuzz.com/issues/415850462
No backport is needed since this was in the latest dev branch.
BUG/MINOR: ssl: can't use crt-store some certificates in ssl-f-use
When declaring a certificate via the crt-store section, this certificate
can then be used 2 ways in a crt-list:
- only by using its name, without any crt-store options
- or by using the exact set of crt-list option that was defined in the
crt-store
Since ssl-f-use is generating a crt-list, this is suppose to behave the
same. To achieve this, ckch_conf_parse() will parse the keywords related
to the ckch_conf on the ssl-f-use line and use ckch_conf_cmp() to
compare it to the previous declaration from the crt-store. This
comparaison is only done when any ckch_conf keyword are present.
However, ckch_conf_parse() was done for the crt-list, and the crt-list
does not use the "crt" parameter to declare the name of the certificate,
since it's the first element of the line. So when used with ssl-f-use,
ckch_conf_parse() will always see a "crt" keyword which is a ckch_conf
one, and consider that it will always need to have the exact same set of
paremeters when using the same crt in a crt-store and an ssl-f-use line.
So a simple configuration like this:
crt-store web
load crt "foo.com.crt" key "foo.com.key" alias "foo"
config : '@web/foo' in crt-list '(null)' line 0, is already defined with incompatible parameters:
- different parameter 'key' : previously 'foo.com.key' vs '(null)'
In order to fix the issue, this patch parses the "crt" parameter itself
for ssl-f-use instead of using ckch_conf_parse(), so the keyword would
never be considered as a ckch_conf keyword to compare.
This patch also take care of setting the CKCH_CONF_SET_CRTLIST flag only
if a ckch_conf keyword was found. This flag is used by ckch_conf_cmp()
to know if it has to compare or not.
Willy Tarreau [Tue, 6 May 2025 16:55:04 +0000 (18:55 +0200)]
BUG/MAJOR: queue: lock around the call to pendconn_process_next_strm()
The extra call to pendconn_process_next_strm() made in commit cda7275ef5
("MEDIUM: queue: Handle the race condition between queue and dequeue
differently") was performed after releasing the server queue's lock,
which is incompatible with the calling convention for this function.
The result is random corruption of the server's streams list likely
due to picking old or incorrect pendconns from the queue, and in the
end infinitely looping on apparently already locked mt_list objects.
Just adding the lock fixes the problem.
It's very difficult to reproduce, it requires low maxconn values on
servers, stickiness on the servers (cookie), a long enough slowstart
(e.g. 10s), and regularly flipping servers up/down to re-trigger the
slowstart.
Add the "crt-store" keyword with its argument in the "3.12" section, so
this could be detected by haproxy-dconv has a keyword and put in the
keywords list.
MINOR: acme/cli: 'acme status' show the status acme-configured certificates
The "acme status" command, shows the status of every certificates
configured with ACME, not only the running task like "acme ps".
The IO handler loops on the ckch_store tree and outputs a line for each
ckch_store which has an acme section set. This is still done under the
ckch_store lock and doesn't support resuming when the buffer is full,
but we need to change that in the future.
This patch introduced a regression leading to a loop on the frames
demultiplexing because a frame may be ignore but not consumed.
But outside this regression that can be fixed, there is a design issue that
was not totally fixed by the patch above. The SPOP connection state is mixed
with the status of the frames demultiplexer and this needlessly complexify
the connection management. Instead of fixing the fix, a better solution is
to revert it to work a a proper solution.
For the record, the idea is to deal with the spop connection state onlu
using 'state' field and to introduce a new field to handle the frames
demultiplexer state. This should ease the closing state management.
Another issue that must be fixed. We must take care to not abort a SPOP
stream when an error is detected on a SPOP connection or when the connection
is closed, if the ACK frame was already received for this stream. It is not
a common case, but it can be solved by saving the last known stream ID that
recieved a ACK.
This patch must be backported if the commit above is backported.
BUG/MINOR: proxy: only use proxy_inc_fe_cum_sess_ver_ctr() with frontends
proxy_inc_fe_cum_sess_ver_ctr() was implemented in 9969adbc
("MINOR: stats: add by HTTP version cumulated number of sessions and
requests")
As its name suggests, it is meant to be called for frontends, not backends
Also, in 9969adbc, when used under h1_init(), a precaution is taken to
ensure that the function is only called with frontends.
However, this precaution was not applied in h2_init() and qc_init().
Due to this, it remains possible to have proxy_inc_fe_cum_sess_ver_ctr()
being called with a backend proxy as parameter. While it did not cause
known issues so far, it is not expected and could result in bugs in the
future. Better fix this by ensuring the function is only called with
frontends.
Willy Tarreau [Tue, 6 May 2025 03:11:56 +0000 (05:11 +0200)]
DEBUG: threads: display held locks in threads dumps
Based on the lock history, we can spot some locks that are still held
by checking the last operation that happened on them: if it's not an
unlock, then we know the lock is held. In this case we append the list
after "locked:" with their label and state like below:
Willy Tarreau [Mon, 5 May 2025 16:24:40 +0000 (18:24 +0200)]
DEBUG: threads: merge successive idempotent lock operations in history
In order to make the lock history a bit more useful, let's try to merge
adjacent lock/unlock sequences that don't change anything for other
threads. For this we can replace the last unlock with the new operation
on the same label, and even just not store it if it was the same as the
one before the unlock, since in the end it's the same as if the unlock
had not been done.
Now loops that used to be filled with "R:LISTENER U:LISTENER" show more
useful info such as:
It's worth noting that it can sometimes induce confusion when recursive
locks of the same label are used (a few exist on peers or stick-tables),
as in such a case the two operations would be needed. However these ones
are already undebuggable, so instead they will just have to be renamed
to make sure they use a distinct label.
Willy Tarreau [Mon, 5 May 2025 16:10:57 +0000 (18:10 +0200)]
DEBUG: threads: don't keep lock label "OTHER" in the per-thread history
Most threads are filled with "R:OTHER U:OTHER" in their history. Since
anything non-important can use other it's not observable but it pollutes
the history. Let's just drop OTHER entirely during the recording.
Willy Tarreau [Mon, 5 May 2025 15:58:04 +0000 (17:58 +0200)]
BUG/MINOR: tools: make parseline report the required space for the trailing 0
The fix in commit 09a325a4de ("BUG/MINOR: tools: always terminate empty
lines") is insufficient. While it properly addresses the lack of trailing
zero, it doesn't account for it in the returned outlen that is used to
allocate a larger line. This happens at boot if the very first line of
the test file is exactly a sharp with nothing else. In this case it will
return a length 0 and the caller (parse_cfg()) will try to re-allocate an
entry of size zero and will fail, bailing out a lack of memory. This time
it should really be OK.
It doesn't need to be backported, unless the patch above would be.
Willy Tarreau [Mon, 5 May 2025 15:33:22 +0000 (17:33 +0200)]
BUG/MINOR: tools: always terminate empty lines
Since latest commit 7e4a2f39ef ("BUG/MINOR: tools: do not create an empty
arg from trailing spaces"), an empty line will no longer produce an arg
and no longer append a trailing zero to them. This was not visible because
one is already present in the input string, however all the trailing args
are set to out+outpos-1, which now points one char before the buffer since
nothing was emitted, and was noticed by ASAN, and/or when parsing garbage.
Let's make sure to always emit the zero for empty lines as well to address
this issue. No backport is needed unless the patch above gets backported.
Willy Tarreau [Mon, 5 May 2025 14:13:33 +0000 (16:13 +0200)]
MINOR: cfgparse: visually show the input line on empty args
Now when an empty arg is found on a line, we emit the sanitized
input line and the position of the first empty arg so as to help
the user figure the cause (likely an empty environment variable).
Willy Tarreau [Mon, 5 May 2025 14:11:42 +0000 (16:11 +0200)]
MINOR: tools: make parse_line() provide hints about empty args
In order to help parse_line() callers report the position of empty
args to the user, let's decide that if no error is emitted, then
we'll stuff the errptr with the position of the first empty arg
without affecting the return value.
Willy Tarreau [Fri, 2 May 2025 13:47:41 +0000 (15:47 +0200)]
MEDIUM: config: warn about the consequences of empty arguments on a config line
For historical reasons, the config parser relies on the trailing '\0'
to detect the end of the line being parsed. When the lines started to be
tokenized into arguments, this principle has been preserved, and now all
the parsers rely on *args[arg]='\0' to detect the end of a line. But as
reported in issue #2944, while most of the time it breaks the parsing
like below:
http-request deny if { path_dir '' }
it can also cause some elements to be silently ignored like below:
acl bad_path path_sub '%2E' '' '%2F'
This may also subtly happen with environment variables that don't exist
or which are empty:
acl bad_path path_sub '%2E' "$BAD_PATTERN" '%2F'
Fortunately, parse_line() returns the number of arguments found, so it's
easy from the callers to verify if any was empty. The goal of this commit
is not to perform sensitive changes, it's only to mention when parsing a
line that an empty argument was found and alert about its consequences
using a warning. Most of the time when this happens, the config does not
parse. But for examples as the ACLs above, there could be consequences
that are better detected early.
This patch depends on this previous fix:
BUG/MINOR: tools: do not create an empty arg from trailing spaces
Willy Tarreau [Fri, 2 May 2025 13:46:18 +0000 (15:46 +0200)]
BUG/MINOR: tools: do not create an empty arg from trailing spaces
Trailing spaces on the lines of the config file create an empty arg
which makes it complicated to detect really empty args. Let's first
address this. Note that it is not user-visible but prevents from
fixing user-visible issues. No backport is needed.
The initial issue was introduced with this fix that already tried to
address it:
The current patch properly addresses leading and trailing spaces by
only counting arguments if non-lws chars were found on the line. LWS
do not cause a transition to a new arg anymore but they complete the
current one. The whole new code relies on a state machine to detect
when to create an arg (!in_arg->in_arg), and when to close the current
arg. A special care was taken for word expansion in the form of
"${ARGS[*]}" which still continue to emit individual arguments past
the first LWS. This example works fine:
ARGS="100 check inter 1000"
server name 192.168.1."${ARGS[*]}"
BUG/MINOR: acme/cli: don't output error on success
Previous patch 7251c13c7 ("MINOR: acme: move the acme task init in a dedicated
function") mistakenly returned the wrong error code when "acme renew" parsing
was successful, and tried to emit an error message.
This patch fixes the issue by returning 0 when the acme task was correctly
scheduled to start.
BUG/MEDIUM: stktable: fix sc_*(<ctr>) BUG_ON() regression with ctx > 9
As reported in GH #2958, commit 6c9b315 caused a regression with sc_*
fetches and tracked counter id > 9.
As such, the below configuration would cause a BUG_ON() to be triggered:
global
log stdout format raw local0
tune.stick-counters 11
defaults
log global
mode http
frontend www
bind *:8080
acl track_me bool(true)
http-request set-var(txn.track_var) str("a")
http-request track-sc10 var(txn.track_var) table rate_table if track_me
http-request set-var(txn.track_var_rate) sc_gpc_rate(0,10,rate_table)
http-request return status 200
backend rate_table
stick-table type string size 1k expire 5m store gpc_rate(1,1m)
While in 6c9b315 the src_fetch logic was removed from
smp_fetch_sc_stkctr(), num > 9 is indeed not expected anymore as
original num value. But what we didn't consider is that num is effectively
re-assigned for generic sc_* variant.
Thus the BUG_ON() is misplaced as it should only be evaluated for
non-generic fetches. It explains why it triggers with valid configurations
Thanks to GH user @tkjaer for his detailed report and bug analysis
Willy Tarreau [Fri, 2 May 2025 14:23:28 +0000 (16:23 +0200)]
[RELEASE] Released version 3.2-dev14
Released version 3.2-dev14 with the following main changes :
- MINOR: acme: retry label always do a request
- MINOR: acme: does not leave task for next request
- BUG/MINOR: acme: reinit the retries only at next request
- MINOR: acme: change the default max retries to 5
- MINOR: acme: allow a delay after a valid response
- MINOR: acme: wait 5s before checking the challenges results
- MINOR: acme: emit a log when starting
- MINOR: acme: delay of 5s after the finalize
- BUG/MEDIUM: quic: Let it be known if the tasklet has been released.
- BUG/MAJOR: tasks: fix task accounting when killed
- CLEANUP: tasks: use the local state, not t->state, to check for tasklets
- DOC: acme: external account binding is not supported
- MINOR: hlua: ignore "tune.lua.bool-sample-conversion" if set after "lua-load"
- MEDIUM: peers: Give up if we fail to take locks in hot path
- MEDIUM: stick-tables: defer adding updates to a tasklet
- MEDIUM: stick-tables: Limit the number of old entries we remove
- MEDIUM: stick-tables: Limit the number of entries we expire
- MINOR: cfgparse-global: add explicit error messages in cfg_parse_global_env_opts
- MINOR: ssl: add function to extract X509 notBefore date in time_t
- BUILD: acme: need HAVE_ASN1_TIME_TO_TM
- MINOR: acme: move the acme task init in a dedicated function
- MEDIUM: acme: add a basic scheduler
- MINOR: acme: emit a log when the scheduler can't start the task
This patch implements a very basic scheduler for the ACME tasks.
The scheduler is a task which is started from the postparser function
when at least one acme section was configured.
The scheduler will loop over the certificates in the ckchs_tree, and for
each certificate will start an ACME task if the notAfter date is past
curtime + (notAfter - notBefore) / 12, or 7 days if notBefore is not
available.
Once the lookup over all certificates is terminated, the task will sleep
and will wakeup after 12 hours.
MINOR: cfgparse-global: add explicit error messages in cfg_parse_global_env_opts
When env variable name or value are not provided for setenv/presetenv it's not
clear from the old error message shown at stderr, what exactly is missed. User
needs to search in it's configuration.
Let's add more explicit error messages about these inconsistencies.
MEDIUM: stick-tables: Limit the number of entries we expire
In process_table_expire(), limit the number of entries we remove in one
call, and just reschedule the task if there's more to do. Removing
entries require to use the heavily contended update write lock, and we
don't want to hold it for too long.
This helps getting stick tables perform better under heavy load.
MEDIUM: stick-tables: Limit the number of old entries we remove
Limit the number of old entries we remove in one call of
stktable_trash_oldest(), as we do so while holding the heavily contended
update write lock, so we'd rather not hold it for too long.
This helps getting stick tables perform better under heavy load.
MEDIUM: stick-tables: defer adding updates to a tasklet
There is a lot of contention trying to add updates to the tree. So
instead of trying to add the updates to the tree right away, just add
them to a mt-list (with one mt-list per thread group, so that the
mt-list does not become the new point of contention that much), and
create a tasklet dedicated to adding updates to the tree, in batchs, to
avoid keeping the update lock for too long.
This helps getting stick tables perform better under heavy load.
MEDIUM: peers: Give up if we fail to take locks in hot path
In peer_send_msgs(), give up in order to retry later if we failed at
getting the update read lock.
Similarly, in __process_running_peer_sync(), give up and just reschedule
the task if we failed to get the peer lock. There is an heavy contention
on both those locks, so we could spend a lot of time trying to get them.
This helps getting peers perform better under heavy load.
MINOR: hlua: ignore "tune.lua.bool-sample-conversion" if set after "lua-load"
tune.lua.bool-sample-conversion must be set before any lua-load or
lua-load-per-thread is used for it to be considered. Indeed, lua-load
directives are parsed on the fly and will cause some parts of the scripts
to be executed during init already (script body/init contexts).
As such, we cannot afford to have "tune.lua.bool-sample-conversion" set
after some Lua code was loaded, because it would mean that the setting
would be handled differently for Lua's code executed during or after
config parsing.
To avoid ambiguities, the documentation now states that the setting must
be set before any lua-load(-per-thread) directive, and if the setting
is met after some Lua was already loaded, the directive is ignored and
a warning informs about that.
It should fix GH #2957
It may be backported with 29b6d8af16 ("MINOR: hlua: rename
"tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"")
Willy Tarreau [Fri, 2 May 2025 08:55:43 +0000 (10:55 +0200)]
CLEANUP: tasks: use the local state, not t->state, to check for tasklets
There's no point reading t->state to check for a tasklet after we've
atomically read the state into the local "state" variable. Not only it's
more expensive, it's also less clear whether that state is supposed to
be atomic or not. And in any case, tasks and tasklets have their type
forever and the one reflected in state is correct and stable.
Willy Tarreau [Fri, 2 May 2025 08:34:16 +0000 (10:34 +0200)]
BUG/MAJOR: tasks: fix task accounting when killed
After recent commit b81c9390f ("MEDIUM: tasks: Mutualize the TASK_KILLED
code between tasks and tasklets"), the task accounting was no longer
correct for killed tasks due to the decrement of tasks in list that was
no longer done, resulting in infinite loops in process_runnable_tasks().
This just illustrates that this code remains complex and should be further
cleaned up. No backport is needed, as this was in 3.2.