BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume()
hlua_ctx_resume() itself can safely be used as-is in a multithreading
context because it takes care of taking the lua lock.
However, when hlua_ctx_resume() returns, the lock is released and it is
thus the caller's responsibility to ensure it owns the lock prior to
performing additional manipulations on the Lua stack. Unfortunately, since
early haproxy lua implementation, we used to do it wrong:
The most common hlua_ctx_resume() pattern we can find in the code (because
it was duplicated over and over over time) is the following:
|ret = hlua_ctx_resume()
|switch (ret) {
| case HLUA_E_OK:
| break;
| case HLUA_E_ERRMSG:
| break;
| [...]
|}
Problem is: for some of the switch cases, we still perform lua stack
manipulations. This is the case for the HLUA_E_ERRMSG for instance where
we often use lua_tostring() to retrieve last lua error message on the top
of the stack, or sometimes for the HLUA_E_OK case, when we need to perform
some lua cleanup logic once the resume ended. But all of this is done
WITHOUT the lua lock, so this means that the main lua stack could be
accessed simultaneously by concurrent threads when a script was loaded
using 'lua-load'.
While it is not critical for switch-cases dedicated to error handling,
(those are not supposed to happen very often), it can be very problematic
for stack manipulations occuring in the HLUA_E_OK case under heavy load
for instance. In this case, main lua stack corruptions will eventually
happen. This is especially true inside hlua_filter_new(), where this bug
was known to cause lua stack corruptions under load, leading to lua errors
and even crashing the process as reported by @bgrooot in GH #2467.
The fix is relatively simple, once hlua_ctx_resume() returns: we should
consider that ANY lua stack access should be lua-lock protected. If the
related lua calls may raise lua errors, then (RE)SET_SAFE_LJMP
combination should be used as usual (it allows to lock the lua stack and
catch lua exceptions at the same time), else hlua_{lock,unlock} may be
used if no exceptions are expected.
This patch should fix GH #2467.
It should be backported to all stable versions.
[ada: some ctx adj will be required for older versions as event_hdl
doesn't exist prior to 2.8 and filters were implemented in 2.5, thus
some chunks won't apply]
BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP()
When we want to perform some unsafe lua stack manipulations from an
unprotected lua environment, we use SET_SAFE_LJMP() RESET_SAFE_LJMP()
combination to lock lua stack and catch potential lua exceptions that
may occur between the two.
Hence, we regularly find this pattern (duplicated over and over):
This is wrong because when SET_SAFE_LJMP() returns false (meaning that an
exception was caught), then the lua lock was released already, thus the
caller is not expected to perform lua stack manipulations (because the
main lua stack may be shared between multiple threads). In the pattern
above we only want to retrieve the lua exception message which may be
found at the top of the stack, to do so we now explicitly take the lua
lock before accessing the lua stack. Note that hlua_lock() doesn't catch
lua exceptions so only safe lua functions are expected to be used there
(lua functions that may NOT raise exceptions).
It should be backported to every stable versions.
[ada: some ctx adj will be required for older versions as event_hdl
doesn't exist prior to 2.8 and filters were implemented in 2.5, thus
some chunks won't apply, but other fixes should stay relevant]
BUG/MINOR: hlua: improper lock usage in hlua_filter_new()
In hlua_filter_new(), after each hlua resume, we systematically try to
empty the stack by calling lua_settop(). However we're doing this without
locking the lua context, so it is unsafe in multithreading context if the
script is loaded using 'lua-load'. To fix the issue, we protect the call
with hlua_{lock,unlock}() helpers.
BUG/MINOR: hlua: improper lock usage in hlua_filter_callback()
In hlua_filter_callback(), some lua stack work is performed under
SET_SAFE_LJMP() guard which also takes care of locking the hlua context
when needed. However, a lua_gettop() call is performed out of the guard,
thus it is unsafe in multithreading context if the script is loaded using
'lua-load' because in this case the main lua stack is shared between
threads and each access to a lua stack must be performed under the lock,
thus we move lua_gettop() call under the lock.
BUG/MINOR: hlua: fix possible crash in hlua_filter_new() under load
hlua_filter_new() handles memory allocation errors by jumping to the
"end:" cleanup label in case of errors. Such errors may happen when the
system is heavily loaded for instance.
In hlua_filter_new(), we try to allocate two hlua contexts in a row before
checking if one of them failed (in which case we jump to the cleanup part
of the function), and only then we initialize them both.
If a memory allocation failure happens for only one out of the two
flt_ctx->hlua[] contexts pair, we still jump to the cleanup part.
It means that the hlua context that was successfully allocated and wasn't
initialized yet will be passed to hlua_ctx_destroy(), resulting in invalid
reads in the cleanup function, which may ultimately cause the process to
crash.
To fix the issue: we make sure flt_ctx hlua contexts are initialized right
after they are allocated, that is before any error handling condition that
may force the cleanup.
This bug was discovered when trying to reproduce GH #2467 with haproxy
started with "-dMfail" argument.
BUG/MINOR: hlua: don't use lua_tostring() from unprotected contexts
As per lua documentation, lua_tostring() may raise a memory error.
However, we're often using it to fetch the error message at the top of
the stack (ie: after a failing lua call) from unprotected environments.
In practise, lua_tostring() has rare chances of failing, but still, if
it happens to be the case, it could crash the process and we better not
risk it.
So here, we add hlua_tostring_safe() function, which works exactly as
lua_tostring(), but the function cannot LJMP as it will catch
lua_tostring() exceptions to return NULL instead.
Everywhere lua_tostring() was used to retrieve error string from such
unprotected contexts, we now rely on hlua_tostring_safe().
This should be backported to all stable versions.
[ada: ctx adj will be required, for versions prior to 2.8 event_hdl
API didn't exist so some chunks won't apply, and prior to 2.5 filters
API didn't exist either, so again, some chunks should be ignored]
BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack
Lua documentation says that lua_tostring() returns a pointer that remains
valid as long as the object is not removed from the stack.
However there are some places were we use the returned string AFTER the
corresponding object is removed from the stack. In practise this doesn't
seem to cause visible bugs (probably because the pointer remains valid
waiting for a GC cycle), but let's fix that to comply with the
documentation and avoid undefined behavior.
Willy Tarreau [Fri, 1 Mar 2024 15:17:47 +0000 (16:17 +0100)]
BUG/MINOR: tools: seed the statistical PRNG slightly better
Thomas Baroux reported a very interesting issue. "balance random" would
systematically assign the same server first upon restart. That comes from
its use of statistical_prng() which is only seeded with the thread number,
and since at low loads threads are assigned to incoming connections in
round robin order, practically speaking, the same thread always gets the
same request and will produce the same random number.
We already have a much better RNG that's also way more expensive, but we
can use it at boot time to seed the PRNG instead of using the thread ID
only.
Add core.silent (-1) value to be able to disable logging via
TXN:set_loglevel() call. Otherwise, there is no way to do so and it may be
handy. This special value cannot be used with TXN:log() function.
BUG/MINOR: hlua: Fix log level to the right value when set via TXN:set_loglevel
When the log level is changed in lua, by calling TXN:set_loglevel function,
it must be incremented by one because it is decremented in strm_log()
function.
This patch must be backport to all stable versions.
BUG/MINOR: config/quic: Alert about PROXY protocol use on a QUIC listener
PROXY procotol is not supported on QUIC for now. Thus return an error during
configuration parsing if 'accept-proxy' option is used for a QUIC listener.
This patch should fix the issue #2186. It should be backport as far as 2.6.
Ciphersuites can be used with any TLS/SSL protocol version and are not
specific to TLSv1.3. However you can only specify the TLSv1.3 ciphers in
ciphersuite format.
CLEANUP: mux-h2: Fix h2s_make_data() comment about the return value
2 return values are specified in the h2s_make_data() function comment. Both
are more or less equivalent but the later is probably more accurate. So,
keep the right one and remove the other one.
Amaury Denoyelle [Fri, 23 Feb 2024 16:32:14 +0000 (17:32 +0100)]
MINOR: quic: add MUX output for show quic
Extend "show quic" to be able to dump MUX related information. This is
done via the new function qcc_show_quic(). This replaces the old streams
dumping list which was incomplete.
These info are displayed on full output or by specifying "mux" field.
Amaury Denoyelle [Mon, 26 Feb 2024 08:57:05 +0000 (09:57 +0100)]
MINOR: quic: specify show quic output fields
Add the possibility to customize show quic full output with only a
specific set of printed fields. This is specified as a comma-separated
list. Here are the currently supported values :
* tp: transport parameters
* sock: connection addresses and socket FD
* pktns: packet number space with ack ranges and in flight bytes
* cc: congestion controler and loss information
Note that streams output is not filtered by this mechanism. It's because
it will be replaced soon by an output generated from the MUX which will
use its owned field name.
Amaury Denoyelle [Mon, 26 Feb 2024 09:56:30 +0000 (10:56 +0100)]
MINOR: quic: filter show quic by address
Add the possibilty to restrict show quic output to only a single
connection. This is done by specifying a quic_conn address pointer.
Default format selection has evolved with it. Indeed, it seems more
fitting to use full format by default when filtering on a connection.
However, it's still possible to revert to the original oneline format
with it by specifying it explicitely.
MEDIUM: htx/http-ana: No longer close connection on early HAProxy response
When a response was returned by HAProxy, a dedicated HTX flag was
set. Thanks to this flag, it was possible to add a "connection: close"
header to the response if the request was not fully received and to close
the connection. In the same way, when a redirect rule was applied,
keep-alive was forcefully disabled for unfinished requests.
All these mechanisms are now useless because the H1 mux is able to drain the
response. So HTX_FL_PROXY_RESP flag is removed and no special processing is
performed on HAProxy response when the request is unfinished.
MAJOR: mux-h1: Drain requests on client side before shut a stream down
unlike for H2 and H3, there is no mechanism in H1 to notify the client it
must stop to upload data when a response is replied before the end of the
request without closing the connection. There is no RST_STREAM frame
equivalent.
Thus, there is only two ways to deal with this situation: closing the
connection or draining the request. Until now, HAProxy didn't support
draining H1 messages. Closing the connection in this case has however a
major drawback. It leads to send a TCP reset, dropping this way all in-fly
data. There is no warranty the client has fully received the response.
Draining H1 messages was never implemented because in old versions it was a
bit tricky to implement. However, it is now far simplier to support this
feature because it is possible to have a H1 stream without any applicative
stream. It is the purpose of this patch. Now, when a shutdown is requested
and the stream is detached from the connection, if the request is unfinished
while the response was fully sent, the request in drained.
To do so, in this case the shutdown and the detach are delayed. From the
upper layer point of view, there is no changes. The endpoint is shut down
and detached as usual. But on H1 mux point of view, the H1 stream is still
alive and is being able to drain data. However the stream-endpoint
descriptor is orphan. Once the request is fully received (and drained), the
connection is shut down if it cannot be reused for a new transaction and the
H1 stream is destroyed.
MINOR: mux-h1: Move all stuff to detach a stream in an internal function
All code from h1_detach() function was moved in a internal function,
h1s_finish_detach(). It will be used to defer the detach and be able to
drain the requests payload.
MINOR: mux-h1: Move checks performed before a shutdown in a dedicated function
Checks performed in h1_shutw() to determine if the connection must be
shutdown now or not was move in a dedicated function. This will be used to
be able to drain the requests payload.
BUG/MINOR: mux-h1: Properly report when mux is blocked during a nego
During a zero-copy forwarding negociation, if the H1 mux is blocked for any
reason, the IOBUF_FL_FF_BLOCKED flag must be set on its iobuf to notfiy the
producer it must wait. However, there were two places where it was not
performed: when the output buffer allocation failed and when the chunk
formatting failed.
This patch fixes the issue. It must be backported to 2.9.
BUG/MEDIUM: mux-h1: Fix again 0-copy forwarding of chunks with an unknown size
There is still an issue with zero-copy forwarding of chunks with an unknown
size. It is possible for a producer to fill the sapce reserved for the CRLF
at the end of the chunk. The root cause is that this space is not accounted
in the iobuf offset. So, from the producer point of view, the space may be
used. We can also argue the current design for iobuf is not well suited for
this case. Instead of using a pointer on the consumer's buffer, it could be
easier to use a custom buffer built on top of the consumer one, via a call
to b_make(), with the size, head and data field reflecting the avaialble
space the producer can use.
By the way, because of this bug, it is possible to trigger a BUG_ON() when
we try to write the CRLF at the end of the chunk because the buffer is
full. It is unexpected. Only the stats applet may hit this bug.
To fix the issue, instead of writting this CRLF when the current chunk is
consumed, it is written before consuming the next one. This way, all space
reserved to create the chunk formatting is always placed before forwarding
data.
This is a followup of the previous commit: GH user @songliumeng initially
reported an issue with the GPL license version for event_hdl source file
which was fixed by the previous commit. It turns out the same mistake was
made in http_ext source file: due to a mixup between LGPL and GPL, GPL
version '2.1' was referenced instead of '2'.
Again, clarify that this is indeed GPL by making use of the banner
provided in doc/gpl.txt
This should be backported in 2.8 with b2bb925 ("MINOR: proxy/http_ext:
introduce proxy forwarded option")
As spotted by user @songliumeng in GH #2463, there was a mixup between
LGPL and GPL in event_hdl source file: GPL version '2.1' was referenced
instead of '2'. Clarify that this is indeed GPL by making use of the
banner provided in doc/gpl.txt.
This should be backported in 2.8 with 68e692d ("MINOR: event_hdl: add
event handler base api")
BUG/MINOR: ssl/cli: duplicate cleaning code in cli_parse_del_crtlist
Since 23cab33 ("BUG/MINOR: ssl: Clear the ckch instance when deleting a
crt-list line"), LIST_DELETE is done twice, one time in
cli_parse_del_crtlist() and another time in ckch_inst_free().
It could trigger a crash with -DDEBUG_LIST.
This isn't a major problem since the ptr is not freed in the meantime so
it will only trigger with the debug.
This patch removes the LIST_DELETE as well as the loop done on link_ref
which is also don in ckch_inst_free()
Could be backported as far as 2.4. 2.4 version does not have a link_ref
loop.
Contrary to static servers, dynamic servers does not initialize their
settings from a default server instance. As such, _srv_parse_init() was
responsible to set a set of minimal values to have a correct behavior.
However, some settings were not properly initialized. This caused
dynamic servers to not behave as static ones without explicit
parameters.
Currently, the main issue detected is connection reuse which was
completely impossible. This is due to incorrect pool_purge_delay and
max_reuse settings incompatible with srv_add_to_idle_list().
To fix the connection reuse, but also more generally to ensure dynamic
servers are aligned with other server instances, define a new function
srv_settings_init(). This is used to set initial values for both default
servers and dynamic servers. For static servers, srv_settings_cpy() is
kept instead, using their default server as reference.
This patch could have unexpected effects on dynamic servers behavior as
it restored proper initial settings. Previously, they were set to 0 via
calloc() invocation from new_server().
This should be backported up to 2.6, after a brief period of
observation.
BUG/MAJOR: ssl/ocsp: crash with ocsp when old process exit or using ocsp CLI
This patch reverts 2 fixes that were made in an attempt to fix the
ocsp-update feature used with the 'commit ssl cert' command.
The patches crash the worker when doing a soft-stop when the 'set ssl
ocsp-response' command was used, or during runtime if the ocsp-update
was used.
This was reported in issue #2462 and #2442.
The last patch reverted is the associated reg-test.
Revert "BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing"
This reverts commit 5e66bf26ecbf6439fafc8ef8857abe22e0874f4d.
BUG/MEDIUM: applet: Fix HTX .rcv_buf callback function to release outbuf buffer
In appctx_htx_rcv_buf(), HTX blocks found in the appctx output buffer are
copied into the channel buffer. At the end, the state of the underlying
buffer must be updated. If everything was copied, the buffer is reset. This
way, it will be released later, at the end of the applet process function.
However, here there was a typo. We do it on the input buffer instead of the
output buffer. As side effect, an empty HTX message remained stuck in the
appctx outbut buffer, blocking the applet and leading to blocked session
with no expiration date.
Willy Tarreau [Fri, 23 Feb 2024 19:01:45 +0000 (20:01 +0100)]
[RELEASE] Released version 3.0-dev4
Released version 3.0-dev4 with the following main changes :
- BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing
- BUG/MEDIUM: quic: Wrong K CUBIC calculation.
- MINOR: quic: Update K CUBIC calculation (RFC 9438)
- MINOR: quic: Dynamic packet reordering threshold
- MINOR: quic: Add a counter for reordered packets
- BUG/MAJOR: mux-h1: Fix zero-copy forwarding when sending chunks of unknown size
- MINOR: stats: Use a dedicated function to check if output is almost full
- BUG/MEDIUM: applet: Add a flag to state an applet is using zero-copy forwarding
- BUG/MEDIUM: stconn/applet: Block 0-copy forwarding if producer needs more room
- MINOR: applet: Remove uselelss test on SE_FL_SHR/SHW flags
- MEDIUM: applet: Add notion of shutdown for write for applets
- MINOR: cli: No longer check SC for shutdown to interrupt wait command
- BUG/MEDIUM: stconn: Allow expiration update when READ/WRITE event is pending
- BUG/MEDIUM: stconn: Don't check pending shutdown to wake an applet up
- CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield
- MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield
- MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side
- MINOR: muxes: Announce support for zero-copy forwarding on consumer side
- BUG/MAJOR: stconn: Check support for zero-copy forwarding on both sides
- MINOR: muxes/applet: Simplify checks on options to disable zero-copy forwarding
- BUG/MINOR: quic: reject unknown frame type
- MINOR: quic: handle all frame types on reception
- BUG/MINOR: quic: reject HANDSHAKE_DONE as server
- BUG/MINOR: qpack: reject invalid increment count decoding
- BUG/MINOR: qpack: reject invalid dynamic table capacity
- DOC/MINOR: userlists: mention solutions to high cpu with hashes
- DOC: quic: Missing tuning setting in "Global parameters"
- BUG/MEDIUM: applet: Immediately free appctx on early error
- BUG/MEDIUM: hlua: Be able to garbage collect uninitialized lua sockets
- BUG/MEDIUM: hlua: Don't loop if a lua socket does not consume received data
- BUG/MEDIUM: quic: fix transient send error with listener socket
- MINOR: log: custom name for logformat node
- MINOR: sample: add type_to_smp() helper function
- MINOR: log: explicit typecasting for logformat nodes
- MINOR: log: simplify last_isspace in sess_build_logline()
- MINOR: log: simplify quotes handling in sess_build_logline()
- MINOR: log: print metadata prefixes separately in sess_build_logline()
- MINOR: log: automate string array construction in sess_build_logline()
- DOC: quic: fix recommandation for bind on multiple address
- MINOR: quic: warn on bind on multiple addresses if no IP_PKTINFO support
- OPTIM: quic: improve slightly qc_snd_buf() internal
- MINOR: quic: move IP_PKTINFO on send on a dedicated function
- MINOR: quic: remove sendto() usage variant
- MINOR: quic: only use sendmsg() syscall variant
- BUILD: applet: fix build on some 32-bit archs
- BUG/MINOR: quic: initialize msg_flags before sendmsg
- BUG/MEDIUM: mux-h1: Don't emit 0-CRLF chunk in h1_done_ff() when iobuf is empty
- CLEANUP: proxy/log: remove unused proxy flag
- CLEANUP: log: fix process_send_log() indentation
- CLEANUP: log: use free_logformat_list() in parse_logformat_string()
- MINOR: log: add free_logformat_node() helper function
- BUG/MINOR: log: fix potential lf->name memory leak
- BUG/MINOR: ist: allocate nul byte on istdup
- BUG/MINOR: stats: drop srv refcount on early release
- BUG/MAJOR: promex: fix crash on deleted server
- BUG/MAJOR: server: fix stream crash due to deleted server
- BUG/MEDIUM: mux-quic: do not crash on qcs_destroy for connection error
- MINOR: cli: Remove useless loop on commands to find unescaped semi-colon
- BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n
- BUG/MAJOR: cli: Restore non-interactive mode behavior with pipelined commands
- BUG/MINOR: quic: fix output of show quic
- MINOR: ssl: Call callback function after loading SSL CRL data
- BUG/MINOR: ist: only store NUL byte on succeeded alloc
Willy Tarreau [Fri, 23 Feb 2024 18:51:54 +0000 (19:51 +0100)]
BUG/MINOR: ist: only store NUL byte on succeeded alloc
The trailing NUL added at the end of istdup() by recent commit de0216758
("BUG/MINOR: ist: allocate nul byte on istdup") was placed outside of
the pointer validity test, rightfully showing null deref warnings. This
fix should be backported along with the fix above, to the same versions.
Miroslav Zagorac [Fri, 23 Feb 2024 02:24:29 +0000 (03:24 +0100)]
MINOR: ssl: Call callback function after loading SSL CRL data
Due to the possibility of calling a control process after adding CRLs, the
ssl_commit_crlfile_cb variable was added. It is actually a pointer to the
callback function, which is called if defined after initial loading of CRL
data from disk and after committing CRL data via CLI command
'commit ssl crl-file ..'.
If the callback function returns an error, then the CLI commit operation
is terminated.
Also, one case was added to the CLI context used by "commit cafile" and
"commit crlfile": CACRL_ST_CRLCB in which the callback function is called.
Signed-off-by: William Lallemand <wlallemand@haproxy.com>
Amaury Denoyelle [Fri, 23 Feb 2024 16:28:49 +0000 (17:28 +0100)]
BUG/MINOR: quic: fix output of show quic
Output of 'show quic' is messed up since the introduction of reordered
packets counter in the following commit. The new counter is mixed up
with the first stream line. This is due to the wrong placement of the
newline delimiter.
BUG/MAJOR: cli: Restore non-interactive mode behavior with pipelined commands
The issue was decribed in commit "BUG/MEDIUM: cli: Warn if pipelined commands
are delimited by a \n". In non-interactive mode, it was possible to use a
newline character as delimiter for pipelined commands. As a consequence, it was
possible to stop commands processing on the middle.
With the above commit, a warning is emitted to notify users. With this one,
we restore the expected behavior, as documented in the management guide.
Only the first line of commands is parsed. This commit will not be
backported to avoid breaking changes on stable versions.
This commit has of course some visible effects. All script using a newline
character as delimiter to pipeline commands in non-interactive mode will
stop working. Only the first command will be evaluated, all others will be
ignored. Pipelined commands MUST now be separated by a semi-colon.
But there is a more subtle and probably more annoying change. It is no
longer possible to pipeline commands with a payload ! A command with a
payload will always be the last one evaluated because it must be finished by
a newline (eventually preceeded by a custom pattern).
It is really annoying to introduce such breaking change. But, on the long
term, it is mandatory. The 2.8 will be the last LST version supporting the
old behavior (with some warning however). This will let 4 years to users to
adapt their scripts.
BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n
This was broken since commit 0011c25144 ("BUG/MINOR: cli: avoid O(bufsize)
parsing cost on pipelined commands"). It is not really a bug fix but it is
labelled as is to make it more visible.
Before, a full line was first retrieved from the request buffer before
extracting the first command to eval it. Now, only one command is retrieved.
But we rely on the request buffer state to interrupt processing in
non-interactive mode. After a command processing, if output of the request
buffer is empty, we leave. Before the above commit, this was not a problem.
But since then, it is obviously a bad statement. First because some input
data may still be there. It is not true today, but it might change. Then,
there is no warranty to receive all commands in same time. For small list of
commands, it will be most of time the case, but it is a dangerous
assumption. For long list of commands, it is almost always false.
To be an issue, commands must be chunked exactly between two commands. But
in this case, remaining commands are skipped. A good way to reproduce the
issue is to wait a bit between two commands, for instance:
In fact, to properly fix the issue, we should exit on the first command
finished by a newline. Indeed, as stated in the documentation, in
non-interactive mode, a single line is processed. To pipeline commands,
commands must be separated by a semi-colon. Unfortunately, the above commit
introduced another change. It is possible to pipeline commands delimited by
a newline. It was pushed 2 years ago and backported to all stable versions.
Several scripts may rely on this behavior.
So, on stable version, the bug will not be fixed. However a warning will be
emitted to notify users their scripts don't respect the documentation and
they must adapt it. Mainly because the cli behavior on this point will be
changed in 3.0 to stick to the doc. This warning will only be emitted once
over the whole worker process life. Idea is to not flood the logs with the
same warning for every offending commands.
This commit should probably be backported to all stable versions. But with
some cautions because the CLI was often modified.
MINOR: cli: Remove useless loop on commands to find unescaped semi-colon
This loop was added to detect pipelined commands when only co_getline() was
used to get commands. Now, co_getdelim() is used and the semi-colon is also
considered as a command delimiter.
As side effet, the last semi-colon, if any, is no longer replaced by a
newline. Thus, we must take care to adapt the test to detect partial
commands.
Amaury Denoyelle [Fri, 23 Feb 2024 10:41:33 +0000 (11:41 +0100)]
BUG/MEDIUM: mux-quic: do not crash on qcs_destroy for connection error
On qcs_destroy(), a BUG_ON() statement check that QCS does not have
anymore prepared data. This is to ensure connection flow control is
always coherent and prevent transfer freeze.
However, this BUG_ON() may cause a spurrious crash in case QCC is
considered on error. Indeed, in this case, all transfers are interrupted
and qmux_strm_detach() will proceed to immediate QCS free before
releasing the connection. In this situation, connection flow control is
irrelevant so the BUG_ON() should be ignored.
This crash occurs since the MUX refactoring via the following patch.
Previously, a similar BUG_ON() was used but it was incorrectly
implemented rendering it immune even to targetted cause.
Amaury Denoyelle [Wed, 21 Feb 2024 14:54:11 +0000 (15:54 +0100)]
BUG/MAJOR: server: fix stream crash due to deleted server
Before a dynamic server can be deleted, a set of preconditions must be
validated to ensure it is not referenced naymore by a stream or a
connection. This is implemented in srv_check_for_deletion().
The various criteria specified were incomplete. This allows a server
instance to be deleted while still be referenced by a stream and a
connection.
This bug was reproduced by using ASAN compilation. A script was used to
add and delete a server every second, while using h2load to generate
traffic with download of 1k objects. Here is the ASAN error.
==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060
READ of size 1 at 0x520000020080 thread T7
#0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99
#1 0x63cb2568f465 in process_stream src/stream.c:1823
#2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632
#3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876
#4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050
#5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252
#6 0x701539aa9559 (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
#7 0x701539b26a3b (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
To fix this, add <curr_used_conns> to the counters checked in
srv_check_for_deletion().
Outside of this bug, one case which remains sensible is for SF_DIRECT
streams which referenced a server instance early in process_stream()
before connect_server(). This occurs with use-server directive,
force-persist rule or cookie persistence. However, after code
reexamination, the code is considered reliable as process_stream() is
not rescheduled before connect_server() invocation. These observations
have been saved in sess_change_server() documentation to ensure it
remains valid in the future.
Amaury Denoyelle [Thu, 22 Feb 2024 13:16:37 +0000 (14:16 +0100)]
BUG/MAJOR: promex: fix crash on deleted server
Promex applet is used to dump many metrics. Some of them are related to
a server instance. The applet can be interrupted in the middle of a
dump, for example waiting for output buffer space. In this case, its
context is save to resume dump on the correct instance.
A crash can occur if dump is interrupted during servers loop. If the
server instance is deleted during two scheduling of the promex applet,
its context will still referenced the deleted server on resume.
To fix this, use server refcount to prevent its deletion during parsing.
No backport is needed, despite all stable releases being affected. This
is because promex applet context has been recently rewritten to use
generic pointers. As such, a specific commit will be applied for earlier
releases.
Amaury Denoyelle [Thu, 22 Feb 2024 13:13:45 +0000 (14:13 +0100)]
BUG/MINOR: stats: drop srv refcount on early release
Server refcount is used to protect from server deletion while dumping a
server instance, for stats dump on both CLI and HTTP applet. However,
dump can be aborted prematurely before reaching the end. In this case,
server refcount is never decremented.
This bug can cause an inconsistency on servers refcount, preventing them
to be deleted even after "del server" success.
To fix this, implement release handler for both stats CLI and HTTP
applet. Drop server reference if dump was interrupted during servers
loop.
Amaury Denoyelle [Wed, 21 Feb 2024 15:10:43 +0000 (16:10 +0100)]
BUG/MINOR: ist: allocate nul byte on istdup
istdup() is documented as having the same behavior as strdup(). However,
it may cause confusion as it allocates a block of input length, without
an extra byte for \0 delimiter. This behavior is incoherent as in case
of an empty string however a single \0 is allocated.
This API inconsistency could cause a bug anywhere an IST is used as a
C-string after istdup() invocation. Currently, the only found issue is
with 'wait' CLI command using 'srv-unused'. This causes a buffer
overflow due to ist0() invocation after istdup() for be_name and
sv_name.
Backport should be done to all stable releases. Even if no bug has been
found outside of wait CLI implementation, it ensures the code is more
consistent on every releases.
Recent commit 2ed6068 ("MINOR: log: custom name for logformat node")
introduced a potential memory leak because when custom name is provided,
lf->name value is allocated using strdup(), thus is expected to be freed
alongside the node when the node is released.
However lf->name was only freed in some common places within log.c
cleanups and helpers func, but in reality there are still cases where
lf nodes are manually freed without making use of freeing helpers.
So this is what this patch does, it makes sure all lf freeing places now
leverage the free_logformat_node() helper function that takes care of
freeing all known allocated elements within the node, including custom
name.
This commit depends on:
- "MINOR: log: add free_logformat_node() helper function"
No backport needed unless 2ed6068 gets backported.
CLEANUP: log: use free_logformat_list() in parse_logformat_string()
This is a follow up for 24a5e42db6 ("CLEANUP: log: deinitialization of
the log buffer in one function") as there was another opportunity to
make use of the new cleanup function.
Since 3d6350e10 ("MINOR: log: Remove log-error-via-logformat option"),
PR_O_ERR_LOGFMT flag is not used anymore, but it was left in the proxy-t.h
header file. Simply removing it and adding a comment to indicate that the
corresponding bit is now unused.
BUG/MEDIUM: mux-h1: Don't emit 0-CRLF chunk in h1_done_ff() when iobuf is empty
A chunk message transferred via zero-copy forwarding in H1 may be
corrupted. This only happens when the chunk size is not known during the
nego stage and when there is nothing to forward when h1_donn_ff() is
called. In this case, we always emit a chunk. Because there is nothing to
forward, a 0-CRLF is emitted in the middle of the message.
The issue occurred with the HTTP stats applet only.
A simple fix is to check the size of data in the iobuf before emitting a new
chunk in h1_done_ff(). However, we still try to send outgoing data because
when this happens, it is most of time because the H1 output buffer is almost
full.
This patch should fix the issue #2453. No backport needed.
Amaury Denoyelle [Wed, 21 Feb 2024 09:05:14 +0000 (10:05 +0100)]
BUG/MINOR: quic: initialize msg_flags before sendmsg
Previously, msghdr struct used for sendmsg was memset to 0. This was
updated for performance reason with each members individually defined.
This is done by the following commit :
msg_flags is the only member unset, as sendmsg manual page reports that
it is unused. However, this caused a coverity report. In the end, it is
better to explicitely set it to 0 to avoid any future interrogations,
compiler warning or even portability issues.
This should fix coverity report from github issue #2455.
Willy Tarreau [Wed, 21 Feb 2024 03:16:16 +0000 (04:16 +0100)]
BUILD: applet: fix build on some 32-bit archs
The to_forward field was added to debugging output of applets with commit 62a81cb6a ("MINOR: applet: Add callback function to deal with zero-copy
forwarding"), though it's a size_t printed as %lu, which causes complaints
on 32-bit archs. Let's just cast as %lu.
Amaury Denoyelle [Tue, 20 Feb 2024 09:44:48 +0000 (10:44 +0100)]
MINOR: quic: only use sendmsg() syscall variant
This patch is the direct followup of the previous one :
MINOR: quic: remove sendto() usage variant
This finalizes qc_snd_buf() simplification by removing send() syscall
usage for quic-conn owned socket. Syscall invocation is merged in a
single code location to the sendmsg() variant.
The only difference for owned socket is that destination address for
sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as
valid for connected sockets. This allows maximum performance by avoiding
unnecessary lookups on kernel socket address tables.
As the previous patch, no functional change should happen here. However,
it will be simpler to extend qc_snd_buf() for GSO usage.
Amaury Denoyelle [Mon, 19 Feb 2024 14:29:48 +0000 (15:29 +0100)]
MINOR: quic: remove sendto() usage variant
qc_snd_buf() is a wrapper around emission syscalls. Given QUIC
configuration, a different variant is used. When using connection
socket, send() is the only used. For listener sockets, sendmsg() and
sendto() are possible. The first one is used only if local address has
been retrieved prior. This allows to fix it on sending to guarantee the
source address selection. Finally, sendto() is used for systems which do
not support local address retrieval.
All of these variants render the code too complex. As such, this patch
simplifies this by removing sendto() alternative. Now, sendmsg() is
always used for listener sockets. Source address is then specified only
if supported by the system.
This patch should not exhibit functional behavior changes. It will be
useful when implementing GSO as the code is now simpler.
Amaury Denoyelle [Mon, 19 Feb 2024 14:21:13 +0000 (15:21 +0100)]
MINOR: quic: move IP_PKTINFO on send on a dedicated function
When using listener socket, source address for emission is explicitely
set using ancillary data for sendmsg(). This is useful to guarantee the
correct address is used when binding on a non-explicit address.
This code was implemented directly under qc_snd_buf(). However, it is
quite complex due to portability issue. For IPv4, two parallel
implementations coexist, defined under IP_PKTINFO or IP_RECVDSTADDR. For
IPv6, another option is defined under IPV6_RECVPKTINFO. Each variant
uses its distinct name which increase the code complexity.
Extract ancillary data filling in a dedicated function named
cmsg_set_saddr(). This reduces greatly the body of qc_snd_buf(). Such
functions can be replicated when other ancillary data type will be
implemented. This will notably be useful for GSO implementation.
qc_snd_buf() is a wrapper for sendmsg() syscall (or its derivatives)
used for all QUIC emissions. This patch aims at removing several
non-optimal code sections :
* fd_send_ready() for connected sockets is only checked on the function
preambule instead of inside the emission loop
* zero-ing msghdr structure for unconnected sockets is removed. This is
unnecessary as all fields are properly initialized then.
* extra memcpy/memset invocations when using IP_PKTINFO/IPV6_RECVPKTINFO
are removed by setting directly the address value into cmsg buffer
Amaury Denoyelle [Fri, 16 Feb 2024 14:40:06 +0000 (15:40 +0100)]
MINOR: quic: warn on bind on multiple addresses if no IP_PKTINFO support
Binding on multiple addresses for QUIC is safe only if IP_PKTINFO or
equivalent is available. Else, the behavior may be undefined as the
system is responsible to choose the network interface and source address
on response.
This commit adds a warning on boot if no or partial support for
IP_PKTINFO or equivalent is detected and configuration contains UDP
binding on multiple addresses.
This should be backported up to 2.6. Special backport recommdations :
* change ha_warning() to ha_diag_warning() to ensure no spurrious
warnings will be triggered on stable releases
* IP_PKTINFO usage was introduced on 2.7. For 2.6, multiple addresses
QUIC binding is always unreliable. As such, preprocessor condition
must simply be removed so that the warning is always active regarding
of the system. Warning message should also be truncated to suppress
IP_PKTINFO reference.
Amaury Denoyelle [Thu, 15 Feb 2024 17:43:44 +0000 (18:43 +0100)]
DOC: quic: fix recommandation for bind on multiple address
Documentation falsely mentions that binding on multiple addresses is
forbidden for QUIC listeners. This is not the case. Moreover, this
behavior is reliable when using destination address retrieval on receive
via IP_PKTINFO, which allows to determine the proper source address for
response.
This should be backported up to 2.7. On 2.6 specific source address
definition on sendmsg via IP_PKTINFO is not implemented. As such, bind
on multiple addresses should remain forbidden for this release.
MINOR: log: print metadata prefixes separately in sess_build_logline()
Some log variables may be prefixed with specific chars that represent
extra informations that are relevant with it but are are not directly
part of the "raw" value.
ie: '+' char is prepended before some values when "option logasap" is
used to indicate that the value has not yet reached its final value.
However, as those "metadata" are printed using the general purpose
LOGCHAR() printing helper, it's not easy to tell if they are part of the
base value or not.
In this patch we add the LOGMETACHAR() helper that is a wrapper for
LOGCHAR(). The goal is to prepare for adding some logic to prevent such
additional infos from being generated when not relevant or needed.
MINOR: log: simplify quotes handling in sess_build_logline()
quotes building for some log formats is directly performed under each
switch case statement so it would become painful to add other conditions
to prevent the quotes from being generated when it's not supported by the
the data encoding format for instance (ie: JSON).
Let's centralize and simplify quotes handling by adding LOGQUOTE_START()
and LOGQUOTE_END() helper macros. If a quotation is started and not
explicitly ended, it will be automatically ended at the end of the current
logformat node:
LOGQUOTE_START() sets 'quote' variable to 1, this way LOGQUOTE_END() only
prints the ending quote when needed. LOGQUOTE_END() is systematically
called after each node switch-case (after each value). LOGQUOTE_START()
does nothing if LOG_OPT_QUOTE isn't set, so does LOGQUOTE_END().
Some rare cases such as %hsl (list of captured headers) required special
handling: in this case multiple quoted texts are generated for the same
field value so explicit LOGQUOTE_START() + LOGQUOTE_END() combination was
needed.
MINOR: log: simplify last_isspace in sess_build_logline()
last_isspace variable is explicitly set to 0 in all cases except
LOG_FMT_SEPARATOR case. So we can actually simplify the code by setting
last_isspace to 0 by default and skipping the assignment for the
LOG_FMT_SEPARATOR case.
MINOR: log: explicit typecasting for logformat nodes
Add the ability to manually specify desired output type after a custom
field name for logformat nodes. Forcing the type can be useful to ensure
value is stored with the proper type representation. (i.e.: forcing
numerical to string to work around the limited resolution of JS number
types)
By default, type is set to SMP_T_SAME, which means the original type will
be preserved.
type_to_smp(type) does the reverse operation of smp_to_type[smp]: it takes
a type name as input string and tries to return the corresponding SMP_T_*
smp type or SMP_TYPES if not found.
Amaury Denoyelle [Mon, 19 Feb 2024 16:27:07 +0000 (17:27 +0100)]
BUG/MEDIUM: quic: fix transient send error with listener socket
Transient send errors is handled differentely if using connection or
listener socket for QUIC transfers. In the first case, proper poller
subscription is used via fd_cant_send()/fd_want_send(). For the listener
socket case, error is ignored by qc_snd_buf() caller and retransmission
mechanism will allow to reemit the data.
For listener socket, transient error code handling is buggy. It blindly
uses fd_cand_send() with <qc.fd> member which is set to -1 for listener
socket usage. This results in an invalid fdtab access, with a possible
crash or a modification of a totally unrelated FD.
This bug is simply fixed by using qc_test_fd() before using
fd_cant_send()/fd_want_send(). This ensures <qc.fd> is used only if
initialized which is only the case when using connection socket.
No crash was reported yet for this bug. However, it is reproducible by
using ASAN compilation and the following strace sendmsg() errno command
injection :
BUG/MEDIUM: hlua: Don't loop if a lua socket does not consume received data
If some data are received for a lua socket while the lua script responsible
to consume these data is not ready to do so, for instance because it is
sleeping, the applet is woken up in loop because it never states it will not
consume these data yet.
To fix the issue, in the applet I/O handle, when there are outgoing data, we
always pretend the applet will not consume it. It is the responsibility to
the lua script to reactivate receives by calling Socket.receive() function.
This patch must be backported to every stable version. For 2.4 and older,
si_want_get()/si_cant_get() must be used instead of
applet_will_consume()/applet_wont_consume().
BUG/MEDIUM: hlua: Be able to garbage collect uninitialized lua sockets
It is poosible to create a lua socket without performing any connect. In
this case, the lua socket is released because of the garbage collector.
However, the garbarge collector does not release the applet, it wakes it
up. Since commit 751b59c40b ("BUG/MEDIUM: hlua: Initialize appctx used by a
lua socket on connect only"), the applet initialization is performed on
connect. So, here, it is possible to wake an uninitialized applet. It is an
unexpected case for the applet's I/O handler, leading to a segfault because
some resources are not initialized (the stream's target in this case).
So, now, in the lua socket GC function, we take care to immediately release
uninitialized applets. At worst, the release itself is delayed. But it is
safe because we are sure the applet's I/O handler will never be executed.
In addition, we take case to increment the GC counter when the lua socket is
created. The way, uninitialized lua socket are released more quickly.
This patch should fix the issue #2451. It must be backported as far as 2.6.
BUG/MEDIUM: applet: Immediately free appctx on early error
When an error is triggered during the applet initialization, a dedicated
function is called to release it. Indeed, in this case, because the applet
was not initialized, the ->release callback must not be called. However,
because the init stage may be delayed to be performed during the first
applet wakeup, we must also take care to not rely on the default
appctx_free() function, to immediately release the applet. Otherwise, if the
error happens in a delayed init stage, the applet is never released.
This patch partially fix the issue #2451. It must be backported as far as
2.6.
Nicolas CARPi [Mon, 12 Feb 2024 17:03:52 +0000 (18:03 +0100)]
DOC/MINOR: userlists: mention solutions to high cpu with hashes
This change adds a paragraph to the documentation regarding "userlists"
and the use of hashed password value.
It indicates what a user can do to address the high CPU cost of
having to calculate the hash at each request, such as reducing the
number of rounds or the cost complexity, if the algorithm allows for it.
I believe it is necessary to mention how the musl C library
impacts performance of hashing functions, as this has already led to a
few issues:
Currently haproxy does not implement dynamic table support for QPACK. As
such, dynamic table capacity advertized via H3 SETTINGS is 0. When
receiving a non-null Set Dynamic Table Capacity instruction, close
immediately the connection using QPACK_ENCODER_STREAM_ERROR.
Prior to this patch, such instructions were simply ignored. This is non
conform to QUIC specification.
This should be backported up to 2.6. Note that on 2.6 qcc_set_error()
must be replaced by function qcc_emit_cc_app().
Close the connection using QPACK_DECODER_STREAM_ERROR when receiving an
invalid insert count increment. As haproxy does not use dynamic table,
this instruction must never be emitted by the peer.
Prior to this patch, haproxy silently ignored such instruction which is
not conform to the QUIC specification.
This should be backported up to 2.6. Note that on 2.6 qcc_set_error()
must be replaced by function qcc_emit_cc_app().
Amaury Denoyelle [Wed, 14 Feb 2024 17:13:08 +0000 (18:13 +0100)]
BUG/MINOR: quic: reject HANDSHAKE_DONE as server
As specified in RFC 9000, a client must never emit a HANDSHAKE_DONE
frame. If this happens, the server must close the connection with error
PROTOCOL VIOLATION.
Previously, such a frame was silently discarded on server side. The
connection remained opened which is not conformant to the specification.
Amaury Denoyelle [Thu, 15 Feb 2024 13:42:54 +0000 (14:42 +0100)]
MINOR: quic: handle all frame types on reception
Ensure every frame types are handled in qc_parse_pkt_frms. Add an
ABORT_NOW on the default case. This is safe as an unknown frame must be
rejected prior via qc_parse_frm().
MINOR: muxes/applet: Simplify checks on options to disable zero-copy forwarding
Global options to disable for zero-copy forwarding are now tested outside
callbacks responsible to perform the forwarding itself. It is cleaner this
way because we don't try at all zero-copy forwarding if at least one side
does not support it. It is equivalent to what was performed before, but it
is simplier this way.
BUG/MAJOR: stconn: Check support for zero-copy forwarding on both sides
There is a nego stage when a producer is ready to forward data to the other
side. At this stage, the zero-copy forwarding may be disabled if the
consumer does not support it. However, there is a flaw with this way to
proceed. If the channel buffer is not empty, we delay the zero-copy
forwarding to flush all data from the channel first. During this delay,
receives on the endpoint (at connection level for muxes), are blocked to be
sure to have the opportunity to switch on zero-copy forwarding. It is a
problem if the consumer cannot flush data from the channel's buffer, waiting
for more data for instance.
It is especially annoying with the CLI applet, because this scenario can
happen if a command is partially received. For instance without the LF at
the end. In this case, the CLI applet is blocked because it waits more
data. The frontend connexion is also blocked because channel's data must be
flushed before trying to receive more data. Worst, this happen at where no
timeout is armed. Thus the session is stuck infinitly, client aborts cannot
be detected because receives are blocked, and the applet cannot abort on its
side because there are pending outgoing data. It is clearly a situation
where it is easy to consume all CLI slots.
To fix the issue, thanks to previous commits, we now check zero-copy
forwarding support on both sides before proceeding.
This patch relies on the following commits:
* MINOR: muxes: Announce support for zero-copy forwarding on consumer side
* MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side
* MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield
* CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield
MINOR: muxes: Announce support for zero-copy forwarding on consumer side
It is unused for now, but the muxes announce their support of the zero-copy
forwarding on consumer side. All muxes, except the fgci one, are supported
it.
MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side
The SE_FL_MAY_FASTFWD_CONS is added and it will be used by endpoints to
announce their support for the zero-copy forwarding on the consumer
side. The flag is not necessarily permanent. However, it will be used this
way for now.
MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield
To fix a bug, a flag to announce the capabitlity to support the zero-copy
forwarding on the consumer side will be added on the SE descriptor. So the
old flag SE_FL_MAY_FASTFWD is renamed to indicate it concerns the producer
side. It is now SE_FL_MAY_FASTFWD_PROD. And to prepare addition of the new
flag, the bitfield is a bit reordered.
CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield
To fix a bug, some SE flags must be added or renamed. To avoid mixing flags
set by the endpoint and flags set by the app, the second set of flags are
moved at the end of the bitfield, leaving the holes on the middle.
BUG/MEDIUM: stconn: Don't check pending shutdown to wake an applet up
This revert of commit 0b93ff8c87 ("BUG/MEDIUM: stconn: Wake applets on
sending path if there is a pending shutdown") and 9e394d34e0 ("BUG/MINOR:
stconn: Don't report blocked sends during connection establishment") because
it was not the right fixes.
We must not wake an applet up when a shutdown is pending because it means
output some data are still blocked in the channel buffer. The applet does
not necessarily consume these data. In this case, the applet may be woken up
infinitly, except if it explicitly reports it wont consume datay yet.
This patch must be backported as far as 2.8. For older versions, as far as
2.2, it may be backported. If so, a previous fix must be pushed to prevent
an HTTP applet to be stuck. In http_ana.c, in http_end_request() and
http_end_reponse(), the call to channel_htx_truncate() on the request
channel in case of MSG_ERROR must be replace by a call to
channel_htx_erase().
BUG/MEDIUM: stconn: Allow expiration update when READ/WRITE event is pending
When a READ or a WRITE activity is reported on a channel, the corresponding
date is updated. the last-read-activity date (lra) is updated and the
first-send-block date (fsb) is reset. The event is also reported at the
channel level by setting CF_READ_EVENT or CF_WRITE_EVENT flags. When one of
these flags is set, this prevent the update of the stream's task expiration
date from sc_notify(). It also prevents corresponding timeout to be reported
from process_stream().
But it is a problem during fast-forwarding stage if no expiration date was
set by the stream. Only process_stream() resets these flags. So a first READ
or WRITE event will prevent any stream's expiration date update till a new
call to process_stream(). But with no expiration date, this will only happen
on shutdown/abort event, blocking the stream for a while.
It is for instance possible to block the stats applet or the cli applet if a
client does not consume the response. The stream may be blocked, the client
timeout is not respected and the stream can only be closed on a client
abort.
So now, we update the stream's expiration date, regardless of reported
READ/WRITE events. It is not a big deal because lra and fsb date are
properly updated. It also means an old READ/WRITE event will no prevent the
stream to report a timeout and it is expected too.
This patch must be backported as far as 2.8. On older versions, timeouts and
stream's expiration date are not updated in the same way and this works as
expected.
MINOR: cli: No longer check SC for shutdown to interrupt wait command
Thanks to the previous patch ("MEDIUM: applet: Add notion of shutdown for
write for applets"), it is no longer necessary to check SC flags to detect
shutdowns to interrupt the wait command. It is possible to remove this ugly
workaround. In addition, we only test the SE for shutdown because end of
stream and error are already checked by the CLI I/O handler. And it is no
longer necessary to remove output data from the channel's buffer because
shutdown are not reported if there are remaining outgoing data.
Of course, if the "wait" command is backported, the commit above and this
one must be backported too.
MEDIUM: applet: Add notion of shutdown for write for applets
In fact there is already flags on the SE to state a shutdown for reads or
writes was performed. But for applets, this notion does not exist. Both
flags are set in same time when the applet is released. But at the SC level,
there are functions to perform a shutdown (formely the shutw) and an abort
(formely the shutr). For applets, when a shutdown is performed on the SC, if
the applet is not immediately released, nothing is acknowledge at the SE
level.
With old way to implement applets, this was not an real issue until recently
because applets accessed to the channel/SC flags. It was thus possible to
catch the shutdowns. But the "wait" command on the CLI reveals the
flaw. Indeed, when this command is executed, nothing is read or sent. So, it
is not possible to detect the shutdowns. As a workaround, a dedicated test
on the SC flags was added at the end of the wait command I/O handler. But it
is pretty ugly.
With new way to implement applets, there is no longer access to the channel
or SC. So we must add a way to acknowledge shutdown into the SE.
This patch solves the both sides of the issue. The shutw notion is added for
applets. Its only purpose is to set SE_FL_SHWN flags. This flag is tested by
all applets, so, it solves the issue quite simply.
Note that it is described as a bug fix but there is no real issue, just a
design flaw. However, if the "wait" command is backported, this patch must
be backported too. Unfortinately it will require an adaptation because there
is no appctx flags on older versions.
MINOR: applet: Remove uselelss test on SE_FL_SHR/SHW flags
These both flags are set after releasing the applet, in
appctx_shut(). Concretly, it means the applet is shutdown for reads and
writes. Once set, the applet's I/O handler was no longer called. Tests on
these flags are useless. There is no chance to match them.
BUG/MEDIUM: stconn/applet: Block 0-copy forwarding if producer needs more room
This case does not exist yet with the H1 multiplexer, but applets may decide to
not produce data if there is not enough room in the destination buffer (the
applet's outbuf or the opposite SE buffer). It is true for the stats applets for
instance. However this case is not properly handled when the zero-copy
forwarding is in-use.
To fix the issue, the se_done_ff() function was modified to return the number of
bytes really forwarded and to subs for sends if nothing was forwarded while the
zero-copy forwarding was blocked by the producer. On the applet side, we take
care to block the zero-copy forwarding if the applet requests more room. At the
end, zero-copy forwarding is unblocked if something was forwarded.
This way, it is now possible for the stats applet to report a full buffer and
block the zero-copy forwarding, even if the buffer is not really full, by
requesting more room.
BUG/MEDIUM: applet: Add a flag to state an applet is using zero-copy forwarding
An issue was introduced when zero-copy forwarding was added to the stats and
cache applets. There is no test to be sure the upper layer is ready to use
the zero-copy forwarding. So these applets refuse to deliver the response
into the applet's output buffer if the zero-copy forwarding is supported by
the opposite endpoint. It is especially an issue when a filter, like the
compression, is in-use on the response channel.
Because of this bug, the response is not delivered and the applet is woken
up in loop to produce data.
To fix the issue, an appctx flag was added, APPCTX_FL_FASTFWD, to know when
the zero-copy forwarding is in-use. We rely on this flag to not fill the
outbuf in the applet's I/O handler.
MINOR: stats: Use a dedicated function to check if output is almost full
This simplifies a bit the stats applet. Because the CLI part was not
refactored yet to use the applet's buffers, there are 3 ways to produce
data:
* the HTX message for the HTTP stats when zero-copy forwarding is not
used
* raw data in the opposite endpoint buffer for the HTTP stats when
zero-copy forwarding is used
* the channel buffer when the CLI "show stat" command is evaluated
There is already a dedicated function to take care to copy data at the right
place. There is now also a dedicated function to check us the output buffer
is almost full.
BUG/MAJOR: mux-h1: Fix zero-copy forwarding when sending chunks of unknown size
Commit 91b77c1632 ("MEDIUM: mux-h1: Support zero-copy forwarding for chunks with
an unknown size") was recently pushed but it contains 3 bugs. The first one is
during the nego. The extra size reserved for the CRLF at the end of the chunk
must not be added to the offset value. Indeed, the CRLF will be appended after
the data and not prepended to them.
The second one, still during the nego, is an integer overflow when the available
room in the output buffer is computed.
Finally, the last one is when the chunk itself is formatted. This part was
totally buggy if the output buffer was not empty at the beginning.
A packet is considered as reordered when it is detected as lost because its packet
number is above the largest acknowledeged packet number by at least the
packet reordering threshold value.
Add ->nb_reordered_pkt new quic_loss struct member at the same location that
the number of lost packets to count such packets.
Let's say that the largest packet number acknowledged by the peer is #10, when inspecting
the non already acknowledged packets to detect if they are lost or not, this is the
case a least if the difference between this largest packet number and and their
packet numbers are bigger or equal to the packet reordering threshold as defined
by the RFC 9002. This latter must not be less than QUIC_LOSS_PACKET_THRESHOLD(3).
Which such a value, packets #7 and oldest are detected as lost if non acknowledged,
contrary to packet number #8 or #9.
So, the packet loss detection is very sensitive to such a network characteristic
where non acknowledged packets are distant from each others by their packet number
differences.
Do not use this static value anymore for the packet reordering threshold which is used
as a criteria to detect packet loss. In place, make it depend on the difference
between the number of the last transmitted packet and the number of the oldest
one among the packet which are still in flight before being inspected to be
deemed as lost.
Add new tune.quic.reorder-ratio setting to apply a ratio in percent to this
dynamic packet reorder threshold.
The formula for K CUBIC calculation is as follows:
K = cubic_root(W_max * (1 - beta_quic) / C).
Note that this does not match the comment. But the aim of this patch is to not
hide a bug inside another patch to update this K CUBIC calculation.
The unit of C is bytes/s^3 (or segments/s^3). And we want to store K as
milliseconds. So, the conversion inside the cubic_root() to convert seconds in
milliseconds is wrong. The unit used here is bytes/(ms/1000)^3 or
bytes*1000^3/ms^3. That said, it is preferable to compute K as seconds, then
convert to milliseconds as done by this patch.
BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing
The CLI command "update ssl ocsp-response" was forcefully removing an
OCSP response from the update tree regardless of whether it used to be
in it beforehand or not. But since the main OCSP upate task works by
removing the entry being currently updated from the update tree and then
reinserting it when the update process is over, it meant that in the CLI
command code we were modifying a structure that was already being used.
These concurrent accesses were not properly locked on the "regular"
update case because it was assumed that once an entry was removed from
the update tree, the update task was the only one able to work on it.
Rather than locking the whole update process, an "updating" flag was
added to the certificate_ocsp in order to prevent the "update ssl
ocsp-response" command from trying to update a response already being
updated.
An easy way to reproduce this crash was to perform two "simultaneous"
calls to "update ssl ocsp-response" on the same certificate. It would
then crash on an eb64_delete call in the main ocsp update task function.
Willy Tarreau [Sat, 10 Feb 2024 16:24:06 +0000 (17:24 +0100)]
[RELEASE] Released version 3.0-dev3
Released version 3.0-dev3 with the following main changes :
- DOC: configuration: clarify http-request wait-for-body
- BUG/MAJOR: ssl_sock: Always clear retry flags in read/write functions
- MINOR: h3: add traces for stream sending function
- BUG/MEDIUM: h3: do not crash on invalid response status code
- BUG/MEDIUM: qpack: allow 6xx..9xx status codes
- BUG/MEDIUM: quic: fix crash on invalid qc_stream_buf_free() BUG_ON
- CLEANUP: log: deinitialization of the log buffer in one function
- BUG/MINOR: h1: Don't support LF only at the end of chunks
- BUG/MEDIUM: h1: Don't support LF only to mark the end of a chunk size
- MINOR: ssl: add HAVE_SSL_0RTT constant
- MINOR: ssl: rename HA_OPENSSL_HAVE_0RTT_SUPPORT constant to HAVE_SSL_0RTT_QUIC
- MEDIUM: ssl/quic: always compile the ssl_conf.early_data test
- DOC: httpclient: add dedicated httpclient section
- BUG/MINOR: h1-htx: properly initialize the err_pos field
- BUG/MEDIUM: h1: always reject the NUL character in header values
- CLEANUP: h1: remove unused function h1_measure_trailers()
- BUG/MINOR: ssl/quic: fix 0RTT define
- MINOR: mux-quic: prepare for earlier flow control update
- MINOR: mux-quic: define a flow control related type
- MEDIUM: mux-quic: limit stream flow control on snd_buf
- MEDIUM: mux-quic: limit conn flow control on snd_buf
- MINOR: mux-quic: remove unneeded sent-offset fields
- MINOR: mux-quic: check fctl during STREAM frame build
- MAJOR: mux-quic: remove intermediary Tx buffer
- MEDIUM: mux-quic: simplify sending API
- MEDIUM: mux-quic: release Tx buf on too small room
- MEDIUM: mux-quic: properly handle conn Tx buf exhaustion
- MINOR: mux-quic: realign Tx buffer if possible
- CLEANUP: connection: remove obsolete comment in header file
- OPTIM: connection: progressive hash for conn_calculate_hash()
- MINOR: tcp_act: fix alphabetical ordering of tcp request content actions
- MINOR: tcp-act: Rename "set-{mark,tos}" to "set-fc-{mark,tos}"
- MINOR: hlua: Rename set_{tos, mark} to set_fc_{tos, mark}
- MEDIUM: tcp-act: <expr> support for set-fc-{mark,tos} actions
- MEDIUM: tcp-act/backend: support for set-bc-{mark,tos} actions
- MINOR: stats: Be able to access to registered stats modules from anywhere
- MEDIUM: stats: Be able to access a specific field into a stats module
- MINOR: promex: Add a param to override the description when a metric is dumped
- MINOR: promex: Add info in the promex context to dump extra counters
- MEDIUM: promex: Dump frontends extra counters if requested
- MEDIUM: promex: Dump backends extra counters if requested
- MEDIUM: promex: Dump servers extra counters if requested
- MEDIUM: promex: Dump listeners extra counters if requested
- DOC: promex: Add documentation about extra-counters
- MINOR: promex: Always limit the number of labels dumped for each metric
- MEDIUM: promex: Simplify the context using generic pointers for restart points
- MINOR: promex: Remove unsued htx parameter when a metric is dumped
- MEDIUM: promex: Add a registration mechanism to support modules
- MEDIUM: promex: Dump metrics of registered modules with a way to filter them
- MEDIUM: promex/stick-table: Dump stick-table metrics via a promex module
- MEDIUM: promex/resolvers: Dump resolvers metrics via a promex module
- MINOR: promex: Rename dump functions to use the right wording
- MINOR: promex: Always pass the final name and description to promex_dmp_ts()
- MEDIUM: promex: Add support for filters on metric names
- REGTESTS: promex: Adapt script to be less verbose
- MINOR: compiler: add a new DO_NOT_FOLD() macro to prevent code folding
- MINOR: debug: make sure calls to ha_crash_now() are never merged
- MINOR: debug: make ABORT_NOW() store the caller's line number when using abort
- BUG/MINOR: diag: always show the version before dumping a diag warning
- BUG/MINOR: diag: run the final diags before quitting when using -c
- MINOR: acl: add extra diagnostics about suspicious string patterns
- BUG/MINOR: quic: Wrong ack ranges handling when reaching the limit.
- BUILD: quic: Variable name typo inside a BUG_ON().
- DOC: config: fix typo for '%ms' log format alternative
- DOC: config: fix ordering for "txn.*" fetches
- MINOR: stream: add "txn.redispatch" fetch
- BUILD: debug: remove leftover parentheses in ABORT_NOW()
- MINOR: debug: make BUG_ON() catch build errors even without DEBUG_STRICT
- BUG/MINOR: ssl: Fix error message after ssl_sock_load_ocsp call
- MINOR: debug: support passing an optional message in ABORT_NOW()
- MINOR: debug: add an optional message argument to the BUG_ON() family
- DEBUG: make the "debug dev {debug|warn|check}" command print a message
- CLEANUP: quic: Code clarifications for QUIC CUBIC (RFC 9438)
- BUG/MINOR: quic: fix possible integer wrap around in cubic window calculation
- MINOR: quic: Stop using 1024th of a second.
- CI: github: abandon asan matrix.py helper
- CI: ssl: add yet another OpenSSL download fallback
- DOC: install: clarify WolfSSL chroot requirements
- MINOR: task: Move wait_event in the task header file
- MINOR: stconn: Be able to detect applets using HTX
- MINOR: stconn: Explicitly use an appctx to attach a stconn on it
- MINOR: stconn: Be prepared to handle error when a SC is attached to an applet
- MINOR: applet: Add dedicated IN/OUT buffers for appctx
- MINOR: applet: Add traces to debug receive/send and block/wake events
- MINOR: applet: Add support for callback functions to exchange data with channels
- MINOR: applet: Implement default functions to exchange data with channels
- MEDIUM: stconn: Add functions to handle applets I/O from the SC layer
- MEDIM: applet: Add the applet handler based on IN/OUT buffers
- MINOR: applet: Show IN/OUT buffers in trace messages when used
- MINOR: applet: Add flags on the appctx and stop abusing its state
- MINIOR: applet: Add flags to deal with ends of input, ends of stream and errors
- MINOR: applet: Remove appctx state field to only used the flags
- MINOR: applet: Add an appctx flag to report shutdown to applets
- MEDIUM: applet: Use appctx flags to report EOS/EOI/ERROR to SE
- MINOR: applet: Add callback function to deal with zero-copy forwarding
- MEDIUM: applet: Add support for zero-copy forwarding from an applet
- MINOR: applet: Automatically handle applets having more data for the stream
- MEDIUM: stats: Don't interrupt processing on partial post
- MAJOR: stats: Update HTTP stats applet to handle its own buffers
- MEDIUM: cache: Temporarily remove zero-copy forwarding support
- MAJOR: cache: Update HTTP cache applet to handle its own buffers
- MAJOR: cache: Send cached objects using zero-copy forwarding
- MINOR: stconn: Add support for flags during zero-copy forwarding negotiation
- MINOR: mux-h1: Be able to define the length of a chunk size when it is prepended
- MEDIUM: stconn: Nofify requested size during zero-copy forwarding nego is exact
- MINOR: mux-h1: Stop zero-copy forwarding during nego for too big requested size
- MEDIUM: mux-h1: Support zero-copy forwarding for chunks with an unknown size
- MAJOR: stats: Send stats dump over HTTP using zero-copy forwarding
- MEDIUM: applet: Simplify a bit API to exchange data with applets
- MINOR: cache: Remove unsed .data_sent field from the cache applet context
- MINOR: applet: Use an option to disable zero-copy forwarding for all applets
- MINOR: applet: Identify applets using their own buffers via a flag
- BUG/MINOR: ssl: Duplicate ocsp update mode when dup'ing ckch
- MINOR: ssl: Use OCSP_CERTID instead of ckch_store in ckch_store_build_certid
- BUG/MINOR: ssl: Clear the ckch instance when deleting a crt-list line
- BUG/MEDIUM: ocsp: Separate refcount per instance and per store
- BUG/MINOR: ssl: Destroy ckch instances before the store during deinit
- BUG/MINOR: ssl: Reenable ocsp auto-update after an "add ssl crt-list"
- REGTESTS: ssl: Add OCSP related tests
- REGTESTS: ssl: Fix empty line in cli command input
- DOC: install: recommend pcre2
- DOC: config: fix misplaced "txn.conn_retries"
- DOC: config: fix typos for "bytes_{in,out}"
- DOC: config: fix misplaced "bytes_{in,out}"
- DOC: config: add more custom log format table alternatives
- MINOR: stream: rename "txn.redispatch" to "txn.redispatched"
- MINOR: sample: implement bc_{be,srv}_queue samples
- BUG/MINOR: mux-h2: count rejected DATA frames against the connection's flow control
- MINOR: mux-h2: count excess of CONTINUATION frames as a glitch
- MINOR: mux-h2: count late reduction of INITIAL_WINDOW_SIZE as a glitch
- DOC: internal: update missing data types in peers-v2.0.txt
- MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate
- MINOR: session: add the necessary functions to update the per-session glitches
- MEDIUM: mux-h2: update session trackers with number of glitches
- BUG/MINOR: server/cli: add missing LF at the end of certain notice/error lines
- BUG/MINOR: vars/cli: fix missing LF after "get var" output
- BUG/MEDIUM: cli: fix once for all the problem of missing trailing LFs
- MINOR: cli: make sure to always print a pending message after release()
- MINOR: cli: always reset the applet task's timeout
- MINOR: cli: add a new "wait" command to wait for a certain delay
- BUG/MINOR: applet: Always release empty appctx buffers after processing
- MINOR: server: split the server deletion code in two parts
- MINOR: cli/wait: make the wait command support a more detailed help message
- MINOR: cli/wait: also support an unrecoverable failure status
- MINOR: cli/wait: also pass up to 4 arguments to the external conditions
- MINOR: cli/wait: add a condition to wait on a server to become unused
- CI: Update to actions/cache@v4
- BUILD: address a few remaining calloc(size, n) cases
- BUG/MEDIUM: pool: fix rare risk of deadlock in pool_flush()
Willy Tarreau [Sat, 10 Feb 2024 11:29:53 +0000 (12:29 +0100)]
BUG/MEDIUM: pool: fix rare risk of deadlock in pool_flush()
As reported by github user @JB0925 in issue #2427, there is a possible
crash in pool_flush(). The problem is that if the free_list is not empty
in the first test, and is empty at the moment the xchg() is performed,
for example because another thread called it in parallel, we place a
POOL_BUSY there that is never removed later, causing the next thread to
wait forever.
This was introduced in 2.5 with commit 2a4523f6f ("BUG/MAJOR: pools: fix
possible race with free() in the lockless variant"). It has probably
very rarely been detected, because:
- pool_flush() is only called when stopping is set
- the function does nothing if global pools are disabled, which is
the case on most modern systems with a fast memory allocator.
It's possible to reproduce it by modifying __task_free() to call
pool_flush() on 1% of the calls instead of only when stopping.
The fix is quite simple, it consists in moving the zeroing of the
entry in the break path after verifying that the entry was not already
busy.
This must be backported wherever commit 2a4523f6f is.
Willy Tarreau [Sat, 10 Feb 2024 10:35:07 +0000 (11:35 +0100)]
BUILD: address a few remaining calloc(size, n) cases
In issue #2427 Ilya reports that gcc-14 rightfully complains about
sizeof() being placed in the left term of calloc(). There's no impact
but it's a bad pattern that gets copy-pasted over time. Let's fix the
few remaining occurrences (debug.c, halog, udp-perturb).
This can be backported to all branches, and the irrelevant parts dropped.
Tim Duesterhus [Thu, 8 Feb 2024 18:55:23 +0000 (19:55 +0100)]
CI: Update to actions/cache@v4
No functional change, but this upgrade is required, due to the v3 runtime being
deprecated:
> Node.js 16 actions are deprecated. Please update the following actions to use
> Node.js 20: actions/cache@v3. For more information see:
> https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Willy Tarreau [Fri, 9 Feb 2024 19:35:52 +0000 (20:35 +0100)]
MINOR: cli/wait: add a condition to wait on a server to become unused
The "wait" command now supports a condition, "srv-unused", which waits
for the designated server to become totally unused, indicating that it
is removable. Upon each wakeup it calls srv_check_for_deletion() to
verify if conditions are met, if not if it's recoverable, or if it's
not recoverable, and proceeds according to this, never waiting for a
final decision longer than the configured delay.
The purpose is to make it possible to remove servers from the CLI after
waiting for their sessions to be terminated:
$ socat -t5 /path/to/socket - <<< "
disable server px/srv1
shutdown sessions server px/srv1
wait 2s srv-unused px/srv1
del server px/srv1"
Or even wait for connections to terminate themselves:
$ socat -t70 /path/to/socket - <<< "
disable server px/srv1
wait 1m srv-unused px/srv1
del server px/srv1"
Willy Tarreau [Fri, 9 Feb 2024 19:09:59 +0000 (20:09 +0100)]
MINOR: cli/wait: also pass up to 4 arguments to the external conditions
Conditions will need to have context, arguments etc from the command line.
Since these will vary with time (otherwise we wouldn't wait), let's just
pass them as text (possibly pre-processed). We're starting with 4 strings
that are expected to be allocated by strdup() and are always sent to free()
upon release.
Willy Tarreau [Fri, 9 Feb 2024 19:05:14 +0000 (20:05 +0100)]
MINOR: cli/wait: also support an unrecoverable failure status
Since we'll support waiting for an action to succeed or permanently
fail, we need the ability to return an unrecoverable failure. Let's
add CLI_WAIT_ERR_FAIL for this. A static error message may be placed
into ctx->msg to report to the user why the failure is unrecoverable.