MINOR: stream: Add http-buffer-request option in the waiting entities
When http-buffer-request option is set on a proxy, the processing will be
paused to wait the full request payload or a full buffer. So it is an entity
that block the processing, just like a rule or a filter that yields. So now,
it is reported as a waiting entity if an error or a timeout occurred.
To do so, an stream entity type is added for this option. There is no
pointer. And "waiting_entity" sample fetch returns the option name.
MINOR: stream: Save the entity waiting to continue its processing
When a rule or a filter yields because it waits for something to be able to
continue its processing, this entity is saved in the stream. If an error or
a timeout occurred, info on this entity may be retrieved via the
"waiting_entity" sample fetch, for instance to dump it in the logs. This
info may be useful to found root cause of some bugs because it is a way to
know the processing was temporarily stopped. This may explain timeouts for
instance.
MINOR: stream: Save the last filter evaluated interrupting the processing
It is very similar to the last evaluated rule. When a filter returns an
error that interrupts the processing, it is saved in the stream, in the
last_entity field, with the type 2. The pointer on filter config is
saved. This pointer never changes during runtime and is part of the proxy's
structure. It is an element of the filter_configs list in the proxy
structure.
"last_entity" sample fetch was update accordingly. The filter identifier is
returned, if defined. Otherwise the save pointer.
MINOR: stream: Replace last_rule_file/line fields by a more generic field
The last evaluated rule is now saved in a generic structure, named
last_entity, with a type to identify it. The idea is to be able to store
other kind of entity that may interrupt a specific processing.
The type of the last evaluated rule is set to 1. It will be replace later by
an enum to be more explicit. In addition, the pointer to the rule itself is
saved instead of its location.
The sample fetch "last_entity" was added to retrieve the information about
it. In this case, it is the rule localtion, the config file containing the
rule followed by the line where the rule is defined, separated by a
colon. This sample fetch is not documented yet.
MINOR: stream: Save last evaluated rule on invalid yield
When an action yields while it is not allowed, an internal error is
reported. This interrupts the processing. So info about the last evaluated
rule must be filled.
This patch may be bakcported if needed. If so, the commit ("MINOR: stream:
Save last evaluated rule on invalid yield") must be backported first.
BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
There were several flaws in the way the different timeouts were applied on
H1 connections. First, the H1C task handling timeouts was not created if no
client/server timeout was specified. But there are other timeouts to
consider. First, the client-fin/server-fin timeouts. But for frontend
connections, http-keey-alive and http-request timeouts may also be used. And
finally, on soft-stop, the close-spread-time value must be considered too.
So at the end, it is probably easier to always create a task to manage H1C
timeouts. Especially since the client/server timeouts are most often set.
Then, when the expiration date of the H1C's task must only be updated if the
considered timeout is set. So tick_add_ifset() must be used instead of
tick_add(). Otherwise, if a timeout is undefined, the taks may expire
immediately while it should in fact never expire.
Finally, the idle expiration date must only be considered for idle
connections.
This patch should be backported in all stable versions, at least as far as
2.6. On the 2.4, it will have to be slightly adapted for the idle_exp
part. On 2.2 and 2.0, the patch will have to be rewrite because
h1_refresh_timeout() is quite different.
Ilia Shipitsin [Wed, 30 Oct 2024 18:27:52 +0000 (19:27 +0100)]
CI: LibreSSL QUIC Interop: fix docker context
in the commit https://github.com/haproxy/haproxy/commit/98099287ee37e697d1d2aaf01e19107bd064c3c6
building docker was switched to URL, but I forgotten to change context.
Following previous commit, when glitch_cnt and glitch_rate data types were
implemented in c9c6b683f ("MEDIUM: stick-tables: add a new stored type for
glitch_cnt and glitch_rate"), newly exposed samples such as
table_glitch_cnt(), table_glitch_rate, src_glitch_cnt() and
src_glitch_rate() were documented but their definitions was missing in
supported keywords list.
DOC: config: add missing glitch_{cnt,rate} data types
When glitch_cnt and glitch_rate data types were implemented in c9c6b683f ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and
glitch_rate"), the data types list for "stick-table" keyword documentation
was overlooked.
BUG/MINOR: startup: don't dump polling info for master in verbose mode
As master-worker fork happens now before step_init_2(), when pollers are
initialized and polling settings and dumped then in verbose and in debug modes
to stdout, it turns out that master and worker dump its same polling
settings separately. This creates long and messy output in these modes.
Polling settings are the same for master and for worker process for the moment.
Even if they would diverge in future we are interested here in worker's
settings. So, when started in the master-worker mode let's dump it only in the
worker context.
This doesn't need to be backported as related to the latest master-worker
refactoring.
BUG/MINOR: startup: dump keywords only in worker if started with -W -dKAll
If haproxy was started with -W -dK*, after master-worker refactoring, we dump
registered keywords to stdout twice in master and in worker processes. This
information is redundant and output has no longer the right format. So, as the
keyword registration happens very early before the fork, let's dump keywords
only in the worker context, if haproxy was launched with -W.
This does not need to be backported, as related to the latest master-worker
refactoring.
BUG/MINOR: startup: dump libs only in worker if started with -W -dL
If haproxy was started with -W -dL, after master-worker refactoring we dump
libs to stdout twice in master and in worker processes. This is information is
redundant. So let's show linked libraries only in the worker context, if
haproxy was started also with -W.
This does not need to be backported, as related to the latest master-worker
rework.
BUG/MINOR: startup: don't fork worker if started with -c -W
Don't do master-worker fork if MODE_CHECK is detected from the command line along
with the master-worker mode. We should exit in MODE_CHECK, after the
configuration parsing and validation. So, with the new master-worker architecture
it's better to align this mode with the standalone.
This patch does not need to be backported, as related to the latest
master-worker rework.
BUG/MINOR: errors: print_message: don't allocate startup logs ring
Don't call startup_logs_init() in order to allocate the startup logs ring
again, if startup_logs pointer is NULL. Startup logs ring is allocated
explicitly in step_init_1 routine, when the process starts, and it's freed
explicitly for master process at the end of mworker_reexec scope. So, when
we no longer have this pointer, let's just save the log message in the
message buffer.
Otherwise, in case of master process, we will allocate the startup logs ring
again here and we will lost its address after execvp.
No need to backport this fix as it's related to the latest master-worker
refactoring.
BUG/MINOR: errors: startup_logs_free: set global startup_logs ptr to NULL
ring_free() calls free() on the ring struct pointer, but startup_logs continues
to keep this address. So let's reset at the end startup_logs to NULL.
startup_logs is checked in print_message().
No need to backport this fix, as it's related to the latest master-worker
refactoring.
BUG/MINOR: mworker: mworker_reexec: unset MODE_STARTING before free startup logs ring
Flag MODE_STARTING should be unset for master just before freeing the startup
logs ring, as it triggers the copy of process logs to this ring, see the code
of print_message().
Moreover with this flag set, if startup logs ring pointer is NULL, any
print_message() triggered just before the execvp in mworker_reexec() will call
startup_logs_init(). So ring will be allocated again "discretely" and after
execvp we will lost its address, as in step_init_1() we will call again
startup_logs_init().
No need to backport this fix as it's related to the latest master-worker
refactoring.
BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly
Since commit 089c13850f ("MEDIUM: ssl: ssl-load-extra-del-ext work
only with .crt"), the 'set ssl cert' CLI command does not check
correctly if the transaction you are trying to update is the right one.
The consequence is that you could commit accidentaly a transaction on
the wrong certificate.
The fix introduces the check again in case you are not using
ssl-load-extra-del-ext.
Tristan [Mon, 28 Oct 2024 15:23:31 +0000 (16:23 +0100)]
MEDIUM: socket: add zero-terminated ABNS alternative
When an abstract unix socket is bound by HAProxy (using "abns@" prefix),
NUL bytes are appended at the end of its path until sun_path is filled
(for a total of 108 characters).
Here we add an alternative to pass only the non-NUL length of that path
to connect/bind calls, such that the effective path of the socket's name
is as humanly written. This may be useful to interconnect with existing
softwares that implement abstract sockets with this logic instead of the
default haproxy one.
This is achieved by implementing the "abnsz" socket prefix (instead of
"abns"), which stands for "zero-terminated ABNS". "abnsz" prefix may be
used anywhere "abns" is. Internally, haproxy uses the custom socket
family (AF_CUST_ABNS vs AF_CUST_ABNSZ) to differentiate default abns
sockets from zero-terminated ones.
MEDIUM: sock_unix: use per-family addrcmp function
Thanks to previous commit, we may now use dedicated addrcmp functions for
each UNIX address family. This allows to simplify sock_unix_addrcmp()
function and avoid useless checks in order to try to guess the socket
type.
In this patch we implement sock_abns_addrcmp() and sock_abnsz_addrcmp()
functions, which are respectively used for ABNS and ABNSZ custom families
sock_unix_addrcmp() now only holds regular UNIX socket comparing logic.
MEDIUM: sock: also restore effective unix family in get_{src,dst}()
As in previous commit, let's push the logic a bit further in order to
properly restore the effective UNIX socket type when leveraging
get_src() and get_dst() sock functions, since they rely on getpeername()
and getsockname() under the hood, both of which will actually loose the
effective family and return AF_UNIX for all our custom UNIX sockets.
To do this, add sock_restore_unix_family() helper function from the logic
implemented in the previous commit, and call this function from get_src()
and get_dst() in case of unix socket prior to returning.
MINOR: sock: restore effective UNIX family in sock_get_old_sockets()
When getting sockets from older process in sock_get_old_sockets(), we
leverage getsockname() to fill sockaddr struct from known fd.
However, the kernel doesn't know about our custom UNIX families such
as CUST_ABNS and CUST_ABNSZ which are both based on AF_UNIX real family.
Since haproxy socket API relies on effective family (and not real family)
to recognize the socket type instead of having to guess it by analyzing
the path content, let's restore it right after getsockname() since we
have all the infos needed to deduce the right family.
If the path starts with a NULL byte, we know that it is an abstract sock.
Then we simply check <addrlen> value from getsockname() to know if the
addr makes uses of the whole path space (normal ABNS) or partial path
space (zero ABNS / aka ABNZ) terminated by 0.
Willy Tarreau [Fri, 9 Aug 2024 19:12:23 +0000 (21:12 +0200)]
MINOR: protocol: create abnsz socket address family
For now it's the same as abns. We'll need to modify sock_unix_addrcmp(),
and a few other ones to support effective path length when dealing with
the \0. Let's check with Tristan's patch for this (upcoming patch).
CLEANUP: tools: rely on address family to detect ABNS sockets
Following previous commit, in str2sa_range(), make use of address' family
which was just set to check if the socket is ABNS or not instead of
relying on an extra boolean to save this info.
MEDIUM: protocol: rely on AF_CUST_ABNS family to recognize ABNS sockets
Now that we can easily distinguish regular UNIX socket from ABNS sockets
by simply looking at the address family, stop looking at the first byte
from addr->sun_path to guess if the socket is an ABNS one or not. Looking
at the family is straightforward and will allow to differentiate between
upcoming ABNSZ and ABNS (where looking at the first byte from path won't
help anymore).
Willy Tarreau [Fri, 9 Aug 2024 16:48:14 +0000 (18:48 +0200)]
MEDIUM: protocol: make abns a custom unix socket address family
This is a pre-requisite to adding the abnsz socket address family:
in this patch we make use of protocol API rework started by 732913f
("MINOR: protocol: properly assign the sock_domain and sock_family") in
order to implement a dedicated address family for ABNS sockets (based on
UNIX parent family).
Thanks to this, it will become trivial to implement a new ABNSZ (for abns
zero) family which is essentially the same as ABNS but with a slight
difference when it comes to path handling (ABNS uses the whole sun_path
length, while ABNSZ's path is zero terminated and evaluation stops at 0)
It was verified that this patch doesn't break reg-tests and behaves
properly (tests performed on the CLI with show sess and show fd).
Anywhere relevant, AF_CUST_ABNS is handled alongside AF_UNIX. If no
distinction needs to be made, real_family() is used to fetch the proper
real family type to handle it properly.
Both stream and dgram were converted, so no functional change should be
expected for this "internal" rework, except that proto will be displayed
as "abns_{stream,dgram}" instead of "unix_{stream,dgram}".
When using trace with -dt, the trace_parse_cmd() function is doing a
strtok which write \0 into the argv string.
When using the mworker mode, and reloading, argv was modified and the
trace won't work anymore because the first : is replaced by a '\0'.
This patch fixes the issue by allocating a temporary string so we don't
modify the source string directly. It also replace strtok by its
reentrant version strtok_r.
Willy Tarreau [Mon, 28 Oct 2024 16:51:29 +0000 (17:51 +0100)]
DEV: gdb: add a number of gdb scripts to navigate in core dumps
These is a collection of functions I'm occasionally using to navigate
in core dumps. Only working ones were extracted.
Those requiring knowledge of global variables (e.g. pools, proxy list)
use the one extracted from the post_mortem struct. That one is defined
in post-mortem.gdb and needs to be initialized using "pm_init post_mortem"
or "pm_init <pointer>". From this point a number of global variables are
accessible even if symbols are missing; those ones are then used by other
functions to dump streams, threads, pools, proxies etc.
The files can be sourced or copy-pasted into a gdb session. It's worth
trying to keep them up-to-date, as the old ones used to navigate through
tasks are no longer usable due to massive changes.
Willy Tarreau [Mon, 28 Oct 2024 06:47:23 +0000 (07:47 +0100)]
MINOR: debug: remove the redundant process.thread_info array from post_mortem
That one is huge and unneeded since we now have the pointer to the
whole thread_info[] array, which does contain the freshest version
of these info and many more. Let's just get rid of it entirely.
Willy Tarreau [Mon, 28 Oct 2024 16:10:19 +0000 (17:10 +0100)]
DOC: design: add notes about more detailed error reporting for logs
These are the notes of a day long code analysis session (CFA+WTA)
aimed at figuring what's missing during most code troubleshooting
sessions. The goal is to provide good indications about what rules/
filters were still active when the processing ended (timeout, error
etc), what subscribers are still active (indicating waiting for an
event), and what shut/abort events were met at the various levels
of each side's stack, in each direction.
CLEANUP: log: use strnlen2() in _lf_text_len() to compute string length
Thanks to previous commit, we can now use strnlen2() function to perform
strnlen() portable equivalent instead of re-implementing the logic under
_lf_text_len() function.
strnlen2() is functionally equivalent to strnlen(). Goal is to provide
an alternative to strnlen() which is not portable since it requires
_POSIX_C_SOURCE >= 200809L
BUG/MEIDUM: mworker: fix fd leak from master to worker
During re-execution master keeps always opened "reload" sockpair FDs and
shared sockpair ipc_fd[0], the latter is using to transfert listeners sockets
from the previously forked worker to the new one. So, these master's FDs are
inherited in the newly forked worker and must be closed in its context.
"reload" sockpair inherited FDs and shared sockpair FD (ipc_fd[0]) are closed
separately, becase master doesn't recreate "reload" sockpair each time after
its re-exec. It always keeps the same FDs for this "reload" sockpair. So in
worker context it can be closed immediately after the fork.
At contrast, shared sockpair is created each time after reload, when the new
worker will be forked. So, if N previous workers are still exist at this moment,
the new worker will inherit N ipc_fd[0] from master. So, it's more save to
close all these FDs after get_listeners_fd() and bind_listeners() calls.
Otherwise, early closed FDs in the worker context will be immediately bound to
listeners and we could potentially have some bugs.
CLEANUP: mworker: make mworker_create_master_cli more readable
Using nested 'if' operator, while checking if we will need to allocate again the
"reload" sockpair, does not degrade performance, as mworker_create_master_cli is
a startup routine.
This nested 'if' (we check one condition in each operator) makes more visible the
fact, that the "reload" sockpair is allocated only once, when the master process
starts and it does not re-allocated again (hence, its FDs are not closed) during
reloads. This way of checking multiple conditions here makes more easy to spot
this fact, while analysing the code in order to investigate FD leaks between
master and worker.
Willy Tarreau [Sat, 26 Oct 2024 09:33:09 +0000 (11:33 +0200)]
MINOR: debug: also add a pointer to struct global to post_mortem
The pointer to struct global is also an important element to have in
post_mortem given that it's used a lot to take decisions in the code.
Let's just add it. It's worth noting that we could get rid of argc/argv
at this point since they're also present in the global struct, but they
don't cost much there anyway.
Willy Tarreau [Thu, 24 Oct 2024 13:14:55 +0000 (15:14 +0200)]
MINOR: debug: do not limit backtraces to stuck threads
Historically for size limitation reasons, we would only dump the
backtrace of stuck threads. The problem is that when triggering
a panic or other reasons, we have no backtrace, which effectively
limits it to the watchdog timer. It's also visible in "show threads"
which used to report backtraces for all threads in 2.4 and displays
none nowadays, making its use much more limited.
A first approach could be to just dump the thread that triggers the
panic (in addition to stuck threads). But that remains quite limited
since "show threads" would still display nothing. This patch takes a
better approach consisting in dumping all non-idle threads. This way
the output is less polluted that with the older approach (no need to
dump all those waiting in the poller), and all active threads are
visible, in panics as well as in "show threads". As such, the CLI
command "debug dev panic" now dmups backtraces again. This is already
a benefit which will ease testing of various locations against the
ability to resolve useful symbols.
Willy Tarreau [Thu, 24 Oct 2024 12:37:12 +0000 (14:37 +0200)]
MINOR: debug: store important pointers in post_mortem
Dealing with a core and a stripped executable is a pain when it comes
to finding pools, proxies or thread contexts. Let's put a pointer to
these heads and arrays in the post_mortem struct for easier location.
Other critical lists like this could possibly benefit from being added
later.
Here we now have:
- tgroup_info
- thread_info
- tgroup_ctx
- thread_ctx
- pools
- proxies
Willy Tarreau [Thu, 24 Oct 2024 09:59:32 +0000 (11:59 +0200)]
MINOR: debug: place the post_mortem struct in its own section.
Placing it in its own section will ease its finding, particularly in
gdb which is too dumb to find anything in memory. Now it will be
sufficient to issue this:
Willy Tarreau [Thu, 24 Oct 2024 09:56:07 +0000 (11:56 +0200)]
MINOR: debug: place a magic pattern at the beginning of post_mortem
In order to ease finding of the post_mortem struct in core dumps, let's
make it start with a recognizable pattern of exactly 32 chars (to
preserve alignment):
Willy Tarreau [Thu, 24 Oct 2024 12:11:06 +0000 (14:11 +0200)]
CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data()
During 11th and 12th iteration of the development cycle for the H2 auto
rx window, several approaches were attempted to figure if another buffer
could be allocated or not. One of them consisted in looping back to the
beginning of the function requesting a new buffer slot and getting one
if the buffer was either apparently or confirmed full. The latest one
consisted in directly allocating the next buffer from the two places
where it's found to be proven full, instead of checking with the now
defunct h2s_may_get_rxbuf() if we were allowed to get once an loop.
That approach was retained. In this case the "full" variabled is no
longer needed, so let's get rid of it because the construct looks bogus
and confuses coverity (and possibly code readers as the intent is unclear
compared to the code).
Willy Tarreau [Thu, 24 Oct 2024 13:04:25 +0000 (15:04 +0200)]
BUILD: debug: silence a build warning with threads disabled
Commit 091de0f9b2 ("MINOR: debug: slightly change the thread_dump_pointer
signification") caused the following warning to be emitted when threads
are disabled:
src/debug.c: In function 'ha_thread_dump_one':
src/debug.c:359:9: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
Let's just disguise the pointer to silence it. It should be backported
where the patch above was backported, since it was part of a series aiming
at making thread dumps more exploitable from core dumps.
BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side
There is no reason to disable the 0-copy data forwarding if an end-of-stream
was reported on the consumer side. Indeed, the consumer will send data in
this case. So there is no reason to check the read side here.
BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding
When the response forwarding is aborted, we must not report a client abort
if a EOS was seen on client side. On abort performed by the stream must be
considered.
This bug was introduced when the SHUTR was splitted in 2 flags.
BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
When some data must be sent to the endpoint but an error was previously
reported, nothing is performed and we leave. But, in this case, the SC is not
notified the sends are blocked.
It is indeed an issue if the endpoint reports an error after consuming all
data from the SC. In the endpoint the outgoing data are trashed because of
the error, but on the SC, everything was sent, even if an error was also
reported.
Because of this bug, it is possible to have outgoing data blocked at the SC
level but without any write timeout armed. In some cases, this may lead to
blocking conditions where the stream is never closed.
So now, when outgoing data cannot be sent because an previous error was
triggered, a blocked send is reported. This way, it is possible to report a
write timeout.
This patch should fix the issue #2754. It must be backported as far as 2.8.
Amaury Denoyelle [Wed, 23 Oct 2024 16:18:48 +0000 (18:18 +0200)]
BUG/MEDIUM: server: fix race on servers_list during server deletion
Each server is inserted in a global list named servers_list on
new_server(). This list is then only used to finalize servers
initialization after parsing.
On dynamic server creation, there is no issue as new_server() is under
thread isolation. However, when a server is deleted after its refcount
reached zero, srv_drop() removes it from servers_list without lock
protection. In the longterm, this can cause list corruption and crashes,
especially if multiple adjacent servers are removed in parallel.
To fix this, convert servers_list to a mt_list. This should not impact
performance as servers_list is not used during runtime outside of server
creation/deletion.
This should fix github issue #2733. Thanks to Chris Staite who first
found the issue here.
Amaury Denoyelle [Tue, 22 Oct 2024 09:02:15 +0000 (11:02 +0200)]
BUG/MINOR: server: fix dynamic server leak with check on failed init
If a dynamic server is added with check or agent-check, its refcount is
incremented after server keyword parsing. However, if add server fails
at a later stage, refcount is only decremented once, which prevented the
server to be fully released.
This causes a leak with a server which is detached from most of the
lists but still exits in the system.
This bug is considered minor as only a few conditions may cause a
failure in add server after check/agent-check initialization. This is
the case if there is a naming collision or the dynamic ID cannot be
generated.
To fix this, simply decrement server refcount on add server error path
if either check and/or agent-check are flagged as activated.
This bug is related to github issue #2733. Thanks to Chris Staite who
first found the leak.
There are two parts in mworker_cli_proxy_create(): allocating and setting up
MASTER proxy and allocating and setting up servers on ipc_fd[0] of the
sockpairs shared with workers.
So, let's split mworker_cli_proxy_create() into two functions respectively.
Each of them takes **errmsg as an argument to write an error message, which may
be triggered by some subcalls. The content of this errmsg will allow to extend
the final alert message shown to user, if these new functions will fail.
The main goals of this split is to allow to move these two parts independantly
in future and makes the code of haproxy initialization in haproxy.c more
transparent.
Before refactoring master-worker architecture, resources to setup master CLI
for the new worker process (shared sockpair, entry in proc_list) were created
in init() before parsing the configuration and binding listening sockets. So,
master during its re-exec has had to cleanup the new worker's ressources in
a case, when it fails at some initialization step before the fork.
Now fork happens very early and worker parses its configuration by itself. If
it fails during the initialization stage, all clean ups (deleting the fds of
the shared sockpair, proc_list cleanup) are performed in SIGCHLD handler up to
catching the SIGCHLD corresponded to this new worker. So, there is no longer
need to call mworker_cleanup_proc() in mworker_reexec().
As for mworker_cleanlisteners(), there is no longer need to call this function.
Master parses now only "global" and "program" sections, so it allocates only
MASTER proxy, which is stopped in mworker_reexec() by mworker_cli_proxy_stop().
Let's keep the definitions of mworker_cleanlisteners() and
mworker_cleanup_proc() in mworker.c for the moment. We may reuse parts of its
code later.
BUG/MINOR: mworker: show worker warnings in startup logs
As master-worker fork happens now at early init stage and worker then parses
its configuration and performs all initialization steps, let's duplicate
startup logs ring for it, just before the moment when it enters in its pollong
loop. Startup logs ring content is shown as an output of the "reload" master
CLI command and we should be able to dump here worker initialization logs.
Log messages are written in startup logs ring only, when mode MODE_STARTING is
set (see print_message()). So, to be able to keep in startup logs the last
worker alerts, let's withdraw MODE_STARTING and let's reset user messages
context respectively just before entering in polling loop.
This fix does not need to be backported as it is a part of previous patches
from this version, which refactor master-worker architecture.
This patch simplifies the code of startup_logs_init_shm(). We no longer re-exec
master process twice after each reload to free its unused memory, which it had
to allocate, because it has parsed all configuration sections. So, there is no
longer need to keep SHM fd opened between the first and the next reloads. We
can completely remove HAPROXY_STARTUPLOGS_FD.
In step_init_1() we continue to call startup_logs_init_shm() to open SHM and to
allocate startup logs ring area within it. In master-worker mode, worker
duplicates initial startup logs ring after sending its READY state to master.
Sharing the same ring between two processes until the worker finishes its
initialization allows to show at master CLI output worker's startup logs.
During the next reload master process should free the memory allocated for the
ring structure. Then after the execvp() it will reopen and map SHM area again
and it will reallocate again the ring structure.
MINOR: mworker: stop MASTER proxy listener on worker mcli sockpair
After sending its "READY" status worker should not keep the access
to MASTER proxy, thus, it shouldn't be able to send any other commands further
to master process.
To achieve this, let's stop in master context master CLI listener attached on
the sockpair shared with worker. We do this just after receiving the worker's
status message.
BUG/MINOR: mworker/cli: show master startup logs in recovery mode
When master enters in recovery mode after unsuccessfull reload
HAPROXY_LOAD_SUCCESS should be set as 0. Like this
cli_io_handler_show_cli_sock() could dump in master CLI its warnings and alerts,
saved in startup logs ring.
No need to backport this fix, as this is related to the previous patches in
this version to refactor master-worker architecture.
Willy Tarreau [Thu, 24 Oct 2024 08:46:06 +0000 (10:46 +0200)]
MINOR: activity/memprofile: show per-DSO stats
On systems where many libs are loaded, it's hard to track suspected
leaks. Having a per-DSO summary makes it more convenient. That's what
we're doing here by summarizing all calls per DSO before showing the
total.
DEBUG: mux-h1: Add debug counters to track errors with in/out pending data
Debug counters were added on all connection error when pending data remain
blocked in the input or ouput buffers. The same is performed when the H1C is
released, when the connection is closed and when a timeout is reached. Idea
is to be able to count all cases where data are lost, especially the
outgoing ones.
In 3.1-dev10, commit 9fbc01710a ("OPTIM: mux-h2: make h2_send() report
more accurate wake up conditions") leveraged the more accurate distinction
between demux and recv to decide when to wake the tasklet up after a send.
But other cases are needed. When we just need to wake the processing task
up so that it itself wakes up other streams, for example because these ones
are blocked. Indeed, a temporarily blocked stream may block other ones,
which will never be woken up if the demux has nothing to do.
In an ideal world we would check all cases where blocking flags were
dropped. However it looks like this case after a send is probably the
only one that deserves waking up the connection again. It's likely that
in practice the MUX_MFULL flag was dropped and that it was that one that
was blocking the send.
In addition, dealing with these cases was not sufficient, as one case was
encountered where dbuf was empty, subs=0, short_read still present while
in FRH state... and the timeouts were still there (easily found with
halog -tcn cD at a rate of 1-2 every 2 minutes roughly).
Interestingly, in a dump, some MBUF_HAS_DATA were seen on an empty mbuf,
so it means that certain conditions must be taken very carefully in the
wakeup conditions.
So overall this indicates that there remain subtle inconsistencies that
this optimization is sensitive to. It may have to be revisited later but
for now better revert it.
No backport is needed.
Annex:
- first dump showing a dependency on WAIT_INLIST after h2_send():
Willy Tarreau [Wed, 23 Oct 2024 13:10:45 +0000 (15:10 +0200)]
BUILD: spoe: fix build warning on older gcc around sub-struct initialization
gcc-4.8 is unhappy with the cfg_file initialization:
src/flt_spoe.c: In function 'parse_spoe_flt':
src/flt_spoe.c:2202:9: warning: missing braces around initializer [-Wmissing-braces]
struct cfgfile cfg_file = {0};
^
src/flt_spoe.c:2202:9: warning: (near initialization for 'cfg_file.list') [-Wmissing-braces]
This is due to the embedded list member. Initializing it to empty like
we do almost everywhere else makes it happy. No backport is needed as
this was changed in 3.1-dev5 only.
BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families
As described in GH #2765, there were situations where http connections
would be re-used for requests to different endpoints, which is obviously
unexpected. In GH #2765, this occured with httpclient and UNIX socket
combination, but later code analysis revealed that while disabling http
reuse on httpclient proxy helped, it didn't fix the underlying issue since
it was found that conn_calculate_hash_sockaddr() didn't take into account
families such as AF_UNIX or AF_CUST_SOCKPAIR, and because of that the
sock_addr part of the connection wasn't hashed.
To properly fix the issue, let's explicly handle UNIX (both regular and
ABNS) and AF_CUST_SOCKPAIR families, so that the destination address is
properly hashed. To prevent this bug from re-appearing: when the family
isn't known, instead of doing nothing like before, let's fall back to a
generic (unoptimal) hashing which hashes the whole sockaddr_storage struct
As a workaround, http-reuse may be disabled on impacted proxies.
(unfortunately this doesn't help for httpclient since reuse policy
defaults to safe and cannot be modified from the config)
It should be backported to all stable versions.
Shout out to @christopherhibbert for having reported the issue and
provided a trivial reproducer.
[ada: prior to 3.0, ctx adjt is required because conn_hash_update()'s
prototype is slightly different]
Willy Tarreau [Tue, 22 Oct 2024 17:43:18 +0000 (19:43 +0200)]
MINOR: sample: add the "when" converter to condition some expressions
Sometimes it would be desirable to include some debugging output only
under certain conditions, but the end of the transfer is too late to
apply some rules.
Here we take the approach of making a converter ("when") that takes a
condition among an arbitrary list, and decides whether or not to let
the input sample pass through or not based on the condition. This
allows for example to log debugging information only when an error
was encountered during the processing (sort of an extension of
dontlog-normal). The conditions are quite limited (stopping, error,
normal, toapplet, forwarded, processed) and can be negated. The
converter can also be chained to use more complex conditions.
A suggested example will be:
# log "dbg={-}" when fine, or "dbg={... debug info ...}" on error:
log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"
Willy Tarreau [Tue, 22 Oct 2024 14:57:03 +0000 (16:57 +0200)]
MINOR: filters: add per-filter call counters
The idea here is to record how many times a filter is being called on a
stream. We're incrementing the same counter all along, regardless of the
type of event, since the purpose is essentially to detect one that might
be misbehaving. The number of calls is reported in "show sess all" next
to the filter name. It may also help detect suboptimal processing. For
example compressing 1GB shows 138k calls to the compression filter, which
is roughly two calls per buffer. Maybe we wake up with incomplete buffers
and compress less. That's left for a future analysis.
Willy Tarreau [Tue, 22 Oct 2024 14:35:04 +0000 (16:35 +0200)]
MINOR: stream: maintain per-stream counters of the number of passes on code
Process_stream() is a complex function and a few times some lopos were
either witnessed or suspected. Each time this happens it's extremely
difficult to figure why because it involves combinations of analysers,
filters, errors etc.
Let's at least maintain a set of 4 counters per stream that report the
number of times we've been through each of the 4 most important blocks
(stconn changes, request analysers, response analysers, and propagation
of changes down). These ones are stored in the stream and reported in
"show sess all", just like they will be reported in panic dumps.
DEBUG: mux-h1: Add debug counters to track some errors
Debug counters are added to track errors about wrong the payload length
during the message formatting (on the sending path). Aborts are also
concerned. connection shutdowns and errors while the end of the message was
not reached are now tracked. On the sending path, shutdown performed while
all the message was not forwarded are tracked too.
DEBUG: stream: Add debug counters to track some client/server aborts
Not all aborts are tracked for now but only those a bit ambiguous. Mainly,
aborts during the data forwarding are concerned. Those triggered during the
request or the response analysis are easier to analyze with the stream
termination state.
BUG/MINOR: stconn: Pretend the SE have more data to deliver on abortonclose
When abortonclose option is enabled on the backend, at the SC level, we must
still pretend the SE have more data to deliver to be able to receive the
EOS. It must be performed at 2 places:
* When the backend is set and the connection is requested. It is when the
option is seen for the first time.
* After a receive attempt, if the EOI flag is set on the sedesc.
Otherwise, when an abort is detected by the mux, the SC is not
notified.
This patch should fix the issue #2764.
This bug probably exists in all stable version but is only visible since bca5e1423 ("OPTIM: stconn: Don't pretend mux have more data to deliver on
EOI/EOS/ERROR"). So I suggest to not backport it for now, except if the commit
above is backported.
BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF
When data are sent via the zero-copy data forwarding, in h2_done_ff, we must
be sure to remove the H2 stream from the send list if something is send. It
was only performed if no blocking condition was encountered. But we must
also do it if something is sent. Otherwise the transfer may be blocked till
timeout.
BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF
During the zero-copy data forwarding, the caller specify the maximum amount
of data the producer may push. However, the HTML stats applet does not use
it and can fill all the free space in the buffer. It is especially an issue
when the consumer is limited by a flow control, like the H2. Because we may
emit too large DATA frame in this case. It is especially visible with big
buffer (for instance 32k).
In the early age or zero-copy data forwarding, the caller was responsible to
pass a properly resized buffer. And during the different refactoring steps,
this has changed but the HTML stats applet was not updated accordingly.
To fix the bug, the buffer used to dump the HTML page is resized to be sure
not too much data are dumped.
This patch should solve the issue #2757. It must be backported to 3.0.
Willy Tarreau [Mon, 21 Oct 2024 16:51:55 +0000 (18:51 +0200)]
MINOR: debug: add "debug dev counters" to list code counters
Issuing "debug dev counters" on the CLI will now scan all existing
counters, and report their count, type, location, function name, the
condition and an optional comment passed to the macro.
The command takes a number of arguments:
- "show": this is the default, it will just list the counters
- "reset": will reset the matching counters instead of listing them
- "all": by default, only non-zero counters are listed. With "all",
they are all listed
- "bug": restrict the reset or dump to counters of type "BUG" (BUG_ON usually)
- "chk": restrict the reset or dump to counters of type "CHK" (CHECK_IF)
- "cnt": restrict the reset or dump to counters of type "CNT" (COUNT_IF)
The types may be cumulated, and the option entered in any order. Here's
an example of the output of "debug dev counters show all bug":
Willy Tarreau [Mon, 21 Oct 2024 16:34:21 +0000 (18:34 +0200)]
MINOR: debug: add a new debug macro COUNT_IF()
This macro works exactly like BUG_ON() except that it never logs anything
nor crashes, it only implements an atomic counter that is incremented on
every call. This can be used to count a number of unlikely events that are
worth checking at run time on setups showing unusual and unreproducible
behaviors.
Willy Tarreau [Mon, 21 Oct 2024 16:29:00 +0000 (18:29 +0200)]
MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF
These macros do not always kill the process, and sometimes it would be
nice to know if some match or not, and how many times (especially for the
CHECK_IF one).
This commit adds a new section "dbg_cnt" made of structs that contain
function name, file name, line number, check type, condition and match
count. A newe macro __DBG_COUNT() adds one to the counter, and is placed
inside _BUG_ON() and _BUG_ON_ONCE(). It's worth noting that the exact
type of the check is not very precise but in practice we don't care,
as most checks will cause the process to die anyway unless they're of
type _BUG_ON_ONCE() (used by CHECK_IF by default).
All of this is limited to !defined(USE_OBSOLETE_LINKER) because we're
creating a section, thus we need a modern linker to be able to scan
this section later. Doing so adds ~50kB to the executable due to the
~1266 BUG_ON() and others placed there. That's not huge in comparison
to the visibility it can provide.
Willy Tarreau [Mon, 21 Oct 2024 16:17:25 +0000 (18:17 +0200)]
CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one
The BUG_ON() macros are made of two levels so as to resolve the condition
to a string. However this doesn't offer much flexibility for performing
other operations when the condition is validated, so let's adjust them so
that the condition is checked in the outer macro and the operations are
performed in the inner one.
Amaury Denoyelle [Fri, 18 Oct 2024 15:46:06 +0000 (17:46 +0200)]
BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
A stream may be shut without any HTX EOM reported to report a proper
closure. This is the case for QCS instances flagged with
QC_SF_UNKNOWN_PL_LENGTH. Shut is performed with an empty FIN emission
instead of a RESET_STREAM. This has been implemented since the following
patch :
However, in case of HTTP/3, an empty FIN should only be done after a
full message is emitted, which requires at least a HEADERS frame. If an
empty FIN is emitted without it, client may interpret this as invalid
and close the connection. To prevent this, fallback to a RESET_STREAM
emission if no data were emitted on the stream.
This was reproduced using ngtcp2-client with 10% loss (-r 0.1) on a
remote host, with httpterm request "/?s=100k&C=1&b=0&P=400". An error
ERR_H3_FRAME_UNEXPECTED is returned by ngtcp2-client when the bug
occurs.
Note that this change is incomplete. The message validity depends solely
on the application protocol in use. As such, a new app_ops callback
should be implemented to ensure the stream is closed accordingly.
However, this first patch ensures that at least HTTP/3 case is valid
while keeping a minimal backport process.
Amaury Denoyelle [Mon, 21 Oct 2024 08:28:18 +0000 (10:28 +0200)]
MINOR: mux-quic: simplify sending of empty STREAM FIN
An empty STREAM frame can be emitted by QUIC MUX to notify about a
delayed FIN when there is no data left to transmit. This requires a
tedious comparison on stream offset in qmux_ctrl_send() to ensure an
empty stream frame is not always considered as retransmitted, which is
necessary to locally close the QCS instance.
Simplify this by unsubscribe from streamdesc layer when the QCS is
locally closed on FIN transmission notification. This prevents all
future retransmitted frames to be reported to the QCS instance,
especially any potentially retransmitted empty FIN.
Before this patch, when wrong argument was provided in the configuration for
mworker-max-reloads keyword, parser shows these errors below on the stderr:
So, as 'mworker-max-reloads' is parsed in discovery mode by master process
let's align now its parser with all others, which could be called for this
mode. Like this in cases, when there are too many args or argument isn't a
valid integer we return proper error codes to global section parser and
messages are formated properly.
This fix should be backported in all stable versions.
Ilya Shipitsin [Thu, 17 Oct 2024 21:02:40 +0000 (23:02 +0200)]
CI: bump development builds explicitely to Ubuntu 24.04
Initially we agreed to split builds into "latest" for development branch
and fixed 22.04 for stable branches. It got broken when "latest" label migrated
from ubuntu-22 to ubuntu-24 ... because of build cache. Cache key is built using
runner label, it was not prepared to use the same "latest" cache from ubuntu 22
on ubuntu 24. To make things clear, let's stick explicitely to ubuntu 24.
Ilya Shipitsin [Thu, 17 Oct 2024 21:02:39 +0000 (23:02 +0200)]
CI: prepare Coverity build for Ubuntu 24
PCRE2 is recommended, PCRE was chosen for no reason. GHA Ubuntu 22 images include both libs,
but recent Ubuntu 24 does not. Let us prepare for Ubuntu 24
Willy Tarreau [Mon, 21 Oct 2024 02:17:59 +0000 (04:17 +0200)]
BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame
Commit cf3fe1eed ("MINOR: mux-h2/traces: print the size of the DATA
frames") added the size of the DATA frame to the traces. Unfortunately
it uses ullong instead of ulong to cast a pointer, which breaks the
build on 32-bit platforms. Let's just switch it to ulong which works
on both.
Willy Tarreau [Sat, 19 Oct 2024 12:53:11 +0000 (14:53 +0200)]
MEDIUM: debug: on panic, make the target thread automatically allocate its buf
One main problem with panic dumps is that they're filling the dumping
thread's trash, and that the global thread_dump_buffer is too small to
catch enough of them.
Here we're proceeding differently. When dumping threads for a panic, we're
passing the magic value 0x2 as the buffer, and it will instruct the target
thread to allocate its own buffer using get_trash_chunk() (which is signal
safe), so that each thread dumps into its own buffer. Then the thread will
wait for the buffer to be consumed, and will assign its own thread_dump_buffer
to it. This way we can simply dump all threads' buffers from gdb like this:
(gdb) set $t=0
while ($t < global.nbthread)
printf "%s\n", ha_thread_ctx[$t].thread_dump_buffer.area
set $t=$t+1
end
For now we make it wait forever since it's only called on panic and we
want to make sure the thread doesn't leave and continues to use that trash
buffer or do other nasty stuff. That way the dumping thread will make all
of them die.
This would be useful to backport to the most recent branches to help
troubleshooting. It backports well to 2.9, except for some trivial
context in tinfo-t.h for an updated comment. 2.8 and older would also
require TAINTED_PANIC. The following previous patches are required:
MINOR: debug: make mark_tainted() return the previous value
MINOR: chunk: drop the global thread_dump_buffer
MINOR: debug: split ha_thread_dump() in two parts
MINOR: debug: slightly change the thread_dump_pointer signification
MINOR: debug: make ha_thread_dump_done() take the pointer to be used
MINOR: debug: replace ha_thread_dump() with its two components
Willy Tarreau [Sat, 19 Oct 2024 12:15:20 +0000 (14:15 +0200)]
MINOR: debug: replace ha_thread_dump() with its two components
At the few places we were calling ha_thread_dump(), now we're
calling separately ha_thread_dump_fill() and ha_thread_dump_done()
once the data are consumed.
Willy Tarreau [Sat, 19 Oct 2024 12:52:35 +0000 (14:52 +0200)]
MINOR: debug: make ha_thread_dump_done() take the pointer to be used
This will allow the caller to decide whether to definitely clear the
pointer and release the thread, or to leave it unlocked so that it's
easy to analyse from the struct (the goal will be to use that in panic()
so that cores are easy to analyse).
Willy Tarreau [Sat, 19 Oct 2024 11:53:39 +0000 (13:53 +0200)]
MINOR: debug: slightly change the thread_dump_pointer signification
Now the thread_dump_pointer is returned ORed with 1 once done, or NULL
when cancelled (for now noone cancels). The goal will be to permit
the callee to provide its own pointer.
The ha_thread_dump_fill() function now returns the buffer pointer that
was used (without OR 1) or NULL, for ease of use from the caller.
Willy Tarreau [Sat, 19 Oct 2024 11:45:57 +0000 (13:45 +0200)]
MINOR: debug: split ha_thread_dump() in two parts
We want to have a function to trigger the dump and another one to wait
for it to be completed. This will be important to permit panic dumps to
be done on local threads. For now this does not change anything, as the
function still calls the two new functions one after the other.
Willy Tarreau [Sat, 19 Oct 2024 13:18:44 +0000 (15:18 +0200)]
MINOR: chunk: drop the global thread_dump_buffer
This variable is not very useful and is confusing anyway. It was mostly
used to detect that a panic dump was still in progress, but we can now
check mark_tainted() for this. The pointer was set to one of the dumping
thread's trash chunks. Let's temporarily continue to copy the dumps to
that trash, we'll remove it later.