git.ipfire.org Git - thirdparty/haproxy.git/log

]> git.ipfire.org Git - thirdparty/haproxy.git/log

projects / thirdparty / haproxy.git / log

Olivier Houchard [Fri, 2 May 2025 11:46:54 +0000 (11:46 +0000)]

MEDIUM: stick-tables: defer adding updates to a tasklet

There is a lot of contention trying to add updates to the tree. So
instead of trying to add the updates to the tree right away, just add
them to a mt-list (with one mt-list per thread group, so that the
mt-list does not become the new point of contention that much), and
create a tasklet dedicated to adding updates to the tree, in batchs, to
avoid keeping the update lock for too long.
This helps getting stick tables perform better under heavy load.

commit | commitdiff | tree

Olivier Houchard [Fri, 2 May 2025 11:29:05 +0000 (11:29 +0000)]

MEDIUM: peers: Give up if we fail to take locks in hot path

In peer_send_msgs(), give up in order to retry later if we failed at
getting the update read lock.
Similarly, in __process_running_peer_sync(), give up and just reschedule
the task if we failed to get the peer lock. There is an heavy contention
on both those locks, so we could spend a lot of time trying to get them.
This helps getting peers perform better under heavy load.

commit | commitdiff | tree

Aurelien DARRAGON [Fri, 2 May 2025 11:56:08 +0000 (13:56 +0200)]

MINOR: hlua: ignore "tune.lua.bool-sample-conversion" if set after "lua-load"

tune.lua.bool-sample-conversion must be set before any lua-load or
lua-load-per-thread is used for it to be considered. Indeed, lua-load
directives are parsed on the fly and will cause some parts of the scripts
to be executed during init already (script body/init contexts).

As such, we cannot afford to have "tune.lua.bool-sample-conversion" set
after some Lua code was loaded, because it would mean that the setting
would be handled differently for Lua's code executed during or after
config parsing.

To avoid ambiguities, the documentation now states that the setting must
be set before any lua-load(-per-thread) directive, and if the setting
is met after some Lua was already loaded, the directive is ignored and
a warning informs about that.

It should fix GH #2957

It may be backported with 29b6d8af16 ("MINOR: hlua: rename
"tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"")

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 10:03:21 +0000 (12:03 +0200)]

DOC: acme: external account binding is not supported

Add a note on external account binding in the ACME section.

commit | commitdiff | tree

Willy Tarreau [Fri, 2 May 2025 08:55:43 +0000 (10:55 +0200)]

CLEANUP: tasks: use the local state, not t->state, to check for tasklets

There's no point reading t->state to check for a tasklet after we've
atomically read the state into the local "state" variable. Not only it's
more expensive, it's also less clear whether that state is supposed to
be atomic or not. And in any case, tasks and tasklets have their type
forever and the one reflected in state is correct and stable.

commit | commitdiff | tree

Willy Tarreau [Fri, 2 May 2025 08:34:16 +0000 (10:34 +0200)]

BUG/MAJOR: tasks: fix task accounting when killed

After recent commit b81c9390f ("MEDIUM: tasks: Mutualize the TASK_KILLED
code between tasks and tasklets"), the task accounting was no longer
correct for killed tasks due to the decrement of tasks in list that was
no longer done, resulting in infinite loops in process_runnable_tasks().
This just illustrates that this code remains complex and should be further
cleaned up. No backport is needed, as this was in 3.2.

commit | commitdiff | tree

Olivier Houchard [Thu, 1 May 2025 12:39:39 +0000 (14:39 +0200)]

BUG/MEDIUM: quic: Let it be known if the tasklet has been released.

quic_conn_release() may, or may not, free the tasklet associated with
the connection. So make it return 1 if it was, and 0 otherwise, so that
if it was called from the tasklet handler itself, the said handler can
act accordingly and return NULL if the tasklet was destroyed.
This should be backported if 9240cd4a2771245fae4d0d69ef025104b14bfc23
is backported.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 08:34:48 +0000 (10:34 +0200)]

MINOR: acme: delay of 5s after the finalize

Let 5 seconds by default to the server after the finalize to generate
the certificate. Some servers would not send a Retry-After during
processing.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 08:23:42 +0000 (10:23 +0200)]

MINOR: acme: emit a log when starting

Emit a administrative log when starting the ACME client for a
certificate.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 08:18:24 +0000 (10:18 +0200)]

MINOR: acme: wait 5s before checking the challenges results

Wait 5 seconds before trying to check if the challenges are ready, so it
let time to server to execute the challenges.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 08:16:12 +0000 (10:16 +0200)]

MINOR: acme: allow a delay after a valid response

Use the retryafter value to set a delay before doing the next request
when the previous response was valid.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 07:40:12 +0000 (09:40 +0200)]

MINOR: acme: change the default max retries to 5

Change the default max retries constant to 5 instead of 3.
Some servers can be be a bit long to execute the challenge.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 07:27:15 +0000 (09:27 +0200)]

BUG/MINOR: acme: reinit the retries only at next request

The retries were reinitialized incorrectly, it must be reinit only
when we didn't retry. So any valid response would reinit the retries
number.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 07:22:23 +0000 (09:22 +0200)]

MINOR: acme: does not leave task for next request

The next request was always leaving the task befor initializing the
httpclient. This patch optimize it by jumping to the next step at the
end of the current one. This way, only the httpclient is doing a
task_wakeup() to handle the response. But transiting from response to
the next request does not leave the task.

commit | commitdiff | tree

William Lallemand [Fri, 2 May 2025 07:15:07 +0000 (09:15 +0200)]

MINOR: acme: retry label always do a request

Doing a retry always result in initializing a request again, set
ACME_HTTP_REQ directly in the label instead of doing it for each step.

commit | commitdiff | tree

Willy Tarreau [Wed, 30 Apr 2025 16:25:28 +0000 (18:25 +0200)]

[RELEASE] Released version 3.2-dev13

Released version 3.2-dev13 with the following main changes :
    - MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb
    - MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process
    - MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut
    - MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb
    - MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed
    - MEDIUM: quic: Make sure we return the tasklet from quic_accept_run
    - BUG/MAJOR: tasklets: Make sure he tasklet can't run twice
    - BUG/MAJOR: listeners: transfer connection accounting when switching listeners
    - MINOR: ssl/cli: add a '-t' option to 'show ssl sni'
    - DOC: config: fix ACME paragraph rendering issue
    - DOC: config: clarify log-forward "host" option
    - MINOR: promex: expose ST_I_PX_RATE (current_session_rate)
    - BUILD: acme: use my_strndup() instead of strndup()
    - BUILD: leastconn: fix build warning when building without threads on old machines
    - MINOR: threads: prepare DEBUG_THREAD to receive more values
    - MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2
    - MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0
    - MINOR: threads/cli: display the lock history on "show threads"
    - MEDIUM: thread: set DEBUG_THREAD to 1 by default
    - BUG/MINOR: ssl/acme: free EVP_PKEY upon error
    - MINOR: acme: separate the code generating private keys
    - MINOR: acme: failure when no directory is specified
    - MEDIUM: acme: generate the account file when not found
    - MEDIUM: acme: use 'crt-base' to load the account key
    - MINOR: compiler: add more macros to detect macro definitions
    - MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags
    - MEDIUM: cli: make the prompt mode configurable between n/i/p
    - MEDIUM: mcli: make the prompt mode configurable between i/p
    - MEDIUM: mcli: replicate the current mode when enterin the worker process
    - DOC: configuration: acme account key are auto generated
    - CLEANUP: acme: remove old TODO for account key
    - DOC: configuration: add quic4 to the ssl-f-use example
    - BUG/MINOR: acme: does not try to unlock after a failed trylock
    - BUG/MINOR: mux-h2: fix the offset of the pattern for the ping frame
    - MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides
    - BUG/MINOR: acme: creating an account should not end the task
    - MINOR: quic: rename min/max fields for congestion window algo
    - MINOR: quic: refactor BBR API
    - BUG/MINOR: quic: ensure cwnd limits are always enforced
    - MINOR: thread: define cshared type
    - MINOR: quic: account for global congestion window
    - MEDIUM: quic: limit global Tx memory
    - MEDIUM: acme: use a map to store tokens and thumbprints
    - BUG/MINOR: acme: remove references to virt@acme
    - MINOR: applet: add appctx_schedule() macro
    - BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers
    - CLEANUP: dns: remove unused dns_stream_server struct member
    - BUG/MINOR: dns: prevent ds accumulation within dss
    - CLEANUP: proxy: mention that px->conn_retries isn't relevant in some cases
    - DOC: ring: refer to newer RFC5424
    - MINOR: tools: make my_strndup() take a size_t len instead of and int
    - MINOR: Add "sigalg" to "sigalg name" helper function
    - MINOR: ssl: Add traces to ssl init/close functions
    - MINOR: ssl: Add traces to recv/send functions
    - MINOR: ssl: Add traces to ssl_sock_io_cb function
    - MINOR: ssl: Add traces around SSL_do_handshake call
    - MINOR: ssl: Add traces to verify callback
    - MINOR: ssl: Add ocsp stapling callback traces
    - MINOR: ssl: Add traces to the switchctx callback
    - MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback
    - MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx
    - BUG/MEDIUM: mux-spop: Wait end of handshake to declare a spop connection ready
    - BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame
    - BUG/MINOR: mux-h1: Don't pretend connection was released for TCP>H1>H2 upgrade
    - BUG/MINOR: mux-h1: Fix trace message in h1_detroy() to not relay on connection
    - BUILD: ssl: Fix wolfssl build
    - BUG/MINOR: mux-spop: Use the right bitwise operator in spop_ctl()
    - MEDIUM: mux-quic: increase flow-control on each bufsize
    - MINOR: mux-quic: limit emitted MSD frames count per qcs
    - MINOR: add hlua_yield_asap() helper
    - MINOR: hlua_fcn: enforce yield after *_get_stats() methods
    - DOC: config: restore default values for resolvers hold directive
    - MINOR: ssl/cli: "acme ps" shows the acme tasks
    - MINOR: acme: acme_ctx_destroy() returns upon NULL
    - MINOR: acme: use acme_ctx_destroy() upon error
    - MEDIUM: tasks: Mutualize code between tasks and tasklets.
    - MEDIUM: tasks: More code factorization
    - MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead.
    - MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list
    - MEDIUM: tasks: Mutualize the TASK_KILLED code between tasks and tasklets
    - BUG/MEDIUM: connections: Report connection closing in conn_create_mux()
    - BUILD/MEDIUM: quic: Make sure we build with recent changes

commit | commitdiff | tree

Olivier Houchard [Wed, 30 Apr 2025 16:22:46 +0000 (18:22 +0200)]

BUILD/MEDIUM: quic: Make sure we build with recent changes

TASK_IN_LIST has been changed to TASK_QUEUED, but one was missed in
quic_conn.c, so fix that.

commit | commitdiff | tree

Olivier Houchard [Wed, 30 Apr 2025 11:19:38 +0000 (13:19 +0200)]

BUG/MEDIUM: connections: Report connection closing in conn_create_mux()

Add an extra parametre to conn_create_mux(), "closed_connection".
If a pointer is provided, then let it know if the connection was closed.
Callers have no way to determine that otherwise, and we need to know
that, at least in ssl_sock_io_cb(), as if the connection was closed we
need to return NULL, as the tasklet was free'd, otherwise that can lead
to memory corruption and crashes.
This should be backported if 9240cd4a2771245fae4d0d69ef025104b14bfc23
is backported too.

commit | commitdiff | tree

Olivier Houchard [Tue, 29 Apr 2025 13:46:20 +0000 (15:46 +0200)]

MEDIUM: tasks: Mutualize the TASK_KILLED code between tasks and tasklets

The code to handle a task/tasklet when it's been killed before it were
to run is mostly identical, so move it outside of task and tasklet
specific code, and inside the common code.

This commit is just cosmetic, and should have no impact.

commit | commitdiff | tree

Olivier Houchard [Tue, 29 Apr 2025 13:25:53 +0000 (15:25 +0200)]

MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list

Remove tasklet_remove_from_tasklet_list, as the function hasn't been
used for a long time, and there is little reason to keep it.

commit | commitdiff | tree

Olivier Houchard [Tue, 29 Apr 2025 13:24:54 +0000 (15:24 +0200)]

MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead.

TASK_QUEUED was used to mean "the task has been scheduled to run",
TASK_IN_LIST was used to mean "the tasklet has been scheduled to run",
remove TASK_IN_LIST and just use TASK_QUEUED for tasklets instead.

This commit is just cosmetic, and should not have any impact.

commit | commitdiff | tree

Olivier Houchard [Tue, 29 Apr 2025 13:15:27 +0000 (15:15 +0200)]

MEDIUM: tasks: More code factorization

There is some code that should run no matter if the task was killed or
not, and was needlessly duplicated, so only use one instance.
This also fixes a small bug when a tasklet that got killed before it
could run would still count as a tasklet that ran, when it should not,
which just means that we'd run one less useful task before going back to
the poller.
This commit is mostly cosmetic, and should not have any impact.

commit | commitdiff | tree

Olivier Houchard [Tue, 29 Apr 2025 13:15:01 +0000 (15:15 +0200)]

MEDIUM: tasks: Mutualize code between tasks and tasklets.

The code that checks if we're currently running, and waits if so, was
identical between tasks and tasklets, so move it in code common to tasks
and tasklets.
This commit is just cosmetic, and should not have any impact.

commit | commitdiff | tree

William Lallemand [Wed, 30 Apr 2025 15:18:46 +0000 (17:18 +0200)]

MINOR: acme: use acme_ctx_destroy() upon error

Use acme_ctx_destroy() instead of a simple free() upon error in the
"acme renew" error handling.

It's better to use this function to be sure than everything has been
been freed.

commit | commitdiff | tree

William Lallemand [Wed, 30 Apr 2025 15:17:58 +0000 (17:17 +0200)]

MINOR: acme: acme_ctx_destroy() returns upon NULL

acme_ctx_destroy() returns when its argument is NULL.

commit | commitdiff | tree

William Lallemand [Wed, 30 Apr 2025 13:49:53 +0000 (15:49 +0200)]

MINOR: ssl/cli: "acme ps" shows the acme tasks

Implement a way to display the running acme tasks over the CLI.

It currently only displays a "Running" status with the certificate name
and the acme section from the configuration.

The displayed running tasks are limited to the size of a buffer for now,
it will require a backref list later to be called multiple times to
resume the list.

commit | commitdiff | tree

Aurelien DARRAGON [Wed, 30 Apr 2025 14:56:00 +0000 (16:56 +0200)]

DOC: config: restore default values for resolvers hold directive

Default values for hold directive (resolver context) used to be documented
but this was lost when the keyword description was reworked in 24b319b
("Default value is 10s for "valid", 0s for "obsolete" and 30s for
others.")

Restoring the part that describes the default value.

It may be backported to all stable versions with 24b319b

commit | commitdiff | tree

Aurelien DARRAGON [Wed, 30 Apr 2025 14:41:16 +0000 (16:41 +0200)]

MINOR: hlua_fcn: enforce yield after *_get_stats() methods

{listener,proxy,server}_get_stats() methods are know to be expensive,
expecially if used under an iteration. Indeed, while automatic yield
is performed every X lua instructions (defaults to 10k), computing an
object's stats 10K times in a single cpu loop is not desirable and
could create contention.

In this patch we leverage hlua_yield_asap() at the end of *_get_stats()
methods in order to force the automatic yield to occur ASAP after the
method returns. Hopefully this should help in similar scenarios as the
one described in GH #2903

commit | commitdiff | tree

Aurelien DARRAGON [Wed, 30 Apr 2025 14:37:56 +0000 (16:37 +0200)]

MINOR: add hlua_yield_asap() helper

When called, this function will try to enforce a yield (if available) as
soon as possible. Indeed, automatic yield is already enforced every X
Lua instructions. However, there may be some cases where we know after
running heavy operation that we should yield already to avoid taking too
much CPU at once.

This is what this function offers, instead of asking the user to manually
yield using "core.yield()" from Lua itself after using an expensive
Lua method offered by haproxy, we can directly enforce the yield without
the need to do it in the Lua script.

commit | commitdiff | tree

Amaury Denoyelle [Mon, 28 Apr 2025 13:36:44 +0000 (15:36 +0200)]

MINOR: mux-quic: limit emitted MSD frames count per qcs

The previous commit has implemented a new calcul method for
MAX_STREAM_DATA frame emission. Now, a frame may be emitted as soon as a
buffer was consumed by a QCS instance.

This will probably increase the number of MAX_STREAM_DATA frame
emission. It may even cause a series of frame emitted for the same
stream with increasing values under high load, which is completely
unnecessary.

To improve this, limit the number of MAX_STREAM_DATA frames built to one
per QCS instance. This is implemented by storing a reference to this
frame in QCS structure via a new member <tx.msd_frm>.

Note that to properly reset QCS msd_frm member, emission of flow-control
frames have been changed. Now, each frame is emitted individually. On
one side, it is better as it prevent to emit frames related to different
streams in a single datagram, which is not desirable in case of packet
loss. However, this can also increase sendto() syscall invocation.

commit | commitdiff | tree

Amaury Denoyelle [Wed, 19 Mar 2025 15:09:08 +0000 (16:09 +0100)]

MEDIUM: mux-quic: increase flow-control on each bufsize

Recently, QCS Rx allocation buffer method has been improved. It is now
possible to allocate multiple buffers per QCS instances, which was
necessary to improve HTTP/3 POST throughput.

However, a limitation remained related to the emission of
MAX_STREAM_DATA. These frames are only emitted once at least half of the
receive capacity has been consumed by its QCS instance. This may be too
restrictive when a client need to upload a large payload.

Improve this by adjusting MAX_STREAM_DATA allocation. If QCS capacity is
still limited to 1 or 2 buffers max, the old calcul is still used. This
is necessary when user has limited upload throughput via their
configuration. If QCS capacity is more than 2 buffers, a new frame is
emitted if at least a buffer was consumed.

This patch has reduced number of STREAM_DATA_BLOCKED frames received in
POST tests with some specific clients.

commit | commitdiff | tree

Christopher Faulet [Wed, 30 Apr 2025 13:58:53 +0000 (15:58 +0200)]

BUG/MINOR: mux-spop: Use the right bitwise operator in spop_ctl()

Becaues of a typo, '||' was used instead of '|' to test the SPOP conneciton
flags and decide if the mux is ready or not. The regression was introduced
in the commit fd7ebf117 ("BUG/MEDIUM: mux-spop: Wait end of handshake to
declare a spop connection ready").

This patch must be backported to 3.1 with the commit above.

commit | commitdiff | tree

Remi Tricot-Le Breton [Wed, 30 Apr 2025 13:26:30 +0000 (15:26 +0200)]

BUILD: ssl: Fix wolfssl build

The newly added SSL traces require an extra 'conn' parameter to
ssl_sock_chose_sni_ctx which was added in the "regular" code but not in
the wolfssl specific one.
Wolfssl also has a different prototype for some getter functions
(SSL_get_servername for instance), which do not expect a const SSL while
openssl version does.

commit | commitdiff | tree

Christopher Faulet [Wed, 30 Apr 2025 12:32:16 +0000 (14:32 +0200)]

BUG/MINOR: mux-h1: Fix trace message in h1_detroy() to not relay on connection

h1_destroy() may be called to release a H1C after a multiplexer upgrade. In
that case, the connection is no longer attached to the H1C. It must not be
used in the h1 trace message because the connection context is no longer a H1C.

Because of this bug, when a H1>H2 upgrade is performed, a crash may be
experienced if the H1 traces are enabled.

This patch must be backport to all stable versions.

commit | commitdiff | tree

Christopher Faulet [Wed, 30 Apr 2025 12:16:42 +0000 (14:16 +0200)]

BUG/MINOR: mux-h1: Don't pretend connection was released for TCP>H1>H2 upgrade

When an applicative upgrade of the H1 multiplexer is performed, we must not
pretend the connection was released. Indeed, in that case, a H1 stream is
still their with a stream connector attached on it. It must be detached
first before releasing the H1 connection and the underlying connection. So
it is important to not pretend the connection was already released.

Concretely, in that case h1_process() must return 0 instead of -1. It is
minor error because, AFAIK, it is harmless. But it is not correct. So let's
fix it to avoid futur bugs.

To be clear, this happens when a TCP connection is upgraded to H1 connection
and a H2 preface is detected, leading to a second upgrade from H1 to H2.

This patch may be backport to all stable versions.

commit | commitdiff | tree

Christopher Faulet [Mon, 28 Apr 2025 06:08:06 +0000 (08:08 +0200)]

BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame

In the SPOE specification, when an error occurred on the SPOP connection,
HAProxy must send a DISCONNECT frame and wait for the agent DISCONNECT frame
in return before trully closing the connection.

However, this part was not properly handled by the SPOP multiplexer. In this
case, the SPOP connection should be in the CLOSING state. But this state was
not used at all. Depending on when the error was encountered, the connection
could be closed immediately, without sending any DISCONNECT frame. It was
the case when an early error was detected during the AGENT-HELLO frame
parsing. Or it could be moved from ERROR to FRAME_H state, as if no error
were detected. This case was less dramatic than it seemed because some flags
were also set to prevent any problem. But it was not obvious.

So now, the SPOP connection is properly switch to CLOSING state when an
DISCONNECT is sent to the agent to be able to wait for its DISCONNECT in
reply. spop_process_demux() was updated to parse frames in that state and
some validity checks was added.

This patch must be backport to 3.1.

commit | commitdiff | tree

Christopher Faulet [Mon, 28 Apr 2025 06:01:40 +0000 (08:01 +0200)]

BUG/MEDIUM: mux-spop: Wait end of handshake to declare a spop connection ready

A SPOP connection must not be considered as ready while the hello handshake
is not finished with success. In addition, no error or shutdown must have
been reported for the underlying connection. Otherwise a freshly openned
spop connexion may be reused while it is in fact dead, leading to a
connection retry.

This patch must be backported to 3.1.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:58 +0000 (17:26 +0200)]

MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx

This is only useful in the traces, the conn parameter won't be used
otherwise.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:57 +0000 (17:26 +0200)]

MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback

We had to parse the sigAlg extension by hand in order to properly select
the certificate used by the SSL frontends. These traces allow to dump
the allowed sigAlg list sent by the client in its clientHello.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:56 +0000 (17:26 +0200)]

MINOR: ssl: Add traces to the switchctx callback

This callback allows to pick the used certificate on an SSL frontend.
The certificate selection is made according to the information sent by
the client in the clientHello. The traces that were added will allow to
better understand what certificate was chosen and why. It will also warn
us if the chosen certificate was the default one.
The actual certificate parsing happens in ssl_sock_chose_sni_ctx. It's
in this function that we actually get the filename of the certificate
used.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:55 +0000 (17:26 +0200)]

MINOR: ssl: Add ocsp stapling callback traces

If OCSP stapling fails because of a missing or invalid OCSP response we
used to silently disable stapling for the given session. We can now know
a bit more what happened regarding OCSP stapling.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:54 +0000 (17:26 +0200)]

MINOR: ssl: Add traces to verify callback

Those traces allow to know which errors were met during certificate
chain validation as well as which ones were ignored.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:53 +0000 (17:26 +0200)]

MINOR: ssl: Add traces around SSL_do_handshake call

Those traces dump information about the multiple SSL_do_handshake calls
(renegotiation and regular call). Some errors coud also be dumped in
case of rejected early data.
Depending on the chosen verbosity, some information about the current
handshake can be dumped as well (servername, tls version, chosen cipher
for instance).
In case of failed handshake, the error codes and messages will also be
dumped in the log to ease debugging.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:52 +0000 (17:26 +0200)]

MINOR: ssl: Add traces to ssl_sock_io_cb function

Add new SSL traces.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:51 +0000 (17:26 +0200)]

MINOR: ssl: Add traces to recv/send functions

Those traces will allow to identify sessions on which early data is used
as well as some forcefully closed connections.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:50 +0000 (17:26 +0200)]

MINOR: ssl: Add traces to ssl init/close functions

Add a dedicated trace for some unlikely allocation failures and async
errors. Those traces will ostly be used to identify the start and end of
a given SSL connection.

commit | commitdiff | tree

Remi Tricot-Le Breton [Fri, 18 Apr 2025 15:26:49 +0000 (17:26 +0200)]

MINOR: Add "sigalg" to "sigalg name" helper function

This function can be used to convert a TLSv1.3 sigAlg entry (2bytes)
from the signature_agorithms client hello extension into a string.

In order to ease debugging, some TLSv1.2 combinations can also be
dumped. In TLSv1.2 those signature algorithms pairs were built out of a
one byte signature identifier combined to a one byte hash identifier.
In TLSv1.3 those identifiers are two bytes blocs that must be treated as
such.

commit | commitdiff | tree

Willy Tarreau [Wed, 30 Apr 2025 03:15:22 +0000 (05:15 +0200)]

MINOR: tools: make my_strndup() take a size_t len instead of and int

In relation to issue #2954, it appears that turning some size_t length
calculations to the int that uses my_strndup() upsets coverity a bit.
Instead of dealing with such warnings each time, better address it at
the root. An inspection of all call places show that the size passed
there is always positive so we can safely use an unsigned type, and
size_t will always suit it like for strndup() where it's available.

commit | commitdiff | tree

Lukas Tribus [Mon, 28 Apr 2025 12:07:31 +0000 (12:07 +0000)]

DOC: ring: refer to newer RFC5424

In the ring configuration example we refer to RFC3164 - the original BSD
syslog protocol without support for structured data (SDATA).

Let's refer to RFC5424 instead so SDATA is by default forwarded if
someone copy & pastes from the documentation:

https://discourse.haproxy.org/t/structured-data-lost-when-forwarding-logs-voa-syslog-forwarding-feature/11741/5

Should be backported to 2.6.

commit | commitdiff | tree

Aurelien DARRAGON [Mon, 28 Apr 2025 14:52:33 +0000 (16:52 +0200)]

CLEANUP: proxy: mention that px->conn_retries isn't relevant in some cases

Since 91e785edc ("MINOR: stream: Rely on a per-stream max connection
retries value"), px->conn_retries may be ignored in the following cases:

* proxy not part of a list which gets properly post-init (ie: main proxy
list, log-forward list, sink list)
* proxy lacking the CAP_FE capability

Documenting such cases where the px->conn_retries is set but effectively
ignored, so that we either remove ignored statements or fix them in
the future if they are really needed. In fact all cases affected here are
automomous applets that already handle the retries themselves so the fact
that 91e785edc made ->conn_retries ineffective should not be a big deal
anyway.

commit | commitdiff | tree

Aurelien DARRAGON [Tue, 29 Apr 2025 08:22:38 +0000 (10:22 +0200)]

BUG/MINOR: dns: prevent ds accumulation within dss

when dns session callback (dns_session_release()) is called upon error
(ie: when some pending queries were not sent), we try our best to
re-create the applet in order to preserve the pending queries and give
them a chance to be retried. This is done at the end of
dns_session_release().

However, doing so exposes to an issue: if the error preventing queries
from being sent is still encountered over and over the dns session could
stay there indefinitely. Meanwhile, other dns sessions may be created on
the same dns_stream_server periodically. If previous failing dns sessions
don't terminate but we also keep creating new ones, we end up accumulating
failing sessions on a given dns_stream_server, which can eventually cause
ressource shortage.

This issue was found when trying to address ("BUG/MINOR: dns: add tempo
between 2 connection attempts for dns servers")

To fix it, we track the number of failed consecutive sessions for a given
dns server. When we reach the threshold (set to 100), we consider that the
link to the dns server is broken (at least temporarily) and we force
dns_session_new() to fail, so that we stop creating new sessions until one
of the existing one eventually succeeds.

A workaround for this fix consists in setting the "maxconn" parameter on
nameserver directive (under resolvers section) to a reasonnable value so
that no more than "maxconn" sessions may co-exist on the same server at
a given time.

This may be backported to all stable versions.
("CLEANUP: dns: remove unused dns_stream_server struct member") may be
backported to ease the backport.

commit | commitdiff | tree

Aurelien DARRAGON [Tue, 29 Apr 2025 14:48:28 +0000 (16:48 +0200)]

CLEANUP: dns: remove unused dns_stream_server struct member

dns_stream_server "max_slots" is unused, let's get rid of it

commit | commitdiff | tree

Aurelien DARRAGON [Tue, 29 Apr 2025 18:13:00 +0000 (20:13 +0200)]

BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers

As reported by Lukas Tribus on the mailing list [1], trying to connect to
a nameserver with invalid network settings causes haproxy to retry a new
connection attempt immediately which eventually causes unexpected CPU usage
on the thread responsible for the applet (namely 100% on one CPU will be
observed).

This can be reproduced with the test config below:

resolvers default
  nameserver ns1 tcp4@8.8.8.8:53 source 192.168.99.99
listen listen
  mode http
  bind :8080
  server s1 www.google.com resolvers default init-addr none

To fix this the issue, we add a temporisation of one second between a new
connection attempt is retried. We do this in dns_session_create() when we
know that the applet was created in the release callback (when previous
query attempt was unsuccessful), which means initial connection is not
affected.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45665.html

This should fix GH #2909 and may be backported to all stable versions.
This patch depends on ("MINOR: applet: add appctx_schedule() macro")

commit | commitdiff | tree

Aurelien DARRAGON [Mon, 28 Apr 2025 16:03:36 +0000 (18:03 +0200)]

MINOR: applet: add appctx_schedule() macro

Just like task_schedule() but for applets to wakeup an applet at a
specific time, leverages _task_schedule() internally

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 14:35:35 +0000 (16:35 +0200)]

BUG/MINOR: acme: remove references to virt@acme

"virt@acme" was the default map used during development, now this must
be configured in the acme section or it won't try to use any map.

This patch removes the references to virt@acme in the comments and the
code.

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 14:08:31 +0000 (16:08 +0200)]

MEDIUM: acme: use a map to store tokens and thumbprints

The stateless mode which was documented previously in the ACME example
is not convenient for all use cases.

First, when HAProxy generates the account key itself, you wouldn't be
able to put the thumbprint in the configuration, so you will have to get
the thumbprint and then reload.
Second, in the case you are using multiple account key, there are
multiple thumbprint, and it's not easy to know which one you want to use
when responding to the challenger.

This patch allows to configure a map in the acme section, which will be
filled by the acme task with the token corresponding to the challenge,
as the key, and the thumbprint as the value. This way it's easy to reply
the right thumbprint.

Example:
http-request return status 200 content-type text/plain lf-string "%[path,field(-1,/)].%[path,field(-1,/),map(virt@acme)]\n" if { path_beg '/.well-known/acme-challenge/' }

commit | commitdiff | tree

Amaury Denoyelle [Tue, 29 Apr 2025 09:39:42 +0000 (11:39 +0200)]

MEDIUM: quic: limit global Tx memory

Define a new settings tune.quic.frontend.max-tot-window. It contains a
size argument which can be used to set a limit on the sum of all QUIC
connections congestion window. This is applied both on
quic_cc_path_set() and quic_cc_path_inc().

Note that this limitation cannot reduce a congestion window more than
the minimal limit which is set to 2 datagrams.

commit | commitdiff | tree

Amaury Denoyelle [Mon, 28 Apr 2025 06:52:43 +0000 (08:52 +0200)]

MINOR: quic: account for global congestion window

Use the newly defined cshared type to account for the sum of congestion
window of every QUIC connection. This value is stored in global counter
quic_mem_global defined in proto_quic module.

commit | commitdiff | tree

Amaury Denoyelle [Fri, 25 Apr 2025 09:37:07 +0000 (11:37 +0200)]

MINOR: thread: define cshared type

Define a new type "struct cshared". This can be used as a tool to
manipulate a global counter with thread-safety ensured. Each thread
would declare its thread-local cshared type, which would point to a
global counter.

Each thread can then add/substract value to their owned thread-local
cshared instance via cshared_add(). If the difference exceed a
configured limit, either positively or negatively, the global counter is
updated and thread-local instance is reset to 0. Each thread can safely
read the global counter value using cshared_read().

commit | commitdiff | tree

Amaury Denoyelle [Mon, 20 Jan 2025 15:24:21 +0000 (16:24 +0100)]

BUG/MINOR: quic: ensure cwnd limits are always enforced

Congestion window is limit by a minimal and maximum values which can
never be exceeded. Min value is hardcoded to 2 datagrams as recommended
by the specification. Max value is specified via haproxy configuration.

These values must be respected each time the congestion window size is
adjusted. However, in some rare occasions, limit were not always
enforced. Fix this by implementing wrappers to set or increment the
congestion window. These functions ensure limits are always applied
after the operation.

Additionnally, wrappers also ensure that if window reached a new maximum
value, it is saved in <cwnd_last_max> field.

This should be backported up to 2.6, after a brief period of
observation.

commit | commitdiff | tree

Amaury Denoyelle [Mon, 28 Apr 2025 07:22:37 +0000 (09:22 +0200)]

MINOR: quic: refactor BBR API

Write minor adjustments to QUIC BBR functions. The objective is to
centralize every modification of path cwnd field.

No functional change. This patch will be useful to simplify
implementation of global QUIC Tx memory usage limitation.

commit | commitdiff | tree

Amaury Denoyelle [Thu, 23 Jan 2025 09:47:57 +0000 (10:47 +0100)]

MINOR: quic: rename min/max fields for congestion window algo

There was some possible confusion between fields related to congestion
window size min and max limit which cannot be exceeded, and the maximum
value previously reached by the window.

Fix this by adopting a new naming scheme. Enforced limit are now renamed
<limit_max>/<limit_min>, while the previously reached max value is
renamed <cwnd_last_max>.

This should be backported up to 3.1.

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 12:09:46 +0000 (14:09 +0200)]

BUG/MINOR: acme: creating an account should not end the task

The account creation was mistakenly ending the task instead of being
wakeup for the NewOrder state, it was preventing the creation of the
certificate, however the account was correctly created.

To fix this, only the jump to the end label need to be remove, the
standard leaving codepath of the function will allow to be wakeup.

No backport needed.

commit | commitdiff | tree

Willy Tarreau [Tue, 29 Apr 2025 09:43:46 +0000 (11:43 +0200)]

MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides

TCP_NOTSENT_LOWAT is very convenient as it indicates when to report
EAGAIN on the sending side. It takes a margin on top of the estimated
window, meaning that it's no longer needed to store too many data in
socket buffers. Instead there's just enough to fill the send window
and a little bit of margin to cover the scheduling time to restart
sending. Experiments on a 100ms network have shown a 10-fold reduction
in the memory used by socket buffers by just setting this value to
tune.bufsize, without noticing any performance degradation. Theoretically
the responsiveness on multiplexed protocols such as H2 should also be
improved.

commit | commitdiff | tree

Willy Tarreau [Tue, 29 Apr 2025 10:05:08 +0000 (12:05 +0200)]

BUG/MINOR: mux-h2: fix the offset of the pattern for the ping frame

The ping frame's pattern must be written at offset 9 (frame header
length), not 8. This was added in 3.2 with commit 4dcfe098a6 ("MINOR:
mux-h2: prepare to support PING emission"), so no backport is needed.

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 09:29:52 +0000 (11:29 +0200)]

BUG/MINOR: acme: does not try to unlock after a failed trylock

Return after a failed trylock in acme_update_certificate() instead of
jumping to the error label which does an unlock.

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 08:50:39 +0000 (10:50 +0200)]

DOC: configuration: add quic4 to the ssl-f-use example

The ssl-f-use keyword is very useful in the case of multiple SSL bind
lines. Add a quic4 bind line in the example to show that.

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 07:59:32 +0000 (09:59 +0200)]

CLEANUP: acme: remove old TODO for account key

Remove old TODO comments about the account key.

commit | commitdiff | tree

William Lallemand [Tue, 29 Apr 2025 07:27:45 +0000 (09:27 +0200)]

DOC: configuration: acme account key are auto generated

Explain that account key are auto generated when they do not exist.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 17:09:02 +0000 (19:09 +0200)]

MEDIUM: mcli: replicate the current mode when enterin the worker process

While humans can find it convenient to enter the worker process in prompt
mode, for external tools it will not be convenient to have to systematically
disable it. A better approach is to replicate the master socket's mode
there, since it has already been configured to suit the user: interactive,
prompt and timed modes are automatically passed to the worker process.
This makes the using the worker commands more natural from the master
process, without having to systematically adapt it for each new connection.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 16:51:47 +0000 (18:51 +0200)]

MEDIUM: mcli: make the prompt mode configurable between i/p

Support the same syntax in master mode as in worker mode in order to
configure the prompt. The only thing is that for now the master doesn't
have a non-interactive mode and it doesn't seem necessary to implement
it, so we only support the interactive and prompt modes. However the code
was written in a way that makes it easy to change this later if desired.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 16:36:57 +0000 (18:36 +0200)]

MEDIUM: cli: make the prompt mode configurable between n/i/p

Now the prompt mode can more finely be configured between non-interactive
(default), interactive without prompt, and interactive with prompt. This
will ease the usage from automated tools which are not necessarily
interested in having to consume '> ' after each command nor displaying
"+" on payload lines. This can also be convenient when coming from the
master CLI to keep the same output format.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 15:42:03 +0000 (17:42 +0200)]

MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags

The CLI's "prompt" command toggles two distinct things:
- displaying or hiding the prompt at the beginning of the line
- single-command vs interactive mode

These are two independent concepts and the prompt mode doesn't
always cope well with tools that would like to upload data without
having to read the prompt on return. Also, the master command line
works in interactive mode by default with no prompt, which is not
consistent (and not convenient for tools). So let's start by splitting
the bit in two, and have a new APPCTX_CLI_ST1_INTER flag dedicated
to the interactive mode. For now the "prompt" command alone continues
to toggle the two at once.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 06:56:56 +0000 (08:56 +0200)]

MINOR: compiler: add more macros to detect macro definitions

We add __equals_0(NAME) which is only true if NAME is defined as zero,
and __def_as_empty(NAME) which is only true if NAME is defined as an
empty string.

commit | commitdiff | tree

William Lallemand [Mon, 28 Apr 2025 15:52:28 +0000 (17:52 +0200)]

MEDIUM: acme: use 'crt-base' to load the account key

Prefix the filename with the 'crt-base' before loading the account key,
in order to work like every other keypair in haproxy.

commit | commitdiff | tree

William Lallemand [Mon, 28 Apr 2025 15:40:26 +0000 (17:40 +0200)]

MEDIUM: acme: generate the account file when not found

Generate the private key on the account file when the file does not
exists. This generate a private key of the type and parameters
configured in the acme section.

commit | commitdiff | tree

William Lallemand [Mon, 28 Apr 2025 15:37:21 +0000 (17:37 +0200)]

MINOR: acme: failure when no directory is specified

The "directory" parameter of the acme section is mandatory. This patch
exits with an alert when this parameter is not found.

commit | commitdiff | tree

William Lallemand [Mon, 28 Apr 2025 14:27:45 +0000 (16:27 +0200)]

MINOR: acme: separate the code generating private keys

acme_EVP_PKEY_gen() generates private keys of specified <keytype>,
<curves> and <bits>. Only RSA and EC are supported for now.

commit | commitdiff | tree

William Lallemand [Mon, 28 Apr 2025 14:33:48 +0000 (16:33 +0200)]

BUG/MINOR: ssl/acme: free EVP_PKEY upon error

Free the EPV_PKEY upon error when the X509_REQ generation failed.

No backport needed.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 13:57:26 +0000 (15:57 +0200)]

MEDIUM: thread: set DEBUG_THREAD to 1 by default

Setting DEBUG_THREAD to 1 allows recording the lock history for each
thread. Tests have shown that (as predicted) the cost of updating a
single thread-local variable is not perceptible in the noise, especially
when compared to the cost of obtaining a lock. Since this can provide
useful value when debugging deadlocks, let's enable it by default when
threads are enabled.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 13:19:35 +0000 (15:19 +0200)]

MINOR: threads/cli: display the lock history on "show threads"

This will display the lock labels and modes for each non-empty step
at the end of "show threads" when these are defined. This allows to
emit up to the last 8 locking operation for each thread on 64 bit
machines.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 07:42:58 +0000 (09:42 +0200)]

MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0

by only storing a word in each thread context, we can keep the history
of all taken/dropped locks by label. This is expected to be very cheap
and to permit to store up to 8 consecutive lock operations in 64 bits.
That should significantly help detect recursive locks as well as figure
what thread was likely to hinder another one waiting for a lock.

For now we only store the final state of the lock, we don't store the
attempt to get it. It's just a matter of space since we already need
4 ops (rd,sk,wr,un) which take 2 bits, leaving max 64 labels. We're
already around 45. We could also multiply by 5 and still keep 8 bits
total per lock, that would limit us to 51 locks max. It seems that
most of the time if we get a watchdog panic, anyway the victim thread
will be perfectly located so that we don't need a specific value for
this. Another benefit is that we perform a single memory write per
lock.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 07:05:02 +0000 (09:05 +0200)]

MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2

At level 1 it now does nothing. This is reserved for some subsequent
patches which will implement lighter debugging.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 07:00:00 +0000 (09:00 +0200)]

MINOR: threads: prepare DEBUG_THREAD to receive more values

We now default the value to zero and make sure all tests properly take
care of values above zero. This is in preparation for supporting several
degrees of debugging.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 14:48:42 +0000 (16:48 +0200)]

BUILD: leastconn: fix build warning when building without threads on old machines

Machines lacking CAS8B/DWCAS and emit a warning in lb_fwlc.c without
threads due to declaration ordering. Let's just move the variable
declaration into the block that uses it as a last variable. No
backport is needed.

commit | commitdiff | tree

Willy Tarreau [Mon, 28 Apr 2025 14:35:24 +0000 (16:35 +0200)]

BUILD: acme: use my_strndup() instead of strndup()

Not all systems have strndup(), that's why we have our "my_strndup()",
so let's make use of it here. This fixes the build on Solaris 10. No
backport is needed.

commit | commitdiff | tree

Aurelien DARRAGON [Mon, 28 Apr 2025 10:19:36 +0000 (12:19 +0200)]

MINOR: promex: expose ST_I_PX_RATE (current_session_rate)

It has been requested to have the current_session_rate exposed at the
frontend level. For now only the per-process value was exposed
(ST_I_INF_SESS_RATE).

Thanks to the work done lately to merge promex and stat_cols_px[]
array, let's simply defined an .alt_name for the ST_I_PX_RATE metric in
order to have promex exposing it as current_session_rate for relevant
contexts.

commit | commitdiff | tree

Aurelien DARRAGON [Mon, 28 Apr 2025 10:09:45 +0000 (12:09 +0200)]

DOC: config: clarify log-forward "host" option

log-forward "host" option may be confusing because we often mention the
host field when talking about syslog RFC3164 or RFC5424 messages, but
neither rfc actually define "host" field. In fact, everywhere we used
"host field" we actually meant "hostname field" as documented in RFC5424.
This was a language abuse on our side.

In this patch we replace "host" with "hostname" where relevant in the
documentation to prevent confusion.

Thanks to Nick Ramirez for having reported the issue.

commit | commitdiff | tree

Aurelien DARRAGON [Mon, 28 Apr 2025 09:30:01 +0000 (11:30 +0200)]

DOC: config: fix ACME paragraph rendering issue

Nick Ramirez reported that the ACME paragraph (3.13) caused a rendering
issue where simple text was rendered as a directive. This was caused
by the use of unescaped <name> which confuses dconv.

Let's escape <name> by putting quotes around it to prevent the rendering
issue.

No backport needed.

commit | commitdiff | tree

William Lallemand [Mon, 28 Apr 2025 09:35:11 +0000 (11:35 +0200)]

MINOR: ssl/cli: add a '-t' option to 'show ssl sni'

Add a -t option to 'show ssl sni', allowing to add an offset to the
current date so it would allow to check which certificates are expired
after a certain period of time.

commit | commitdiff | tree

Willy Tarreau [Fri, 25 Apr 2025 16:32:02 +0000 (18:32 +0200)]

BUG/MAJOR: listeners: transfer connection accounting when switching listeners

Since we made it possible for a bind_conf to listen to multiple thread
groups with shards in 2.8 with commit 9d360604bd ("MEDIUM: listener:
rework thread assignment to consider all groups"), the per-listener
connection count was not properly transferred to the target listener
with the connection when switching to another thread group. This results
in one listener possibly reaching high values and another one possibly
reaching negative values. Usually it's not visible, unless a maxconn is
set on the bind_conf, in which case comparisons will quickly put an end
to the willingness to accept new connections.

This problem only happens when thread groups are enabled, and it seems
very hard to trigger it normally, it only impacts sockets having a single
shard, hence currently the CLI (or any conf with "bind ... shards 1"),
where it can be reproduced with a config having a very low "maxconn" on
the stats socket directive (here, 4), and issuing a few tens of
socat <<< "show activity" in parallel, or sending HTTP connections to a
single-shared listener. Very quickly, haproxy stops accepting connections
and eats CPU in the poller which tries to get its connections accepted.

A BUG_ON(l->nbconn<0) after HA_ATOMIC_DEC() in listener_release() also
helps spotting them better.

Many thanks to Christian Ruppert who once again provided a very accurate
report in GH #2951 with the required data permitting this analysis.

This fix must be backported to 2.8.

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 10:17:07 +0000 (12:17 +0200)]

BUG/MAJOR: tasklets: Make sure he tasklet can't run twice

tasklets were originally designed to alway run on only one thread, so it
was not possible to have it run on 2 threads concurrently.
The API has been extended so that another thread may wake the tasklet,
the idea was still that we wanted to have it run on one thread only.
However, the way it's been done meant that unless a tasklet was bound to
a specific tid with tasklet_set_tid(), or we explicitely used
tasklet_wakeup_on() to specify the thread for the target to run on, it
would be scheduled to run on the current thread.
This is in fact a desirable feature. There is however a race condition
in which the tasklet would be scheduled on a thread, while it is running
on another. This could lead to the same tasklet to run on multiple
threads, which we do not want.
To fix this, just do what we already do for regular tasks, set the
"TASK_RUNNING" flag, and when it's time to execute the tasklet, wait
until that flag is gone.
Only one case has been found in the current code, where the tasklet
could run on different threads depending on who wakes it up, in the
leastconn load balancer, since commit
627280e15f03755b8f59f0191cd6d6bcad5afeb3.
It should not be a problem in practice, as the function called can be
called concurrently.
If a bug is eventually found in relation to this problem, and this patch
should be backported, the following patches should be backported too :
MEDIUM: quic: Make sure we return the tasklet from quic_accept_run
MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed
MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb
MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut
MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process
MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 11:03:29 +0000 (13:03 +0200)]

MEDIUM: quic: Make sure we return the tasklet from quic_accept_run

In quic_accept_run, return the tasklet to tell the scheduler the tasklet
is still alive, it is not yet needed, but will be soon.

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 11:02:47 +0000 (13:02 +0200)]

MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed

In quic_conn_app_io_cb, make sure we return NULL if the tasklet has been
destroyed, so that the scheduler knows. It is not yet needed, but will
be soon.

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 11:01:58 +0000 (13:01 +0200)]

MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb

In qcc_io_cb, return the tasklet to tell the scheduler the tasklet is
still alive, it is not yet needed, but will be soon.

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 11:01:15 +0000 (13:01 +0200)]

MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut

In fcgi_deferred_shut, return the tasklet to tell the scheduler the
tasklet is still alive, it is not yet needed, but will be soon.

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 11:00:34 +0000 (13:00 +0200)]

MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process

In accept_queue_process, return the tasklet to tell the scheduler the
tasklet is still alive, it is not yet needed, but will be soon.

commit | commitdiff | tree

Olivier Houchard [Fri, 25 Apr 2025 10:59:37 +0000 (12:59 +0200)]

MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb

In srv_chk_io_cb, return the tasklet to tell the scheduler the tasklet
is still alive, it is not yet needed, but will be soon.

commit | commitdiff | tree

Willy Tarreau [Fri, 25 Apr 2025 08:19:03 +0000 (10:19 +0200)]

[RELEASE] Released version 3.2-dev12

Released version 3.2-dev12 with the following main changes :
    - BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure
    - BUG/MINOR: proxy: always detach a proxy from the names tree on free()
    - CLEANUP: proxy: detach the name node in proxy_free_common() instead
    - CLEANUP: Slightly reorder some proxy option flags to free slots
    - MINOR: proxy: Add options to drop HTTP trailers during message forwarding
    - MINOR: h1-htx: Skip C-L and T-E headers for 1xx and 204 messages during parsing
    - MINOR: mux-h1: Keep custom "Content-Length: 0" header in 1xx and 204 messages
    - MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value
    - CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function
    - BUG/MEDIUM: mux-spop: Respect the negociated max-frame-size value to send frames
    - MINOR: http-act: Add 'pause' action to temporarily suspend the message analysis
    - MINOR: acme/cli: add the 'acme renew' command to the help message
    - MINOR: httpclient: add an "https" log-format
    - MEDIUM: acme: use a customized proxy
    - MEDIUM: acme: rename "uri" into "directory"
    - MEDIUM: acme: rename "account" into "account-key"
    - MINOR: stick-table: use a separate lock label for updates
    - MINOR: h3: simplify h3_rcv_buf return path
    - BUG/MINOR: mux-quic: fix possible infinite loop during decoding
    - BUG/MINOR: mux-quic: do not decode if conn in error
    - BUG/MINOR: cli: Issue an error when too many args are passed for a command
    - MINOR: cli: Use a full prompt command for bidir connections with workers
    - MAJOR: cli: Refacor parsing and execution of pipelined commands
    - MINOR: cli: Rename some CLI applet states to reflect recent refactoring
    - CLEANUP: applet: Update st0/st1 comment in appctx structure
    - BUG/MINOR: hlua: Fix I/O handler of lua CLI commands to not rely on the SC
    - BUG/MINOR: ring: Fix I/O handler of "show event" command to not rely on the SC
    - MINOR: cli/applet: Move appctx fields only used by the CLI in a private context
    - MINOR: cache: Add a pointer on the cache in the cache applet context
    - MINOR: hlua: Use the applet name in error messages for lua services
    - MINOR: applet: Save the "use-service" rule in the stream to init a service applet
    - CLEANUP: applet: Remove unsued rule pointer in appctx structure
    - BUG/MINOR: master/cli: properly trim the '@@' process name in error messages
    - MEDIUM: resolvers: add global "dns-accept-family" directive
    - MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS
    - MINOR: sock-inet: detect apparent IPv6 connectivity
    - MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6
    - MEDIUM: acme: use Retry-After value for retries
    - MEDIUM: acme: reset the remaining retries
    - MEDIUM: acme: better error/retry management of the challenge checks
    - BUG/MEDIUM: cli: Handle applet shutdown when waiting for a command line
    - Revert "BUG/MINOR: master/cli: properly trim the '@@' process name in error messages"
    - BUG/MINOR: master/cli: only parse the '@@' prefix on complete lines
    - MINOR: resolvers: use the runtime IPv6 status instead of boot time one

commit | commitdiff | tree

Willy Tarreau [Fri, 25 Apr 2025 07:26:44 +0000 (09:26 +0200)]

MINOR: resolvers: use the runtime IPv6 status instead of boot time one

On systems where the network is not reachable at boot time (certain HA
systems for example, or dynamically addressed test machines), we'll want
to be able to periodically revalidate the IPv6 reachability status. The
current code makes it complicated because it sets the config bits once
for all at boot time. This commit changes this so that the config bits
are not changed, but instead we rely on a static inline function that
relies on sock_inet6_seems_reachable for every test (really cheap). This
also removes the now unneeded resolvers late init code.

This variable for now is still set at boot time but this will ease the
transition later, as the resolvers code is now ready for this.

Mirror of https://github.com/haproxy/haproxy.git

RSS Atom