git.ipfire.org Git - thirdparty/haproxy.git/log

MEDIUM: cache: Add refcount on cache_entry

Add a reference counter on the cache_entry. Its value will be atomically
increased and decreased via the retain_entry and release_entry
functions.
The release_entry function has two distinct versions,
release_entry_locked and release_entry_unlocked that should be called
when the cache lock is already taken in write mode or not
(respectively). In the unlocked case the cache lock will only be taken
in write mode on the last reference of the entry (before calling
delete_entry). This allows to limit the amount of times when we need to
take the cache lock during a release operation.

MEDIUM: cache: Switch shctx spinlock to rwlock and restrict its scope

Since a lock on the cache tree was added in the latest cache changes, we
do not need to use the shared_context's lock to lock more than pure
shared_context related data anymore. This already existing lock will now
only cover the 'avail' list from the shared_context. It can then be
changed to a rwlock instead of a spinlock because we might want to only
run through the avail list sometimes.

Apart form changing the type of the shctx lock, the main modification
introduced by this patch is to limit the amount of code covered by the
shctx lock. This lock does not need to cover any code strictly related
to the cache tree anymore.

MINOR: cache: Use dedicated trash for "show cache" cli command

After the latest changes in the cache/shared_context mechanism, the
cache and shared_context logic were decorrelated and in some unlikely
cases we might end up using the "show cache" command while some regular
cache processing is occurring (a response being stored in the cache for
instance). In such a case, because we used the same 'trash' buffer in
those two contexts, we could end up with the contents of a response in
the ouput of the "show cache" command.
This patch fixes this problem by allocating a dedicated trash for the
CLI command.

MEDIUM: shctx: Remove 'hot' list from shared_context

The "hot" list stored in a shared_context was used to keep a reference
to shared blocks that were currently being used and were thus removed
from the available list (so that they don't get reused for another cache
response). This 'hot' list does not ever need to be shared across
threads since every one of them only works on their current row.

The main need behind this 'hot' list was to detach the corresponding
blocks from the 'avail' list and to have a known list root when calling
list_for_each_entry_from in shctx_row_data_append (for instance).

Since we actually never need to iterate over all members of the 'hot'
list, we can remove it and replace the inc_hot/dec_hot logic by a
detach/reattach one.

MEDIUM: cache: Use rdlock on cache in cache_use

When looking for a valid entry in the cache tree in
http_action_req_cache_use, we do not need to delete an expired entry at
once because even if an expired entry exists, since the request will be
forwarded to the server, then the expired entry will be overwritten when
the updated response is seen. We can then use a simpler rdlock during
cache_use operation.

MINOR: cache: Add option to avoid removing expired entries in lookup function

Any lookup in the cache tree done through entry_exist or
secondary_entry_exist functions could end up deleting the corresponding
entry if it is expired which prevents from using a rdlock on code paths
that would just perform a lookup on the tree (in
http_action_req_cache_use for instance).
Adding a 'delete_expired' boolean as a parameter allows for "pure"
lookups and thus it will allow to perform operations on the tree that
simply require a rdlock instead of a "heavier" wrlock.

MINOR: cache: Remove expired entry delete in "show cache" command

The "show cache" CLI command iterates over all the entries of the cache
tree and it used this opportunity to remove expired entries from the
cache. This behavior was completely undocumented and does not seem that
necessary. By removing it we can take the cache lock in read mode only
which limits the impact on the other threads.

MEDIUM: cache: Use dedicated cache tree lock alongside shctx lock

Every use of the cache tree was covered by the shctx lock even when no
operations were performed on the shared_context lists (avail and hot).
This patch adds a dedicated RW lock for the cache so that blocks of code
that work on the cache tree only can use this lock instead of the
superseding shctx one. This is useful for operations during which the
concerned blocks are already in the hot list.
When the two locks need to be taken at the same time, in
http_action_req_cache_use and in shctx_row_reserve_hot, the shctx one
must be taken first.
A new parameter needed to be added to the shared_context's free_block
callback prototype so that cache_free_block can take the cache lock and
release it afterwards.

MINOR: shctx: Remove explicit 'from' param from shctx_row_data_append

This parameter is not necessary since the first element of a row always
has a pointer to the row's tail.

MEDIUM: shctx: Simplify shctx_row_reserve_hot loop

The shctx_row_reserve_hot relied on two loop levels in order to first
look for the first block of a preused row and then iterate on all the
blocks of this row to reserve them for the new row. This was not the
simplest nor the easiest to read way so this logic could be replaced by
a single iteration on the avail list members.
The two use cases of calling this function with or without a preexisting
"first" member were a bit cumbersome as well and were replaced by a more
straightforward approach.

MEDIUM: shctx: Move list between hot and avail list in O(1)

Instead of iterating over all the elements of a given row when moving it
between the hot and available lists, we can make use of the last_reserved
pointer that already points to the last block of the list to perform the
move in O(1).

MINOR: shctx: Set last_append to NULL when reserving block in hot list

Ensure that the last_append pointer is always set to NULL on first block
of rows reserved by the subsystems using the shctx (cache for instance).
This pointer will be used directly in shctx_row_data_append instead of
the 'from' param which will simplify its uses.

MINOR: server: force add to idle on reverse

A backend connection is inserted in server idle list via
srv_add_to_idle_list(). This function has several conditions which may
cause the connection to be rejected instead.

One of this condition is based on the current estimate count of needed
connections for the server. If the count of idle connections stored has
already reached this estimation, the new connection is rejected. This is
in opposition with the purpose of reverse HTTP. On active reverse,
haproxy can instantiate several connections to properly serve the future
traffic. However, the opposite passive haproxy will have only a low
estimate of needed connection and will reject most of them.

To fix this, simply check CO_FL_REVERSED connection flag on
srv_add_to_idle_list(). If set, the connection is inserted without
checking for estimate count. Note that all other conditions are not
impacted, so it's still possible to reject a connection, for example if
process FD limit is reached.

This commit relies on recent patch which change CO_FL_REVERSED flag for
connection after passive reverse.

BUG/MINOR: mux_h2: reject passive reverse conn if error on add to idle

On passive reverse, H2 mux is responsible to insert the connection in
the server idle list. This is done via srv_add_to_idle_list(). However,
this function may fail for various reason, such as FD usage limit
reached.

Handle properly this error case. H2 mux flags the connection on error
which will cause its release. Prior to this patch, the connection was
only released on server timeout.

This bug was found inspecting server curr_used_conns counter. Indeed, on
connection reverse, this counter is first incremented. It is decremented
just after on srv_add_to_idle_list() if insertion is validated. However,
if insertion is rejected, the connection was not released which cause
curr_used_conns to remains positive. This has the major downside to
break the reusing of idle connection on rhttp causing spurrious 503
errors.

No need to backport.

MINOR: connection: update rhttp flags usage

Change the flags used for reversed connection :
* CO_FL_REVERSED is now put after reversal for passive connect. For
active connect, it is delayed when accept is completed after reversal.
* CO_FL_ACT_REVERSING replace the old CO_FL_REVERSED. It is put only for
active connect on reversal and removes once accept is done.

This allows to identify a connection as reversed during its whole
lifetime. This should be useful to extend reverse connect.

BUG/MEDIUM: stream: Don't call mux .ctl() callback if not implemented

The commit 5ff7d2276 ("BUG/MEDIUM: stream: Properly handle abortonclose when set
on backend only") introduced a regression. Not all multiplexer implement the
.ctl() callback function. Thus we must be sure this callback function is defined
first to call it.

This patch should fix a crash reported by Tristan in the issue #2095. It must be
backported as far as 2.2, with the commit above.

BUG/MEDIUM: mworker: set the master variable earlier

Since 2.7 and the mcli_reload_bind_conf (56f73b21a5c), upon a reload
failure because of a bind error, the mcli_reload_bind_conf go through a
sock_unbind((). This is not supposed to do anything when a listener is
RX_F_INHERITED in the master, but unfortunately this is done too early
and provokes an exit of the master.

We already suspected in the past that setting the 'master' variable this
late could have negative impact.

The fix sets the master variable earlier before the bind.

This must be backported at least to 2.7. This could be backported
earlier but better wait any feedbacks on the fix.

MINOR: activity: report profiling duration and age in "show profiling"

Seeing counters in "show profiling" is not always very helpful without
an indication of how long the analysis lasted nor if it's still active
or not. Let's add a pair of start/stop timers for tasks and memory so
that we can now indicate how long the measurements lasted and when they
ended (or 0 if still running).

Note that for tasks profiling set to "auto", the measurement is considered
enabled since it can automatically switch on and off on a per-thread
basis.

REGTESTS: http: Improve script testing abortonclose option

We now take care to properly handle the abortonclose close option if it is
set on the backend and be sure we ignore it when it is set on the frontend
(inherited from the defaults section).

MINOR: stconn: Use SC to detect frontend connections in sc_conn_recv()

In sc_conn_recv(), instead of using the connection to know we are on the
frontend side, we now use the SC flags. It changes nothing but it is
cleaner.

BUG/MEDIUM: stream: Properly handle abortonclose when set on backend only

Since the 2.2 and the commit dedd30610 ("MEDIUM: h1: Don't wake the H1 tasklet
if we got the whole request."), we avoid to subscribe for reads if the H1
message is fully received. However, this broke the abortonclose option. To fix
the issue, a CO_RFL flag was added to instruct the mux it should still wait for
read events to properly handle read0. Only the H1 mux was concerned.

But since then, most of time, the option is only handled if it is set on the
frontend proxy because the request is fully received before selecting the
backend. If the backend is selected before the end of the request there is no
issue. But otherwise, because the backend is not known yet, we are unable to
properly handle the option and we miss to subscribe for reads.

Of course the option cannot be set on a frontend proxy. So concretly it means
the option is properly handled if it is enabled in the defaults section (if
common to frontend and backend) or a listen proxy, but it is ignored if it is
set on backend only.

Thanks to previous patches, we can now instruct the mux it should subscribe for
reads if not already done. We use this mechanism in process_stream() when the
connection is set up, ie when backend SC is set to SC_ST_REQ state.

This patch relies on following patches:
* MINOR: connection: Add a CTL flag to notify mux it should wait for reads again
* MEDIUM: mux-h1: Handle MUX_SUBS_RECV flag in h1_ctl() and susbscribe for reads

This patch should be the issue #2344. All the series must be backported as far
as 2.2.

MEDIUM: mux-h1: Handle MUX_SUBS_RECV flag in h1_ctl() and susbscribe for reads

The H1 mux now handle MUX_SUBS_RECV flag in h1_ctl(). If it is not already
subscribed for reads, it does so. This patch will be mandatory to properly
handle abortonclose option.

MINOR: connection: Add a CTL flag to notify mux it should wait for reads again

MUX_SUBS_RECV ctl flag is added to instruct the mux it should wait for read
events. This flag will be pretty useful to handle abortonclose option.

BUG/MINOR: stconn: Handle abortonclose if backend connection was already set up

abortonclose option is a backend option, it should not be handle on frontend
side. Of course a frontend can also be a backend but the option should not
be handled too early because it is not necessarily the selected backend
(think about a listen proxy routing requests to another backend).

It is especially an issue when the abortonclose option is enabled in the
defaults section and disabled by the selected backend. Because in this case,
the option may still be enabled while it should not.

Thus, now we wait the backend connection was set up to handle the option. To
do so, we check the backend SC state. The option is ignored if it is in
ST_CS_INI state. For all other states, it means the backend was already
selected.

This patch could be backported as far as 2.2.

BUG/MEDIUM: connection: report connection errors even when no mux is installed

An annoying issue was met when testing the reverse-http mechanism, by
which failed connection attempts would apparently not be attempted again
when there was no connect timeout. It turned out to be more generalized
than the rhttp system, and actually affects all outgoing connections
relying on NPN or ALPN to choose the mux, on which no mux is installed
and for which the subscriber (ssl_sock) must be notified instead.

The problem appeared during 2.2-dev1 development. First, commit
062df2c23 ("MEDIUM: backend: move the connection finalization step to
back_handle_st_con()") broke the error reporting by testing CO_FL_ERROR
only under CO_FL_CONNECTED. While it still worked OK for cases where a
mux was present, it did not for this specific situation because no
single error path would be considered when no mux was present. Changing
the CO_FL_CONNECTED test to also include CO_FL_ERROR did work, until
a few commits later with 477902bd2 ("MEDIUM: connections: Get ride of
the xprt_done callback.") which removed the xprt_done callback that was
used to indicate success or failure of the transport layer setup, since,
as the commit explains, we can report this via the mux. What this last
commit says is true, except when there is no mux.

For this, however, the sock_conn_iocb() function (formerly conn_fd_handler)
is called for such errors, evaluates a number of conditions, none of which
is matched in this error condition case, since sock_conn_check() instantly
reports an error causing a jump to the leave label. There, the mux is
notified if installed, and the function returns. In other error condition
cases, readiness and activity are checked for both sides, the tasklets
woken up and the corresponding subscriber flags removed. This means that
a sane (and safe) approach would consist in just notifying the subscriber
in case of error, if such a subscriber still exists: if still there, it
means the event hasn't been caught earlier, then it's the right moment
to report it. And since this is done after conn_notify_mux(), it still
leaves all control to the mux once it's installed.

This commit should be progressively backported as far as 2.2 since it's
where the problem was introduced. It's important to clearly check the
error path in each function to make sure the fix still does what it's
supposed to.

BUG/MINOR: quic: maximum window limits do not match the doc

This bug arrived with this commit:
MINOR: quic: Add a max window parameter to congestion control algorithms

The documentation was been modified with missing/wrong modifications in the code part.
The 'g' suffix must be accepted to parse value in gigabytes. And exctly 4g is
also accepted.

No need to backport.

DOC: quic: Maximum congestion control window configuration

Document the optional parameter which may be supplied after the congestion
control algorithm name to set the maximum congestion control window.

DOC: quic: Wrong syntax for "quic-cc-algo" keyword.

As the argument to "quic-cc-algo" is mandatory, brackets must be used here
in the documentation.

Must be backported as far as 2.6.

MINOR: quic: Maximum congestion control window for each algo

Make all the congestion support the maximum congestion control window
set by configuration. There is nothing special to explain. For each
each algo, each time the window is incremented it is also bounded.

MINOR: quic: Add a max window parameter to congestion control algorithms

Add a new ->max_cwnd member to bind_conf struct to store the maximum
congestion control window value for each QUIC binding.
Modify the "quic-cc-algo" keyword parsing to add an optional parameter
to its value: the maximum congestion window value between parentheses
as follows:

ex: quic-cc-algo cubic(10m)

This value must be bounded, greater than 10k and smaller than 1g.

BUG/MEDIUM: quic: Non initialized CRYPTO data stream deferencing

This bug arrived with this commit:
BUG/MINOR: quic: Useless use of non-contiguous buffer for in order CRYPTO data

Before this commit qc->cstream was tested before entering qc_treat_rx_crypto_frms().
This patch restablishes this behavior. Furthermore, it simplyfies
qc_ssl_provide_all_quic_data() which is a little bit ugly: the CRYPTO data
frame may be freed asap in the list_for_each_entry_safe() block after
having store its data pointer and length in local variables.
Also interrupt the CRYPTO data process as soon as qc_ssl_provide_quic_data()
or qc_treat_rx_crypto_frms() fail.

No need to be backported.

REGTESTS: startup: -conf-OK requires -V with current VTest

Current version of VTest tests the output of "haproxy -c" instead of the
return code. Since we don't output anymore when the configuration is
valid, this broke the test. (a06f621).

This fixes the issue by adding the -V when doing a -conf-OK. But this
must fixed in VTest.

DOC: config: Fix name for tune.disable-zero-copy-forwarding global param

"disable-" prefix was missing. This param was correctly named in the list of
supported keywords in the global section, but not in the keyword
description.

No backport needed.

BUG/MEDIUM: quic: fix FD for quic_cc_conn

Since following commit, quic_conn closes its owned socket before
transition to quic_cc_conn for closing state. This allows to save FDs as
quic_cc_conn could use the listener socket for their I/O.

commit 150c0da8895be50a39fd8e44f1db28e52c938569
MEDIUM: quic: release conn socket before using quic_cc_conn

This patch is incomplete as it removes initialization of <fd> member for
quic_cc_conn. Thus, if sending is done on closing state, <fd> value is
undefined which in most cases will result in a crash. Fix this by simply
initializing <fd> member with qc_init_fd() in qc_new_cc_conn().

This bug should fix recent issue from #2095. Thanks to Tristan for its
reporting and then testing of this patch.

No need to backport.

BUG/MINOR: quic: fix decrement of half_open counter on qc alloc failure

Half open counter is used to comptabilize QUIC connections waiting for
address validation. It was recently reworked to adjust its scope. With
each decrement operation, a BUG_ON() was added to ensure the counter
never wraps.

This BUG_ON() could be triggered if an allocation fails for one of
quic_conn members in qc_new_conn(). This is because half open counter is
incremented at the end of qc_new_conn(). However, in case of alloc
failure, quic_conn_release() is called immediately to ensure the counter
is decremented if a connection is freed before peer address has been
validated.

To fix this, increment half open counter early in qc_new_conn() prior to
every quic_conn members allocations.

This issue was reproduced using -dMfail argument.

This issue has been introduced by
commit 278808915b05bab1ff1e08e95fe32be22109719f
MINOR: quic: reduce half open counters scope

No need to backport.

BUG/MINOR: quic: fix crash on qc_new_conn alloc failure

A new counter was recently introduced to comptabilize the current number
of active QUIC handshakes. This counter is stored on the listener
instance.

This counter is incremented at the beginning of qc_new_conn() to check
if limit is not reached prior to quic_conn allocation. If quic_conn or
one of its inner member allocation fails, special care is taken to
decrement the counter as the connection instance is released. However,
it relies on <l> variable which is initialized too late to cover
pool_head_quic_conn allocation failure.

To fix this, simply initialize <l> at the beginning of qc_new_conn().

This issue was reproduced using -dMfail argument.

This issue was introduced by the following commit
commit 3df6a60113773d8abff3457c7a06e30c1892b7dc
MEDIUM: quic: limit handshake per listener

No need to backport.

BUG/MINOR: log: keep the ref in dup_logger()

This bug was introduced with 969e212 ("MINOR: log: add dup_logsrv() helper
function")

When duplicating an existing log entry, we must take care to inherit from
its original ->ref if it is set, because not doing so would make 28ac0999
("MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries")
ineffective given that global log directives will lose their original
reference when duplicated resursively (at least twice), which is what
happens when global log directives are first inherited to defaults which
are then inherited to a regular proxy at the end of the chain.

This can be easily reproduced using the following configuration:

   |global
   |  log stdout format raw local0
   |
   |defaults
   |  log global
   |
   |frontend test
   |  log global
   |  ...

Logs from "test" proxy will be duplicated because test incorrectly
inherited from global "log" directives twice, which 28ac0999 would
normally detect and prevent.

No backport needed unless 969e212 gets backported.

BUG/MINOR: sample: Fix bytes converter if offset is bigger than sample length

When the bytes converter was improved to be able to use variables (915e48675
["MEDIUM: sample: Enhances converter "bytes" to take variable names as
arguments"]), the behavior of the sample slightly change. A failure is
reported if the given offset is bigger than the sample length. Before, a
empty binary sample was returned.

This patch fixes the converter to restore the original behavior. The
function was also refactored to properly handle failures by removing
SMP_F_MAY_CHANGE flag. Because the converter now handles variables, the
conversion to an integer may fail. In this case SMP_F_MAY_CHANGE flag must
be removed to be sure the caller will not retry.

This patch should fix the issue #2335. No backport needed except if commit
above is backported.

MEDIUM: startup: 'haproxy -c' is quiet when valid

MODE_CHECK does not output "Configuration file is valid" by default
anymore. To display this message the -V option must be used with -c.

However the warning and errors are still output by default if they
exist.

This allows to clean the output of the systemd unit file with is doing a
-c.

BUG/MEDIUM: proxy: always initialize the default settings after init

The proxy's initialization is rather odd. First, init_new_proxy() is
called to zero all the lists and certain values, except those that can
come from defaults, which are initialized by proxy_preset_defaults().
The default server settings are also only set there.

This results in these settings not to be set for a number of internal
proxies that do not explicitly call proxy_preset_defaults() after
allocation, such as sink and log forwarders.

This was revealed by last commit 79aa63823 ("MINOR: server: always
initialize pp_tlvs for default servers") which crashes in log parsers
when applied to certain proxies which did not initialize their default
servers.

In theory this should be backported, however it would be desirable to
wait a bit before backporting it, in case certain parts would rely on
these elements not being initialized.

MINOR: server: always initialize pp_tlvs for default servers

In commit 6f4bfed3a ("MINOR: server: Add parser support for
set-proxy-v2-tlv-fmt") a suspicious check for a NULL srv_tlv was placed
in the list_for_each_entry(), that should not be needed. In practice,
it's caused by the list head not being initialized, hence the first
element is NULL, as shown by Alexander's reproducer below which crashes
if the test in the loop is removed:

  backend dummy
    default-server send-proxy-v2 set-proxy-v2-tlv-fmt(0xE1) %[fc_pp_tlv(0xE1)]
    server dummy_server 127.0.0.1:2319

The right place to initialize this field is proxy_preset_defaults().
We'd really need a function to initialize a server :-/

The check in the loop was removed. No backport is needed.

BUG/MINOR: quic: Useless use of non-contiguous buffer for in order CRYPTO data

This issue could be reproduced with a TLS client certificate verificatio to
generate enough CRYPTO data between the client and haproxy and with dev/udp/udp-perturb
as network perturbator. Haproxy could crash thanks to a BUG_ON() call as soon as
in disorder data were bufferized into a non-contiguous buffer.

There is no need to pass a non NULL non-contiguous to qc_ssl_provide_quic_data()
from qc_ssl_provide_all_quic_data() which handles in order CRYPTO data which
have not been bufferized. If not, the first call to qc_ssl_provide_quic_data()
to process the first block of in order data leads the non-contiguous buffer
head to be advanced to a wrong offset, by <len> bytes which is the length of the
in order CRYPTO frame. This is detected by a BUG_ON() as follows:

FATAL: bug condition "ncb_ret != NCB_RET_OK" matched at src/quic_ssl.c:620
  call trace(11):
  | 0x5631cc41f3cc [0f 0b 8b 05 d4 df 48 00]: qc_ssl_provide_quic_data+0xca7/0xd78
  | 0x5631cc41f6b2 [89 45 bc 48 8b 45 b0 48]: qc_ssl_provide_all_quic_data+0x215/0x576
  | 0x5631cc3ce862 [48 8b 45 b0 8b 40 04 25]: quic_conn_io_cb+0x19a/0x8c2
  | 0x5631cc67f092 [e9 1b 02 00 00 83 45 e4]: run_tasks_from_lists+0x498/0x741
  | 0x5631cc67fb51 [89 c2 8b 45 e0 29 d0 89]: process_runnable_tasks+0x816/0x879
  | 0x5631cc625305 [8b 05 bd 0c 2d 00 83 f8]: run_poll_loop+0x8b/0x4bc
  | 0x5631cc6259c0 [48 8b 05 b9 ac 29 00 48]: main-0x2c6
  | 0x7fa6c34a2ea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7
  | 0x7fa6c33c2a2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

Thank you to @Tristan971 for having reported this issue in GH #2095.

No need to backport.

CLEANUP: sink: useless leftover in sink_add_srv()

Removing a useless leftover which has been introduced with 31e8a003a5
("MINOR: sink: function to add new sink servers")

CLEANUP: sink: bad indent in sink_new_from_logger()

Fixing bad indent in sink_new_from_logger() which was recently introduced

BUG/MINOR: sink: don't learn srv port from srv addr

Since 04276f3d ("MEDIUM: server: split the address and the port into two
different fields") we should not use srv->addr to store server's port
and rely on srv->svc_port instead.

For sink servers, we correctly set >svc_port upon server creation but
we didn't use it when initializing address for the connection.

As a result, FQDN resolution will not work properly with sink servers.
Hopefully, this used to work by accident because sink servers were
resolved using the PA_O_RESOLVE flag in str2sa_range(), which made the
srv->addr contain the port in addition to the address.

But this will fail to work when FQDN resolution is postponed because only
->svc_port will contain the proper server port upon resolution.

For instance, FQDN resolution with servers from log backends (which are
resolved as regular servers, that is, without the PA_O_RESOLVE) will fail
to work because of this.

This may be backported as far as 2.2 even though the bug didn't have
noticeable effects for versions below 2.9

[In 2.2, sink_forward_session_init() didn't exist it should be applied in
sink_forward_session_create()]

BUG/MEDIUM: server: invalid address (post)parsing checks

This bug was introduced with 29b76ca ("BUG/MEDIUM: server/log: "mode log"
after server keyword causes crash ")

Indeed, we cannot safely rely on addr_proto being set when str2sa_range()
returns in parse_server() (even if SRV_PARSE_PARSE_ADDR is set), because
proto lookup might be bypassed when FQDN addresses are involved.

Unfortunately, the above patch wrongly assumed that proto would always
be set when SRV_PARSE_PARSE_ADDR was passed to parse_server() (so when
str2sa_range() was called), resulting in invalid postparsing checks being
performed, which could as well lead to crashes with log backends
("mode log" set) because some postparsing init was skipped as a result of
proto not being set and this wasn't expected later in the init code.

To fix this, we now make use of the previous patch to perform server's
address compatibility checks on hints that are always set when
str2sa_range() succesfully returns.

For log backend, we're also adding a complementary test to check if the
address family is of expected type, else we report an error, plus we're
moving the postinit logic in log api since _srv_check_proxy_mode() is
only meant to check proxy mode compatibility and we were abusing it.

This patch depends on:
- "MINOR: tools: make str2sa_range() directly return type hints"

No backport required unless 29b76ca gets backported.

MINOR: tools: make str2sa_range() directly return type hints

str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")

However, as stated into the commit message, there is a trick:
   "we can fail to return a protocol in case the caller
    accepts an fqdn for use later. This is what servers do and in this
    case it is valid to return no protocol"

In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.

While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.

Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.

BUG/MEDIUM: applet: Remove appctx from buffer wait list on release

For now, the appctx is removed from the buffer wait list when it is
freed. However, when it is released, it is not necessarily freed
immediately. But it is detached from the SC. If it is still registered in
the buffer wait list, it could then be woken up to get a buffer. At this
stage it is totally unexpected, especially because we must access the SC.

The fix is obvious, the appctx must be removed from the buffer wait list on
release.

Note this bug exists because the appctx was moved at the mux level.

This patch must be backported as far as 2.6.

DOC: config: use the word 'backend' instead of 'proxy' in 'track' description

User @nwehrman reported in issue #2328 that the used of "proxy" instead
of "backend" in the argument of the "track" server keyword is confusing.
Admittedly, all other places in the doc use "backend/server" instead of
"proxy/server", so let's update it for the sake of consistency.

MEDIUM: quic: release conn socket before using quic_cc_conn

After emission/reception of a CONNECTION_CLOSE, a connection enters the
CLOSING state. In this state, only minimal exchanges occurs as only the
packets which containted the CONNECTION_CLOSE frame can be reemitted. In
conformance with the RFC, most resources are released and quic_conn
instance is converted to the lighter quic_cc_conn.

Push further this optimization by closing quic_conn socket FD before
switching to a quic_cc_conn. This means that quic_cc_conn will rely on
listener socket for its send/recv operation. This should not impact
performance as as stated input/output are minimal on this state.

This patch should improve FD consumption as prior to this a socket FD
was kept during the closing delay which could cause maxsock to be
reached for other connections.

Note that fd member is kept in QUIC_CONN_COMMON and not removed from
quic_cc_conn. This is because quic_cc_conn relies on qc_snd_buf() which
access this field.

As a side-effect to this change, jobs accounting for quic_conn is also
updated. quic_cc_conn instances are now not counted as jobs. Indeed, the
main objective of jobs is to prevent haproxy process to be stopped with
data truncation. However, this relies on the connection to uses its
owned socket as the listener socket is shut down inconditionaly on
shutdown.

A consequence of the jobs handling change is that haproxy process will
be closed if only quic_cc_conn instances are present, thus preventing to
respect the closing state. In case of a reload, if a client missed a
CONNECTION_CLOSE frame just before process shutdown, it will probably
received a Stateless Reset on sending retry.

This change is considered safe as, for now, haproxy only emits
CONNECTION_CLOSE on error conditions (such as protocol violation or
timeout). It is considered as expected to suffer from data truncation
from this. However, if connection closing is reused by haproxy to
implement clean shutdown, it should be necessary to delay
CONNECTION_CLOSE frame emission to ensure no data truncation happens
here.

MEDIUM: quic: respect closing state even on soft-stop

Prior to this patch, a special condition was set when idle timer was
rearmed for closing connections during haproxy process stopping. In this
case, the timeout was ditched and the idle task woken up immediatly.

The objective was to release quickly closing connections to not prevent
the process stopping to be too long. However, it is not conform with RFC
9000 recommandations and may cause some clients to miss a
CONNECTION_CLOSE in case of a packet loss.

A recent fix was set to use a shorter timeout for closing state. Now a
connection should only be left in this state for one second or less.
This reduces greatly the importance of stopping special condition. Thus,
this patch removes it completely.

BUG/MINOR: quic: remove dead code in error path

In quic_rx_pkt_retrieve_conn(), err label is now only used if qc is
NULL. Thus, condition on qc can be removed.

No need to backport.

This issue was reported by coverity on github.
This should fix issue #2338.

OPTIM: mux-h2: don't allocate more buffers per connections than streams

When an H2 mux works with a slow downstream connection and without the
mux-mux mode, it is possible that a single stream will allocate all 32
buffers in the connection. This is not desirable at all because 1) it
brings no value, and 2) it allocates a lot of memory per connection,
which, in addition to using a lot of memory, tends to degrade performance
due to cache thrashing.

This patch improves the situation by refraining from sending data frames
over a connection when more mbufs than streams are allocated. On a test
featuring 10k connections each with a single stream reading from the
cache, this patch reduces the RAM usage from ~180k buffers to ~20k bufs,
and improves the bandwidth. This may even be backported later to recent
versions to improve memory usage. Note however that it is efficient only
when combined with e16762f8a ("OPTIM: mux-h2: call h2_send() directly
from h2_snd_buf()"), and tends to slightly reduce the single-stream
performance without it, so in case of a backport, the two need to be
considered together.

MINOR: task/debug: make task_queue() and task_schedule() possible callers

It's common to see process_stream() being woken up by wake_expired_tasks
in the profiling output, without knowing which timeout was set to cause
this. By making it possible to record the call places of task_queue()
and task_schedule(), and by making wake_expired_tasks() explicitly not
replace it, we'll be able to know which task_queue() or task_schedule()
was triggered for a given wakeup.

For example below:
  process_stream                51200   311.4ms   6.081us   34.59s    675.6us <- run_tasks_from_lists@src/task.c:659 task_queue
  process_stream                19227   70.00ms   3.640us   9.813m    30.62ms <- sc_notify@src/stconn.c:1136 task_wakeup
  process_stream                 6414   102.3ms   15.95us   8.093m    75.70ms <- stream_new@src/stream.c:578 task_wakeup

It's visible that it's the run_tasks_from_lists() which in fact applies
on the task->expire returned by the ->process() function itself.

MINOR: task/debug: explicitly support passing a null caller to wakeup functions

This is used for tracing and profiling. By permitting to have a NULL
caller, we allow a caller to explicitly pass zero to state that the
current caller must not be replaced. This will soon be used by
wake_expired_tasks() to avoid replacing a caller in the expire loop.

BUG/MINOR: quic: fix retry token check inconsistency

A client may send multiple INITIAL packets if ClientHello is too big for
only one. In case a Retry token is used, the client must reuse it for
every INITIAL packets.

On the haproxy server side, there was an inconsistency to handle these
packets depending on the socket mode :
* when using listener socket, token is always revalidated.
* when using connection socket, token check is bypassed. This is because
quic_conn instance is known through its socket and thus
quic_rx_pkt_retrieve_conn() is not necessary.

RFC 9000 does not seems to mandate retry token validation after the
first INITIAL packet per connection. Thus, this patch chooses to bypass
the check every time the connection instance is known, as this indicates
that a previous token was already validated.

This should be backported up to 2.7.

MEDIUM: quic: define an accept queue limit

QUIC connections are pushed manually into a dedicated listener queue
when they are ready to be accepted. This happens after handshake
finalization or on 0-RTT packet reception. Listener is then woken up to
dequeue them with listener_accept().

This patch comptabilizes the number of connections currently stored in
the accept queue. If reaching a certain limit, INITIAL packets are
dropped on reception to prevent further QUIC connections allocation.
This should help to preserve system resources.

This limit is automatically derived from the listener backlog. Half of
its value is reserved for handshakes and the other half for accept
queues. By default, backlog is equal to maxconn which guarantee that
there can't be no more than maxconn connections in handshake or waiting
to be accepted.

MEDIUM: quic: limit handshake per listener

Implement a limit per listener for concurrent number of QUIC
connections. When reached, INITIAL packets for new connections are
automatically dropped until the number of handshakes is reduced.

The limit value is automatically based on listener backlog, which itself
defaults to maxconn.

This feature is important to ensure CPU and memory resources are not
consume if too many handshakes attempt are started in parallel.

Special care is taken if a connection is released before handshake
completion. In this case, counter must be decremented. This forces to
ensure that member <qc.state> is set early in qc_new_conn() before any
quic_conn_release() invocation.

MINOR: quic: reduce half open counters scope

Accounting is implemented for half open connections which represent QUIC
connections waiting for handshake completion. When reaching a certain
limit, Retry mechanism is automatically activated prior to instantiate
new connections.

The issue with this behavior is that two notions are mixed : QUIC
connection handshake phase and Retry which is mechanism against
amplification attacks. As such, only peer address validation should be
taken into account to activate Retry protection.

This patch chooses to reduce the scope of half_open_conn. Now only
connection waiting to validate the peer address are now accounted for.
Most notably, connections instantiated with a validated Retry token
check are not accounted.

One impact of this patch is that it should prevent to activate Retry
mechanism too early, in particular in case if multiple handshakes are
too slow. Another limitation should be implemented to protect against
this scenario.

MEDIUM: quic: adjust address validation

When a new QUIC connection is created, server considers peer address as
not yet validated. The server must limit its sending up to 3 times the
content already received. This is a defensive measure to avoid flooding
a remote host victim of address spoofing.

This patch adjust the condition to consider the peer address as
validated. Two conditions are now considered :
* successful handling of a received HANDSHAKE packet. This was already
done before although implemented in a different way.
* validation of a Retry token. This was not considered prior this patch
despite RFC recommandation.

This patch also adjusts how a connection is internally labelled as using
a validated peer address. Before, above conditions were checked via
quic_peer_validated_addr(). Now, a flag QUIC_FL_CONN_PEER_VALIDATED_ADDR
is set to labelled this. It already existed prior this patch but was
only used for quic_cc_conn. This should now be more explicit.

BUG/MEDIUM: mux-h1: Exit early if fast-forward is not supported by opposite SC

The commit 4be0c7c65 ("MEDIUM: stconn/muxes: Loop on data fast-forwarding to
forward at least a buffer") introduced a regression. In h1_fastfwd(), if
data fast-forwarding is not supported by the opposite SC, we must exit
without calling se_donn_ff(). Otherwise a BUG_ON() will be triggered because
the opposite mux has no .done_fastfwd() callback function.

No backport needed.

MEDIUM: mworker: -W is mandatory when using -S

Defining a master CLI without the master-worker mode emits a warning
since version 1.8. This patch enforce the behavior by forbiding the
usage of the -S option without the master-worker mode.

DOC: management: -q is quiet all the time

The documentation about -q seems wrong, it does not output messages
after the startup, it disables all messages. It was always quiet with
the stdio_quiet() function.

Must be backported in all stable versions.

MEDIUM: errors: move the MODE_QUIET test in print_message()

Move the MODE_QUIET and MODE_VERBOSE test in print_message() so we
always output in the startup-logs even with MODE_QUIET.

ha_warning(), ha_alert() and ha_notice() does not check the MODE_QUIET
and MODE_VERBOSE anymore, it is done before doing the fprintf() in
print_message().

MINOR: errors: does not check MODE_STARTING for log emission

ha_alert(), ha_warning() and ha_notice() shouldn't check MODE_STARTING
for log emission. Let's remove the check.

This shouldn't do much since the stdio_quiet() function mute the output
in main().

MINOR: errors: ha_alert() and ha_warning() uses warn_exec_path()

Move the code to display the haproxy version and path during starting
mode, which is called by the first ha_alert() or ha_warning().

BUG/MEDIUM: stconn: Don't update stream expiration date if already expired

The commit 08d7169f4 ("MINOR: stconn: Don't queue stream task in past in
sc_notify()") tried to fix issues with epiration date set in past for the
stream in sc_notify(). However it remains some cases where the stream
expiration date may already be expired before recomputing it. This happens
when an event is reported by the mux exactly when a timeout is triggered. In
this case, depending on the scheduling, the SC may be woken up before the
stream. For these cases, we fall into the BUG_ON() preventing to queue in
the past.

So, it remains unexpected to queue a task in the past. The BUG_ON() is
correct at this place. We must just avoid to recompute the stream expiration
date if it is already expired. At worst, the stream will be woken up for
nothing. But it is not really a big deal because it will only happen on
timeouts from time to time. It is so sporadic that we can ignore it from a
performance point of view.

This patch must be backpoted to 2.8. Be careful to remove the BUG_ON() on
the 2.8.

BUG/MEDIUM: quic: Possible crashes during secrets allocations (heavy load)

This bug could be reproduced with -dMfail option and detected by libasan.
During the TLS secrets allocations, when failed, quic_tls_ctx_secs_free()
is called. It resets the already initialized secrets. Some were detected
as initialized when not, or with a non initialized length, which leads
to big "memset(0)" detected by libsasan.

Ensure that all the secrets are really initialized with correct lengths.

No need to be backported.

BUG/MEDIUM: quic: Avoid some crashes upon TX packet allocation failures

If a TX packet cannot be allocated (by qc_build_pkt()), as it can be coalesced
to another one, this leads the TX buffer to have remaining not sent prepared data.
Then haproxy crashes upon a BUG_ON() triggered by the next call to qc_txb_release().
This may happen only during handshakes.

To fix this, qc_build_pkt() returns a new -3 error to dected such allocation
failures followed which is for now on followed by a call to qc_purge_txbuf() to
send the TX prepared data and purge the TX buffer.

Must be backported as far as 2.6.

BUG/MEDIUM: quic: Possible crashes when sending too short Initial packets

This may happen during handshakes when Handshake packets cannot be coalesced
to a first Initial packet because of TX frame allocation failures (from
qc_build_frms()). This leads too short (not padded) Initial packets to be sent.
This is detected by a BUG_ON() in qc_send_ppkts().

To avoid this an Handshake packet without ack-eliciting frames which should have
been built by qc_build_frms() is built.

Must be backported as far as 2.6.

BUG/MEDIUM: quic: Avoid trying to send ACK frames from an empty ack ranges tree

This may happen upon ack ranges allocation failures (from quic_update_ack_ranges_list().
This can lead to empty trees of ack ranges to be used to build ACK frames which
is not good at all. Furthermore this is detected by a BUG_ON() (in qc_do_build_pkt()).

To avoid this, simply update the acknowledgemen state of the connection only if
quic_update_ack_ranges_list() succeeds, as it fails only in case of memory
allocation failures.

Must be backported as far as 2.6.

BUG/MEDIUM: quic: Too short Initial packet sent (enc. level allocation failed)

If the Handshake encryption level could not be allocated, this could lead
to Initial packets to be sent because no Handshake CRYPTO frames were generated.

Furthermore in such an allocation failure case, the connection should be closed
as soon as possible. This is done making ha_quic_set_encryption_secrets() return
0 upon an encryption level allocation failure.

Also fix a typo in the trace in relation to this allocation failure.

No need to be backported.

MINOR: quic: Avoid zeroing frame structures

Do not initialize anymore ->type of quic_frame structures which leads
to the others to be zeroed.

CLEANUP: quic: Indentation fix in qc_do_build_pkt()

Modification without any functional impact.

BUG/MINOR: quic: idle timer task requeued in the past

When the idle timer expired with a still present mux, this task was not freed
and even requeued with a timer in the past.

Fix this issue calling task_destroy() in this case. As the task is freed,
its handler must return NULL setting local <t> variable to NULL in every cases.

Also ensure that this timer task is not armed again after having been released
with a <return> statement when this is the case from qc_idle_timer_do_rearm().

Must be backported as far as 2.6.

MINOR: quic: Add idle timer task pointer to traces

Helpful to detect if this timer was freed or not.

MINOR: quic: release the TLS context asap from quic_conn_release()

This was no reason not to release as soon as possible the TLS/SSL QUIC connection
context from quic_conn_release() before allocating a "closing connection" connection
(quic_cc_conn struct).

MEDIUM: quic: Heavy task mode with non contiguously bufferized CRYPTO data

This patch sets the handshake task in heavy task mode when receiving in disorder
CRYPTO data which results in in order bufferized CRYPTO data. This is done
thanks to a non-contiguous buffer and from qc_handle_crypto_frm() after having
potentially bufferized CRYPTO data in this buffer.
qc_treat_rx_crypto_frms() is no more called from qc_treat_rx_pkts() but instead
this is where the task is set in heavy task mode. Consequently,
this is the job of qc_ssl_provide_all_quic_data() to call directly
qc_treat_rx_crypto_frms() to provide the in order bufferized CRYPTO data to the
TLS stack. As this function releases the non-contiguous buffer for the CRYPTO
data, if possible, there is no need to do that from qc_treat_rx_crypto_frms()
anymore.

MEDIUM: quic: Heavy task mode during handshake

Add a new pool for the CRYPTO data frames received in order.
Add ->rx.crypto_frms list to each encryption level to store such frames
when they are received in order from qc_handle_crypto_frm().
Also set the handshake task (qc_conn_io_cb()) in heavy task mode from
this function after having received such frames. When this task
detects that it is set in heavy mode, it calls qc_ssl_provide_all_quic_data()
newly implemented function to provide the CRYPTO data to the TLS task.
Modify quic_conn_enc_level_uninit() to release these CRYPTO frames
when releasing the encryption level they are in relation with.

MINOR: stconn/mux-h2: Use a iobuf flag to report EOI to consumer side during FF

IOBUF_FL_EOI iobuf flag is now set by the producer to notify the consumer
that the end of input was reached. Thanks to this flag, we can remove the
ugly ack in h2_done_ff() to test the opposite SE flags.

Of course, for now, it works and it is good enough. But we must keep in mind
that EOI is always forwarded from the producer side to the consumer side in
this case. But if this change, a new CO_RFL_ flag will have to be added to
instruct the producer if it can forward EOI or not.

MEDIUM: stconn/muxes: Loop on data fast-forwarding to forward at least a buffer

In the mux-to-mux data forwarding, we now try, as far as possible to send at
least a buffer. Of course, if the consumer side is congested or if nothing
more can be received, we leave. But the idea is to retry to fast-forward
data if less than a buffer was forwarded. It is only performed for buffer
fast-forwarding, not splicing.

The idea behind this patch is to optimise the forwarding, when a first
forward was performed to complete a buffer with some existing data. In this
case, the amount of data forwarded is artificially limited because we are
using a non-empty buffer. But without this limitation, it is highly probable
that a full buffer could have been sent. And indeed, with H2 client, a
significant improvement was observed during our test.

To do so, .done_fastfwd() callback function must be able to deal with
interim forwards. Especially for the H2 mux, to remove H2_SF_NOTIFIED flags
on the H2S on the last call only. Otherwise, the H2 stream can be blocked by
itself because it is in the send_list. IOBUF_FL_INTERIM_FF iobuf flag is
used to notify the consumer it is not the last call. This flag is then
removed on the last call.

BUG/MEDIUM: pool: try once to allocate from another bucket if empty

In order to limit inter-thread contention on the global pool, in 2.9-dev3
with commit 7bf829ace ("MAJOR: pools: move the shared pool's free_list
over multiple buckets"), it was decided that if the selected bucket had
an empty free list, we would simply give up and fall back to the OS
allocator.

But this causes allocations to be made from the OS for certain threads,
to be released to overloaded pools that are sent back to the OS. One
visible effect is that sending a lot of traffic using h2load with 100
parallel streams over 100 connections causes 5-10k buffers to be
allocated, then reducing the load to only 10 connections doesn't make
these allocations go down, just because some buckets are no longer
visited.

Tests show that giving a second chance to pick another bucket in this
case is sufficient to visit all other buckets and recycle their pending
objects. Now "show pools" that starts at 10k buffers at 100 connections
goes down to about 150 with 1 connection and 100 streams in a fraction
of a second.

No backport is needed, as the issue is only in 2.9.

BUG/MINOR: pool: check one other random bucket on alloc conflict

Since 2.9-dev3 with commit 7bf829ace ("MAJOR: pools: move the shared
pool's free_list over multiple buckets"), the global pool supports
multiple heads to reduce inter-thread contention. However, when
grabbing a freelist head fails because another thread is already
picking from it, we just skip to the next one and try again.

Unfortunately, it still maintains a bit of contention between thread
pairs when for some reasons only a few threads are used. This may
happen for example when running on a 4- or 8- thread system and
the two most active ones end up on adjacent buckets.

A better and much simpler solution consists in visiting a random bucket
instead of the current one. Tests show that the CPU usage spent in
pool_refill_local_from_shared() reduces at low number of connections
(hence threads).

No backport is needed, as the issue is only in 2.9.

BUG/MEDIUM: pool: fix releasable pool calculation when overloaded

In 2.6-dev1, the method used to decide how many pool entries could be
released at once was revisited to support releases in batches. This was
done with commits 91a8e28f9 ("MINOR: pool: add a function to estimate
how many may be released at once") and 361e31e3f ("MEDIUM: pool: compute
the number of evictable entries once per pool").

The first commit takes care of the possible inconsistency between the
moment the allocated count and the used count are read, but unfortunately
fixed it the wrong way, by adjusting "used" to match "alloc" whenever it
was lower (i.e. almost always). This results in a nasty case which is that
as soon as the allocated value becomes higher than the estimated count of
needed entries, we end up returning pool->minavail, which causes very
small batches to be released, starting from commit 1513c5479 ("MEDIUM:
pools: release cached objects in batches").

The problem was further amplified in 2.9-dev3 with commit 7bf829ace
("MAJOR: pools: move the shared pool's free_list over multiple buckets")
because it now becomes possible for a thread to allocate from one bucket
and release into a few other different ones, causing an accumulation of
entries in that bucket.

The fix is trivial, simply adjust the alloc counter if the used one is
higher, before performing operations.

This must be backported to 2.6.

BUG/MEDIUM: freq-ctr: Don't report overshoot for long inactivity period

The function returning the excess of events over the current period for a
target frequency (the overshoot) has a flaw if the inactivity period is too
long. In this case, the result may overflow. Instead to be negative, a very
high positive value is returned.

This function is used by the bandwidth limitation filter. It means after a
long inactivity period, a huge burst may be detected while it should not.

In fact, the problem arise from the moment we're past the current period. In
this case, we should not report any overshoot and just get the number of
remaining events as usual.

This patch should be backported as far as 2.7.

BUG/MINOR: mux-h1: Properly handle http-request and http-keep-alive timeouts

It is now the turn for the H1 mux to be fix to properly handle http-request
and http-keep-alive timeouts. It is quite surprising but it is broken since
the 2.2. For idle connections on client side, the smallest value between the
client timeout and the http-request/http-keep-alive timeout is used while
the client timeout should only be used if other ones are not defined. So, if
the client timeout is the smallest value, the keep-alive timeout is not
respected.

It is only an issue for idle client connections. The http-request timeout is
respected from the moment part of the next request was received.

This patch should fix the issue #2334. It must be backported as far as 2.2. But
be careful during the backports. The H1 mux had evolved a lot since the 2.2.

MINOR: stktable/cli: support v6tov4 and v4tov6 conversions

Add a special treatment for the IPV4 and IPV6 cases in
table_process_entry_per_key() function so that input string is parsed
in best effort (STR to pseudo type ADDR): input format is first considered
over table type and then let smp_to_stkey() do the type conversion for us
when needed.

This patch heavily depends on:
- "MEDIUM: stktable/cli: simplify entry key handling"

And optionally depends on:
- 72514a44 ("MEDIUM: tools/ip: v4tov6() and v6tov4() rework")

MEDIUM: stktable/cli: simplify entry key handling

Make use of smp_to_stkey() in table_process_entry_per_key() to simplify
key handling and leverage auto type conversions from sample API.

One noticeable side effect is that integer input checks will be relaxed
given that c_str2int() sample conv is more permissible than the integrated
table_process_entry_per_key() integer parser.

BUG/MINOR: stick-table/cli: Check for invalid ipv4 key

When an ipv4 key is used to filter a CLI command on a stick table
clear/set/show table ...), inetaddr_host+htonl combination was used
with no error checking.

Instead, we now use inet_pton(), which is what we use for ipv6 addresses
since b7c962b0c0 ("BUG/MINOR: stick-table/cli: Check for invalid ipv6 key")

Doing this allows us to easily check for parsing errors: we're trading off
some parsing efficience to better catch input errors and ensure we get
similar behavior between ipv4 and ipv6 addresses handling.

This patch may be backported to all supported versions.

BUG/MINOR: mux-h1: Release empty ibuf during data fast-forwarding

We must take care to release H1 input buffer when it is emptied during the
fast-forwarding nego. Otherwise, it may be kept allocated for a while,
waiting for the next "normal" receive or the H1C release.

No backport needed.

MINOR: proto_reverse_connect: use connect timeout

Use backend connect timeout when a new connection is instantiated for
rhttp. This ensures that if connect operation fails after a certain
delay, reverse_connect listener task is woken up. This allows to free
the current connection and retry a new connect.

As a consequence of this change, rev_process() may be woken up even if
connection is not reported with CO_FL_ERROR. This happens if timeout
fired before any network reported issue. Connection freeing is adjusted
as in this case MUX instance is already allocated. Use destroy callback
to release MUX context prior to the connection itself.

This patch is really useful as a side measure for a haproxy bug
impacting connect with SSL for both backend connections and active
reverse connect. This is caused by the delayed allocation of MUX
allocation. Asynchronous connect error detected at the socket layer is
not notified to upper layers. Currently, only connect timeout allows to
release this failed connection.

BUG/MEDIUM: mux-h1: Be sure xprt support splicing to use it during fast-forward

The commit d6d4abdc3 ("BUILD: mux-h1: Fix build without kernel splicing
support") introduced a regression. The kernel support for the underlying
XPRT is no longer checked. So it is possible to enable the splicing for SSL
connection. This of course leads to a segfault.

This patch restore the test on the xprt rcv_pipe/snd_pipe functions.

This patch should fix a crash reported by Tristan in #2095
(#issuecomment-1788949014). No backport needed.

BUG/MEDIUM: quic: fix sslconns on quic_conn alloc failure

QUIC connections are accounted inside global sslconns. As with QUIC
actconn, it suffered from a similar issue if an intermediary allocation
failed inside qc_new_conn().

Fix this similarly by moving increment operation inside qc_new_conn().
Increment and error path are now centralized and much easier to
validate.

The consequences are similar to the actconn fix : on memory allocation
global sslconns may wrap, this time blocking any future QUIC or SSL
connections on the process.

This must be backported up to 2.6.

BUG/MEDIUM: quic: fix actconn on quic_conn alloc failure

Since the following commit, quic_conn instances are accounted into
global actconn and compared against maxconn.

  commit 7735cf3854eb155a50a5ea747406f2a25657e25c
  MEDIUM: quic: count quic_conn instance for maxconn

Increment is always done prior to real allocation to guarantee minimal
resource consumption. Special care is taken to ensure there will always
be one decrement operation for each increment. To help this, decrement
is centralized in quic_conn_release().

This behaves incorrectly in case of an intermediary allocation failure
inside qc_new_conn(). In this case, quic_conn_release() will decrement
actconn. Then, a NULL qc is returned in quic_rx_pkt_retrieve_conn()
which will also decrement the counter on its own error code path.

To properly fix this, actconn incrementation has been moved directly
inside qc_new_conn(). It is thus easier to cover every cases :
* if alloc failure before or on pool_head_quic_conn, actconn is
  decremented manually at the end of qc_new_conn()
* after this step, actconn will be decremented by quic_conn_release()
  either on intermediary alloc failure or on proper connection release

This bug happens on memory allocation failure so it should be rare.
However, its impact is not negligeable as if actconn counter is wrapped
it will block any future connection allocation for both QUIC and TCP.

One small downside of this change is that a CID is now always allocated
before quic_conn even if maxconn will be reached. However, this is
considered as of minor importance compared to a more robust code.

This must be backported up to 2.6.

DOC: stconn: Improve comments about lra and fsb usage

Recent fixes have shown <lra> and <fsb> uses were not prettu clear. So let's
try to improve documentation about these value. Especially when <lra> is
updated and how to used it.

CLEANUP: htx: Properly indent htx_reserve_max_data() function

Spaces were used instead of tabs to indent htx_reserve_max_data()
function. Let's reindent the whole function.

BUG/MINOR: stconn: Sanitize report for read activity

When a EOS or EOI is detected on the endpoint and when the event is reported
at the SC level, a read activity must be reported. It is not really a big
deal because these flags already inhibit any read timeout. But it is
consistent with the <lra> comment. In addition, no read activity is reported
on abort. It is up-down event and it is not an event unblocking the
reads. So there is no reason to report a read activity.

This patch must be backported to 2.8.

BUG/MEDIUM: Don't apply a max value on room_needed in sc_need_room()

In sc_need_room(), we compute the maximum room that can be requested to
restarted reading to be sure to be able to unblock the SC. At worst when the
buffer is emptied. Here, the buffer reserve is considered but it is an issue.

Counting the reserve can lead to a wicked bug with the H1 multiplexer, when
small amount of data are found at the end of the HTX buffer. In this case,
to not wrap, the H1 mux requests more room. It is an optim to be able to
resync the buffer with the consumer side and to be able to perform zero-copy
transfers. However, if this amount of data is smaller than the reserve and
if the consumer is congested, we fall in a loop because the wrong value is
used to request more room. The H1 mux continues to pretend there is not
enough space in the buffer, while the effective requested value is lower
than the free space in the buffer. While the consumer is congested and does
not consume these data, the is no way to stop the loop.

We can fix the function by removing the buffer reserve from the
computation. But it remains a dangerous decision to apply a max value on
room_needed. It is safer to require the caller must set a correct value. For
now, it is true. But at the end, it is totally unexepected to wait for more
room than an empty buffer can contain.

This patch must be backported to 2.8.

MINOR: stconn: Don't queue stream task in past in sc_notify()

A task must never be queued in past. However, in sc_notify(), the stream
task, if not woken up, is queued. Thanks to previous fixes, the stream task
expiration date should be correct. But to prevent any issue, a BUG_ON() is
added to be sure it never happens. I guess a good idea could be to remove it
or change it to BUG_ON_HOT() for the final release.

BUG/MEDIUM: stconn: Don't report rcv/snd expiration date if SC cannot epxire

When receive or send expiration date of a stream-connector is retrieved, we
now automatically check if it may expire. If not, TICK_ETERNITY is returned.

The expiration dates of the frontend and backend stream-connectors are used
to compute the stream expiration date. This operation is performed at 2
places: at the end of process_stream() and in sc_notify() if the stream is
not woken up.

With this patch, there is no special changes for process_stream() because it
was already handled. It make thing a little simpler. However, it fixes
sc_notify() by avoiding to erroneously compute an expiration date in
past. This highly reduce the stream wakeups when there is contention on the
consumer side.

The bug was introduced with the commit 8073094bf ("NUG/MEDIUM: stconn:
Always update stream's expiration date after I/O"). It was an error to
unconditionnaly set the stream expiration data, without testing blocking
conditions on both SC.

This patch must be backported to 2.8.