git.ipfire.org Git - thirdparty/haproxy.git/log

MEDIUM: tree-wide: avoid manually initializing proxies

In this patch we try to use the proxy API init functions as much as
possible to avoid code redundancy and prevent proxy initialization
errors. As such, we prefer using alloc_new_proxy() and setup_new_proxy()
instead of manually allocating the proxy pointer and performing the
base init ourselves.

MINOR: flt_spoe: mark spoe agent frontend as internal

spoe agent frontend is used by the agent internally, but it is not meant
to be directly exposed like user-facing proxies defined in the config.

As such, better mark it as internal using PR_CAP_INT capability to prevent
any mis-use.

MINOR: checks: mark CHECKS-FE dummy frontend as internal

CHECKS-FE frontend is a dummy frontend used to create checks sessions
as such, it is internal and should not be exposed to the user.
Better mark it as internal using PR_CAP_INT capability to prevent
proxy API from ever exposing it.

MINOR: proxy: add setup_new_proxy() function

Split alloc_new_proxy() in two functions: the preparing part is now
handled by setup_new_proxy() which can be called individually, while
alloc_new_proxy() takes care of allocating a new proxy struct and then
calling setup_new_proxy() with the freshly allocated proxy.

BUG/MINOR: hlua: fix invalid errmsg use in hlua_init()

errmsg is used with memprintf and friends, thus it must be NULL
initialized before being passed to memprintf, else invalid read will
occur.

However in hlua_init() the errmsg value isn't initialized, let's fix that

This is really minor because it would only cause issue on error paths,
yet it may be backported to all stable versions, just in case.

BUG/MINOR: backend: do not use the source port when hashing clientip

The server's "usesrc" keyword supports among other options "client"
and "clientip". The former means we bind to the client's IP and port
to connect to the server, while the latter means we bind to its IP
only. It's done in two steps, first alloc_bind_address() retrieves
the IP address and port, and second, tcp_connect_server() decides
to either bind to the IP only or IP+port.

The problem comes with idle connection pools, which hash all the
parameters: the hash is calculated before (and ideally withouy) calling
tcp_connect_server(), and it considers the whole struct sockaddr_storage
for the hash, except that both client and clientip entirely fill it with
the client's address. This means that both client and clientip make use
of the source port in the hash calculation, making idle connections
almost not reusable when using "usesrc clientip" while they should for
clients coming from the same source. A work-around is to force the
source port to zero using "tcp-request session set-src-port int(0)" but
it's ugly.

Let's fix this by properly zeroing the port for AF_INET/AF_INET6 addresses.

This can be backported to 2.4. Thanks to Sebastien Gross for providing a
reproducer for this problem.

DEV: h2: fix h2-tracer.lua nil value index

Nick Ramirez reported the following error while testing the h2-tracer.lua
script:

Lua filter 'h2-tracer' : [state-id 0] runtime error: /etc/haproxy/h2-tracer.lua:227: attempt to index a nil value (field '?') from /etc/haproxy/h2-tracer.lua:227: in function line 109.

It is caused by h2ff indexing with an out of bound value. Indeed, h2ff
is indexed with the frame type, which can potentially be > 9 (not common
nor observed during Willy's tests), while h2ff only defines indexes from
0 to 9.

The fix was provided by Willy, it consists in skipping h2ff indexing if
frame type is > 9. It was confirmed that doing so fixes the error.

MINOR: ring/cli: support delimiting events with a trailing \0 on "show events"

At the moment it is not supported to produce multi-line events on the
"show events" output, simply because the LF character is used as the
default end-of-event mark. However it could be convenient to produce
well-formatted multi-line events, e.g. in JSON or other formats. UNIX
utilities have already faced similar needs in the past and added
"-print0" to "find" and "-0" to "xargs" to mention that the delimiter
is the NUL character. This makes perfect sense since it's never present
in contents, so let's do exactly the same here.

Thus from now on, "show events <ring> -0" will delimit messages using
a \0 instead of a \n, permitting a better and safer encapsulation.

MINOR: ring: support arbitrary delimiters through ring_dispatch_messages()

In order to support delimiting output events with other characters than
just the LF, let's pass the delimiter through the API. The default remains
the LF, used by applet_append_line(), and ignored by the log forwarder.

DOC: configuration: rework the crt-list section

The crt-list section was unclear, this patch reworks it, giving more
details on the matching algorithms and how the things are loaded.

BUG/MEDIUM: sample: fix risk of overflow when replacing multiple regex back-refs

Aleandro Prudenzano of Doyensec and Edoardo Geraci of Codean Labs
reported a bug in sample_conv_regsub(), which can cause replacements
of multiple back-references to overflow the temporary trash buffer.

The problem happens when doing "regsub(match,replacement,g)": we're
replacing every occurrence of "match" with "replacement" in the input
sample, which requires a length check. For this, a max is applied, so
that a replacement may not use more than the remaining length in the
buffer. However, the length check is made on the replaced pattern and
not on the temporary buffer used to carry the new string. This results
in the remaining size to be usable for each input match, which can go
beyond the temporary buffer size if more than one occurrence has to be
replaced with something that's larger than the remaining room.

The fix proposed by Aleandro and Edoardo is the correct one (check on
"trash" not "output"), and is the one implemented in this patch.

While it is very unlikely that a config will replace multiple short
patterns each with a larger one in a request, this possibility cannot
be entirely ruled out (e.g. mask a known, short IP address using
"XXX.XXX.XXX.XXX"). However when this happens, the replacement pattern
will be static, and not be user-controlled, which is why this patch is
marked as medium.

The bug was introduced in 2.2 with commit 07e1e3c93e ("MINOR: sample:
regsub now supports backreferences"), so it must be backported to all
versions.

Special thanks go to Aleandro and Edoardo for reporting this bug with
a simple reproducer and a fix.

BUG/MINOR: log: fix CBOR encoding with LOG_VARTEXT_START() + lf_encode_chunk()

There have been some reports that using %HV logformat alias with CBOR
encoder would produce invalid CBOR payload according to some CBOR
implementations such as "cbor.me". Indeed, with the below log-format:

  log-format "%{+cbor}o %(protocol)HV"

And the resulting CBOR payload:

  BF6870726F746F636F6C7F48485454502F312E31FFFF

cbor.me would complain with: "bytes/text mismatch (ASCII-8BIT != UTF-8) in
streaming string") error message.

It is due to the version string being first announced as text, while CBOR
encoder actually encodes it as byte string later when lf_encode_chunk()
is used.

In fact it affects all patterns combining LOG_VARTEXT_START() with
lf_encode_chunk() which means  %HM, %HU, %HQ, %HPO and %HP are also
affected. To fix the issue, in _lf_encode_bytes() (which is
lf_encode_chunk() helper), we now check if we are inside a VARTEXT (we
can tell it if ctx->in_text is true), in which case we consider that we
already announced the current data as regular text so we keep the same
type to encode the bytes from the chunk to prevent inconsistencies.

It should be backported in 3.0

BUILD: atomics: fix build issue on non-x86/non-arm systems

Commit f435a2e518 ("CLEANUP: atomics: also replace __sync_synchronize()
with __atomic_thread_fence()") replaced the builtins used for barriers,
but the different API required an argument while the macros didn't specify
any, resulting in double parenthesis that were causing obscure build errors
such as "called object type 'void' is not a function or function pointer".
Let's just specify the args for the macro. No backport is needed.

MEDIUM: ssl/crt-list: warn on negative filters only

negative SNI filters on crt-list lines only have a meaning when they
match a positive wildcard filter. This patch adds a warning which
is emitted when trying to use negative filters without any wildcard on
the same line.

This was discovered in ticket #2900.

MEDIUM: ssl/crt-list: warn on negative wildcard filters

negative wildcard filters were always a noop, and are not useful for
anything unless you want to use !* alone to remove every name from a
certificate.

This is confusing and the documentation never stated it correctly. This
patch adds a warning during the bind initialization if it founds one,
only !* does not emit a warning.

This patch was done during the debugging of issue #2900.

CLEANUP: log: adjust _lf_cbor_encode_byte() comment

_lf_cbor_encode_byte() comment was not updated in c33b857df ("MINOR: log:
support true cbor binary encoding") to reflect the new behavior.

Indeed, binary form is now supported. Updating the comment that says
otherwise.

MEDIUM: task: make notification_* API thread safe by default

Some notification_* functions were not thread safe by default as they
assumed only one producer would emit events for registered tasks.

While this suited well with the Lua sockets use-case, this proved to
be a limitation with some other event sources (ie: lua Queue class)

instead of having to deal with both the non thread safe and thread
safe variants (_mt suffix), which is error prone, let's make the
entire API thread safe regarding the event list.

Pruning functions still require that only one thread executes them,
with Lua this is always the case because there is one cleanup list
per context.

MINOR: hlua_fcn: add Queue:alarm()

Queue:alarm() sets a wakeup alarm on the task when new data becomes
available on Queue. It must be re-armed for each event.

Lua documentation was updated

MINOR: hlua: add AppletTCP:try_receive()

This is the non-blocking variant for AppletTCP:receive(). It doesn't
take any argument, instead it tries to read as much data as available
at once. If no data is available, empty string is returned.

Lua documentation was updated.

MINOR: hlua: split hlua_applet_tcp_recv_yield() in two functions

Split hlua_applet_tcp_recv_yield() in order to create
hlua_applet_tcp_recv_try() helper function which does a single receive
attempt.

MINOR: hlua: core.wait() takes optional delay paramater

core.wait() now accepts optional delay parameter in ms. Passed this delay
the task is woken up if no event woke the task before.

Lua documentation was updated.

MINOR: hlua: add core.wait()

Similar to core.yield(), except that the task is not woken up
automatically, instead it waits for events to trigger the task
wakeup.

Lua documentation was updated.

MINOR: hlua_fcn: register queue class using hlua_register_metatable()

Most lua classes are registered by leveraging the
hlua_register_metatable() helper. Let's use that for the Queue class as
well for consitency.

BUG/MINOR: hlua_fcn: fix potential UAF with Queue:pop_wait()

If Queue:pop_wait() excecuted from a stream context and pop_wait() is
aborted due to a Lua or ressource error, then the waiting object pointing
to the task will still be registered, so if the task eventually dissapears,
Queue:push() may try to wake invalid task pointer..

To prevent this bug from happening, we now rely on notification_* API to
deliver waiting signals. This way signals are properly garbage collected
when a lua context is destroyed.

It should be backported in 2.8 with 86fb22c55 ("MINOR: hlua_fcn: add Queue
class").
This patch depends on ("MINOR: task: add thread safe notification_new and
notification_wake variants")

MINOR: task: add thread safe notification_new and notification_wake variants

notification_new and notification_wake were historically meant to be
called by a single thread doing both the init and the wakeup for other
tasks waiting on the signals.

In this patch, we extend the API so that notification_new and
notification_wake have thread-safe variants that can safely be used with
multiple threads registering on the same list of events and multiple
threads pushing updates on the list.

MINOR: check: implement check-pool-conn-name srv keyword

This commit is a direct follow-up of the previous one. It defines a new
server keyword check-pool-conn-name. It is used as the default value for
the name parameter of idle connection hash generation.

Its behavior is similar to server keyword pool-conn-name, but reserved
for checks reuse. If check-pool-conn-name is set, it is used in priority
to match a connection for reuse. If unset, a fallback is performed on
check-sni.

MINOR: check/backend: support conn reuse with SNI

Support for connection reuse during server checks was implemented
recently. This is activated with the server keyword check-reuse-pool.

Similarly to stream processing via connect_backend(), a connection hash
is calculated when trying to perform reuse for checks. This is necessary
to retrieve for a connection which shares the check connect parameters.
However, idle connections can additionnally be tagged using a
pool-conn-name or SNI under connect_backend(). Check reuse does not test
these values, which prevent to retrieve a matching connection.

Improve this by using "check-sni" value as idle connection hash input
for check reuse. be_calculate_conn_hash() API has been adjusted so that
name value can be passed as input, both when using streams or checks.

Even with the current patch, there is still some scenarii which could
not be covered for checks connection reuse. most notably, when using
dynamic pool-conn-name/SNI value. It is however at least sufficient to
cover simpler cases.

MINOR: server: activate automatically check reuse for rhttp@ protocol

Without check-reuse-pool, it is impossible to perform check on server
using @rhttp protocol. This is due to the inherent nature of the
protocol which does not implement an active connect method.

Thus, ensure that check-reuse-pool is always set when a reverse HTTP
server is declared. This reduces server configuration and should prevent
any omission. Note that it is still require to add "check" server
keyword so activate server checks.

BUG/MINOR: server: ensure check-reuse-pool is copied from default-server

Duplicate server check.reuse_pool boolean value in srv_settings_cpy().
This is necessary to ensure that check-reuse-pool value can be set via
default-server or server-template.

This does not need to be backported.

MINOR: backend: mark srv as nonnull in alloc_dst_address()

Server instance can be NULL on connect_server(), either when dispatch or
transparent proxy are active. However, in alloc_dst_address() access to
<srv> is safe thanks to SF_ASSIGNED stream flag. Add an ASSUME_NONNULL()
to reflect this state.

This should fix coverity report from github issue #2922.

DOC: configuration: replace "crt" by "ssl-f-use" in listeners

Replace the "crt" keyword from the frontend section with a "ssl-f-use"
keyword, "crt" could be ambigous in case we don't want to put a
certificate filename.

MEDIUM: ssl: replace "crt" lines by "ssl-f-use" lines

The new "crt" lines in frontend and listen sections are confusing:

- a filename is mandatory but we could need a syntax without the
filename in the future, if the filename is generated for example
- there is no clue about the fact that its only used on the frontend
side when reading the line

A new "ssl-f-use" line replaces the "crt" line, but a "crt" keyword
can be used on this line. "f" indicates that this is the frontend
configuration, a "ssl-b-use" keyword could be used in the future.

The "crt" lines only appeared in 3.2-dev so this won't change anything
for people using configurations from previous major versions.

TESTS: Fix build for filltab25.c

Give a return type to main(), so that filltab25.c compiles with
modern compilers.

CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()

The drop of older compilers also allows us to focus on clearer
barriers, so let's use them.

CLEANUP: atomics: remove support for gcc < 4.7

The old __sync_* API is no longer necessary since we do not support
gcc before 4.7 anymore. Let's just get rid of this code, the file is
still ugly enough without it.

CLEANUP: assorted typo fixes in the code, commits and doc

CI: codespell: add "pres" to spellcheck whitelist

spellcheck was triggered by the following:

  * pres  : same as "res" but using the parent stream, if any. "pres"
            variables are only accessible during response processing of the
            parent stream.

CI: spell check: allow manual trigger

BUG/MEDIUM: peers: prevent learning expiration too far in futur from unsync node

This patch sets the expire of the entry to the max value in
configuration if the value showed in the peer update message
is too far in futur.

This should be backported an all supported branches.

BUG/MINOR: peers: fix expire learned from a peer not converted from ms to ticks

This is has now impact currently since MS_TO_TICKS macro does nothing
but it will prevent further bugs.

MEDIUM: stream: Save SC and channel flags earlier in process_steam()

At the begining of process_stream(), the flags of the stream connectors and
channels are saved to be able to handle changes performed in sub-functions
(for instance in analyzers). But, some operations were performed before
saving these flags: Synchronous receives and forced shutdowns. While it
seems to safe for now, it is a bit annoying because some events could be
missed.

So, to avoid bugs in the future, the channels and stream connectors flags
are now really saved before any other processing.

BUG/MEDIUM: stream: Fix a possible freeze during a forced shut on a stream

When a forced shutdown is performed on a stream, it is possible to freeze it
infinitly because it is performed in an unexpected way from process_stream()
point of view, especially when the stream is waiting for a server
connection. The events sequence is a bit complex but at the end the stream
remains blocked in turn-around state and no event are trriggered to unblock
it.

By trying to fix the issue, we considered it was safer to rethink the
feature. The idea is to quickly shutdown a stream to release resources. For
instance to be able to delete a server. So, instead of scheduling a
shutdown, it is more efficient to trigger an error and detach the stream
from the server, if neecessary. The same code than the one used to deal with
connection errors in back_handle_st_cer() is used.

This patch must be slowly backported as far as 2.6.

REORG: ssl: move curves2nid and nid2nist to ssl_utils

curves2nid and nid2nist are generic functions that could be used outside
the JWS scope, this patch put them at the right place so they can be
reused.

[RELEASE] Released version 3.2-dev9

Released version 3.2-dev9 with the following main changes :
    - MINOR: quic: move global tune options into quic_tune
    - CLEANUP: quic: reorganize TP flow-control initialization
    - MINOR: quic: ignore uni-stream for initial max data TP
    - MINOR: mux-quic: define config for max-data
    - MINOR: quic: define max-stream-data configuration as a ratio
    - MEDIUM: lb-chash: add directive hash-preserve-affinity
    - MEDIUM: pools: be a bit smarter when merging comparable size pools
    - REGTESTS: disable the test balance/balance-hash-maxqueue
    - BUG/MINOR: log: fix gcc warn about truncating NUL terminator while init char arrays
    - CI: fedora rawhide: allow "on: workflow_dispatch" in forks
    - CI: fedora rawhide: install "awk" as a dependency
    - CI: spellcheck: allow "on: workflow_dispatch" in forks
    - CI: coverity scan: allow "on: workflow_dispatch" in forks
    - CI: cross compile: allow "on: workflow_dispatch" in forks
    - CI: Illumos: allow "on: workflow_dispatch" in forks
    - CI: NetBSD: allow "on: workflow_dispatch" in forks
    - CI: QUIC Interop on AWS-LC: allow "on: workflow_dispatch" in forks
    - CI: QUIC Interop on LibreSSL: allow "on: workflow_dispatch" in forks
    - MINOR: compiler: add __nonstring macro
    - MINOR: thread: dump the CPU topology in thread_map_to_groups()
    - MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal()
    - MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode
    - MINOR: cpu-topo: add a dump of thread-to-CPU mapping to -dc
    - MINOR: cpu-topo: pass an extra argument to ha_cpu_policy
    - MINOR: cpu-topo: add new cpu-policies "group-by-2-clusters" and above
    - BUG/MINOR: config: silence .notice/.warning/.alert in discovery mode
    - EXAMPLES: add "games.cfg" and an example game in Lua
    - MINOR: jws: emit the JWK thumbprint
    - TESTS: jws: change the jwk format
    - MINOR: ssl/ckch: add substring parser for ckch_conf
    - MINOR: mt_list: Implement mt_list_try_lock_prev().
    - MINOR: lbprm: Add method to deinit server and proxy
    - MINOR: threads: Add HA_RWLOCK_TRYRDTOWR()
    - MAJOR: leastconn; Revamp the way servers are ordered.
    - BUG/MINOR: ssl/ckch: leak in error path
    - BUILD: ssl/ckch: potential null pointer dereference
    - MINOR: log: support "raw" logformat node typecast
    - CLEANUP: assorted typo fixes in the code and comments
    - DOC: config: fix two missing "content" in "tcp-request" examples
    - MINOR: cpu-topo: cpu_dump_topology() SMT info check little optimisation
    - BUILD: compiler: undefine the CONCAT() macro if already defined
    - BUG/MEDIUM: leastconn: Don't try to reposition if the server is down
    - BUG/MINOR: rhttp: fix incorrect dst/dst_port values
    - BUG/MINOR: backend: do not overwrite srv dst address on reuse
    - BUG/MEDIUM: backend: fix reuse with set-dst/set-dst-port
    - MINOR: sample: define bc_reused fetch
    - REGTESTS: extend conn reuse test with transparent proxy
    - MINOR: backend: fix comment when killing idle conns
    - MINOR: backend: adjust conn_backend_get() API
    - MINOR: backend: extract conn hash calculation from connect_server()
    - MINOR: backend: extract conn reuse from connect_server()
    - MINOR: backend: remove stream usage on connection reuse
    - MINOR: check define check-reuse-pool server keyword
    - MEDIUM: check: implement check-reuse-pool
    - BUILD: backend: silence a build warning when not using ssl
    - BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6
    - BUILD: ssl_ckch: use my_strndup() instead of strndup()
    - DOC: update INSTALL to reflect the minimum compiler version

DOC: update INSTALL to reflect the minimum compiler version

The mt_list update in 3.1 mandated the support for c11-like atomics that
arrived with gcc-4.7. As such, older versions are no longer supported.
For special cases in single-threaded environments, mt_lists could be
replaced with regular lists but it doesn't seem worth the hassle. It
was verified that gcc 4.7 to 14 and clang 3.0 and 19 do build fine.
That leaves us with 10 years of coverage of compiler versions, which
remains reasonable assuming that users of old ultra-stable systems are
unlikely to upgrade haproxy without touching the rest of the system.

This should be backported to 3.1.

BUILD: ssl_ckch: use my_strndup() instead of strndup()

Not all systems have strndup(), that's why we have our "my_strndup()",
so let's make use of it here. This fixes the build on Solaris 10.
No backport is needed, this was just merged with commit fdcb97614c
("MINOR: ssl/ckch: add substring parser for ckch_conf").

BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6

The UDP GSO code emits a build warning with older toolchains (gcc 5 and 6):

  src/quic_sock.c: In function 'cmsg_set_gso':
  src/quic_sock.c:683:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    *((uint16_t *)CMSG_DATA(c)) = gso_size;
    ^

Let's just use the write_u16() function that's made for this purpose.
It was verified that for all versions from 5 to 13, gcc produces the
exact same code with the fix (and without the warning). It arrived in
3.1 with commit 448d3d388a ("MINOR: quic: add GSO parameter on quic_sock
send API") so this can be backported there.

BUILD: backend: silence a build warning when not using ssl

Since recent commit ee94a6cfc1 ("MINOR: backend: extract conn reuse
from connect_server()") a build warning "set but not used" on the
"reuse" variable is emitted, because indeed the variable is now only
checked when SSL is in use. Let's just mark it as such.

MEDIUM: check: implement check-reuse-pool

Implement the possibility to reuse idle connections when performing
server checks. This is done thanks to the recently introduced functions
be_calculate_conn_hash() and be_reuse_connection().

One side effect of this change is that be_calculate_conn_hash() can now
be called with a NULL stream instance. As such, part of the functions
are adjusted accordingly.

Note that to simplify configuration, connection reuse is not performed
if any specific check connection parameters are defined on the server
line or via the tcp-check connect rule. This is performed via newly
defined tcpcheck_use_nondefault_connect().

MINOR: check define check-reuse-pool server keyword

Define a new server keyword check-reuse-pool, and its counterpart with a
"no" prefix. For the moment, only parsing is implemented. The real
behavior adjustment will be implemented in the next patch.

MINOR: backend: remove stream usage on connection reuse

Adjust newly defined be_reuse_connection() API. The stream argument is
removed. This will allows checks to be able to invoke it without relying
on a stream instance.

MINOR: backend: extract conn reuse from connect_server()

Following the previous patch, the part directly related to connection
reuse is extracted from connect_server(). It is now define in a new
function be_reuse_connection().

MINOR: backend: extract conn hash calculation from connect_server()

On connection reuse, a hash is first calculated. It is generated from
various connection parameters, to retrieve a matching connection.

Extract hash calculation from connect_server() into a new dedicated
function be_calculate_conn_hash(). The objective is to be able to
perform connection reuse for checks, without connect_server() invokation
which relies on a stream instance.

MINOR: backend: adjust conn_backend_get() API

The main objective of this patch is to remove the stream instance from
conn_backend_get() parameters. This would allow to perform reuse outside
of stream contexts, for example for checks purpose.

MINOR: backend: fix comment when killing idle conns

Previously, if a server reached its pool-high-count limit, connection
were killed on connect_server() when reuse was not possible. However,
this is now performed even if reuse is done since the following patch :
b3397367dc7cec9e78c62c54efc24d9db5cde2d2
MEDIUM: connections: Kill connections even if we are reusing one.

Thus, adjust the related comment to reflect this state.

REGTESTS: extend conn reuse test with transparent proxy

Recently, work on connection reuses reveals an issue when mixed with
transparent proxy and set-dst. This patch rewrites the related regtests
to be able to catch this now fixed bug.

Note that it is the first regtest which relies on bc_reused recently
introduced sample fetches. This fetch could be reuse in other related
connection reuse regtests to simplify them.

MINOR: sample: define bc_reused fetch

Define a new layer4 sample fetch "bc_reused". It is used as a boolean,
set to true if backend connection was reused for the request.

BUG/MEDIUM: backend: fix reuse with set-dst/set-dst-port

On backend connection reuse, a hash is calculated from various
parameters, to ensure the selected connection match the requested
parameters. Notably, destination address is one of these parameters.
However, it is only taken into account if using a transparent server
(server address 0.0.0.0).

This may cause issue where an incorrect connection is reused, which is
not targetted to the correct destination address. This may be the case
if a set-dst/set-dst-port is used with a transparent proxy (proxy option
transparent).

The fix is simple enough. Destination address is now always used as
input to the connection reuse hash.

This must be backported up to 2.6. Note that for reverse HTTP to work,
it relies on the following patch, which ensures destination address
remains NULL in this case.

commit e94baf6ca71cb2319610baa74dbf17b9bc602b18
BUG/MINOR: rhttp: fix incorrect dst/dst_port values

BUG/MINOR: backend: do not overwrite srv dst address on reuse

Previously, destination address of backend connection was systematically
always reassigned. However, this step is unnecessary on connection
reuse. Indeed, reuse should only be conducted with connection using the
same destination address matching the stream requirements.

This patch removes this unnecessary assignment. It is now only performed
when reuse cannot be conducted and a new connection is instantiated.

Functionnally speaking, this patch should not change anything in theory,
as reuse is performed in conformance with the destination address.
However, it appears that it was not always properly enforced. The
systematic assignment of the destination address hides these issues, so
it is now remove. The identified bogus cases will then be fixed in the
following patches.would

This should be backported up to all stable versions.

BUG/MINOR: rhttp: fix incorrect dst/dst_port values

With a @rhttp server, connect is not possible, transfer is only possible
via idle connection reuse. The server does not have any network address.

Thus, it is unnecessary to allocate the stream destination address prior
to connection reuse. This patch adjusts this by fixing
alloc_dst_address() to take this into account.

Prior to this patch, alloc_dst_address() would incorrectly assimilate a
@rhttp server with a transparent proxy mode. Thus stream destination
address would be copied from the destination address. Connection adress
would then be rewrote with this incorrect value. This did not impact
connect or reuse as destination addr is only used in idle conn hash
calculation for transparent servers. However, it causes incorrect values
for dst/dst_port samples.

This should be backported up to 2.9.

BUG/MEDIUM: leastconn: Don't try to reposition if the server is down

It may happen that the server is going down, and fwlc_srv_reposition()
is still called, because streams still attached to the server are
being terminated.
So in fwlc_srv_reposition(), just do nothing if we've been removed from
the tree.

This should fix github issue #2919.

This should not be backported, unless commit
9fe72bba3cf3484577fa1ef00723de08df757996 is also backported.

BUILD: compiler: undefine the CONCAT() macro if already defined

As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD
which defines its own as __CONCAT() (which is exactly the same). Let's
just undefine it before ours to fix the issue instead of renaming, but
keep ours so that we don't have doubts about what we're running with.

Note that the patch introducing this breaking change was backported
to 3.0.

MINOR: cpu-topo: cpu_dump_topology() SMT info check little optimisation

Once we stumble across the first cpu having the criteria, we exit
earlier from the loop.

DOC: config: fix two missing "content" in "tcp-request" examples

As reported by Uku Sõrmus in GitHub issue #2917, two "tcp-request" rules
in an example were mistakenly missing the "content" hook, rendering them
invalid.

This can be backported.

CLEANUP: assorted typo fixes in the code and comments

code, comments and doc actually.

MINOR: log: support "raw" logformat node typecast

"raw" logformat node typecast is a special value (unlike str,bool,int..)
which tells haproxy to completely ignore logformat options (including
encoding ones) and force binary output for the current node only. It is
mainly intended for use with JSON or CBOR encoders in order to generate
nested CBOR or nested JSON by storing intermediate log-formats within
variables and assembling the final object in the parent log-format.

Example:

  http-request set-var-fmt(txn.intermediate) "%{+json}o %(lower)[str(value)]"

  log-format "%{+json}o %(upper)[str(value)] %(intermediate:raw)[var(txn.intermediate)]"

Would produce:

   {"upper": "value", "intermediate": {"lower": "value"}}

BUILD: ssl/ckch: potential null pointer dereference

src/ssl_ckch.c: In function ‘ckch_conf_parse’:
src/ssl_ckch.c:4852:40: error: potential null pointer dereference [-Werror=null-dereference]
4852 | while (*r) {
| ^~

Add a test on r before using *r.

No backport needed

BUG/MINOR: ssl/ckch: leak in error path

fdcb97614cb ("MINOR: ssl/ckch: add substring parser for ckch_conf")
introduced a leak in the error path when the strndup fails.

This patch fixes issue #2920. No backport needed.

MAJOR: leastconn; Revamp the way servers are ordered.

For leastconn, servers used to just be stored in an ebtree.
Each server would be one node.
Change that so that nodes contain multiple mt_lists. Each list
will contain servers that share the same key (typically meaning
they have the same number of connections). Using mt_lists means
that as long as tree elements already exist, moving a server from
one tree element to another does no longer require the lbprm write
lock.
We use multiple mt_lists to reduce the contention when moving
a server from one tree element to another. A list in the new
element will be chosen randomly.
We no longer remove a tree element as soon as they no longer
contain any server. Instead, we keep a list of all elements,
and when we need a new element, we look at that list only if it
contains a number of elements already, otherwise we'll allocate
a new one. Keeping nodes in the tree ensures that we very
rarely have to take the lbrpm write lock (as it only happens
when we're moving the server to a position for which no
element is currently in the tree).

The number of mt_lists used is defined as FWLC_NB_LISTS.
The number of tree elements we want to keep is defined as
FWLC_MIN_FREE_ENTRIES, both in defaults.h.
The value used were picked afrer experimentation, and
seems to be the best choice of performances vs memory
usage.

Doing that gives a good boost in performances when a lot of
servers are used.
With a configuration using 500 servers, before that patch,
about 830000 requests per second could be processed, with
that patch, about 1550000 requests per second are
processed, on an 64-cores AMD, using 1200 concurrent connections.

MINOR: threads: Add HA_RWLOCK_TRYRDTOWR()

Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock
from reader to writer, and fails if any seeker or writer already
holds it.

MINOR: lbprm: Add method to deinit server and proxy

Add two new methods to lbprm, server_deinit() and proxy_deinit(),
in case something should be done at the lbprm level when
removing servers and proxies.

MINOR: mt_list: Implement mt_list_try_lock_prev().

Implement mt_list_try_lock_prev(), that does the same thing
as mt_list_lock_prev(), exceot if the list is locked, it
returns { NULL, NULL } instaed of waiting.

MINOR: ssl/ckch: add substring parser for ckch_conf

Add a substring parser for the ckch_conf keyword parser, this will split
a string into multiple substring, and strdup them in a array.

TESTS: jws: change the jwk format

The format of the jwk output changed a little bit because of the
previous commit.

MINOR: jws: emit the JWK thumbprint

jwk_thumbprint() is a function which is a function which implements
RFC7368 and emits a JWK thumbprint using a EVP_PKEY.

EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in
order to match what is required to emit a thumbprint (ie, no spaces or
lines and the lexicographic order of the fields)

EXAMPLES: add "games.cfg" and an example game in Lua

The purpose is mainly to exhibit certain limitations that come with such
less common programming models, to show users how to program interactive
tools in Lua, and how to connect interactively.

Other use cases that could be envisioned are "top" and various monitoring
utilities, with sliding graphs etc. Lua is particularly attractive for
this usage, easy to program, well known from most AI tools (including its
integration into haproxy), making such programs very quick to obtain in
their basic form, and to improve later.

A very limited example game is provided, following the principle of a
very popular one, where the player must compose lines from falling
pieces. It quickly revealed the need to the ability to enforce a timeout
to applet:receive(). Other identified limitations include the difficulty
from the Lua side to monitor multiple events at once, but it seems that
callbacks and/or event dispatchers would be useful here.

At the moment the CLI is not workable (it interactivity was broken in 2.9
when line buffering was adopted), though it was verified that it works
with older releases.

The command needed to connect to the game is displayed as a notice message
during boot.

BUG/MINOR: config: silence .notice/.warning/.alert in discovery mode

When first pre-parsing the config to detect the presence or absence of
the master mode, we must not emit messages because they are not supposed
to be visible at this point, otherwise they appear twice each. The
pre-parsing, also called discovery mode, is only for internal use,
thus it should remain silent.

This should be backported to 3.1 where this mode was introduced.

MINOR: cpu-topo: add new cpu-policies "group-by-2-clusters" and above

This adds "group-by-{2,3,4}-clusters", which, as its name implies,
create one thread group per X clusters. This can be useful when CPUs
are split into too small clusters, as well as when the total number
of assigned cores is not even between the clusters, to try to spread
the load between less different ones.

MINOR: cpu-topo: pass an extra argument to ha_cpu_policy

This extra argument will allow common functions to distinguish between
multiple policies. For now it's not used.

MINOR: cpu-topo: add a dump of thread-to-CPU mapping to -dc

When emitting the CPU topology info with -dc, also emit a list of
thread-to-CPU mapping. The group/thread and thread ID are emitted
with the list of their CPUs on each line. The count of CPUs is shown
to ease comparisons, and as much as possible, we try to pack identical
lines within a group by showing thread ranges.

MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode

The new function "print_cpu_set()" will print cpu sets in a human-friendly
way, with commas and dashes for intervals. The goal is to keep them compact
enough.

MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal()

This function returns true if two CPU sets are equal.

MINOR: thread: dump the CPU topology in thread_map_to_groups()

It was previously done in thread_detect_count() but that's not quite
handy because we still don't know about the groups setting. Better do
it slightly later and have all the relevant info instead.

MINOR: compiler: add __nonstring macro

GCC 15 throws the following warning on fixed-size char arrays if they do not
contain terminated NUL:

src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
2041 | const char hextab[16] = "0123456789ABCDEF";

We are using a couple of such definitions for some constants. Converting them
to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences,
as enlarged arrays won't fit anymore where they were possibly located due to
the memory alignement constraints.

GCC adds 'nonstring' variable attribute for such char arrays, but clang and
other compilers don't have it. Let's wrap 'nonstring' with our
__nonstring macro, which will test if the compiler supports this attribute.

This fixes the issue #2910.

CI: QUIC Interop on LibreSSL: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: QUIC Interop on AWS-LC: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: NetBSD: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: Illumos: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: cross compile: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: coverity scan: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: spellcheck: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

CI: fedora rawhide: install "awk" as a dependency

for some reason it is not installed by default on rawhide anymore

CI: fedora rawhide: allow "on: workflow_dispatch" in forks

previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks

BUG/MINOR: log: fix gcc warn about truncating NUL terminator while init char arrays

gcc 15 throws such kind of warnings about initialization of some char arrays:

src/log.c:181:33: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
  181 | const char sess_term_cond[16] = "-LcCsSPRIDKUIIII"; /* normal, Local, CliTo, CliErr, SrvTo, SrvErr, PxErr, Resource, Internal, Down, Killed, Up, -- */
      |                                 ^~~~~~~~~~~~~~~~~~
src/log.c:182:33: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Werror=unterminated-string-initialization]
  182 | const char sess_fin_state[8]  = "-RCHDLQT";     /* cliRequest, srvConnect, srvHeader, Data, Last, Queue, Tarpit */

So, let's make it happy by not giving the sizes of these char arrays
explicitly, thus he can accomodate there NUL terminators.

Reported in GitHub issue #2910.

This should be backported up to 2.6.

REGTESTS: disable the test balance/balance-hash-maxqueue

This test brought by commit 8ed1e91efd ("MEDIUM: lb-chash: add directive
hash-preserve-affinity") seems to have hit a limitation of what can be
expressed in vtc, as it would be desirable to have one server response
release two clients at once but the various attempts using barriers
have failed so far. The test seems to work fine locally but still fails
almost 100% of the time on the CI, so it remains timing dependent in
some ways. Tests have been done with nbthread 1, pool-idle-shared off,
http-reuse never (since always fails locally) etc but to no avail. Let's
just mark it broken in case we later figure another way to fix it. It's
still usable locally most of the time, though.

MEDIUM: pools: be a bit smarter when merging comparable size pools

By default, pools of comparable sizes are merged together. However, the
current algorithm is dumb: it rounds the requested size to the next
multiple of 16 and compares the sizes like this. This results in many
entries which are already multiples of 16 not being merged, for example
1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are
separate (though 56 merges with 64).

This commit changes this to consider not just the entry size but also the
average entry size, that is, it compares the average size of all objects
sharing the pool with the size of the object looking for a pool. If the
object is not more than 1% bigger nor smaller than the current average
size or if it neither 16 bytes smaller nor larger, then it can be merged.
Also, it always respects exact matches in order to avoid merging objects
into larger pools or worse, extending existing ones for no reason, and
when there's a tie, it always avoids extending an existing pool.

Also, we now visit all existing pools in order to spot the best one, we
do not stop anymore at the smallest one large enough. Theoretically this
could cost a bit of CPU but in practice it's O(N^2) with N quite small
(typically in the order of 100) and the cost at each step is very low
(compare a few integer values). But as a side effect, pools are no
longer sorted by size, "show pools bysize" is needed for this.

This causes the objects to be much better grouped together, accepting to
use a little bit more sometimes to avoid fragmentation, without causing
everyone to be merged into the same pool. Thanks to this we're now
seeing 36 pools instead of 48 by default, with some very nice examples
of compact grouping:

  - Pool qc_stream_r (80 bytes) : 13 users
      >  qc_stream_r : size=72 flags=0x1 align=0
      >  quic_cstrea : size=80 flags=0x1 align=0
      >  qc_stream_a : size=64 flags=0x1 align=0
      >  hlua_esub   : size=64 flags=0x1 align=0
      >  stconn      : size=80 flags=0x1 align=0
      >  dns_query   : size=64 flags=0x1 align=0
      >  vars        : size=80 flags=0x1 align=0
      >  filter      : size=64 flags=0x1 align=0
      >  session pri : size=64 flags=0x1 align=0
      >  fcgi_hdr_ru : size=72 flags=0x1 align=0
      >  fcgi_param_ : size=72 flags=0x1 align=0
      >  pendconn    : size=80 flags=0x1 align=0
      >  capture     : size=64 flags=0x1 align=0

  - Pool h3s (56 bytes) : 17 users
      >  h3s         : size=56 flags=0x1 align=0
      >  qf_crypto   : size=48 flags=0x1 align=0
      >  quic_tls_se : size=48 flags=0x1 align=0
      >  quic_arng   : size=56 flags=0x1 align=0
      >  hlua_flt_ct : size=56 flags=0x1 align=0
      >  promex_metr : size=48 flags=0x1 align=0
      >  conn_hash_n : size=56 flags=0x1 align=0
      >  resolv_requ : size=48 flags=0x1 align=0
      >  mux_pt      : size=40 flags=0x1 align=0
      >  comp_state  : size=40 flags=0x1 align=0
      >  notificatio : size=48 flags=0x1 align=0
      >  tasklet     : size=56 flags=0x1 align=0
      >  bwlim_state : size=48 flags=0x1 align=0
      >  xprt_handsh : size=48 flags=0x1 align=0
      >  email_alert : size=56 flags=0x1 align=0
      >  caphdr      : size=41 flags=0x1 align=0
      >  caphdr      : size=41 flags=0x1 align=0

  - Pool quic_cids (32 bytes) : 13 users
      >  quic_cids   : size=16 flags=0x1 align=0
      >  quic_tls_ke : size=32 flags=0x1 align=0
      >  quic_tls_iv : size=12 flags=0x1 align=0
      >  cbuf        : size=32 flags=0x1 align=0
      >  hlua_queuew : size=24 flags=0x1 align=0
      >  hlua_queue  : size=24 flags=0x1 align=0
      >  promex_modu : size=24 flags=0x1 align=0
      >  cache_st    : size=24 flags=0x1 align=0
      >  spoe_appctx : size=32 flags=0x1 align=0
      >  ehdl_sub_tc : size=32 flags=0x1 align=0
      >  fcgi_flt_ct : size=16 flags=0x1 align=0
      >  sig_handler : size=32 flags=0x1 align=0
      >  pipe        : size=24 flags=0x1 align=0

  - Pool quic_crypto (1032 bytes) : 2 users
      >  quic_crypto : size=1032 flags=0x1 align=0
      >  requri      : size=1024 flags=0x1 align=0

  - Pool quic_conn_r (65544 bytes) : 2 users
      >  quic_conn_r : size=65536 flags=0x1 align=0
      >  dns_msg_buf : size=65540 flags=0x1 align=0

On a very unscientific test consisting in sending 1 million H1 requests
and 1 million H2 requests to the stats page, we're seeing an ~6% lower
memory usage with the patch:

  before the patch:
    Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches).

  after the patch:
    Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches).

This should be taken with care however since pools allocate and release
in batches.

MEDIUM: lb-chash: add directive hash-preserve-affinity

When using hash-based load balancing, requests are always assigned to
the server corresponding to the hash bucket for the balancing key,
without taking maxconn or maxqueue into account, unlike in other load
balancing methods like 'first'. This adds a new backend directive that
can be used to take maxconn and possibly maxqueue in that context. This
can be used when hashing is desired to achieve cache locality, but
sending requests to a different server is preferable to queuing for a
long time or failing requests when the initial server is saturated.

By default, affinity is preserved as was the case previously. When
'hash-preserve-affinity' is set to 'maxqueue', servers are considered
successively in the order of the hash ring until a server that does not
have a full queue is found.

When 'maxconn' is set on a server, queueing cannot be disabled, as
'maxqueue=0' means unlimited. To support picking a different server
when a server is at 'maxconn' irrespective of the queue,
'hash-preserve-affinity' can be set to 'maxconn'.

MINOR: quic: define max-stream-data configuration as a ratio

MINOR: mux-quic: define config for max-data

Define a new global configuration tune.quic.frontend.max-data. This
allows users to explicitely set the value for the corresponding QUIC TP
initial-max-data, with direct impact on haproxy memory consumption.

MINOR: quic: ignore uni-stream for initial max data TP

Initial TP value for max-data is automatically calculated to be adjusted
to the maximum number of opened streams over a QUIC connection. This
took into account both max-streams-bidi-remote and uni-streams. By
default, this is equivalent to 100 + 3 = 103 max opened streams.

This patch simplifies the calculation by only using bidirectional
streams. Uni streams are ignored because they are only used for HTTP/3
control exchanges, which should only represents a few bytes. For now,
users can only configure the max number of remote bidi streams, so the
simplified calculation should make more sense to them.

Note that this relies on the assumption that HTTP/3 is used as
application protocol. To support other protocols, it may be necessary to
review this and take into account both local bidi and uni streams.