git.ipfire.org Git - thirdparty/haproxy.git/log

MEDIUM: quic: remove deprecated keywords

Several QUIC related keywords were removed in 3.4. The legacy options
were marked as deprecated and scheduled for removal in 3.5. This patch
applies this removal for the upcoming 3.5.

BUG/MEDIUM: fd: Fix a deadlock when closing other tgroups fds

Commit 061754b249b9903913d6766c1ab31bb393ee5c0d attempted to make it
possible to close file descriptors belonging to other thread groups by
using thread isolation.
The problem is, closing other thread groups' fds usually happens when we
destroy a listener, in which case we hold the listener lock. If any
other thread tries to get that lock while we're waiting for the thread
isolation, then they will deadlock.
This can happen with any type of listener, but it is easier to reproduce
with a suspend/resume loop with ABNS sockets, as a suspend translates to
a close here.
To fix that, instead of using thread isolation, do something similar to
what's done when the fd belongs to our thread group. Increase the tgid
ref counter, so that we're sure nobody will close the fd while we're
dealing with it, then set the FD_MUST_CLOSE bit and set thread_mask to
0.
At this point, if no thread was running on that fd, no one will and we
can safely close it. So just call _fd_delete_orphan() if the
running_mask is 0 and if the FD_MUST_CLOSE bit is still there, otherwise
we can safely assume another thread will take care of it.

This should be backported up to 2.8.

BUG/MINOR: quic: ignore STREAM after MUX closure on BE side

For frontend connections, quic_conn layer is able to reject any new
streams opened after MUX closure. This is necessary as the peer may not
have been notified yet of the closure.

This operation is unnecessary on backend side. This is due to the fact
that only HTTP protocols are currently supported on top of QUIC, with
requests initiated by the client. For requests started before the MUX
closure, either they are already completed or closed early with a
STOP_SENDING emitted during stream shut.

Prior to this patch, spurrious RESET_STREAM could have been emitted on
backend connections after MUX closure as quic_conn stream_max_bidi was
not correctly set. Now reject is only performed for frontend connections
so this should not occured anymore.

This should be backported up to 3.3.

BUG/MEDIUM: mux_quic: complete stream shutdown for read channel

Prior to this patch, shut stream callback only handles write channel
closure. In case of an early closure, a RESET_STREAM would be emitted.

On the frontend side in most cases this is sufficient as read channel is
already closed, as HTTP/3 GET requests has been fully received. However,
this may not be the case for POST requests. Also, on the backend side,
haproxy acts a client. In this case, a stream early closure will
typically happen before receiving the full response. Nothing will be
emitted (RESET_STREAM is unnecessary as write channel is already
closed), thus the server peer will continue to emit.

To fix this situation, the current patch implement read channel closure
on shut if SE_SHR_RESET is set. Callback lclose from app_ops is called
with a new dedicated mode for read channel closure, which will result in
a STOP_SENDING frame generated by H3 and hq transcoders. This will
instruct the peer to stop emission.

This should be backported up to 3.3. Note that this depends on the
following patch :
dde3ee06c30f20091443bdafdda0e0294f7ac26b
MINOR: mux_quic: use separate error code for STOP_SENDING

MINOR: mux_quic: adjust shut stream callback

Clean up MUX QUIC stream shut callback with a central invokation for
lclose app_ops. Also, traces are added at H3 layer to log the method of
closure.

MINOR: mux_quic: use separate error code for STOP_SENDING

Prior to this patch, a single error code was registrable at the QCS
level. This code was used both for RESET_STREAM and STOP_SENDING
emission. It was specified via qcc_reset_stream().

This patch extends the API so that now a dedicated error code is
implemented for STOP_SENDING as well. This may be necessary as both
frames can be sent in different context, with a diverging error code.

This patch is required to implement STOP_SENDING emission during shut
callback when read channel is closed.

BUG/MEDIUM: mux_quic: do not free QCS if STOP_SENDING to sent

When stream detach callback is called, the default behavior is to free
the associated QCS instance. However, QCS may be preserved in so-called
detached state if there is remaining data to sent.

This condition is checked via qcs_is_close_local() which ensures that
either FIN or a RESET_STREAM was emitted. However, this does not take
into account a scheduled STOP_SENDING emission, which can happen in case
of request abort for example.

Adjusts qcm_strm_detach() to also take into account STOP_SENDING
emission before freeing or keeping a detached QCS instance. As a
complement, QCS have to be purged after STOP_SENDING emission when
reaching completion.

On frontend side, this bug is probably only visible in case of HTTP/3
POST. When dealing with GET, FIN is most of the time received earlier,
which render STOP_SENDING unnecessary. This issue however has a bigger
impact on the backend side. In case of stream abort, for example on
timeout, the server may be left unnotified and will continue to emit
STREAM data despite QCS closure on haproxy client side.

Note that this fix also has a side effect on backend connection reuse.
Indeed it may increase the rate of QCS in detached state. This may
prevent an idle connection to be reinserted in the server pool, without
any possibility to reinsert it later. In the end this causes a lower
reuse rate. This is an issue which must be addressed in a dedicated
patch. For now, add a COUNT_IF_HOT() to report when such situation
occurs.

This should be backported to all stable releases, after a period of
observation. COUNT_IF_HOT() is unnecessary on 3.2 and below.

BUILD: makefile: add a new generic target "tiny"

This target disables all possible features except poll(). It is meant to
serve as a base for small embedded setups, on top of which one may manually
enable select features. Even threads, traces/h2/fcgi/SPOE are disabled.
The default executable is roughly 80% smaller than with linux-glibc:

  $ size haproxy-linux-glibc haproxy-tiny
     text    data     bss      dec    hex filename
  3660924  176964 9868784 13706672 d125b0 haproxy-linux-glibc
  2537864  146512   84928  2769304 2a4198 haproxy-tiny

With SSL enabled, the difference shrinks a bit (-77%):

  $ size haproxy-linux-glibc-ssl haproxy-tiny-ssl
     text    data     bss      dec    hex filename
  4163373  208788 9873904 14246065 d960b1 haproxy-linux-glibc-ssl
  2950852  177732   90048  3218632 311cc8 haproxy-tiny-ssl

BUILD: makefile: add an option to enable or disable SPOE (USE_SPOE)

USE_SPOE is enabled by default and allows to disable SPOE when forced to
zero. It saves roughly 92kB on the executable.

BUILD: makefile: add an option to enable or disable FCGI (USE_FCGI)

USE_FCGI is enabled by default and allows to disable FCGI when forced
to zero. It saves roughly 75kB on the executable.

BUILD: makefile: add an option to enable or disable HTTP/2 (USE_H2)

USE_H2 is enabled by default and allows to disable HTTP/2 when forced to
zero. It saves roughly 127kB on the executable.

BUILD: makefile: add macros enable_opts and disable_opts

These ones are used to only enable or disable selected options.

BUILD: makefile: only build trace.c and ssl_trace.c when USE_TRACE is set

There's no point in building these ones anymore when traces are disabled,
nothing relies on them. This brings extra 28kB savings, resulting in 709kB
total savings when disabling traces.

CLEANUP: trace/tree-wide: drop trace decoding/definition when USE_TRACE=0

The various trace sources always have the same pattern:
  - trace events
  - trace source
  - trace decoding function

Dropping these when USE_TRACE=0 definitely makes sense. There are two
modes of definition here:
  - those designed after mux_h2 which interleave #define and the entry
    definition in the event. These ones cannot be removed without a
    significant code move to split the #define and usage apart. Instead
    here we mark the struct __maybe_unused, so that the compiler will
    just not implement it.

  - those designed like stream.c where defines are separated. Here we
    can simply enclose the events definition inside the USE_TRACE guard

For most of these the static declaration of the trace function was moved
after the events so that the #if defined(USE_TRACE) could be placed between
the two. Nothing else was changed.

This saves another 51 kB of object code when USE_TRACE=0.

CLEANUP: trace/config: do not register section "traces" with USE_TRACE=0

We don't want to register the "traces" section when traces are disabled.
Also this forces us to continue to build trace.c.

CLEANUP: trace/h3: allow to disable traces in H3

It requires essentially a few ifdefs and to add a dummy definition of
h3_trace_header() to completely disable traces in H3. This reduces the
object code by 35 kB.

BUG/MINOR: trace/quic_frame: use buf, not trace_buf in chunk_frm_appendf()

The function takes a buffer in argument which is the target buffer. The
first calls properly use it but the subsequent ones, probably due to
reused/moved code, directly write into &trace_buf, thus ignoring the
buf argument. Fortunately all call places pass &trace_buf for buf, so
it currently has no impact but could possibly change.

No backport is needed, but it doesn't hurt to backport it if it helps.

CLEANUP: debug/trace: remote "debug dev trace" when USE_TRACE is not set

The functions associated with this command were already subject to
DEBUG_DEV, let's also add USE_TRACE for them to be defined. This saves
8kB.

CLEANUP: mux-h2/traces: remove unused trace code when building without USE_TRACE

By just moving a few definitions, creating two dummy inline functions and
a few ifdefs, we can get rid of the entire trace generation code in the
H2 mux and save ~96 kB. This is what this patch does. Even the trace_h2
struct is removed in this case.

CLEANUP: haproxy: remove -dt parsing and help when !USE_TRACE

Better not show -dt in the help message and stop parsing it when
USE_TRACE is disabled since traces won't work anyway.

BUILD: ssl: avoid a wrong null deref warning in ssl_sock_handshake

When disabling traces, "conn" isn't used between ctx assignment and its
first usage, and as usual, gcc wrongly believes that a null check in a
shared function implies the checked argument may be NULL where it's used,
leading to this warning:

src/ssl_sock.c: In function 'ssl_sock_handshake.constprop':
src/ssl_sock.c:6049:7: warning: null pointer dereference [-Wnull-dereference]

Assigning ctx after the conn_ctrl_ready() check is sufficient to shut it
up, so let's do this. It should also result in slightly better code.

MINOR: trace: always pretend to use args when disabled

When traces are disabled, we used to make TRACE() and other macros just
emit a "do { } while (0)" statement, which has the unfortunate limitation
of explicitly marking the arguments as not used. As such, all variables
that are initialized in functions for the sole purpose of being passed
to the trace calls end up emitting warnings about "foo defined but not
used". It is difficult to keep these in a clean state all the time, and
to always think about adding __maybe_unused after each declaration, and
the traces try hard to be developer-friendly in order to gain in adoption.

Let's just remap all macros to __eat_all_args() which will mark all
arguments as used. No code is emitted, the output binary is the same
as with the while(0) stuff, but syntactically speaking the argument is
used and the compiler is happy.

It may be useful to backport this to 3.4 as it's already expected that
some future fixes will trigger build warnings there otherwise. This
commit requires these two ones:

CLEANUP: traces: get rid of a few rare empty args in TRACE calls
MINOR: compiler: add a macro to ignore all arguments

MINOR: compiler: add a macro to ignore all arguments

Regularly when disabling features (e.g. traces), some macros that would
make use of some arguments end up not consuming them at all, making the
compiler complain that "variable foo defined but not used".

An elegant way to generically mark arguments as used is to pass them to
a variadic function. However a first argument is needed. So we create a
macro that passes (0, __VA_ARGS__) to an inline function that does nothing
from its arguments, and that's done.

CLEANUP: traces: get rid of a few rare empty args in TRACE calls

The TRACE macro allows to leave empty args and automatically turns them
into zeroes. However it also limits how we can remap the macro, because
functions do not accept this for example. There are very few places all
over the code where ',,' exists in TRACE calls, so let's explicitly add
the 0 there. It could even make some editors' syntax highlighting happier.

BUILD: quic: workaround a gcc bug saying "maybe used uninitialized" when USE_TRACE=0

In quic_transport_params_store(), we call qc_early_transport_params_cpy()
if edata_accepted is set, which copies one by one all tx_params into the
locally allocated etps struct, and later after updates we call
qc_early_transport_params_validate() to check if they changed. It turns
out that when USE_TRACE is disabled, gcc 4 to 13 are confused and believe
that one or several of the fields compared in the later function might be
used uninitialized. A careful code inspection proves that this is not the
case. Setting them to zero in the _cpy() function makes the warning
disappear, it's really an issue related to variable propagation it seems,
which can explain why it doesn't happen with traces (code is a bit more
complex). Gcc-13 only emits a warning about a single field, and gcc-14
completely solved it. Playing with consts, __maybe_unused etc has no
effect.

One thing works however, it is to mark the _validate() function noinline.
In this case it is implemented normally and the compiler doesn't put its
nose into the propagation path and doesn't complain.

Such comments are always scary because one may seriously wonder whether
the compiler emits valid code when it says this...

It should be backported to 3.4 which experiences the same warning with
USE_TRACE=0.

BUG/MINOR: mux_quic: prevent multiple STOP_SENDING emission per stream

A QUIC stream may be aborted to ignore future data read. This also
prepares a STOP_SENDING frame to instruct the peer to close its write
channel.

This capability is exposed via qcc_abort_stream_read() which should be
guarded against multiple invokation for a single stream. This was
checked via QC_SF_TO_STOP_SENDING flag. However, this flag is resetted
once STOP_SENDING frame is emitted. Thus in theory it could be possible
to emit several STOP_SENDING for a single stream.

This patch improves this by using QC_SF_READ_ABORTED flag check. This
flag is set during qcc_abort_stream_read() and never removed even after
STOP_SENDING frame is emitted.

This bug was never encountered in a real situation. However, this patch
is necessary to definitely guarantee that it cannot occur.

This should be backported up to 2.8.

BUG/MEDIUM: h3: fix parser desync on error with multiple frames

On success, h3_rcv_buf() returns the number of parsed STREAM bytes which
are removed by the caller afterwards. A success value is mandatory so
that the underlying QUIC packet is acknowledged.

When H3 parser detects an error during HEADERS or DATA parsing, the
stream or the connetcion is flagged for closure. If there is remaining
frames, they are simply ignored and h3_rcv_buf() returns the remaining
input buffer size.

However, this value is wrong in case one or several frames were already
parsed before the invalid frame in the same h3_rcv_buf() invokation.
This instructs caller to only remove a subset of the data and parsing is
restarted on a random boundary. Most of the times this generates again a
new final yet invalid error, possibly overwriting a stream error with a
full connection closure.

This patch fixes the return value in case of an error during HEADERS or
DATA parsing by ensuring that total variable is always incremented
instead of being directly assigned.

This must be backported up to 2.8.

BUG/MINOR: http-htx: Don't by-pass HTX API when merging cookie values

http_cookie_merge() function is responsible to add a cookie header and merge
all values from a list. However, it was performed by appending values by
hand, using the pointer to the header value and changing the block
length. This was totally by-passing the HTX API.

For now, there is now bug because the function is called by h2 and quic
muxes when a HTTP message is parsed. And the cookie header is the last one
inserted. The HTX message is never fragmented and data from other blocs
cannot be overwritten. But, it could be an issue if it is called in another
context, from the HTTP analysis for instance.

To fix the issue, the function now relies on htx_replace_blk_value()
function to add the value separator first and then a cookie value.

This patch must be backported as far as 2.6.

BUG/MINOR: mux-quic: Fix handling EOM after in qcs_http_rcv_buf()

In qcs_http_rcv_buf() function, when the buffers cannot be swapped and
htx_xfer() function is called, the way the EOM flag is handled is buggy. The
htx_xfer() function is responsible to tranfer HTX flags from the QCS message
to the CS one. And when it is performed, HTX flags of the original message
are reset.

So, the following test on the EOM flag when the QCS message is empty is
never true. Because of this bug, QC_SF_EOI_SUSPENDED flag is never tested on
this code path and <fin> variable is not set to 1 as expected.

To fix the issue, we must test the EOM flag on the CS message.

This patch must be backported to 3.4.

CLEANUP: applet/http-client: Don't needlessly copy HTX flags after htx_xfer()

htx_xfer() function already takes care to copy HTX flags (EOM and
errors). So it is useless to do so in caller functions.

BUG/MAJOR: htx: Don't swap buffers for empty HTX message with an error

Recent fix of some HTX muxes to drain remaining data when the stream is in
closed state revealed a bug, mainly due to a corner case of the HTX API.

It is possible to have an empty HTX message with a parsing/internal
error. In that case, the underlying buffer remains full. It is mandatory to
prevent any buffer release and be sure the error will be handeled.

On the other end, at several places, when data must be transfer from an HTX
message to another one, we try to swap underlying buffers instead of
performing a bloc-per-bloc copy. To do so, we rely on b_xfer() function. One
condition is that the destination message must be empty. And here is the
issue. The HTX message can be empty but the buffer can also be full because
an HTX error was triggered earlier and not handled yet. In that case,
attempting to call b_xfer() leads to a crash because the destination buffer
is full. It is not expected to call b_xfer() if there is not enough space in
the destination buffer.

So, it appears the HTX API should be improved/fixed but first of all, the
bug must be fixed. Especially because stable versions are also affected. The
htx_is_empty_noerr() function was added to know if a HTX message is empty
and no error was reported on it. And this function is now used, instead of
htx_is_empty(), to know if we can safely swap the underlying buffers or not.

the FCGI, H2 and QUIC multiplexers are concerned. The HTTP client and the
applet API were also fixed while it seems harder to trigger the bug at these
places.

The fix must be backported to all supported versions.

BUG/MINOR: tools: fix invalid character detection in strl2ic()

ASCII characters with a value smaller than '0' were not properly
detected as invalid characters, leading to incorrect behavior. The
strl2irc() and strl2llrc() functions are not impacted because this
situation is detected by their overflow checks.

Fixes Github issue #3357.

BUG/MINOR: init: fix default global settings being overwritten by -G

The default global configuration tuning settings (tune.memory.hot-size,
expose-experimental-directives, and tune.pipesize) were lost when the
-G option was used.

The bug was caused by an incorrect scope: these default settings were
nested inside the block that generates the default global header, which
is skipped/overwritten when -G is provided. Fix this by closing the
conditional block early.

This patch depends on this commit:
"01f4e33ea MINOR: hbuf: new lightweight hbuf API"

Must be backported to 3.4.

MINOR: log: add app_log_raw() and send_log_raw() for binary-safe logging

app_log() and send_log() build the message with vsnprintf(), which stops
at the first NUL byte and therefore cannot emit an arbitrary binary
payload.

Add two variants that pass a pre-built <msg> of <len> bytes straight to
__send_log() without formatting it, so embedded NUL bytes are preserved:

* app_log_raw() : takes an explicit list of loggers and a tag
* send_log_raw() : derives both from a proxy

The send path still strips trailing LF / NUL bytes (kept for the legacy
text logs), so the message must be self-terminating by its own encoding
and must not rely on a meaningful trailing '\n' or NUL.

MINOR: haload: import source code and documentation

This patch imports the implementation of haload, a lightweight,
multi-threaded traffic generator designed to benchmark HTTP infrastructures
under heavy loads. Built onto HAProxy's highly scalable
architecture, it natively supports HTTP/1, HTTP/2, and HTTP/3 (QUIC).

It uses the previously exposed initialization functions, the no-listener mode,
the lightweight hbuf API, and the specialized hldstream object types to
dynamically derive and generate its configuration in memory from basic
command-line inputs. By leveraging HAProxy's internal HTX
(Internal HTTP Native Representation) format, haload abstractly manipulates
HTTP elements independently of the wire protocol. This
abstraction allows it to generate unified requests and process responses
seamlessly across HTTP/1.1, HTTP/2, or HTTP/3 without duplicating the payload
handling logic for each version.

- Makefile:
   Introduce the 'haload' compilation target and define HALOAD_OBJS.

- src/haload.c, include/haproxy/haload.h:
   Add user and stream task scheduling handlers, HTX-driven traffic orchestration
   mechanisms, and terminal benchmarking statistical summary rendering.

- src/haload_init.c:
   Implement program arguments parsing, fileless HAProxy memory configuration
   generation, and target URL allocations.

- src/stconn.c:
   Wire up sc_attach_mux() to properly allocate the specific tasklet
   context when dealing with a haload stream.

- doc/haload.txt:
   Add detailed documentation covering compilation, flags, and usage examples.

MINOR: server: export functions used during server initialization

Export _srv_parse_kw() and srv_postinit() so they can be called from
haload (to come), which needs to configure servers using HAProxy's configuration
parser keywords.

MINOR: stconn: export sc_new()

This patch exports sc_new() by removing its static storage class and
adding its prototype to include/haproxy/stconn.h.

This is required to allow external modules, such as the upcoming haload
benchmarking tool, to allocate and initialize new stream connectors
from a stream endpoint descriptor (sedesc).

MINOR: stconn: add sc_hastream() and __sc_hastream() helpers

This patch introduces the sc_hastream() and __sc_hastream() inline
helpers to retrieve a haload stream context (struct hastream) from
a stream connector.

These functions allow the stconn layer to safely access haload-specific
stream data when the application type is OBJ_TYPE_HXLOAD.

MINOR: obj_type: add OBJ_TYPE_HALOAD for haload stream objects

This patch introduces the OBJ_TYPE_HXLOAD object type to distinguish
the haload stream objects (struct hastream).

It also adds the associated inline helper functions objt_hastream()
and __objt_hastream() to allow safe casting and retrieval of
hastream contexts from a generic object pointer, following the
standard container_of pattern.

MINOR: hldstream: add definition of hldstream struct objects

haload is a client-side HTTP benchmarking tool designed to manage
concurrent HTTP streams.

This patch defines the hldstream C structure, which serves as the
core object to represent a haload HTTP stream for all the HTTP protocol.
It will be used by the upcoming haload module to handle specialized
stream contexts.

MINOR: trace: add definitions for haload streams

haload is the successor to the h1load HTTP benchmarking tool.

This patch adds haload stream definitions as arguments for the TRACE API.
These will be used by the upcoming haload module, which will handle
hldstream struct objects instead of regular stream structs.

MINOR: init: add no listener mode

Introduce the new <no_listener_mode> global variable to define a new operating mode
for haproxy. This variable can be set to 1 to allow haproxy to start without
any listeners. Without such a setting, haproxy refuses to start without listener.

During the initialization cycle, setting this variable to 1 ensures that the
lack of configured listeners is no longer treated as a fatal error. This allows
programs based on haproxy source code to initialize the stack and use its
features even without a frontend. This will be the case for haload.

MINOR: hbuf: new lightweight hbuf API

Add a new lightweight hbuf API to buffer formatted strings, similar to the
existing buffer API (struct buffer), extracting the code which already does this
in haterm_init.c. This is used by haterm to build its configuration in memory
(fileless mode). And this will be used by haload to do the same thing.

Update haterm to use this new API.

Note: hstream_str_buf_append() has been renamed to hbuf_str_append().

BUG/MEDIUM: servers: Use a refcount for port_range and free it properly

port_range was never freed. That used not to be a problem, but now that
we can dynamically add and remove servers, it becomes one, as that leads
to a memory leak each time a server with a "source" directive is destroyed.
However, just adding a free() is not enough. We have to add a refcount,
because the server is not the only one with a reference to it. We may
also have one in fdinfo, so that we know which port to release when we
finally close the fd.
So add a refcount, and make sure to call port_range_release() when a
server is destroyed.

This should be backported up to 3.0.

DOC: server: document 'set server name' CLI command

Document the new 'set server <b>/<s> name <newname>' CLI command in
management.txt.

The documentation states the two preconditions that gate the operation
(server must be in maintenance, server's name must not be statically
referenced via use-server / track / ARGT_SRV), notes that the command
is not gated by a per-backend opt-in directive (parity with 'add
server' / 'del server'), and mentions the EVENT_HDL_SUB_SERVER_NAME
event published on successful rename so Lua and other event consumers
know to subscribe to it.

REGTESTS: server: add test for 'set server name' CLI command

Tests cover:
  - error cases: missing name, not-in-maintenance, invalid chars
    (rejected by invalid_char()), duplicate name in the same backend,
    name-referenced server (use-server target, track target)
  - same-name rename as a no-op success
  - successful rename with verification via 'show servers state'
  - old name no longer resolves after rename
  - round-trip rename back to original name
  - traffic still works after rename round-trip

The use-server and tracked-server cases exercise the SRV_F_NAME_REFD
gating added in the preceding patch. Servers pinned only via resolvers
(SRV_F_NON_PURGEABLE without SRV_F_NAME_REFD) remain renamable; that
positive case is not exercised here as it would require a real DNS
resolver in the test environment.

MEDIUM: server: add 'set server name' CLI command for runtime server renaming

Add the ability to rename a HAProxy server at runtime via the CLI:

  set server <backend>/<server> name <newname>

This is useful in slot-based dynamic scaling setups where servers are
pre-allocated with generic names (e.g. srv001, srv002) but the operator
wants the names to reflect the current workload (e.g. pod name or
IP:port) for observability and server-state-file consistency.

The implementation:
  - validates the new name: non-empty, passes invalid_char() check
    (allows [A-Za-z0-9_:.-]), and fits in the event data name field
  - requires the server to be administratively in maintenance mode
    (same precondition as 'del server')
  - rejects the rename if the server has SRV_F_NAME_REFD set (use-server
    target, track target, sample-fetch ARGT_SRV referent) - keeps the
    running state consistent with the configuration text
  - re-indexes the server in the name tree under thread_isolate(),
    mirroring the locking pattern used by 'add server' / 'del server'
  - publishes a new EVENT_HDL_SUB_SERVER_NAME event with the old and
    new names so downstream consumers (logs, observability backends)
    can track the rename
  - frees the old name immediately under thread isolation: srv_name
    sample consumers (ACLs, log formats, ...) act on the fetched pointer
    within the current task and do not retain it across wake-ups, so
    no extra deferred-free machinery is needed

There is no opt-in directive: like 'add server' and 'del server', the
operation is gated by the server's properties rather than by a
per-backend toggle. This avoids the runtime-surprise failure mode
where an operator discovers at the CLI that renaming is forbidden by
a missing 'option server-rename' rather than by an actual structural
reference.

This feature was discussed in:
  https://github.com/haproxy/haproxy/issues/952

MINOR: server: distinguish name references with new SRV_F_NAME_REFD flag

Until now, every form of "this server is referenced by something in the
running config" was collapsed onto a single flag, SRV_F_NON_PURGEABLE,
which prevents the server from being removed via 'del server'. This
catches everything but conflates two distinct properties:

  - the server object itself is pinned by another runtime structure
    (e.g. DNS resolution attached to it), versus
  - the server's *name* is referenced statically (use-server rules,
    track chains, sample-fetch arguments of type ARGT_SRV)

These differ for any operation that touches the name but not the
object identity, e.g. the runtime rename feature added next. Removing
a name-referenced server is still forbidden (the rule text would
dangle), but renaming such a server should also be forbidden for the
same reason - while renaming a resolver-pinned server is fine, since
the resolver holds the object pointer and doesn't care about the name.

Introduce SRV_F_NAME_REFD for the name-reference case and move the
three name-based setters (sample.c ARGT_SRV resolution, proxy.c
use-server resolution, server.c track chain setup) from
SRV_F_NON_PURGEABLE to SRV_F_NAME_REFD. The resolvers.c call site
keeps SRV_F_NON_PURGEABLE since it is the object-pinned case.

Adjust 'del server' to check both flags so the set of servers it
refuses to remove is unchanged: same observable behavior, just a
richer internal taxonomy.

A subsequent patch introducing 'set server name' will gate on
SRV_F_NAME_REFD only.

BUG/MINOR: sample: set SMP_F_CONST on srv_name fetch

smp_fetch_srv_name() stored a raw pointer to srv->id in the sample
without setting SMP_F_CONST. Every other sibling id-pointer fetch
(smp_fetch_be_name on px->id, smp_fetch_fe_name on fe->id, the SSL
helpers using OBJ_nid2sn() / SSL_get_cipher_name(), etc.) correctly
sets SMP_F_CONST to prevent in-place mutation by converters such as
,upper / ,lower / ,regsub.

Without SMP_F_CONST, an expression like srv_name,lower would write
into srv->id for the lifetime of the process. In practice this has
gone unnoticed because srv->id is a private allocation that is never
read back by name, but the bug is real and the divergence from the
other id fetches is unintentional.

This becomes more important with the introduction of runtime server
renaming (next patch in series): SMP_F_CONST ensures that callers go
through smp_make_rw() / smp_dup() before mutating, isolating the
sample's bytes from the server's id storage.

This is a stand-alone fix and should be backported.

BUG/MEDIUM: server: initialise agent.health in srv_settings_init()

srv_settings_init() sets agent.rise but forgets agent.health, while
srv_settings_cpy() sets both. check.health is fixed up later when the
server's admin state is updated at startup, but nothing does the same
for agent.health.

This used to be harmless because servers were always set up through
srv_settings_cpy(). But since 49a619aca ("MEDIUM: proxy: no longer
allocate the default-server entry by default") the defsrv pointer is
NULL when a proxy has no "default-server" line, and srv_settings_cpy()
then falls back to srv_settings_init(). So a server whose agent-check is
declared entirely on its "server" line ends up with agent.health == 0,
which is below agent.rise.

The wrong value only bites when the server has to come back up. While it
stays up nobody notices agent.health is 0, but as soon as the regular
health check fails and recovers, agent.health is still 0 (below rise) and
check_notify_success() won't bring the server back up. The agent never
sends an explicit "up", which is the only thing that raises agent.health,
so the server stays down for good. Moving the agent settings to a
"default-server" line works around it.

Just initialise agent.health in srv_settings_init() like
srv_settings_cpy() already does.

This should be backported to 3.3 and 3.4.

MINOR: hq-interop: trace HTX headers

Similar to the previous patch, complete HTTP/0.9 user traces by logging
received HTX headers on request (BE side) or response (FE). This is only
for debugging purpose : the final HTTP/0.9 content does not contain any
of these.

MINOR: hq-interop: trace transcoding of response status line

Add a user trace when HTTP/0.9 response is either emitted (FE side) or
received (BE). The status code is displayed despite not being present in
the HTTP/0.9 response.

MINOR: hq-interop: add request start-line traces

Add traces to log the start-line of received (FE side) or sent (BE)
requests.

This uses a similar pattern with already supported HTTP/3 header traces.
However, this only requires minimal trace verbosity. This is because
these traces will be mostly useful for QUIC interop testing. However it
is probably not desirable to use advanced verbosity in this context to
avoid increasing the traces output.

MINOR: mux_quic: add minimal traces for QUIC MUX init/release

Add user traces in qcm_init() and qcc_release(). This is useful to be
able to quickly account connection allocation/release without using the
developer trace level.

BUG/MINOR: h3: adjust HTTP headers traces

In a recent patch, dedicated HTTP/3 traces have been added to log the
transfered HTTP content, with the start/status line and headers as well.

This patch adjusts these traces, correcting "qcc" typo to "qcs". It also
now correctly pass qcc and qcs as argument, which is used for trace
follow.

No need to backport unless HTTP/3 header traces are picked to previous
releases.

BUG/MINOR: hq-interop: support transcoding of absolute URI

On the backend side, HTTP/0.9 transcoder is responsible to convert a HTX
request into a HTTP start line. In particular, path is generated from
the HTX request URI.

However, an absolute URI was not converted correctly in a HTTP/0.9
simple path. This occurs notably in most cases when using HTTP/2 or 3 on
the frontend side.

This issue was detected when running QUIC interop. Some servers
implementation such as picoquic would reject these requests as they are
considered invalid.

To adjust this, extract the path component from HTX uri using
http_uri_parser API.

This should be backported up to 3.3 as this is a QUIC backend fix.

BUG/MINOR: hq-interop: fix transcoding of wrapping response buffer

The below patch has implemented support for response wrapping in HTTP/0.9
transcoder, similar to what is already performed in HTTP/3.

1e144c488c612092067e07f18618482c64d33d17
BUG/MINOR: hq-interop: support response buffer wrapping

However, some bits were incorrectly written and the transcoding would
not be able to handle all of the wrapping data in one pass, despite no
crash possible. This patch fixes these, so that a wrapping response can
be handled in a single pass by the HTTP/0.9 transcoder if there is
enough room in the output HTX buffer.

This should fix github issue #3430.

This should be backported up to 3.3.

MEDIUM: httpclient: initialize the httpclient with default SSL values

The current httpclient implementation does not initialize its server
with the options from the global section: ciphers, ciphersuites and
various SSL options are always the default of the SSL library.

This patch changes the behavior and apply the ssl-default-server-*
keywords to the httpclient SSL server.

MINOR: ssl: export ssl_sock_init_srv()

Export ssl_sock_init_srv() so it can be called at other places where we
initialize servers

BUG/MEDIUM: mux_quic: fix memory leak of rx app_buf on stream free

When freeing a QUIC stream (qcs), the receive application buffer
(qcs->rx.app_buf) was not released if it still contained data or
had been allocated. This led to a memory leak over time as streams
were opened and closed.

Fix this by explicitly freeing qcs->rx.app_buf via b_free() in
qcs_free() if its size is non-zero, and call offer_buffers() to
notify the buffer pool.

This should be backported as far as 2.6.

CI: github: remove OpenTracing leftovers

When removing support for USE_OT=1 I only saw the tests but not the
build matrix, and this now causes build failures on the CI for tests
with "all features". Let's remove it there as well as the checks for
the OT cache and libs.

BUG/MEDIUM: h3: fix trace crash on frontend response headers

Fix segfault when using HTTP/3 header traces on the frontend side. This
occured because headers list was dumped prior to the insertion of the
end marker.

This issue is introduced by the following patch :
commit 00c081b5f388b655dd2c0fe5fdda8aacceb6b97e
MINOR: h3: trace HTTP headers on FE side

No need to backport, unless HTTP/3 header traces are picked to previous
releases.

MAJOR: ot: remove deprecated OpenTracing support

OpenTracing support has long been best-effort and was deprecated in 3.3
with removal planned in 3.5. Let's clean it up now.

This commit removes addons/ot, the build script, ARGC_OT, USE_OT and
OT_* variables in the Makefile, and replaces the config section with a
mention for the OpenTelemetry filter instead.

For more info, see GH issues #1640 and #2782, as well as the wiki's
"breaking changes" page.

CLEANUP: trace: remove backend retrieval attempt from conn->target

Since we may no longer see conn->target point to the proxy, let's drop
the retrieval attempt for a backend there in __trace_enabled().

CLEANUP: backend: drop checks for OBJ_TYPE_PROXY in connect() code

In tcp_connect_server(), uxst_connect_server(), and quic_connect_server(),
we can no longer see obj_type(conn->target) == OBJ_TYPE_PROXY so let's
drop that code. This implies that srv may no longer be NULL so we can
drop these checks as well.

CLEANUP: connection: remove some checks for objt_proxy(conn->target)

Since a connection's target may no longer be a proxy and is necessarily
a server, let's simplify such checks. This is essentially in mux install
code and in the debugging code.

MEDIUM: cli/show-fd: no longer accept filtering for dispatch mode

"show fd" supports various flags, one of which is 'd' for "dispatch",
which also catches "transparent", in fact, connections whose target is
a proxy. Since these can no longer happen, let's remove that. The 'b'
and 's' flags are now aliases of each other for simplicity.

MAJOR: proxy: remove support for "dispatch" and "transparent" proxy keywords

These ones were deprecated in 3.3-dev2 with commits 5c15ba5eff ("MEDIUM:
proxy: mark the "dispatch" directive as deprecated") and e93f3ea3f8
("MEDIUM: proxy: deprecate the "transparent" and "option transparent"
directives"), and were planned for removal in 3.5. See also:

https://github.com/orgs/haproxy/discussions/2921

as well as the wiki page about breaking changes.

They've lived their lives and always cause internal limitations
(exceptions between connecting to server or connecting to proxy), and
are even confusing to some extents (especially "transparent" which users
often get wrong).

This commit removes the ability to configure them, tests based on them
and all the doc related to them. The keywords remain detected by the
parser and indicate how to proceed instead.

It's likely that other deeper parts will be changed as well (e.g.
conn->target will no longer be of OBJ_TYPE_PROXY). This will be done
over the long term.

MINOR: proxy: permit to report version info for option deprecation

It's already possible to report that some options are not supported due
to build options by passing 0 instead of PR_CAP_* in the option's cap
field. Let's extend that by passing a non-zero value in the val field,
where the 3rd byte will be the major version and the 4th one the minor.
In this case haproxy will now indicate that support for that option was
removed in that version.

[RELEASE] Released version 3.5-dev1

Released version 3.5-dev1 with the following main changes :
    - BUG/MEDIUM: check: Skip tcpcheck post-config for external checks
    - BUG/MEDIUM: check: Ignore small-buffer option when starting an external check
    - MINOR: check: Don't dump buffers state in check traces for external checks
    - BUG/MEDIUM: server/checks: Support healtcheck keyword on default-server lines
    - BUG/MEDIUM: mux_quic: prevent risk of infinite loop on recv
    - OPTIM: mux_quic: remove QCS from recv_list on reset
    - BUG/MINOR: mux_quic: do not interrupt recv on error/incomplete data
    - BUG/MINOR: tcpcheck: Override external check if healthcheck section is set
    - REGTESTS: checks: Add script for external healthchecks
    - BUG/MEDIUM: regex: initialize the match array earlier during boot
    - BUG/MEDIUM: threads: Fiw build when using no thread
    - BUG/MEDIUM: xprt_qmux: implement ->get_ssl_sock_ctx() to get the SSL laye
    - CLEANUP: sessions: simplify the sess_priv_conns pool name
    - MINOR: pools: reject creation of pools containing invalid chars in their name
    - BUG/MINOR: acl: report "ACL" not "map" in ACL ID lookup failures
    - MINOR: memprof: make in_memprof a bitfield instead of a counter
    - MINOR: memprof: be careful to account allocations only once
    - BUG/MEDIUM: checks: Dequeue checks on purge
    - MINOR: servers: Add a back-pointer to the server in srv_per_thread
    - MEDIUM: servers: Move to a per-thread idle connection cleanup task
    - REGTESTS: Fix log matching in healthcheck-section.vtc
    - BUG/MINOR: quic: fix Initial length value in sent packets
    - BUILD: servers: Fix build with -std=gnu89
    - BUG/MEDIUM: acme: stuck ACME task when authz is already "valid"
    - MINOR: acme: introduce acme_challenge_ready() for reuse outside the CLI
    - MINOR: h3: extend trace verbosity
    - MINOR: h3: trace HTTP headers on FE side
    - MINOR: h3: trace HTTP headers on BE side
    - BUILD: h3: fix compilation with USE_TRACE=0
    - MINOR: lua: add REGISTER_HLUA_STATE_INIT() to register state init callbacks
    - MEDIUM: lua: move longjmp annotation macros to hlua.h
    - MINOR: acme/lua: implement ACME.challenge_ready() Lua function
    - BUG/MEDIUM: ktls: defer enabling TLS ULP on a socket until connected
    - MINOR: errors: add ha_diag_notice() to report diag-level notifications
    - BUG/MINOR: cpu-topo: use ha_diag_notice() to report thread creations
    - MINOR: acme: publish ACME_NEWCERT event via event_hdl
    - MINOR: acme: publish ACME_DEPLOY event via event_hdl
    - EXAMPLES: lua/acme: add a dns-01 handler for Gandi LiveDNS API
    - DOC: acme: add mentions of lua features
    - MINOR: tasks: Introduce __task_set_state_and_tid
    - MINOR: tasks: Add __task_get_new_tid_field()
    - MINOR: tasks: Introduce __task_get_current_owner
    - MINOR: tasks: Use __task_get_current_owner() in task_kill.
    - MINOR: tasks: Start using __task_set_state_and_tid()
    - MEDIUM: tasks: Remove the per-thread group wait queue
    - MINOR: tasks: Use __task_set_state_and_tid() in task_instant_wakeup()
    - MINOR: tasks: Remove wq_lock and the per-thread group wait queues
    - MEDIUM: tasks: Redispatch shared tasks when the thread is loaded
    - BUG/MEDIUM: h3: Properly handle PUSH_PROMISE on backend connections
    - BUG/MINOR: server: fix add server with consistent hash balancing
    - MINOR: lua: export hlua_pusherror() and check_args()
    - REORG: httpclient/lua: move the lua httpclient code to http_client.c
    - MEDIUM: httpclient/lua: allow multiple requests from a single core.httpclient() instance
    - MEDIUM: httpclient: set res.status to 0 upon SF_ERR_MASK
    - DOC: httpclient: document status 0 on internal error
    - DEBUG: stconn: Add a BUG_ON on shut flags when the endpoint is shut
    - BUG/MINOR: http-ana: Remove a debugging memset on redirect
    - BUG/MEDIUM: http-ana: Don't ignore L7 retry errors
    - BUG/MINOR: mux-h1: Properly resolve file path for 'h1-case-adjust-file'
    - BUG/MINOR: quic: fix rxbuf settings on backend side
    - EXAMPLES: lua/acme: fix acme-gandi-livedns.lua configuration example
    - BUG/MEDIUM: ssl: Don't free the early data buffer too early
    - BUG/MINOR: hpack-tbl: add missing NULL check after hpack_dht_defrag()
    - BUG/MEDIUM: mux_quic: fix freeze transfer after QCS rxbuf realign
    - BUG/MEDIUM: http-act: Make a copy of the sample expr in (set/add)-headers-bin
    - BUG/MEDIUM: mux-fcgi: fix uint16_t overflow in drl += drp
    - OPTIM: mux-fcgi: Reorganise fcgi_conn structure to fill some holes
    - BUG/MINOR: hq-interop: reject too big content
    - MINOR: hq_interop: do not rely on stream layer for HTX stline encoding
    - BUG/MINOR: hq-interop: prevent reset if missing content-length
    - BUG/MEDIUM: hlua: Properly report EOS when http applet exits
    - BUG/MINOR: hq-interop: support full demux buf on large response
    - BUG/MINOR: hq-interop: support response buffer wrapping
    - DOC: sched: Document the wait queue modifications
    - DOC: lua: remove incorrect init tags
    - BUG/MEDIUM: h3: increment unknown request payload length
    - REGTESTS: quic: test H3 request without content-length
    - DEBUG: cli: relax tid check in "debug dev task" for recent sched changes
    - MINOR: debug: add "print" to "debug dev sched"
    - BUG/MINOR: poller: fix wait time calculation that is always 1 extra ms
    - BUILD: quic_pacing: add missing includes for api and activity in the file
    - MINOR: task: move the profiling checks to the called functions not callers
    - MINOR: task: add a new explicitly local tasklet wakeup function
    - MINOR: task: make tasklet_wakeup() explicitly call _tasklet_wakeup_here()
    - MINOR: task: make task_instant_wakeup() explicitly call _tasklet_wakeup_here()
    - MEDIUM: task: make __tasklet_wakeup_on() only accept non-local threads
    - MEDIUM: task: add a new flag TASK_RT to permit a task to skip the priority queue
    - MINOR: debug: add "rt=1" to "debug dev task" to tune the RT flag
    - BUG/MEDIUM: mux-fcgi: Truly drain outgoing HTX data when the stream is closed
    - BUG/MEDIUM: mux-h2: Truly drain outgoing HTX data when the stream is closed
    - BUG/MEDIUM: mux-spop: Truly drain outgoing data when the stream is closed
    - BUG/MEDIUM: mux-quic: Drain the given amount of data in qcs_http_reset_buf()
    - CLEANUP: task: remove duplicated code in __tasklet_wakeup_after()
    - BUILD: task: silence a build warning with threads disabled
    - MINOR: task: do not try to redistribute the WQ when single-threaded
    - MEDIUM: task: add a new tasklet class for real-time: TL_RT

MEDIUM: task: add a new tasklet class for real-time: TL_RT

This adds new class TL_RT, which is processed before other queues for
one (and only one) tasklet featuring the TASK_RT flag. This is meant to
process real time wakeups under load with even less latency. We only
process one entry to make sure it will not be abused for unimportant
stuff, and if tune.sched.low-latency is set, we also avoid picking more
tasks from the current run queues and looping after the first call to
run_tasks_from_list().

Measurements under a load of 10k concurrent conns injection at 10 Gbps
(~58k 20kB objects/s) on 4 threads and with task profiling enabled shows
that the average wakeup latency for wakeups every 10ms dropped from 220
microseconds to 1.8 microsecond, and even ~550 nanoseconds when
tune.sched.low-latency is set, or 400 times less.

The doc was updated, including the schematics.

MINOR: task: do not try to redistribute the WQ when single-threaded

When running with nbthread=1, we still try to redistribute once, it
fails (new_tid=tid) and leaves the loop. That's just a waste for no
reason. Let's condition the redispatch to the presence of at least
another thread.

BUILD: task: silence a build warning with threads disabled

The compiler doesn't know that a random value based on global.nbthread
is necessarily smaller than MAX_THREADS, and when picking a random
thread number while single-threaded it complains that new_tid 1 is
out of bounds for the array. In fact all this is dead code in this
case.

Let's tell it about it to silence the warning.

CLEANUP: task: remove duplicated code in __tasklet_wakeup_after()

In the case where the task is first inserted (!head), the code is
exactly __tasklet_wakeup_here(), so let's rely on this one. The
profiling and rq_total parts are already handled there so let's
move them to the head!=NULL branch.

BUG/MEDIUM: mux-quic: Drain the given amount of data in qcs_http_reset_buf()

Name of qcs_http_reset_buf() function is confusing. But the comment is
clear. In this function, a given amount of HTX data must be cleared from the
buffer. However, concretely, the whole buffer was always reset. Most of time
it is equivalent but it could be possible to keep unsent data in the
buffer. For instance, when a filter is registered on the data forwarding
stage.

So, instead of calling htx_reset(), htx_drain() must be used.

This patch must be backported to all supported version.

BUG/MEDIUM: mux-spop: Truly drain outgoing data when the stream is closed

After the H2 and the FCGI multiplexers, it is the third mux concerned by
this issue.

When we try to send data to the server and the stream is closed (in error,
in half-closed state or fully closed), remaining data must be drained. This
way the upper stream is able to properly handle the stream close.

However, there was a bug here. The mux claimed to have consumed these data
without draining them from the buffer. The issue was never reported on the
SPOP multiplexer. But, in theory, the same than for the FCGI multiplexer is
possible.

This patch must be backported as far as 3.2.

BUG/MEDIUM: mux-h2: Truly drain outgoing HTX data when the stream is closed

It is the same bug than the previous one on the FCGI mux.

When we try to send data to the server and the stream is closed (in error,
in half-closed state or fully closed), remaining data must be drained. This
way the upper stream is able to properly handle the stream close.

However, there was a bug here. The mux claimed to have consumed these data
without draining them from the buffer. The issue was never reported on the
H2 multiplexer. But, in theory, the same than for the FCGI multiplexer is
possible.

Tihs patch must be backported to all supported versions.

BUG/MEDIUM: mux-fcgi: Truly drain outgoing HTX data when the stream is closed

When we try to send data to the server and the stream is closed (in error,
in half-closed state or fully closed), remaining data must be drained. This
way the upper stream is able to properly handle the stream close.

However, there was a bug here. The mux claimed to have consumed these data
without draining them from the buffer. So the upper stream will try to send
these data in loop. Because of this bug, it is possible to trigger the
watchdog with a bogus stream.

This patch should fix the issue #3425. It must be backported to all
supported versions.

MINOR: debug: add "rt=1" to "debug dev task" to tune the RT flag

When rt=1 is set, the created task is set to real time. This will
essentially be used jointly with print=1 to print the wakeup date.

Now it is pretty much visible that TASK_RT helps keeping recurrent
delays stable under load:

  $ socat - /tmp/sock1 <<< "expert-mode on;debug dev sched task inter=1 count=100000"
  $ socat - /tmp/sock1 <<< "expert-mode on;debug dev sched task print=1 inter=1000 count=1"

  $ taskset -c 0  ./haproxy -db -f h2-h1.cfg
  [NOTICE]   (11262) : Automatically setting global.maxconn to 2033.
  task 0x5a67740: time_ms=355817139.2593
  task 0x5a67740: time_ms=355818165.934056
  task 0x5a67740: time_ms=355819192.609666
  task 0x5a67740: time_ms=355820219.528467
  task 0x5a67740: time_ms=355821245.778249
  task 0x5a67740: time_ms=355822271.800731
  task 0x5a67740: time_ms=355823297.836419
  task 0x5a67740: time_ms=355824327.894992
  task 0x5a67740: time_ms=355825354.92738
  task 0x5a67740: time_ms=355826381.242925
  ^C
  => ~1030ms interval

  $ socat - /tmp/sock1 <<< "expert-mode on;debug dev sched task inter=1 count=100000"
  $ socat - /tmp/sock1 <<< "expert-mode on;debug dev sched task print=1 inter=1000 count=1 rt=1"

  $ taskset -c 0  ./haproxy -db -f h2-h1.cfg
  [NOTICE]   (11274) : Automatically setting global.maxconn to 2033.
  task 0x1c89d740: time_ms=355842657.804670
  task 0x1c89d740: time_ms=355843657.130495
  task 0x1c89d740: time_ms=355844657.159396
  task 0x1c89d740: time_ms=355845657.176695
  task 0x1c89d740: time_ms=355846657.144914
  task 0x1c89d740: time_ms=355847657.416217
  task 0x1c89d740: time_ms=355848657.192072
  ^C
  => ~1000ms interval

Also mixing one non-rt and one rt tasks easily shows that the non-RT skews:

  task 0x3180080: time_ms=356074147.738657
  task 0x7f82705749c0: time_ms=356074167.140429
  task 0x3180080: time_ms=356075157.58365
  task 0x7f82705749c0: time_ms=356075167.155128
  task 0x3180080: time_ms=356076167.23508
  task 0x7f82705749c0: time_ms=356076167.163529
  task 0x7f82705749c0: time_ms=356077167.131421
  task 0x3180080: time_ms=356077176.204301
  task 0x7f82705749c0: time_ms=356078167.172336
  task 0x3180080: time_ms=356078186.76828
  task 0x7f82705749c0: time_ms=356079167.110302
  task 0x3180080: time_ms=356079195.814623

Under traffic with a run queue of 50k, with non-rt we can see delays
of up to a few hundred ms, while the rt one is stable:

  task 0xf52228e3c580: time_ms=12817821323.738291
  task 0xf52228e3c580: time_ms=12817822375.954267
  task 0xf52228e3c580: time_ms=12817823437.542732
  task 0xf522c85fe340: time_ms=12817823901.552931
  task 0xf52228e3c580: time_ms=12817824487.769129
  task 0xf522c85fe340: time_ms=12817824901.491252
  task 0xf52228e3c580: time_ms=12817825553.539234
  task 0xf522c85fe340: time_ms=12817825901.930177
  task 0xf52228e3c580: time_ms=12817826613.611365
  task 0xf522c85fe340: time_ms=12817826901.94851
  task 0xf52228e3c580: time_ms=12817827659.386720
  task 0xf522c85fe340: time_ms=12817827901.655777
  task 0xf52228e3c580: time_ms=12817828694.795097
  task 0xf522c85fe340: time_ms=12817828901.882341
  task 0xf52228e3c580: time_ms=12817829765.216886
  task 0xf522c85fe340: time_ms=12817829901.723780
  task 0xf52228e3c580: time_ms=12817830844.706481
  task 0xf522c85fe340: time_ms=12817830901.571820
  task 0xf522c85fe340: time_ms=12817831901.440380
  task 0xf52228e3c580: time_ms=12817831951.556375

This indicates that thisshould be a sufficient first step to unblock
haload.

MEDIUM: task: add a new flag TASK_RT to permit a task to skip the priority queue

For some very rare tasks that need to be woken up at an exact date (right
now the only known use case is haload's periodic stats collection), it's
currently difficult to guarantee the wake up date on a heavily loaded
run queue.

This patch introduces TASK_RT for real-time tasks. Right now, all it does
is modify __task_wakeup() to immediately switch to __tasklet_wakeup_*()
and effectively bypass the priority-based run queue. Doing it here has
the benefit of making sure that it automatically applies to tasks found
in the wait queue, and that it will also work for _task_drop_running().

For now nothing uses it. The doc was updated.

MEDIUM: task: make __tasklet_wakeup_on() only accept non-local threads

The ambiguity in usage for __tasklet_wakeup_on() is now gone. All known
callers that used to be able to pass a negative value now call
__tasklet_wakeup_here(), and remaining ones always pass an explicit
thread number. This means that we can remove the "if (thr<0)" branch,
but still leave a BUG_ON_HOT() to catch any possibly missed case. The
comment around tasklet_wakeup_on() not supporting remotely waking a
tasklet whose tid<0 was also removed since it was addressed long ago.

MINOR: task: make task_instant_wakeup() explicitly call _tasklet_wakeup_here()

This patch moves the tid check upper in the chain, in task_instant_wakeup()
so as to branch to _tasklet_wakeup_here() for run-anywhere tasks, or
_tasklet_wakeup_on() for designated threads.

At this point there is no longer any direct caller of __tasklet_wakeup_on()
passing a negative thread value.

MINOR: task: make tasklet_wakeup() explicitly call _tasklet_wakeup_here()

This patch moves the tid check upper in the chain, in tasklet_wakeup()
so as to branch to _tasklet_wakeup_here() for run-anywhere tasklets, or
_tasklet_wakeup_on() for designated threads. The tid is retrieved via
__task_get_current_owner() so that the call remains compatible with
tasklets that would have a super-negative tid due to being tasks used
as tasklets.

MINOR: task: add a new explicitly local tasklet wakeup function

The current tasklet_wakeup() call relies on tasklet_wakeup_on(tl->tid),
which was already quite ambiguous till now due to the sole reliance on
tid being negative or not to decide to run locally, but it no longer
works correctly if used to wake tasks up since the new set of possible
negative values for ->tid (particularly if some code calls
__tasklet_wakeup_on() on a task as is done in task_instant_wakeup()).

The problem is that it is not possible in the current API to explicitly
say that we want a task/tasklet to run locally or remotely without having
to play games with a thread number. The chosen approach to address this
is to change tasklet_wakeup_on() to always be remote and have
tasklet_wakeup_here() which will always be local, with tasklet_wakeup()
choosing one or the other depending on the tid, for backwards compat
only.

This patch implements tasklet_wakeup_here() to __tasklet_wakeup_here()
that reimplement the part of __tasklet_wakeup_on() that used to deal
with the local thread only (negative tid). No other change was made.
For now it remains unused.

The doc was updated.

MINOR: task: move the profiling checks to the called functions not callers

The checks on TH_FL_TASK_PROFILING that are used to decide whether or not
to set t->wake_date from now_mono_time() used to be made in callers of
__tasklet_wakeup_on() and __tasklet_wakeup_after(), but not only this
needlessly inflates code by placing this in every caller (~4kB), it also
renders the design fragile since each caller needs to blindly copy-paste
that statement.

Let's move the operation in the callees instead. As a bonus, it allows
to check the flag on the target thread and not on the calling thread
(which was arguably a bug though without a noticeable effect since for
now profiling is for all threads or none).

BUILD: quic_pacing: add missing includes for api and activity in the file

quic_pacing.c is missing a number of include files that it got by chance
through task.h, resulting in build breakage as soon as that one gets
cleaned up. Let's add api.h and activity.h that are needed. No backport
is needed.

BUG/MINOR: poller: fix wait time calculation that is always 1 extra ms

In 1.3.11, 19 years ago, commit bdefc513a0 ("[BUG] fix null timeouts in
*poll-based pollers") addressed an issue where some wakeup times could
sometimes be rounded to less than one millisecond (by then they were
calculated on timeval), and would make the poller wake up too early and
loop with a timeout of zero. The solution used by then consisted in
always adding 1 to the wait delay so that poll() was never called with
a null timeout.

Nowadays our internal wakeup delays are in milliseconds so we cannot wake
too early, all the timeout calculation was moved to compute_poll_timeout()
which has a specific check for expired next wakeup event, so we cannot
even have a null timeout as a result of a real delay calculation by
accident. Yet, it's clearly visible with strace thats a task created
with an interval of 10ms results in a poll timeout of 11ms, causing some
small time drift in periodic wakeups.

Let's just now drop this "+1" which is no longer needed nor relevant and
only causes wrong delays to be calculated. Now creating a time-printing
task results in correct delays passed to poll() and measured intervals
around:
  - ~10.3ms interval for 10ms
  - ~100.5ms for 100ms
  - ~1001ms for 1000ms

E.g:

  $ socat - /tmp/sock1 <<< "expert-mode on;debug dev sched task print=1 inter=10 count=1"
  (...)

  17:58:05.191885 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=44590744}) = 0
  17:58:05.191919 epoll_wait(4, [], 200, 10) = 0
  17:58:05.202215 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=44601494}) = 0
  17:58:05.202237 write(1, "task 0x3aeeb080: time_ms=3553"..., 42task 0x3aeeb080: time_ms=355304053.757383
  ) = 42
  17:58:05.202253 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=44610199}) = 0
  17:58:05.202265 epoll_wait(4, [], 200, 10) = 0
  17:58:05.212579 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=44631754}) = 0
  17:58:05.212639 write(1, "task 0x3aeeb080: time_ms=3553"..., 42task 0x3aeeb080: time_ms=355304064.157626
  ) = 42

These delays with longer sleeps are entirely on the system side, most
likely due to the CPU switching to low-power for such long delays (tests
run on a laptop).

There is no reason to backport this fix, though it shouldn't hurt either.

MINOR: debug: add "print" to "debug dev sched"

Passing "print=1" for a periodic task will cause it to print the exact
monotonic time at each wakeup to stdout and do nothing else. This is
convenient to watch wakeup delay drifts.

DEBUG: cli: relax tid check in "debug dev task" for recent sched changes

Since commit 0988b9c773 ("MEDIUM: tasks: Remove the per-thread group
wait queue") in 3.5-dev, a task's tid may be as negative as -MAX_THREAD-1
and not just -1, so we must accept this when trying to check if a pointer
looks like a valid task.

No backport is needed.

REGTESTS: quic: test H3 request without content-length

Add a QUIC regression test for an HTTP/3 request body without
Content-Length forwarded to an HTTP/1 backend.

The client side sends a chunked HTTP/1 request to a first HAProxy
instance, which forwards it to a second HAProxy instance over QUIC/H3.
The second instance then forwards it to a plain HTTP/1 server. This
exercises the H3 frontend to H1 backend path with an unknown request
body length.

BUG/MEDIUM: h3: increment unknown request payload length

When an HTTP/3 request carries DATA frames without a Content-Length
header, the H3 mux updates the stream endpoint known input payload
length so the stream layer can pass this information to the output mux.

The current code assigns h3s->data_len to qcs->sd->kip. However,
h3s->data_len is cumulative, while sedesc->kip is an incremental value:
it is moved to the opposite side as kop and then consumed by the output
mux. With multiple DATA frames, the H1 output mux may therefore announce
chunk sizes based on the total body length received so far instead of
the next payload length.

For an H3-to-H1 request without Content-Length, this can produce
malformed chunked encoding on the backend connection. A backend HTTP/1
parser may then reject the request, and HAProxy can return a 500 to the
client.

Fix this by incrementing qcs->sd->kip with the current DATA frame length
instead of assigning the cumulative body length.

This should be backported up to 3.3.

DOC: lua: remove incorrect init tags

The `core.httpclient()` and `core.tcp()` functions aren't actually
available in the init context, as they require the event loop to be set
up.

As such, remove "init" form the list of contexts for those functions.

It probably makes sense to backport this to previous versions.

Fixes: #3420
Signed-off-by: Thayne McCombs <astrothayne@gmail.com>

DOC: sched: Document the wait queue modifications

Commit 0988b9c7730663e24c0809879e40ffcdb245c1e9 removed the global wait
queue, and introduced the concept of ownership for shared tasks, so
properly document it.

BUG/MINOR: hq-interop: support response buffer wrapping

When using QUIC on the backend side, transcoding of a large HTTP
response may cause the rxbuf to wrap. This patch introduces buffer
wrapping support for the HTTP/0.9 transcoder. This is similar to what is
already implemented in HTTP/3 layer.

This should be backported up to 3.3.

BUG/MINOR: hq-interop: support full demux buf on large response

When dealing with large responses, QUIC MUX demux buffer may be full. In
this case, QC_SF_DEM_FULL flag must be set to pause transcoding. This
patch implements it for HTTP/0.9 response transcoding, similarly to what
HTTP/3 already provides. This is a backend side fix.

This should be backported up to 3.3.

BUG/MEDIUM: hlua: Properly report EOS when http applet exits

When the Lua HTTP applet was migrated to the new API to use its own buffers,
a regression was introduced. The EOS flag at the end of the response was no
longer set. While it is not an issue when the response length is known
(because of a content-length or a transfer-encoding header), it is an issue
for responses with an unkown payload size. For the stconn and the stream, in
that case, the EOS is used to detect the end of the response. Without this
info, the stream remains blocked.

To fix the issue, the EOS flag is now set as expected on the applet.

This patch should fix the issue #3422. It must be backport as far as 3.3.

BUG/MINOR: hq-interop: prevent reset if missing content-length

HTTP/0.9 transcoder is minimal. In particular, it did not checked if the
HTX payload length was unknown. In this case, the stream shutdown is the
normal termination signal. As this condition was not reported to the
MUX, the stream would be closed via a RESET_STREAM during the stream
shut callback invokation.

Fix this by properly inspecting HTX response line prior to generating
the HTTP/0.9 response. If flag HTX_SL_F_XFER_LEN is not set, correctly
convert it to QCS flag QC_SF_UNKNOWN_PL_LENGTH. This ensures that MUX
will use a FIN signal instead of a RESET_STREAM frame when shut is
called by the upper stream layer.

This procedure is already implemented by HTTP/3 transcoder.

This bug was detected with haterm, because contrary to httpterm the
latter does not honour Connection keep-alive header in case of HTTP/1.0.
Thus connection close mode is used and no content-length is added.

This must be backported up to 2.8.

MINOR: hq_interop: do not rely on stream layer for HTX stline encoding

HTTP/0.9 is a simple protocol. The response only contains the body
without any status line nor header. An HTX start line must be built when
transcoding the message for haproxy stream layer on the first invocation
of rcv_buf() callback.

Previously, this condition was detected by using an access to the stream
object. However, it's possible to rely only on the QCS by checking the
value from <rx.offset> field. This is a better solution which completely
remove the superfluous dependency between hq-interop and the stream
layer.

BUG/MINOR: hq-interop: reject too big content

hq-interop request parser is minimal. It simply extracts a method and a
path until the whitespace delimiter. Parsing is interrupted if the
delimiter cannot be found, and MUX is responsible to reinvoke it later
with new content added.

This patch adjusts hq-interop parsing in case of a missing delimiter.
Now it also checks if there is space remaining in the buffer. If this is
not the case, it returns a fatal error as parsing cannot be completed at
all.

This change has the side effect of preventing a BUG_ON() crash in MUX :
in case of a truncated parsing, qcs_transfer_rx_data() may be used to
realign content from the next buffer. However this function explicitely
forbids to be called with a full buffer as it could do nothing in this
case, hence this BUG_ON() to ensure parsing is never fully blocked.

The impact of this bug remains low despite the potential BUG_ON() crash.
This is because hq-interop is only used for QUIC debugging purpose and
should not be activated in production. HTTP/3 layer is immune as it
already ensures that frame length is never bigger than a buffer size
(except for DATA frames which can be parsed in a streaming mode).

Thanks to BeaCox <root@beacox.space> for having reported us this issue.

This should be backported up to 2.6. From 3.3, qcm_stream_rx_bufsz() is
using the older "qmux" prefix and must be renamed. Also, this function
does not exists in 3.0 and older so the test must be adjusted there as
well.

OPTIM: mux-fcgi: Reorganise fcgi_conn structure to fill some holes

<drl> field was moved before <dsi> and term_evts_log was moved before the
demux buffer. 8 bytes was saved this way.