git.ipfire.org Git - thirdparty/haproxy.git/log

BUG/MINOR: ssl: suboptimal certificate selection with TLSv1.3 and dual ECDSA/RSA

When using TLSv1.3, the signature algorithms extension is used to chose
the right ECDSA or RSA certificate.

However there was an old test for previous version of TLS (< 1.3) which
was testing if the cipher is compatible with ECDSA when an ECDSA
signature algorithm is used. This test was relying on
SSL_CIPHER_get_auth_nid(cipher) == NID_auth_ecdsa to verify if the
cipher is still good.

Problem is, with TLSv1.3, all ciphersuites are compatible with any
authentication algorithm, but SSL_CIPHER_get_auth_nid(cipher) does not
return NID_auth_ecdsa, but NID_auth_any.

Because of this, with TLSv1.3 when both ECDSA and RSA certificates are
available for a domain, the ECDSA one is not chosen in priority.

This patch also introduces a test on the cipher IDs for the signaling
ciphersuites, because they would always return NID_auth_any, and are not
relevent for this selection.

This patch fixes issue #2300.

Must be backported in all stable versions.

MEDIUM: quic: count quic_conn for global sslconns

Similar to the previous commit which check for maxconn before allocating
a QUIC connection, this patch checks for maxsslconn at the same step.
This is necessary as a QUIC connection cannot run without a SSL context.

This should be backported up to 2.6. It relies on the following patch :
"BUG/MINOR: ssl: use a thread-safe sslconns increment"

MEDIUM: quic: count quic_conn instance for maxconn

Increment actconn and check maxconn limit when a quic_conn is
instantiated. This is necessary because prior to this patch, quic_conn
instances where not counted. Global actconn was only incremented after
the handshake has been completed and the connection structure is
allocated.

The increment is done using increment_actconn() on INITIAL packet
parsing if a new connection is about to be created. If the limit is
reached, the allocation is cancelled and the INITIAL packet is dropped.

The decrement is done under quic_conn_release(). This means that
quic_cc_conn instances are not taken into account. This seems safe
enough because quic_cc_conn are only used for minimal usage.

The counterpart of this change is that maxconn must not be checked a
second time when listener_accept() is done over a QUIC connection. For
this, a new bind_conf flag BC_O_XPRT_MAXCONN is set for listeners when
maxconn is already counted by the lower layer. For the moment, it is
positionned only for QUIC listeners.

Without this patch, haproxy process could suffer from heavy memory/CPU
load if the number of concurrent handshake is high.

This patch is not considered a bug fix per-se. However, it has a major
benefit to protect against too many QUIC handshakes. As such, it should
be backported up to 2.6. For this, it relies on the following patch :
"MINOR: frontend: implement a dedicated actconn increment function"

BUG/MINOR: ssl: use a thread-safe sslconns increment

Each time a new SSL context is allocated, global.sslconns is
incremented. If global.maxsslconn is reached, the allocation is
cancelled.

This procedure was not entirely thread-safe due to the check and
increment operations conducted at different stage. This could lead to
global.maxsslconn slightly exceeded when several threads allocate SSL
context while sslconns is near the limit.

To fix this, use a CAS operation in a do/while loop. This code is
similar to the actconn/maxconn increment for connection.

A new function increment_sslconn() is defined for this operation. For
the moment, only SSL code is using it. However, it is expected that QUIC
will also use it to count QUIC connections as SSL ones.

This should be backported to all stable releases. Note that prior to the
2.6, sslconns was outside of global struct, so this commit should be
slightly adjusted.

MINOR: frontend: implement a dedicated actconn increment function

When a new frontend connection is instantiated, actconn global counter
is incremented. If global maxconn value is reached, the connection is
cancelled. This ensures that system limit are under control.

Prior to this patch, the atomic check/increment operations were done
directly into listener_accept(). Move them in a dedicated function
increment_actconn() in frontend module. This will be useful when QUIC
connections will be counted in actconn counter.

BUG/MINOR: quic: do not consider idle timeout on CLOSING state

When entering closing state, a QUIC connection is maintained during a
certain delay. The principle is to ensure the other peer has received
the CONNECTION_CLOSE frame. In case of packet duplication/reordering,
CONNECTION_CLOSE is reemitted.

QUIC RFC recommends to use at least 3 times the PTO value. However,
prior to this patch, haproxy used instead the max value between 3 times
the PTO and the connection idle timeout. In the default case, idle
timeout is set to 30s which is in most of the times largely superior to
the PTO. This has the downside of keeping the connection in memory for
too long whereas all resources could be released much earlier.

Fix this behavior by using 3 times the PTO on closing or draining state.
This value is limited up to 1s. This ensures that most of connections
are covered by this. If a connection runs with a very high RTT, it must
not impact the whole process and should be released in a reasonable
delay.

This should be backported up to 2.6.

DEBUG: pools: detect that malloc_trim() is in progress

Now when calling ha_panic() with a thread still under malloc_trim(),
we'll set a new tainted flag to easily report it, and the output
trace will report that this condition happened and will suggest to
use no-memory-trimming to avoid it in the future.

DEBUG: lua: add tainted flags for stuck Lua contexts

William suggested that since we can detect the presence of Lua in the
stack, let's combine it with stuck detection to set a new pair of flags
indicating a stuck Lua context and a stuck Lua shared context.

Now, executing an infinite loop in a Lua sample fetch function with
yield disabled crashes with tainted=0xe40 if loaded from a lua-load
statement, or tainted=0x640 from a lua-load-per-thread statement.

In addition, at the end of the panic dump, we can check if Lua was
seen stuck and emit recommendations about lua-load-per-thread and
the choice of dependencies depending on the presence of threads
and/or shared context.

DEBUG: add a tainted flag when ha_panic() is called

This will make it easier to know that the panic function was called,
for the occasional case where the dump crashes and/or the stack is
corrupted and not much exploitable. Now at least it will be sufficient
to check the tainted value to know that someone called ha_panic(), and
it will also be usable to condition extra analysis.

MINOR: server: add helper function to detach server from proxy list

Remove some code duplication by introducing a basic helper function
to detach a server from its parent proxy. It is supported to call
the function even if the server is not yet listed in the proxy list.

If the server is not yet listed in the proxy, the function will do
nothing. In delete_server(), we previously performed some BUG_ON()
to ensure that the detach always succeeded given that we were certain
that the server was in the proxy list because it was retrieved through
get_backend_server().

However this test is superfluous, we can safely assume that the operation
will always succeed if get_backend_server() returned != NULL (we're under
full thread isolation), and if it's not the case, then we have a bigger
API issue anyway..

BUG/MEDIUM: server: "proto" not working for dynamic servers

In 304672320e ("MINOR: server: support keyword proto in 'add server' cli")
improper use of conn_get_best_mux_entry() function was made:

First, server's proxy mode was directly passed as "proto_mode" argument
to conn_get_best_mux_entry(), but this is strictly invalid because while
there is some relationship between proto modes and proxy modes, they
don't use the same storage mechanism and cannot be used interchangeably.

Because of this bug, conn_get_best_mux_entry() would not work at all for
TCP because PR_MODE_TCP equals 0, where PROTO_MODE_TCP normally equals 1.

Then another, less sensitive bug, remains:
as its name and description implies, conn_get_best_mux_entry() will try
its best to return something to the user, only using keyword (mux_proto)
input as an hint to return the most relevant mux within the list of
mux that are compatibles with proto_side and proto_mode values.

This means that even if mux_proto cannot be found or is not available
with current proto_side and proto_mode values, conn_get_best_mux_entry()
will most probably fallback to a more generic mux.

However in cli_parse_add_server(), we directly check the result of
conn_get_best_mux_entry() and consider that it will return NULL if the
provided keyword hint for mux_proto cannot be found. This will result in
the function not raising errors as expected, because most of the times if
the expected proto cannot be found, then we'll silently switch to the
fallback one, despite the user providing an explicit proto.

To fix that, we store the result of conn_get_best_mux_entry() to compare
the returned mux proto name with the one we're expecting to get, as it
is originally performed in cfgparse during initial server keyword parsing.

This patch depends on
- "MINOR: connection: add conn_pr_mode_to_proto_mode() helper func")

It must be backported up to 2.6.

MINOR: connection: add conn_pr_mode_to_proto_mode() helper func

This function allows to safely map proxy mode to corresponding proto_mode

This will allow for easier code maintenance and prevent mixups between
proxy mode and proto mode.

BUG/MEDIUM: server/log: "mode log" after server keyword causes crash

In 9a74a6c ("MAJOR: log: introduce log backends"), a mistake was made:
it was assumed that the proxy mode was already known during server
keyword parsing in parse_server() function, but this is wrong.

Indeed, "mode log" can be declared late in the proxy section. Due to this,
a simple config like this will cause the process to crash:

   |backend test
   |
   |  server name 127.0.0.1:8080
   |  mode log

In order to fix this, we relax some checks in _srv_parse_init() and store
the address protocol from str2sa_range() in server struct, then we set-up
a postparsing function that is to be called after config parsing to
finish the server checks/initialization that depend on the proxy mode
to be known. We achieve this by checking the PR_CAP_LB capability from
the parent proxy to know if we're in such case where the effective proxy
mode is not yet known (it is assumed that other proxies which are implicit
ones don't provide this possibility and thus don't suffer from this
constraint).

Only then, if the capability is not found, we immediately perform the
server checks that depend on the proxy mode, else the check is postponed
and it will automatically be performed during postparsing thanks to the
REGISTER_POST_SERVER_CHECK() hook.

Note that we remove the SRV_PARSE_IN_LOG_BE flag because it was introduced
in the above commit and it is no longer relevant.

No backport needed unless 9a74a6c gets backported.

DEBUG: mux-h2/flags: fix list of h2c flags used by the flags decoder

The two recent commits below each added one flag to h2c but omitted to
update the __APPEND_FLAG macro used by dev/flags so they are not
properly decoded:

3dd963b35 ("BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again")
68d02e5fa ("BUG/MINOR: mux-h2: make up other blocked streams upon removal from list")

This can be backported along with these commits.

MINOR: backend: refactor insertion in avail conns tree

Define a new function srv_add_to_avail_list(). This function is used to
centralize connection insertion in available tree. It reuses a BUG_ON()
statement to ensure the connection is not present in the idle list.

BUG/MAJOR: backend: fix idle conn crash under low FD

Since the following commit, idle conns are stored in a list as secondary
storage to retrieve them in usage order :
  5afcb686b93c3811bd859a331efd6a8341a61218
  MAJOR: connection: purge idle conn by last usage

The list usage has been extended wherever connections lookup are done
both on idle and safe trees. This reduced the code size by replacing a
two tree loops by a single list loop.

LIST_ELEM() is used in this context to retrieve the first idle list
element from the server list head. However, macro usage was wrong due to
an extra '&' operator which returns an invalid connection reference.
This will most of the time caused a crash on conn_delete_from_tree() or
affiliated functions.

This bug only occurs if the FD pool is exhausted and some idle
connections are selected to be killed.

It can be reproduced using the following config and h2load command :
$ h2load -t 8 -c 800 -m 10 -n 800 "http://127.0.0.1:21080/?s=10k"

global
maxconn 100

defaults
mode http
timeout connect 20s
timeout client  20s
timeout server  20s

listen li
bind :21080 proto h2
server nginx 127.99.0.1:30080 proto h1

This bug has been introduced by the above commit. Thus no need to
backport this fix.

Note that LIST_ELEM() macro usage was slightly adjusted also in
srv_migrate_conns_to_remove(). The function used toremove_list instead
of idle_list connection list element. This is not a bug as they are
stored in the same union. However, the new code is clearer as it intends
to move connection from the idle_list only into the toremove_list
mt-list.

BUG/MINOR: backend: fix wrong BUG_ON for avail conn

Idle connections are both stored in an idle/safe tree and in an idle
list. The list is used as a secondary storage to be able to retrieve
them by usage order.

If a connection is moved into the available tree, it must not be present
in the idle list. A BUG_ON() was written to check this but was placed at
the wrong code section. Fix this by removing the misplaced one and write
new ones for avail_conns tree insertion and lookup.

The impact of this bug is minor as the misplaced BUG_ON() did not seem
to be triggered.

No need to backport.

MINOR: lua: change tune.lua.log.stderr default from 'on' to 'auto'

After making it configurable in previous commit "MINOR: lua: Add flags
to configure logging behaviour", this patch changes the default value
of tune.lua.log.stderr from 'on' (unconditionally forward LUA logs to
stderr) to 'auto' (only forward LUA logs to stderr if logging via a
standard logger is disabled, or none is configured for the current context)

Since this is a change in behaviour, it shouldn't be backported

MINOR: lua: Add flags to configure logging behaviour

Until now, messages printed from LUA log functions were sent both to
the any logger configured for the current proxy, and additionally to
stderr (in most cases)

This introduces two flags to configure LUA log handling:
- tune.lua.log.loggers to use standard loggers or not
- tune.lua.log.stderr to use stderr, or not, or only conditionally

This addresses github feature request #2316

This can be backported to 2.8 as it doesn't change previous behaviour.

BUG/MINOR: ssl: load correctly @system-ca when ca-base is define

The configuration parser still adds the 'ca-base' directory when loading
the @system-ca, preventing it to be loaded correctly.

This patch fixes the problem by not adding the ca-base when a file
starts by '@'.

Fix issue #2313.

Must be backported as far as 2.6.

DOC: internal: filters: fix reference to entities.pdf

In doc/internals/api/filters.txt was the referece to
doc/internals/entities.pdf which was delteted in the
past.

[RELEASE] Released version 2.9-dev8

Released version 2.9-dev8 with the following main changes :
    - MINOR: ssl: add an explicit error when 'ciphersuites' are not supported
    - BUILD: ssl: enable 'ciphersuites' for WolfSSL
    - BUILD: ssl: add 'ssl_c_r_dn' fetch for WolfSSL
    - BUILD: ssl: add 'secure_memcmp' converter for WolfSSL and awslc
    - BUILD: ssl: enable keylog for awslc
    - CLEANUP: ssl: remove compat functions for openssl < 1.0.0
    - BUILD: ssl: enable keylog for WolfSSL
    - REGTESTS: pki: add a pki for SSL tests
    - REGTESTS: ssl: update common.pem with the new pki
    - REGTESTS: ssl: disable ssl_dh.vtc for WolfSSL
    - REGTESTS: wolfssl: temporarly disable some failing reg-tests
    - CI: ssl: add wolfssl to build-ssl.sh
    - CI: ssl: add git id support for wolfssl download
    - CI: github: add a wolfssl entry to the CI
    - CI: github: update wolfssl to git revision d83f2fa
    - CI: github: add awslc 1.16.0 to the push CI
    - BUG/MINOR: quic: Avoid crashing with unsupported cryptographic algos
    - REORG: quic: cleanup traces definition
    - BUG/MINOR: quic: reject packet with no frame
    - BUG/MEDIUM: mux-quic: fix RESET_STREAM on send-only stream
    - BUG/MINOR: mux-quic: support initial 0 max-stream-data
    - BUG/MINOR: h3: strengthen host/authority header parsing
    - CLEANUP: connection: drop an uneeded leftover cast
    - BUG/MAJOR: connection: make sure to always remove a connection from the tree
    - BUG/MINOR: quic: fix qc.cids access on quic-conn fail alloc
    - BUG/MINOR: quic: fix free on quic-conn fail alloc
    - BUG/MINOR: mux-quic: fix free on qcs-new fail alloc
    - BUG/MEDIUM: quic-conn: free unsent frames on retransmit to prevent crash
    - MEDIUM: tree-wide: logsrv struct becomes logger
    - MEDIUM: log: introduce log target
    - DOC: config: log <address> becomes log <target> in "log" related doc
    - MEDIUM: sink/log: stop relying on AF_UNSPEC for rings
    - MINOR: log: support explicit log target as argument in __do_send_log()
    - MINOR: log: remove the logger dependency in do_send_log()
    - MEDIUM: log/sink: simplify log header handling
    - MEDIUM: sink: inherit from caller fmt in ring_write() when rings didn't set one
    - MINOR: sink: add sink_new_from_srv() function
    - MAJOR: log: introduce log backends
    - MINOR: log/balance: support for the "sticky" lb algorithm
    - MINOR: log/balance: support for the "random" lb algorithm
    - MINOR: lbprm: support for the "none" hash-type function
    - MINOR: lbprm: compute the hash avalanche in gen_hash()
    - MINOR: sample: add sample_process_cnv() function
    - MEDIUM: log/balance: support for the "hash" lb algorithm
    - REGTEST: add a test for log-backend used as a log target
    - MINOR: server: introduce "log-bufsize" kw
    - BUG/MEDIUM: stconn: Report a send activity everytime data were sent
    - BUG/MEDIUM: applet: Report a send activity everytime data were sent
    - BUG/MINOR: mux-h1: Send a 400-bad-request on shutdown before the first request
    - MINOR: support for http-response set-timeout
    - BUG/MINOR: mux-h2: make up other blocked streams upon removal from list
    - DEBUG: pool: store the memprof bin on alloc() and update it on free()
    - BUG/MEDIUM: quic_conn: let the scheduler kill the task when needed
    - CLEANUP: hlua: Remove dead-code on error path in hlua_socket_new()
    - BUG/MEDIUM: mux-h1: do not forget TLR/EOT even when no data is sent
    - BUG/MINOR: htpp-ana/stats: Specify that HTX redirect messages have a C-L header
    - BUG/MEDIUM: mux-h2: Don't report an error on shutr if a shutw is pending
    - MEDIUM: stconn/channel: Move pipes used for the splicing in the SE descriptors
    - MINOR: stconn: Start to introduce mux-to-mux fast-forwarding notion
    - MINOR: stconn: Extend iobuf to handle a buffer in addition to a pipe
    - MINOR: connection: Add new mux callbacks to perform data fast-forwarding
    - MINOR: stconn: Temporarily remove kernel splicing support
    - MINOR: mux-pt: Temporarily remove splicing support
    - MINOR: mux-h1: Temporarily remove splicing support
    - MINOR: connection: Remove mux callbacks about splicing
    - MEDIUM: stconn: Add mux-to-mux fast-forward support
    - MINOR: mux-h1: Use HTX extra field only for responses with known length
    - MEDIUM: mux-h1: Properly handle state transitions of chunked outgoing messages
    - MEDIUM: raw-sock: Specifiy amount of data to send via snd_pipe callback
    - MINOR: mux-h1: Add function to add size of a chunk to an outgoind message
    - MEDIUM: mux-h1: Simplify zero-copy on sending path
    - MEDIUM: mux-h1: Simplify payload formatting based on HTX blocks on sending path
    - MEDIUM: mux-h1: Add fast-forwarding support
    - MINOR: h2: Set the BODYLESS_RESP flag on the HTX start-line if necessary
    - MEDIUM: mux-h2: Add consumer-side fast-forwarding support
    - MEDIUM: channel: don't look at iobuf to report an empty channel
    - MINOR: tree-wide: Only rely on co_data() to check channel emptyness
    - REGTESTS: Reenable HTTP tests about splicing
    - CLEAN: mux-h1: Remove useless __maybe_unused attribute on h1_make_chunk()
    - MEDIUM: mux-pt: Add fast-forwarding support
    - MINOR: global: Add an option to disable the zero-copy forwarding
    - BUILD: mux-h1: Fix build without kernel splicing support
    - REORG: stconn/muxes: Rename init step in fast-forwarding
    - MINOR: dgram: allow to set rcv/sndbuf for dgram sockets as well
    - BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again
    - BUG/MINOR: trace: fix trace parser error reporting
    - BUG/MEDIUM: peers: Be sure to always refresh recconnect timer in sync task
    - BUG/MEDIUM: peers: Fix synchro for huge number of tables
    - MINOR: cfgparse: forbid mixing reverse and standard listeners
    - MINOR: listener: add nbconn kw for reverse connect
    - MINOR: server: convert @reverse to rev@ standard format
    - MINOR: cfgparse: rename "rev@" prefix to "rhttp@"
    - REGTESTS: remove maxconn from rhttp bind line
    - MINOR: listener: forbid most keywords for reverse HTTP bind
    - MINOR: sample: Added support for Arrays in sample_conv_json_query in sample.c
    - MINOR: mux-h2/traces: explicitly show the error/refused stream states
    - MINOR: mux-h2/traces: clarify the "rejected H2 request" event
    - BUG/MINOR: mux-h2: commit the current stream ID even on reject
    - BUG/MINOR: mux-h2: update tracked counters with req cnt/req err

BUG/MINOR: mux-h2: update tracked counters with req cnt/req err

Originally H2 would transfer everything to H1 and parsing errors were
handled there, so that if there was a track-sc rule in effect, the
counters would be updated as well. As we started to add more and more
HTTP-compliance checks at the H2 layer, then switched to HTX, we
progressively lost this ability. It's a bit annoying because it means
we will not maintain accurate error counters for a given source, for
example.

This patch adds the calls to session_inc_http_req_ctr() and
session_inc_http_err_ctr() when needed (i.e. when failing to parse
an HTTP request since all other cases are handled by the stream),
just like mux-h1 does. The same should be done for mux-h3 by the
way.

This can be backported to recent stable versions. It's not exactly a
bug, rather a missing feature in that we had never updated this counter
for H2 till now, but it does make sense to do it especially based on
what the doc says about its usage.

BUG/MINOR: mux-h2: commit the current stream ID even on reject

The H2 spec says that a HEADERS frame turns an idle stream to the open
state, and it may then turn to half-closed(remote) on ES, then to close,
all at once, if we respond with RST (e.g. on error). Due to the fact that
we process a complete frame at once since h2_dec_hdrs() may reassemble
CONTINUATION frames until everything is complete, the state was only
committed after the frame was completley valid (otherwise multiple passes
could result in subsequent frames being rejected as the stream ID would
be equal to the highest one).

However this is not correct because it means that a client may retry on
the same ID as a previously failed one, which technically is forbidden
(for example the client couldn't know which of them a WINDOW_UPDATE or
RST_STREAM frame is for).

In practice, due to the error paths, this would only be possible when
failing to decode HPACK while leaving the HPACK stream intact, thus
when the valid decoded HPACK stream cannot be turned into a valid HTTP
representation, e.g. when the resulting headers are too large for example.
The solution to avoid this consists in committing the stream ID on this
error path as well. h2spec continues to be happy.

Thanks to Annika Wickert and Tim Windelschmidt for reporting this issue.

This fix must be backported to all stable versions.

MINOR: mux-h2/traces: clarify the "rejected H2 request" event

In h2_frt_handle_headers() all failures lead to a generic message saying
"rejected H2 request". It's quite inexpressive while there are a few
distinct tests that are made before jumping there:

  - trailers on closed stream
  - unparsable request
  - refused stream

Let's emit the traces from these call points instead so that we get more
info about what happened. Since these are user-level messages, we take
care of keeping them aligned as much as possible.

For example before it would say:

  [04|h2|1|mux_h2.c:2859] rejected H2 request : h2c=0x7f5d58036fd0(F,FRE)
  [04|h2|5|mux_h2.c:2860] h2c_frt_handle_headers(): leaving on error : h2c=0x7f5d58036fd0(F,FRE) dsi=1 h2s=0x9fdb60(0,CLO)

And now it says:

  [04|h2|1|mux_h2.c:2817] rcvd unparsable H2 request : h2c=0x7f55f8037160(F,FRH) dsi=1 h2s=CLO
  [04|h2|5|mux_h2.c:2875] h2c_frt_handle_headers(): leaving on error : h2c=0x7f55f8037160(F,FRE) dsi=1 h2s=CLO

MINOR: mux-h2/traces: explicitly show the error/refused stream states

Sometimes it's unclear whether a stream is still open or closed when
certain traces are emitted, for example when the stream was refused,
because the reported pointer and ID in fact correspond to the refused
stream. And for closed streams, no pointer/name is printed, leaving
some confusion about the state. This patch makes the situation easier
to analyse by explicitly reporting "h2s=CLO" on closed/error/refused
streams so that we don't waste time comparing pointers and we instantly
know the stream is closed. Now instead of emitting:

[03|h2|5|mux_h2.c:2874] h2c_frt_handle_headers(): leaving on error : h2c=0x7fdfa8026820(F,FRE) dsi=201 h2s=0x9fdb60(0,CLO)

It will emit:

[03|h2|5|mux_h2.c:2874] h2c_frt_handle_headers(): leaving on error : h2c=0x7fdfa8026820(F,FRE) dsi=201 h2s=CLO

MINOR: sample: Added support for Arrays in sample_conv_json_query in sample.c

Method now returns the content of Json Arrays, if it is specified in
Json Path as String. The start and end character is a square bracket. Any
complex object in the array is returned as Json, so that you might get Arrays
of Array or objects. Only recommended for Arrays of simple types (e.g.,
String or int) which will be returned as CSV String. Also updated
documentation and fixed issue with parenthesis and other changes from
comments.

This patch was discussed in issue #2281.

Signed-off-by: William Lallemand <wlallemand@haproxy.com>

MINOR: listener: forbid most keywords for reverse HTTP bind

Reverse HTTP bind is very specific in that in rely on a server to
initiate connection. All connection settings are defined on the server
line and ignored from the bind line.

Before this patch, most of keywords were silently ignored. This could
result in a configuration from doing unexpected things from the user
point of view. To improve this situation, add a new 'rhttp_ok' field in
bind_kw structure. If not set, the keyword is forbidden on a reverse
bind line and will cause a fatal config error.

For the moment, only the following keywords are usable with reverse bind
'id', 'name' and 'nbconn'.

This change is safe as it's already forbidden to mix reverse and
standard addresses on the same bind line.

REGTESTS: remove maxconn from rhttp bind line

The maxconn keyword is not used anymore for reverse HTTP bind. It has
been replaced recently by the new keyword nbconn. As it's default value
is 1, it can be safely removed from the regtest without affecting its
behavior.

MINOR: cfgparse: rename "rev@" prefix to "rhttp@"

'rev@' was used to specify a bind/server used with reverse HTTP
transport. This notation was deemed not explicit enough. Rename it
'rhttp@' instead.

MINOR: server: convert @reverse to rev@ standard format

Remove the recently introduced '@reverse' notation for HTTP reverse
servers. Instead, reuse the 'rev@' prefix already defined for bind
lines.

MINOR: listener: add nbconn kw for reverse connect

Previously, maxconn keyword was reused for a specific usage on reverse
HTTP binds to specify the number of active connect to proceed. To avoid
confusion, introduce a new dedicated keyword 'nbconn' which is specific
to reverse HTTP bind.

This new keyword is forbidden for non-reverse listener. A fatal error is
emitted during config parsing if this rule is not respected. It's safe
because it's also forbidden to mix standard and reverse addresses on the
same bind line.

Internally, nbconn value will be reassigned to 'maxconn' member of
bind_conf structure. This ensures that listener layer will automatically
reenable the preconnect task each time a connection is closed.

MINOR: cfgparse: forbid mixing reverse and standard listeners

Reverse HTTP listeners are very specific and share only a very limited
subset of keywords with other listeners. As such, it is probable
meaningless to mix standard and reverse addresses on the same bind line.
This patch emits a fatal error during configuration parsing if this is
the case.

BUG/MEDIUM: peers: Fix synchro for huge number of tables

The number of updates sent at once was limited to not loop too long to emit
updates when the buffer size is huge or when the number of sync tables is
huge. The limit can be configured and is set to 200 by default. However,
this fix introduced a bug. It is impossible to syncrhonize two peers if the
number of tables is higher than this limit. Thus by default, it is not
possible to sync two peers if there are more than 200 tables to sync.

Technically speacking, a teaching process is finished if we loop on all tables
with no new update messages sent. Because we are limited at each call, the loop
is splitted on several calls. However the restart point for the next loop is
always the last table for which we emitted an update message. Thus with more
tables than the limit, the loop never reachs the end point.

Worse, in conjunction with the bug fixed by "BUG/MEDIUM: peers: Be sure to
always refresh recconnect timer in sync task", it is possible to trigger the
watchdog because the applets may be woken up in loop and leave requesting
more room while its buffer is empty.

To fix the issue, restart conditions for a teaching loop were changed. If
the teach process is interrupted, we now save the restart point, called
stop_local_table. It is the last evaluated table on the previous loop. This
restart point is reset when the teach process is finished.

In additionn, the updates_sent variable in peer_send_msgs() was renamed to
updates to avoid ambiguities. Indeed, the variable is incremented, whether
messages were sent or not.

This patch must be backported as far as 2.6.

BUG/MEDIUM: peers: Be sure to always refresh recconnect timer in sync task

A sync task used to manage reconnect, sessions creation or shutdown and data
synchronization is responsible to refresh reconnect and heartbeat timers for
each remote peers and trigger applets wakeup. These timers are used to
refresh the sync task timeer itself. Thus it is important to take care to
always properly refresh them.

However, when there are some data to push, the reconnect timer is not
checked. It may be expired and not refreshed. In this case, an expired timer
may be used to the sync task, leading to a storm of wakeups. The sync task
is woken up in loop because its timer is in the past, waking up Peer applets
at each time.

To fix the issue, the peer's reconnect timer is now refresh to the default
reconnect timeout, if necessary, when there are some data to push.

This patch must be backported to all stable versions.

BUG/MINOR: trace: fix trace parser error reporting

Since traces were adapted to support being declared in the global section
in 2.7 with commit c11f1cdf4 ("MINOR: trace: split the CLI "trace" parser
in CLI vs statement"), the method used to return the error message was
unreliable. For example an invalid sink name in the global section would
produce:

  [ALERT]    (26685) : config : parsing [test-trace.cfg:51] : 'trace': No such sink
  [ALERT]    (26685) : config : parsing [test-trace.cfg:51] : (null)
  [ALERT]    (26685) : config : Error(s) found in configuration file : test-trace.cfg
  [ALERT]    (26685) : config : Fatal errors found in configuration.

The reason is that the trace is emitted manually using ha_error() in
cfg_parse_trace() and -1 is returned without setting the message, and
the caller also prints the empty message. That's quite awkward given
that the API originally comes from the CLI which does support dynamic
strings and that config keywords do as well.

This commit modifies both cli_parse_trace() and cfg_parse_trace() to
return a dynamically allocated message instead, and adapts the central
function trace_parse_statement() to do the same, replacing a few direct
assignments with strdup() or memprintf(). This way the alert is no
longer emitted by the parser function, it just passes the message to
the caller.

A few of the static messages switching to memprintf() also took this
opportunity to report the faulty word:

  [ALERT]    (26772) : config : parsing [test-trace.cfg:51] : No such trace sink 'stduot'
  [ALERT]    (26772) : config : Error(s) found in configuration file : test-trace.cfg
  [ALERT]    (26772) : config : Fatal errors found in configuration.

This may be backported to 2.8 and 2.7.

BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again

Stefan Behte reported that since commit f279a2f14 ("BUG/MINOR: mux-h2:
refresh the idle_timer when the mux is empty"), the http-request and
http-keep-alive timeouts don't work anymore on H2. Before this patch,
and since 3e448b9b64 ("BUG/MEDIUM: mux-h2: make sure control frames do
not refresh the idle timeout"), they would only be refreshed after stream
frames were sent (HEADERS or DATA) but the patch above that adds more
refresh points broke these so they don't expire anymore as long as
there's some activity.

We cannot just revert the fix since it also addressed an isse by which
sometimes the timeout would trigger too early and provoque truncated
responses. The right approach here is in fact to only use refresh the
idle timer when the mux buffer was flushed from any such stream frames.

In order to achieve this, we're now setting a flag on the connection
whenever we write a stream frame, and we consider that flag when deciding
to refresh the buffer after it's emptied. This way we'll only clear that
flag once the buffer is empty and there were stream data in it, not if
there were no such stream data. In theory it remains possible to leave
the flag on if some control data is appended after the buffer and it's
never cleared, but in practice it's not a problem as a buffer will always
get sent in large blocks when the window opens. Even a large buffer should
be emptied once in a while as control frames will not fill it as much as
data frames could.

Given the patch above was backported as far as 2.6, this patch should
also be backported as far as 2.6.

MINOR: dgram: allow to set rcv/sndbuf for dgram sockets as well

tune.rcvbuf.client and tune.rcvbuf.server are not suitable for shared
dgram sockets because they're per connection so their units are not the
same. However, QUIC's listener and log servers are not connected and
take per-thread or per-process traffic where a socket log buffer might
be too small, causing undesirable packet losses and retransmits in the
case of QUIC. This essentially manifests in listener mode with new
connections taking a lot of time to set up under heavy traffic due to
the small queues causing delays. Let's add a few new settings allowing
to set these shared socket sizes on the frontend and backend side (which
reminds that these are per-front/back and not per client/server hence
not per connection).

REORG: stconn/muxes: Rename init step in fast-forwarding

Instead of speaking of an initialisation stage for each data
fast-forwarding, we now use the negociate term. Thus init_ff/init_fastfwd
functions were renamed nego_ff/nego_fastfwd.

BUILD: mux-h1: Fix build without kernel splicing support

Data fast-forwarding does not build without the kernel splicing support
because counters about splicing don't exist. To make the code more readable,
all code about splicing is disabled if kernel splicing is not supported.

MINOR: global: Add an option to disable the zero-copy forwarding

The zero-copy forwarding or the mux-to-mux forwarding is a way to
fast-forward data without using the channels buffers. Data are transferred
from a mux to the other one. The kernel splicing is an optimization of the
zero-copy forwarding. But it can also use normal buffers (but not channels
ones). This way, it could be possible to fast-forward data with muxes not
supporting the kernel splicing (H2 and H3 muxes) but also with applets.

However, this mode can introduce regressions or bugs in future (just like
the kernel splicing). Thus, It could be usefull to disable this optim. To do
so, in configuration, the global tune settting
'tune.disable-zero-copy-forwarding' may be set in a global section or the
'-dZ' command line parameter may be used to start HAProxy. Of course, this
also disables the kernel splicing.

MEDIUM: mux-pt: Add fast-forwarding support

The PT multiplexer now implements callbacks function to produce and consume
fast-forwarded data. Only splicing is support because the mux-pt does not
use its own buffers.

CLEAN: mux-h1: Remove useless __maybe_unused attribute on h1_make_chunk()

This attribute was added during the dev stage. But it is useless now the
function is used. So, just remove it.

REGTESTS: Reenable HTTP tests about splicing

MINOR: tree-wide: Only rely on co_data() to check channel emptyness

Because channel_is_empty() function does now only check the channel's
buffer, we can remove it and rely on co_data() instead. Of course, all tests
must be inverted.

channel_is_empty() is thus removed.

MEDIUM: channel: don't look at iobuf to report an empty channel

It is important to split channels and I/O buffers. When data are pushed in
an I/O buffer, we consider them as forwarded. The channel never sees
them. Fast-forwarded data are now handled in the SE only.

MEDIUM: mux-h2: Add consumer-side fast-forwarding support

The H2 multiplexer now implements callbacks to consume fast-forwarded
data. It is the most usful case: A H2 client getting data from a H1
server. It is also the easiest case to implement. The producer side is
trickier because of multiplexing. It is not obvious this case would be
improved with data fast-forwarding.

MINOR: h2: Set the BODYLESS_RESP flag on the HTX start-line if necessary

When message headers are parsed and an HTX start-line is created, if we
detect the response must not have any payload, a specific flag must be set
on the HTX start-line. It happens for instance for response to HEAD
requests. This flag is useb by the multiplexers to know response payload, if
any, must be silently skipped.

This was not performed when h2 HEADERS frames were decoded. This HTX flag
was specifically added to fix a bug when the splicing is inuse. Thus the H2
multiplexer was not concerned. Because the mux-to-mux fast-forwarding will
be introduced, it is important handle this flag in the H2 multiplexer too.

MEDIUM: mux-h1: Add fast-forwarding support

The H1 multiplexer now implements callbacks function to produce and consume
fast-forwarded data.

MEDIUM: mux-h1: Simplify payload formatting based on HTX blocks on sending path

Just like for the zero-copy, this patch tries to simplify the code
responsible to format the message payload before sending it. But here, we
take care to simplify the loop on the HTX blocks. The result should be
less errorrpone.

MEDIUM: mux-h1: Simplify zero-copy on sending path

In h1_make_data(), the function responsible to format the message payload
before sending it, the code dealing with zero-copy was slighly simplified
(at least for me :).

There is no real change but there is a better split between messages with a
content-length and cunked messages.

MINOR: mux-h1: Add function to add size of a chunk to an outgoind message

This function should be used to send the chunk size, before appending the
chunk payload. It also takes care to add a CRLF to finish a previous chunk,
if necessary. This function will be used to fix the splicing for re-chunk
responses with an unknown length.

MEDIUM: raw-sock: Specifiy amount of data to send via snd_pipe callback

When data were sent using the kernel splicing, we tried to send all data
with no restriction. Most of time it is valid. However, because the payload
representation may differ between the producer and the consumer, it is
important to be able to specify how must data to send via the splicing.

Of course, for performance reason, it is important to maximize amount of
data send via splicing at each call. However, on edge-cases, this now can be
limited.

MEDIUM: mux-h1: Properly handle state transitions of chunked outgoing messages

On the sending path, there are 3 states for chunked payload in H1:

  * H1_MSG_CHUNK_SIZE: the chunk size must be emitted
  * H1_MSH_CHUNK_CRLF: The end of the chunk must be emitted
  * H1_MSG_DATA: Chunked data must be emitted

However, some shortcuts were used on the sending path to avoid some
transitions. Especially, outgoing messages were never switched in
H1_MSG_CHUNK_SIZE state.

However, it will be necessary to properly handle all transitions on the payload
to implement mux-to-mux forwarding, to be sure to always known when the chunk
size or the end of the chunk must be emitted.

MINOR: mux-h1: Use HTX extra field only for responses with known length

For now, it is not an issue, but it is safer to explicitly ignore HTX extra
field for responses with unknown length. This will be mandatory to future
fixes, to be able to re-chunk responses with an unknown length..

MEDIUM: stconn: Add mux-to-mux fast-forward support

Now the kernel splicing support was removed, we can add mux-to-mux
fast-forward support. Of course, the splicing support will be reintroduced
in the muxes themselves but this will be transparent.

Changes are mainly located into sc_conn_recv() and sc_conn_send().

MINOR: connection: Remove mux callbacks about splicing

The kernel splicing support was totally remove waiting for the mux-to-mux
fast-forward implementation. So corresponding mux callbacks can be removed
now.

MINOR: mux-h1: Temporarily remove splicing support

Because the kernel splicing support was removed from the stconn, it is
useless to keep it in muxes. In this patch, we remove the kernel splicing
support from the H1 multiplexer. It will be replaced by the mux-to-mux data
fast-forwarding.

MINOR: mux-pt: Temporarily remove splicing support

Because the kernel splicing support was removed from the stconn, it is
useless to keep it in muxes. In this patch, we remove the kernel splicing
support from the passthough multiplexer. It will be replaced by the
mux-to-mux data fast-forwarding.

MINOR: stconn: Temporarily remove kernel splicing support

mux-to-mux fast-forwarding will be added. To avoid mix with the splicing and
simplify the commits, the kernel splicing support is removed from the
stconn. CF_KERN_SPLICING flag is removed and the support is no longer tested
in process_stream().

In the stconn part, rcv_pipe() callback function is no longer called.

Reg-tests scripts testing the kernel splicing are temporarly marked as
broken.

MINOR: connection: Add new mux callbacks to perform data fast-forwarding

To perform the mux-to-mux data fast-forwarding, 4 new callbacks were added
into the mux_ops structure. 2 callbacks will be used from the stconn for
fast-forward data. The 2 other callbacks will be used by the endpoint to
request an iobuf to the opposite endpoint.

* fastfwd() callback function is used by a producer to forward data

* resume_fastfwd() callback function is used by a consumer if some data are
   blocked in the iobuf, to resume the data forwarding.

* init_fastfwd() must be used by an endpoint (the producer one), inside the
   fastfwd() callback to request an iobuf to the opposite side (the consumer
   one).

* done_fastfwd() must be used by an endpoint (the producer one) at the end
   of fastfwd() to notify the opposite endpoint (the consumer one) if data
   were forwarded or not.

This API is still under development, so it may evolved. Especially when the
fast-forward will be extended to applets.

2 helper functions were also added into the SE api to wrap init_fastfwd()
and done_fastfwd() callback function of the underlying endpoint.

For now, this API is unsed and not implemented at all in muxes.

MINOR: stconn: Extend iobuf to handle a buffer in addition to a pipe

It is unused for now, but the iobuf structure now owns a pointer to a
buffer. This buffer will be used to perform mux-to-mux fast-forwarding when
splicing is not supported or unusable. This pointer should be filled by an
endpoint to let the opposite one forward data.

Extra fields, in addition to the buffer, are mandatory because the buffer
may already contains some data. the ".offset" field may be used may be used
as the position to start to copy data. Finally, the amount of data copied in
this buffer must be saved in ".data" field.

Some flags are also added to prepare next changes. And helper stconn
fnuctions are updated to also count data in the buffer. For a first
implementation, it is not planned to handle data in the buffer and in the
pipe in same time. But it will be possible to do so.

MINOR: stconn: Start to introduce mux-to-mux fast-forwarding notion

Instead of talking about kernel splicing at stconn/sedesc level, we now try
to talk about mux-to-mux fast-forwarding. To do so, 2 functions were added
to know if there are fast-forwarded data and to retrieve this amount of
data. Of course, for now, there is only data in a pipe.

In addition, some flags were renamed to reflect this notion. Note the
channel's documentation was not updated yet.

MEDIUM: stconn/channel: Move pipes used for the splicing in the SE descriptors

The pipes used to put data when the kernel splicing is in used are moved in
the SE descriptors. For now, it is just a simple remplacement but there is a
major difference with the pipes in the channel. The data are pushed in the
consumer's pipe while it was pushed in the producer's pipe. So it means the
request data are now pushed in the pipe of the backend SE descriptor and
response data are pushed in the pipe of the frontend SE descriptor.

The idea is to hide the pipe from the channel/SC side and to be able to
handle fast-forwading in pipe but also in buffer. To do so, the pipe is
inside a new entity, called iobuf. This entity will be extended.

BUG/MEDIUM: mux-h2: Don't report an error on shutr if a shutw is pending

If a shutw is blocked because the mux is full or busy, we must defer the
shutr. In this case, the H2 stream is not in H2_SS_CLOSED state because the
shutw is also deferred. If the shutr is performed, this will lead to a
error.

Concretly, when the mux is unblocked, a RST_STREAM is sent while in some
cases, an empty DATA frame with ES flag set could be sent.

This patch should be backported to all stable versions.

BUG/MINOR: htpp-ana/stats: Specify that HTX redirect messages have a C-L header

Redirect responses sent during the HTTP analysis have no payload. However
there is still a "Content-Length" header. It is important to set the
corresponding flag on the HTX start-line to be sure to preserve this header
when the reponse is sent to the client. The same is true with the stats
applet, when it returns a redirect responses.

It is especially important because we no ignore in-fly modifications of
"Content-Length" or "Transfer-Encoding" headers without updating the HTX
start-line flags.

This patch may be backported to all stable versions but it is probably
useless because only the 2.9-dev is affected by the bug.

BUG/MEDIUM: mux-h1: do not forget TLR/EOT even when no data is sent

Since commit 723c73f8a ("MEDIUM: mux-h1: Split h1_process_mux() to make code
more readable"), outgoing H1 chunked messages with no data at all get
delayed by 200ms. It is due to the fact that we end processing too early and
we don't have the opportunity to process trailers in this case.

This fix addresses it by verifying if it's required to emit EOT or trailers,
if any, when retruning from h1_make_data()

No backport is needed, this was in 2.9-dev.

CLEANUP: hlua: Remove dead-code on error path in hlua_socket_new()

Since last fixes about the lua cosocket, the appctx is no longer initialized
in hlua_socket_new(). The code to deal with error at this stage can be
removed.

This patch should fix the issue #2308.

BUG/MEDIUM: quic_conn: let the scheduler kill the task when needed

The two timer handlers qc_process_timer() and qc_idle_timer_task() would
inadvertently return NULL when they don't want to be requeued, instead
of just returning the task itself. The effect of returning NULL for the
scheduler is that it considers the task as freed, so it must not touch
it anymore. As such, the TASK_F_RUNNING flag is never removed from these
tasks, and when quic_conn_release() later tries to release these tasks
using task_destroy(), the latter sees the RUNNING flag and just sets
->process to NULL, hoping that the scheduler will kill them on return,
but there's no longer being executed so this never happens and they are
leaked.

Interestingly, this doesn't seem to happen as much when multi-queue is
set to off, but it's likely because the tasks are being replaced and the
first ones have already been woken up and leaked, while the latter might
only trigger on a timeout or timer renewal.

This should address github issue #2310. Thanks to @hpn0t0ad for the
numerous traces that helped understand this sequence.

This must be backported to 2.7 at least, and adapted for 2.6
(qc_idle_timer_task must return t there).

DEBUG: pool: store the memprof bin on alloc() and update it on free()

When looking at "show pools", it's often difficult to know which alloc()
corresponds to which free() since it's not often 1:1. But sometimes we
have all elements available to maintain a link between alloc and free.
Indeed, when the caller is recorded in the allocated area, we can store
the pointer to the just created bin instead of the caller address itself,
since the caller address is already in the memprof bin. By doing so, we
permit the pool_free() call to locate the allocator bin and update its
free count when caller tracing is enabled. This for example allows to
produce outputs like this on "show profiling" and a process started with
-dMcaller:

  1391967  1391968  22805987328  22806003712|  0x59f72f process_stream+0x19f/0x3a7a p_alloc(0) [delta=-16384] [pool=buffer]
  1391936  1391937  22805479424  22805495808|  0x6e1476 task_run_applet+0x426/0xea2 p_alloc(0) [delta=-16384] [pool=buffer]
  1391925  1391925  22805299200  22805299200|  0x58435a main+0xdf07a p_alloc(0) [delta=0] [pool=buffer]
        0  2087930            0  34208645120|  0x59b519 stream_release_buffers+0xf9/0x110 p_free(-16384) [pool=buffer]
   695993   695992  11403149312  11403132928|  0x66018f main+0x1baeaf p_alloc(0) [delta=16384] [pool=buffer]
        0  1391957            0  22805823488|  0x59b47c stream_release_buffers+0x5c/0x110 p_free(-16384) [pool=buffer]
   695968   695970  11402739712  11402772480|  0x587b85 h1_io_cb+0x9a5/0xe7c p_alloc(0) [delta=-32768] [pool=buffer]
        0  1391923            0  22805266432|  0x57f388 main+0xda0a8 p_free(-16384) [pool=buffer]
   695959   695960  11402592256  11402608640|  0x586add main+0xe17fd p_alloc(0) [delta=-16384] [pool=buffer]
          0      695978              0    11402903552|         0x59cc58 stream_free+0x178/0x9ea p_free(-16384) [pool=buffer]
(...)

Here it's quickly visible that all of them got properly released.

BUG/MINOR: mux-h2: make up other blocked streams upon removal from list

An interesting issue was met when testing the mux-to-mux forwarding code.

In order to preserve fairness, in h2_snd_buf() if other streams are waiting
in send_list or fctl_list, the stream that is attempting to send also goes
to its list, and will be woken up by h2_process_mux() or h2_send() when
some space is released. But on rare occasions, there are only a few (or
even a single) streams waiting in this list, and these streams are just
quickly removed because of a timeout or a quick h2_detach() that calls
h2s_destroy(). In this case there's no even to wake up the other waiting
stream in its list, and this will possibly resume processing after some
client WINDOW_UPDATE frames or even new streams, so usually it doesn't
last too long and it not much noticeable, reason why it was left that
long. In addition, measures have shown that in heavy network-bound
benchmark, this exact situation happens on less than 1% of the streams
(reached 4% with mux-mux).

The fix here consists in replacing these LIST_DEL_INIT() calls on
h2s->list with a function call that checks if other streams were queued
to the send_list recently, and if so, which also tries to resume them
by calling h2_resume_each_sending_h2s(). The detection of late additions
is made via a new flag on the connection, H2_CF_WAIT_INLIST, which is set
when a stream is queued due to other streams being present, and which is
cleared when this is function is called.

It is particularly difficult to reproduce this case which is particularly
timing-dependent, but in a constrained environment, a test involving 32
conns of 20 streams each, all downloading a 10 MB object previously
showed a limitation of 17 Gbps with lots of idle CPU time, and now
filled the cable at 25 Gbps.

This should be backported to all versions where it applies.

MINOR: support for http-response set-timeout

Added set-timeout action for http-response. Adapted reg-tests and
documentation.

BUG/MINOR: mux-h1: Send a 400-bad-request on shutdown before the first request

Except if we must silently ignore empty connections by enabling
http-ignore-probes or dontlognull options, when a client connection is
closed before the first request, a 400-bad-request response must be sent
with the corresponding log message. However, that is broken since the commit
fc473a6453 ("MEDIUM: mux-h1: Rely on the H1C to deal with shutdown for
reads").

The bug is subtle. Parsing errors are no longer reported on connection errors
before the first request while it should be.

This patch must be backported where the above commit is (as far as 2.7).

BUG/MEDIUM: applet: Report a send activity everytime data were sent

In the same way than for stream-connectors (see "BUG/MEDIUM: stconn: Report
a send activity everytime data were sent" for details), we now report a send
activity everytime something was consumed by an applet, even if some output
data remains blocked into the channel's buffer.

This patch must be backported to 2.8.

BUG/MEDIUM: stconn: Report a send activity everytime data were sent

When read/write timeouts were refactored in 2.8, we decided to change when a
send activity had to be reported. Before, everytime some data were sent a
send activity were reported. At this time, the channel's wex timer were
updated. During the refactoring, we decided to limit send activity to sends
that ampty te channel's buffer, consuming all outgoing data. Idea behind
this change was to protect haproxy against clients consumming data very
slowly.

However, it is too strict. Some congested muxes but still active can hit the
client or the server timeout. It seems a bit unfair. It is especially
visible with QUIC/H3 but it is probably also possible with H2 if the window
size is small.

The better is to restore the old behavior.

This patch must be backported to 2.8.

MINOR: server: introduce "log-bufsize" kw

"log-bufsize" may now be used for a log server (in a log backend) to
configure the bufsize of implicit ring associated to the server (which
defaults to BUFSIZE).

REGTEST: add a test for log-backend used as a log target

This regtest declares and uses 3 log backends, one of which has TCP syslog
servers declared in it and other ones UDP syslog servers.

Some tests aims at testing log distribution reliability by leveraging the
log-balance hash algorithm with a key extracted from the request URL, and
the dummy vtest syslog servers ensure that messages are sent to the
correct endpoint. Overall this regtest covers essential parts of the log
message distribution and log-balancing logic involved with log backends.

It also leverages the log-forward section to perform the TCP->UDP
translation required to test UDP endpoints since vtest syslog servers
work in UDP mode.

Finally, we have some tests to ensure that the server queuing/dequeuing
and failover (backup) logics work properly.

MEDIUM: log/balance: support for the "hash" lb algorithm

hash lb algorithm can be configured with the "log-balance hash <cnv_list>"
directive. With this algorithm, the user specifies a converter list with
<cnv_list>.

The produced log message will be passed as-is to the provided converter
list, and the resulting hash will be used to select the log server that
will receive the log message.

MINOR: sample: add sample_process_cnv() function

split sample_process() in 2 parts in order to be able to only process
the converter part of a sample expression from an existing input sample
struct passed as parameter.

MINOR: lbprm: compute the hash avalanche in gen_hash()

Instead of systematically computing the avalanche hash right after the
gen_hash() call, do it inside the gen_hash() function directly to ensure
avalanche setting is always considered.

MINOR: lbprm: support for the "none" hash-type function

Allow the use of the "none" hash-type function so that the key resulting
from the sample expression is directly used as the hash.

This can be useful to do the hashing manually using available hashing
converters, or even custom ones, and then inform haproxy that it can
directly rely on the sample expression result which is explictly handled
as an integer in this case.

MINOR: log/balance: support for the "random" lb algorithm

In this patch we add basic support for the random algorithm:

random algorithm picks a random server using the result of the
statistical_prng() function as if it was a hash key to then compute the
related server ID.

There is no support for the <draw> parameter (which is implemented for
tcp/http load-balancing), because we don't have the required metrics to
evaluate server's load in log backends for the moment. Plus it would add
more complexity to the __do_send_log_backend() function so we'll keep it
this way for now but this might be needed in the future.

MINOR: log/balance: support for the "sticky" lb algorithm

sticky algorithm always tries to send log messages to the first server in
the farm. The server will stay in front during queue and dequeue
operations (no other server can steal its place), unless it becomes
unavailable, in which case it will be replaced by another server from
the tree.

MAJOR: log: introduce log backends

Using "mode log" in a backend section turns the proxy in a log backend
which can be used to log-balance logs between multiple log targets
(udp or tcp servers)

log backends can be used as regular log targets using the log directive
with "backend@be_name" prefix, like so:

| log backend@mybackend local0

A log backend will distribute log messages to servers according to the
log load-balancing algorithm that can be set using the "log-balance"
option from the log backend section. For now, only the roundrobin
algorithm is supported and set by default.

MINOR: sink: add sink_new_from_srv() function

This helper function can be used to create a new sink from an existing
server struct (and thus existing proxy as well), in order to spare some
resources when possible.

MEDIUM: sink: inherit from caller fmt in ring_write() when rings didn't set one

implicit rings were automatically forced to the parent logger format, but
this was done upon ring creation.

This is quite restrictive because we might want to choose the desired
format right before generating the log header (ie: when producing the
log message), depending on the logger (log directive) that is
responsible for the log message, and with current logic this is not
possible. (To this day, we still have dedicated implicit ring per log
directive, but this might change)

In ring_write(), we check if the sink->fmt is specified:
- defined: we use it since it is the most precise format
(ie: for named rings)
- undefined: then we fallback to the format from the logger

With this change, implicit rings' format is now set to UNSPEC upon
creation. This is safe because the log header building function
automatically enforces the "raw" format when UNSPEC is set. And since
logger->format also defaults to "raw", no change of default behavior
should be expected.

MEDIUM: log/sink: simplify log header handling

Introduce log_header struct to easily pass log header data between
functions and use that to simplify the logic around log header
handling.

While at it, some outdated comments were updated as well.

No change in behavior should be expected.

MINOR: log: remove the logger dependency in do_send_log()

do_send_log() now exlusively relies on explicit parameters to remove
logger dependency in low-level log sending chain.

MINOR: log: support explicit log target as argument in __do_send_log()

__do_send_log() now takes an extra target parameter to pass an explicit
log target instead of getting it from logger->target.

This will allow __do_send_log() to be called multiple times within a
logger entry containing multiple log targets.

MEDIUM: sink/log: stop relying on AF_UNSPEC for rings

Since a5b325f92 ("MINOR: protocol: add a real family for existing FDs"),
we don't rely anymore on AF_UNSPEC for buffer rings in do_send_log.

But we kept it as a parsing hint to differentiate between implicit and
named rings during ring buffer postparsing.

However it is still a bit confusing and forces us to systematically rely
on target->addr, even for named buffer rings where it doesn't make much
sense anymore.

Now that target->addr was made a pointer in a recent commit, we can
choose not to initialize it when not needed (i.e.: named rings) and use
this as a hint to distinguish implicit rings during init since they rely
on the addr struct to temporarily store the ring's address until the ring
is actually created during postparsing step.

DOC: config: log <address> becomes log <target> in "log" related doc

This is a follow up of the previous commit to emphasize that "log"
directive allows to provide a log target which may directly be a server
address but may also be a log transport facility such as rings. Thus we
use the term "target" instead of "address" to make it more generic.

MEDIUM: log: introduce log target

log targets were immediately embedded in logger struct (previously
named logsrv) and could not be used outside of this context.

In this patch, we're introducing log_target type with the associated
helper functions so that it becomes possible to declare and use log
targets outside of loggers scope.

MEDIUM: tree-wide: logsrv struct becomes logger

When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.

But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.

Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.

So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.

So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).

This is internal rewording, so no functional change should be expected
on user-side.

BUG/MEDIUM: quic-conn: free unsent frames on retransmit to prevent crash

Since the following patch :
  commit 33c49cec987c1dcd42d216c6d075fb8260058b16
  MINOR: quic: Make qc_dgrams_retransmit() return a status.
retransmission process is interrupted as soon as a fatal send error has
been encounted. However, this may leave frames in local list. This cause
several issues : a memory leak and a potential crash.

The crash happens because leaked frames are duplicated of an origin
frame via qc_dup_pkt_frms(). If an ACK arrives later for the origin
frame, all duplicated frames are also freed. During qc_frm_free(),
LIST_DEL_INIT() operation is invalid as it still references the local
list used inside qc_dgrams_retransmit().

This bug was reproduced using the following injection from another
machine :
  $ h2load --npn-list h3 -t 8 -c 10000 -m 1 -n 2000000000 \
      https://<host>:<port>/?s=4m

Haproxy was compiled using ASAN. The crash resulted in the following
trace :
==332748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff82bf9d78 at pc 0x556facd3b95a bp 0x7fff82bf8b20 sp 0x7fff82bf8b10
WRITE of size 8 at 0x7fff82bf9d78 thread T0
    #0 0x556facd3b959 in qc_frm_free include/haproxy/quic_frame.h:273
    #1 0x556facd59501 in qc_release_frm src/quic_conn.c:1724
    #2 0x556facd5a07f in quic_stream_try_to_consume src/quic_conn.c:1803
    #3 0x556facd5abe9 in qc_treat_acked_tx_frm src/quic_conn.c:1866
    #4 0x556facd5b3d8 in qc_ackrng_pkts src/quic_conn.c:1928
    #5 0x556facd60187 in qc_parse_ack_frm src/quic_conn.c:2354
    #6 0x556facd693a1 in qc_parse_pkt_frms src/quic_conn.c:3203
    #7 0x556facd7531a in qc_treat_rx_pkts src/quic_conn.c:4606
    #8 0x556facd7a528 in quic_conn_app_io_cb src/quic_conn.c:5059
    #9 0x556fad3284be in run_tasks_from_lists src/task.c:596
    #10 0x556fad32a3fa in process_runnable_tasks src/task.c:876
    #11 0x556fad24a676 in run_poll_loop src/haproxy.c:2968
    #12 0x556fad24b510 in run_thread_poll_loop src/haproxy.c:3167
    #13 0x556fad24e7ff in main src/haproxy.c:3857
    #14 0x7fae30ddd0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #15 0x556facc9375d in _start (/opt/haproxy-quic-2.8/haproxy+0x1ea75d)

Address 0x7fff82bf9d78 is located in stack of thread T0 at offset 40 in frame
    #0 0x556facd74ede in qc_treat_rx_pkts src/quic_conn.c:4580

This must be backported up to 2.7.

BUG/MINOR: mux-quic: fix free on qcs-new fail alloc

qcs_new() allocates several elements in intermediary steps. All elements
must first be properly initialized to be able to free qcs instance in
case of an intermediary failure.

Previously, qc_stream_desc allocation was done in the middle of
qcs_new() before some elements initializations. In case this fails, a
crash can happened as some elements are left uninitialized.

To fix this, move qc_stream_desc allocation at the end of qcs_new().
This ensures that all qcs elements are initialized first.

This should be backported up to 2.6.

BUG/MINOR: quic: fix free on quic-conn fail alloc

qc_new_conn() allocates several elements in intermediary steps. If one
of the fails, a global free is done on the quic_conn and its elements.
This requires that most elements are first initialized to NULL or
equivalent to ensure freeing operation is done only on proper values.

Once of this element is qc.tx.cc_buf_area. It was initialized too late
which could caused crashes. This is introduced by
9f7cfb0a56352188854bdaef9617ca836c2a30c9
MEDIUM: quic: Allow the quic_conn memory to be asap released.

No need to backport.

BUG/MINOR: quic: fix qc.cids access on quic-conn fail alloc

CIDs tree is now allocated dynamically since the following commit :
276697438d50456f92487c990f20c4d726dfdb96
MINOR: quic: Use a pool for the connection ID tree.

This can caused a crash if qc_new_conn() is interrupted due to an
intermediary failed allocation. When freeing all connection members,
free_quic_conn_cids() is used. However, this function does not support a
NULL cids.

To fix this, simply check that cids is NULL during free_quic_conn_cids()
prologue.

This bug was reproduced using -dMfail.

No need to backport.

BUG/MAJOR: connection: make sure to always remove a connection from the tree

Since commit 5afcb686b ("MAJOR: connection: purge idle conn by last usage")
in 2.9-dev4, the test on conn->toremove_list added to conn_get_idle_flag()
in 2.8 by commit 3a7b539b1 ("BUG/MEDIUM: connection: Preserve flags when a
conn is removed from an idle list") becomes misleading. Indeed, now both
toremove_list and idle_list are shared by a union since the presence in
these lists is mutually exclusive. However, in conn_get_idle_flag() we
check for the presence in the toremove_list to decide whether or not to
delete the connection from the tree. This test now fails because instead
it sees the presence in the idle or safe list via the union, and concludes
the element must not be removed. Thus the element remains in the tree and
can be found later after the connection is released, causing crashes that
Tristan reported in issue #2292.

The following config is sufficient to reproduce it with 2 threads:

   defaults
        mode http
        timeout client 5s
        timeout server 5s
        timeout connect 1s

   listen front
        bind :8001
        server next 127.0.0.1:8002

   frontend next
        bind :8002
        timeout http-keep-alive 1
        http-request redirect location /

Sending traffic with a few concurrent connections and some short timeouts
suffices to instantly crash it after ~10k reqs:

   $ h2load -t 4 -c 16 -n 10000 -m 1 -w 1 http://0:8001/

With Amaury we analyzed the conditions in which the function is called
in order to figure a better condition for the test and concluded that
->toremove_list is never filled there so we can safely remove that part
from the test and just move the flag retrieval back to what it was prior
to the 2.8 patch above. Note that the patch is not reverted though, as
the parts that would drop the unexpected flags removal are unchanged.

This patch must NOT be backported. The code in 2.8 works correctly, it's
only the change in 2.9 that makes it misbehave.

CLEANUP: connection: drop an uneeded leftover cast

In conn_delete_from_tree() there remains a cast of the toremove_list
to struct list while the introduction of the union precisely was to
avoid this cast. It's a leftover from the first version of patch
5afcb686b ("MAJOR: connection: purge idle conn by last usage") merged
into in 2.9-dev4, let's fix that.

No backport is needed.

BUG/MINOR: h3: strengthen host/authority header parsing

HTTP/3 specification has several requirement when parsing authority or
host header inside a request. However, it was until then only partially
implemented.

This commit fixes this by ensuring the following :
* reject an empty authority/host header
* reject a host header if an authority was found with a different value
* no authority neither host header present

This must be backported up to 2.6.