BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in RSA+ECDSA configuration
SSL_get_ciphers() in AWS-LC seems to lack the TLSv1.3 ciphersuites,
which break the ECDSA key selection when doing TLSv1.3.
An issue was opened https://github.com/aws/aws-lc/issues/1638
Indeed, in ssl_sock_switchctx_cbk(), the sigalgs is used to determine if
ECDSA is doable or not, then the function compares the list of ciphers in
the clienthello with the list of configured ciphers.
The fix solves the issue by never skipping the TLSv1.3 ciphersuites,
even if they are not in SSL_get_ciphers().
REGTESTS: ssl: fix some regtests 'feature cmd' start condition
Since patch fde517b ("REGTESTS: wolfssl: temporarly disable some failing
reg-tests") some 'feature cmd' lines have an extra quotation mark, so
they were disable in every cases.
DEBUG: hlua: distinguish burst timeout errors from exec timeout errors
hlua burst timeout was introduced in 58e36e5b1 ("MEDIUM: hlua: introduce
tune.lua.burst-timeout").
It is a safety measure that allows to detect when too much time is spent
on a single lua execution (between 2 interruptions/yields), meaning that
the current thread is not able to perform other tasks. Such scenario
should be avoided because it will cause thread contention which may have
negative performance impact and could cause the watchdog to trigger. When
the burst timeout is exceeded, the current Lua execution is aborted and a
timeout error is reported to the user.
Unfortunately, the same error is currently being reported for cumulative
(AKA execution) timeout and for burst timeout, which may be confusing to
the user.
Indeed, "execution timeout" error historically results from the current
hlua context exceeding the total (cumulative) time it's allowed to run.
It is set per lua context using the dedicated tunables:
- tune.lua.session-timeout
- tune.lua.task-timeout
- tune.lua.service-timeout
We've already faced an user report where the user was able to trigger the
burst timeout and got "Lua task: execution timeout." error while the user
didn't set cumulative timeout. Thus the error was actually confusing
because it was indeed the burst timeout which was causing it due to the
use of cpu-intensive call from within the task without sufficient manual
"yield" keypoints around the cpu-intensive call to ensure it runs on a
dedicated scheduler cycle.
In this patch we make it so burst timeout related errors are reported as
"burst timeout" errors instead of "execution timeout" errors (which
in fact became the generic timeout errors catchall with 58e36e5b1).
To do this, hlua_timer_check() now returns a different value depending if
the exeeded timeout is the burst one or the cumulative one, which allows
us to return either HLUA_E_ETMOUT or HLUA_E_BTMOUT in hlua_ctx_resume().
It should improve the situation described in GH #2356 and may possibly be
backported with 58e36e5b1 to improve error reporting if it applies without
resistance.
In 12d08cf912 ("BUG/MEDIUM: log: don't ignore disabled node's options"),
while trying to restore historical node option inheritance behavior, I
broke the '+bin' logformat node option recently introduced in b7c3d8c87c
("MINOR: log: add +bin logformat node option").
Indeed, because of 12d08cf912, LOG_OPT_BIN is not set anymore on
individual nodes even if it was set globally, making the feature unusable.
('+bin' is also used for binary cbor encoding)
What I should have done instead is include LOG_OPT_BIN in the options
inherited from global ones. This is what's being done in this commit.
Misleading comment was adjusted.
Released version 3.1-dev1 with the following main changes :
- REGTESTS: Remove REQUIRE_VERSION=2.1 from all tests
- REGTESTS: Remove REQUIRE_VERSION=2.2 from all tests
- CI: use "--no-install-recommends" for apt-get
- CI: switch to lua 5.4
- CI: use USE_PCRE2 instead of USE_PCRE
- DOC: replace the README by a markdown version
- CI: VTest: accelerate package install a bit
- ADMIN: acme.sh: remove the old acme.sh code
- BUG/MINOR: cfgparse: remove the correct option on httpcheck send-state warning
- BUG/MINOR: tcpcheck: report correct error in tcp-check rule parser
- BUG/MINOR: tools: fix possible null-deref in env_expand() on out-of-memory
- DOC: configuration: add an example for keywords from crt-store
- CI: speedup apt package install
- DOC: add the FreeBSD status badge to README.md
- DOC: change the link to the FreeBSD CI in README.md
- MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table()
- BUG/MINOR: hlua: use CertCache.set() from various hlua contexts
- CLEANUP: hlua: fix CertCache class comment
- CI: FreeBSD: upgrade image, packages
- BUG/MEDIUM: h1-htx: Don't state interim responses are bodyless
- MEDIUM: stconn: Be able to unblock zero-copy data forwarding from done_fastfwd
- BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released
- BUG/MINOR: quic: prevent crash on qc_kill_conn()
- CLEANUP: hlua: use hlua_pusherror() where relevant
- BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP
- BUG/MINOR: hlua: fix unsafe hlua_pusherror() usage
- BUG/MINOR: hlua: prevent LJMP in hlua_traceback()
- CLEANUP: hlua: get rid of hlua_traceback() security checks
- BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path
- CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume()
- BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
- MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
- BUG/MEDIUM: ssl: wrong priority whem limiting ECDSA ciphers in ECDSA+RSA configuration
- BUG/MEDIUM: ssl: bad auth selection with TLS1.2 and WolfSSL
- BUG/MINOR: quic: fix computed length of emitted STREAM frames
- BUG/MINOR: quic: ensure Tx buf is always purged
- BUG/MEDIUM: stconn/mux-h1: Fix suspect change causing timeouts
- BUG/MAJOR: mux-h1: Properly copy chunked input data during zero-copy nego
- BUG/MINOR: mux-h1: Use the right variable to set NEGO_FF_FL_EXACT_SIZE flag
- DOC: install: remove boringssl from the list of supported libraries
- MINOR: log: fix "http-send-name-header" ignore warning message
- BUG/MINOR: proxy: fix server_id_hdr_name leak on deinit()
- BUG/MINOR: proxy: fix log_tag leak on deinit()
- BUG/MINOR: proxy: fix email-alert leak on deinit()
- BUG/MINOR: proxy: fix check_{command,path} leak on deinit()
- BUG/MINOR: proxy: fix dyncookie_key leak on deinit()
- BUG/MINOR: proxy: fix source interface and usesrc leaks on deinit()
- BUG/MINOR: proxy: fix header_unique_id leak on deinit()
- MINOR: proxy: add proxy_free_common() helper function
- BUG/MEDIUM: proxy: fix UAF with {tcp,http}checks logformat expressions
- MINOR: log: change wording in lf_expr_postcheck() error message
- BUG/MEDIUM: log: fix lf_expr_postcheck() behavior with default section
- CLEANUP: log/proxy: fix comment in proxy_free_common()
- DOC: config: move "hash-key" from proxy to server options
- DOC: config: add missing section hint for "guid" proxy keyword
- DOC: config: add missing context hint for new server and proxy keywords
- BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
- DOC: internals: add a documentation about the master worker
- BUG/MAJOR: mux-h1: Prevent any UAF on H1 connection after draining a request
- BUG/MINOR: quic: fix padding of INITIAL packets
- OPTIM: quic: fill whole Tx buffer if needed
- MINOR: quic: refactor qc_build_pkt() error handling
- MINOR: quic: use global datagram headlen definition
- MINOR: quic: refactor qc_prep_pkts() loop
- DOC/MINOR: management: add missed -dR and -dv options
- DOC/MINOR: management: add -dZ option
- DOC: management: rename show stats domain cli "dns" to "resolvers"
- REORG: log: reorder send log helpers by dependency order
- MINOR: session: expose session_embryonic_build_legacy_err() function
- MEDIUM: log/session: handle embryonic session log within sess_log()
- MINOR: log: provide sending log context to process_send_log() when available
- MINOR: log: add log_orig_to_str() function
- MINOR: log: provide log origin in logformat expressions using '%OG'
- CLEANUP: log: remove ambiguous legacy comment for resolve_logger()
- MINOR: log/backend: always free parsing hints in resolve_logger()
- MINOR: log: make resolve_logger() static
- MINOR: log: provide proxy context to resolve_logger()
- MINOR: log: add __send_log_set_metadata_sd helper
- MINOR: log: add logger flags
- MINOR: log: add log-profile parsing logic
- MINOR: log: add log profile buildlines
- MEDIUM: log: handle log-profile in process_send_log()
- DOC: config: add documentation for log profiles
- REGTESTS: log: add a test for log-profile
- MINOR: ssl: add ssl_sock_bind_verifycbk() in ssl_sock.h
- REORG: ssl: move the SNI selection code in ssl_clienthello.c
- BUILD: ssl: fix build with wolfSSL
- CI: github: upgrade aws-lc to 1.29.0
- Revert "CI: github: upgrade aws-lc to 1.29.0"
- MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC
- BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0
- MINOR: ssl: relax the 'ssl.default-dh-param' keyword parsing
- CI: github: upgrade aws-lc to 1.29.0
- DOC: INSTALL: minimum AWS-LC version is v1.22.0
- CI: github: do the AWS-LC weekly build with ERR=1
The weekly CI that tries new version of AWS-LC was not building with
ERR=1, which let us think that everything was good but there was in fact
new warning that we missed.
Add ERR=1 to the build so the CI will failed for any new warning.
MINOR: ssl: relax the 'ssl.default-dh-param' keyword parsing
Some libraries are ignoring SSL_CTX_set_tmp_dh_callback(), but disabling
the 'ssl.default-dh-param' keyword when the function is not supported would
result in an error instead of silently continuing. This patch emits a
warning when the keyword is not supported instead of a loading failure.
BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0
AWS-LC have a lot of functions that does nothing, which are now
deprecated and emits some warning.
This patch disables the following useless functions that emits a warning:
SSL_CTX_get_security_level(), SSL_CTX_set_tmp_dh_callback(),
ERR_load_SSL_strings(), RAND_keep_random_devices_open()
MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC
AWS-LC does not support the SSL_CTX_set_client_hello_cb() function from
OpenSSL which allows to analyze ciphers and signatures algorithm of the
ClientHello. However it supports the SSL_CTX_set_select_certificate_cb()
which allows the same thing but was the implementation from the
boringSSL side.
This patch uses the SSL_CTX_set_select_certificate_cb() as well as the
SSL_early_callback_ctx_extension_get() function to get the signature
algorithms.
This was successfully tested with openssl s_client as well as
testssl.sh.
This should allow to enable more reg-tests that depend on certificate
selection.
REORG: ssl: move the SNI selection code in ssl_clienthello.c
Move the code which is used to select the final certificate with the
clienthello callback. ssl_sock_client_sni_pool need to be exposed from
outside ssl_sock.c
Try to cover some common use-cases for "log-profile" feature. The tests
mainly focus on log-profile section declaration, and testing the behavior
of logformat / log-tag overriding capabilities.
For now, the use of log-profiles is somewhat limited because we lack
the ability to explicitly trigger the log building process at specific
steps during the stream handling. Indeed, for now we rely on
"option logasap" and proxy log-format string content "hacks" to force
the log emission at some specific steps, thus more tests should be added
over the time, when new mechanisms allowing the emission of logs at
expected processing steps will be added, or if new keywords are added to
the log-profile section.
Now that log-profile parsing logic has been implemented in "MINOR: log:
add log-profile parsing logic" and is actually effective since "MEDIUM:
log: handle log-profile in process_send_log()", let's document the feature
and add some examples.
Log-profile section is declared like this:
log-profile myprof
log-tag "custom-tag"
on error format "%ci: error"
on any format "(custom httplog) ${HAPROXY_HTTP_LOG_FMT}" sd "[exampleSDID@1234 step=\"accept\" id=\"%ID\"]"
(check out the documentation for the full list of options, some options
are only relevant under specific contexts)
And used this way (from usual "log" directive lines):
global
log stdout format rfc5424 profile myprof local0
--------------
For now, the use of log-profiles is somewhat limited because we lack
the ability to explicitly trigger the log building process at specific
steps during the stream handling, but it should gain more traction over
the time as the feature evolves and new mechanisms allowing the emission
of logs at expected processing steps will be added.
MEDIUM: log: handle log-profile in process_send_log()
In previous commit we implemented log-profile parsing logic. Now let's
actually make use of available log-profile information from logger struct
to decide whether we need to rebuild the logline under process_send_log()
according to log profile settings. Nothing is done if the logger didn't
specify a log-profile.
Now that we have log-profile parsing done, let's prepare for runtime
log-profile handling by adding the necessary string buffer required to
re-build log strings using sess_build_logline() on the fly without
altering regular loglines content.
Indeed, since a different log-profile may (or may not) be specified for
each logger, we must keep the original string and only rebuild a custom
one when required for the current logger (according to the selected log-
profile).
This patch implements prerequisite log-profile struct and parser logic.
It has no effect during runtime for now.
Logformat expressions provided in log-profile "steps" are postchecked
during postparsing for each proxy "log" directive that makes use of a
given profile. (this allows to ensure that the logformat expressions
used in the profile are compatible with proxy using them)
Logger struct may benefit from having a "flags" struct member to set
or remove different logger states. For that, we reuse an existing
4 bytes hole in the logger struct to store a 2 bytes flags integer,
leaving the struct with a 2-bytes hole now.
Extract sd metadata assignment in __send_log() to make an inline helper
function out of it in order to be able to use it from other functions if
needed.
MINOR: log: provide proxy context to resolve_logger()
Prerequisite work for log-profiles, we need to know under which proxy
context the logger is being used. When the info is not available, (ie:
global section or log-forward section, <px> is set to NULL)
MINOR: log/backend: always free parsing hints in resolve_logger()
Since resolve_logger() always resolves logger target (even when error
occurs), we must take care of freeing parsing hints because free_logger()
won't try to do it if target RESOLVED flag is set on the target.
This isn't considered as a bug because resolve_logger(), being a
postparsing check, will make haproxy immediately exit upon fatal error
in haproxy.c, but it's better to ensure that everything will be properly
freed if we decide to perform a clean exit upon postparsing checks error
in the future.
CLEANUP: log: remove ambiguous legacy comment for resolve_logger()
It is no longer relevant to say that <logger> is used for implicit
settings. In fact the function resolves <logger>, but currently
mainly focuses on loggers's target. However we could extend the
function to perform additional work on the logger itself in the future.
let's adjust the comment to prevent any confusion.
MINOR: log: provide log origin in logformat expressions using '%OG'
'%OG' logformat alias may be used to report the log origin (when/where)
that triggered log generation using sess_build_logline().
Possible values are:
- "sess_error": log was generated during session error handling
- "sess_killed": log was generated during session abortion (killed
embryonic session)
- "txn_accept": log was generated right after frontend conn was accepted
- "txn_request": log was generated after client request was received
- "txn_connect": log was generated after backend connection establishment
- "txn_response": log was generated during server response handling
- "txn_close": log was generated at the final txn step, before closing
- "unspec": unknown or not specified
MINOR: log: provide sending log context to process_send_log() when available
This is another prerequisite work in preparation for log-profiles: in this
patch we make process_send_log() aware of the log origin, primarily aiming
for sess and txn logging steps such as error, accept, connect, close, as
well as relevant sess and stream pointers.
MEDIUM: log/session: handle embryonic session log within sess_log()
Move the embryonic session logging logic down to sess_log() in preparation
for log-profiles because then log preferences will be set per logger and
not per proxy. Indeed, as each logger may come with its own log-profile
that possibly overrides proxy logformat preferences, the check will need
to be performed at a central place by lower sending functions.
To ensure the change doesn't break existing behavior, a dedicated
sess_log_embryonic() wrapper was added and is exclusively used by
session_kill_embryonic() to indicate that a special logging logic must
be performed under sess_log().
Also, thanks to this change, log-format-sd will now be taken into account
for legacy embryonic session logging.
MINOR: session: expose session_embryonic_build_legacy_err() function
rename session_build_err_string() to session_embryonic_build_legacy_err()
and add new <out> buffer argument to the prototype. <out> will be used as
destination for the generated string instead of implicitly relying on the
trash buffer. Finally, expose the new function through the header file so
that it becomes usable from any source file.
The function is expected to be called with a session originating from
a connection and should not be used for applets.
DOC: management: rename show stats domain cli "dns" to "resolvers"
In commit f8642ee82 ("MEDIUM: resolvers: rename dns extra counters to
resolvers extra counters"), we renamed "dns" counters to "resolvers", but
we forgot to update the documentation accordingly.
Amaury Denoyelle [Thu, 30 May 2024 12:53:06 +0000 (14:53 +0200)]
MINOR: quic: refactor qc_prep_pkts() loop
qc_prep_pkts() is built around a double loop iteration. First, it
iterates over every QEL instance register on sending. The inner loop is
used to repeatdly called qc_build_pkt() with a QEL instance. If the QEL
instance has no more data to sent, the next QEL entry is selected. It
can also be interrupted earlier if there is not enough room on the sent
buffer.
Clarify the inner loop by using qc_may_build_pkt() directly into it
besides the check on buffer room left. This function is used to test if
the QEL instance has something to send.
This should simplify send evolution, in particular GSO implementation.
Amaury Denoyelle [Tue, 28 May 2024 16:28:41 +0000 (18:28 +0200)]
MINOR: quic: use global datagram headlen definition
Each emitted QUIC datagram is prefixed by an out-of-band header. This
header specify the datagram length and the pointer to the first QUIC
packet instance. This header length is defined via QUIC_DGRAM_HEADLEN.
Replace every occurences of manually calculated header length with
globally defined QUIC_DGRAM_HEADLEN. This should ease code maintenance
and simplify GSO implementation.
qc_build_pkt() error handling was difficult due to multiple error code
possible. Improve this by defining a proper enum to describe the various
error code. Also clean up ending labels inside qc_build_pkt().
Previously, packets encoding was stopped as soon as buffer room left is
less than UDP MTU. This is suboptimal if the next packet would be
smaller than that.
To improve this, only check if there is at least enough room for the
mandatory packet header. qc_build_pkt() would ensure there is thus
responsible to return QC_BUILD_PKT_ERR_BUFROOM as soon as buffer left is
insufficient to stop packets encoding. An extra check is added to ensure
end pointer would never exceed buffer end.
This should not have any significant impact on the performance. However,
this renders the code intention clearer.
Amaury Denoyelle [Thu, 30 May 2024 16:06:27 +0000 (18:06 +0200)]
BUG/MINOR: quic: fix padding of INITIAL packets
API for sending has been extended to support emission on more than 2 QEL
instances. However, this has rendered the PADDING emission for INITIAL
packets less previsible. Indeed, if qc_send() is used with empty QEL
instances, a padding frame may be generated before handling the last QEL
registered, which could cause unnecessary padding to be emitted.
This commit simplify PADDING by only activating it for the last QEL
registered. This ensures that no superfluous padding is generated as if
the minimal INITIAL datagram length is reached, padding is resetted
before handling last QEL instance.
This bug is labelled as minor as haproxy already emit big enough INITIAL
packets coalesced with HANDSHAKE one without needing padding. This
however render the padding code difficult to test. Thus, it may be
useful to force emission on INITIAL qel only without coalescing
HANDSHAKE packet. Here is a sample to reproduce it :
BUG/MAJOR: mux-h1: Prevent any UAF on H1 connection after draining a request
Since 2.9, it is possible to drain the request payload from the H1
multiplexer in case of early reply. When this happens, the upper stream is
detached but the H1 stream is not destroyed. Once the whole request is
drained, the end of the detach stage is finished. So the H1 stream is
destroyed and the H1 connection is ready to be reused, if possible,
otherwise it is released.
And here is the issue. If some data of the next request are received with
last bytes of the drained one, parsing of the next request is immediately
started. The previous H1 stream is destroyed and a new one is created to
handle the parsing. At this stage the H1 connection may be released, for
instance because of a parsing error. This case was not properly handled.
Instead of immediately exiting the mux, it was still possible to access the
released H1 connection to refresh its timeouts, leading to a UAF issue.
Many thanks to Annika for her invaluable help on this issue.
The patch should fix the issue #2602. It must be backported as far as 2.9.
DOC: internals: add a documentation about the master worker
Add a documentation about the history of the master-worker and how it
was implemented in its first version and how it is currently working.
This is a global view of the architecture, and not an exhaustive
explanation of all mechanisms.
BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
By default, there is always at least on resolver section, the default one,
based on "/etc/resolv.conf" content. However, it is possible to have no
resolver at all if the file is empty or if any error occurred. Errors are
silently ignored at this stage.
In that case, there was a bug in the Prometheus exporter leading to a crash
because the resolver section list is empty. An invalid resolver entity was
used. To fix the issue we must only take care to not dump resolvers metrics
when there is no resolver.
Thanks to Aurelien to have spotted the offending commit.
This patch should fix the issue #2604. It must be backported to 3.0.
DOC: config: add missing context hint for new server and proxy keywords
To stay consistent with the work started in 54627f991 ("DOC: config: add
context hint for proxy keywords") and 3d4e1e682 ("DOC: config: add context
hint for server keywords"), we add missing context hint for "guid" (both
proxy and server) keyword and "hash-key" server keyword that were added
during 3.0 development.
DOC: config: add missing section hint for "guid" proxy keyword
"guid" proxy keyword added in da754b45 ("MINOR: proxy: implement GUID
support") was lacking the section hint in the keyword description, let's
fix that.
DOC: config: move "hash-key" from proxy to server options
As reported by Ashley Morris, "hash-key" keyword which was introduced in
commit faa8c3e0 ("MEDIUM: lb-chash: Deterministic node hashes based on
server address") doesn't belong to proxy keywords and should be found in
5.2 "Server and default-server options" instead.
BUG/MEDIUM: log: fix lf_expr_postcheck() behavior with default section
Since 7a21c3a4ef ("MAJOR: log: implement proper postparsing for logformat
expressions"), logformat expressions stored in a default section are not
postchecked anymore. This is because the REGISTER_POST_PROXY_CHECK() only
evaluates regular proxies. Because of this, proxy options which are
automatically enabled on the proxy depending on the logformat expression
features in use are not set on the default proxy, which means such options
are not passed to the regular proxies that inherit from it (proxies that
and will actually be running the logformat expression during runtime).
Because of that, a logformat expression stored inside a default section
and executed by a regular proxy may not behave properly. Also, since 03ca16f38b ("OPTIM: log: resolve logformat options during postparsing"),
it's even worse because logformat node options postresoving is also
skipped, which may also alter logformat expression encoding feature.
To fix the issue, let's add a special case for default proxies in
parse_logformat_string() and lf_expr_postcheck() so that default proxies
are postchecked on the fly during parsing time in a "relaxed" way as we
cannot assume that the features involved in the logformat expression won't
be compatible with the proxy actually running it since we may have
different types of proxies inheriting from the same default section.
This bug was discovered while trying to address GH #2597.
MINOR: log: change wording in lf_expr_postcheck() error message
logformat_node was referenced as "node" in the error message reported
to the user, but in fact it is referred to as "item" in user
documentation. Using "item" in the error message to better comply with
the doc.
Error message was introduced with 7a21c3a4ef ("MAJOR: log: implement
proper postparsing for logformat expressions")
BUG/MEDIUM: proxy: fix UAF with {tcp,http}checks logformat expressions
When parsing a logformat expression using parse_logformat_string(), the
caller passes the proxy under which the expression is found as argument.
This information allows the logformat expression API to check if the
expression is compatible with the proxy settings.
Since 7a21c3a ("MAJOR: log: implement proper postparsing for logformat
expressions"), the proxy compatibilty checks are postponed after the proxy
is fully parsed to ensure proxy properties are fully resolved for checks
consistency.
The way it works, is that each time parse_logformat_string() is called for
a given expression and proxy, it schedules the expression for postchecking
by appending the expression to the list of pending expression checks on
the proxy (lf_checks struct). Then, when the proxy is called with the
REGISTER_POST_PROXY_CHECK() hook, it iterates over unchecked expressions
and performs the check, then it removes the expression from its list.
However, I overlooked a special case: if a logformat expression is used
on a proxy that is disabled or a default proxy:
REGISTER_POST_PROXY_CHECK() hook is never called. Because of that, lf
expressions may still point to the proxy after the proxy is freed.
For most logformat expressions, this isn't an issue because they are
stored within the proxy itself, but this isn't the case with
{tcp,http}checks logformat expressions: during deinit() sequence, all
proxies are first cleaned up, and only then shared checks are freed.
Because of that, the below config will trigger UAF since 7a21c3a:
uaf.conf:
listen dummy
bind localhost:2222
backend testback
disabled
mode http
option httpchk
http-check send hdr test "test"
http-check expect status 200
haproxy -f uaf.conf -c:
==152096== Invalid write of size 8
==152096== at 0x21C317: lf_expr_deinit (log.c:3491)
==152096== by 0x2334A3: free_tcpcheck_http_hdr (tcpcheck.c:84)
==152096== by 0x2334A3: free_tcpcheck_http_hdr (tcpcheck.c:79)
==152096== by 0x2334A3: free_tcpcheck_http_hdrs (tcpcheck.c:98)
==152096== by 0x23365A: free_tcpcheck.part.0 (tcpcheck.c:130)
==152096== by 0x2338B1: free_tcpcheck (tcpcheck.c:108)
==152096== by 0x2338B1: deinit_tcpchecks (tcpcheck.c:3780)
==152096== by 0x2CF9A4: deinit (haproxy.c:2949)
==152096== by 0x2D0065: deinit_and_exit (haproxy.c:3052)
==152096== by 0x169BC0: main (haproxy.c:3996)
==152096== Address 0x52a8df8 is 6,968 bytes inside a block of size 7,168 free'd
==152096== at 0x484B27F: free (vg_replace_malloc.c:872)
==152096== by 0x2CF8AD: deinit (haproxy.c:2906)
==152096== by 0x2D0065: deinit_and_exit (haproxy.c:3052)
==152096== by 0x169BC0: main (haproxy.c:3996)
To fix the issue, let's ensure in proxy_free_common() that no unchecked
expressions may still point to the proxy after the proxy is freed by
purging the list (DEL_INIT is used to reset list items).
Special thanks to GH user @mhameed who filed a comprehensive issue with
all the relevant information required to reproduce the bug (see GH #2597),
after having first reported the issue on the alpine project bug tracker.
MINOR: proxy: add proxy_free_common() helper function
As shown by previous patch series, having to free some common proxy
struct members twice (in free_proxy() and proxy_free_defaults()) is
error-prone: we often overlook one of the two free locations when
adding new features.
To prevent such bugs from being introduced in the future, and also avoid
code duplication, we now have a proxy_free_common() function to free all
proxy struct members that are common to all proxy types (either regular or
default ones).
This should greatly improve code maintenance related to proxy freeing
logic.
BUG/MINOR: proxy: fix header_unique_id leak on deinit()
proxy header_unique_id wasn't cleaned up in proxy_free_defaults(),
resulting in small memory leak if "unique-id-header" was used on a
default proxy section.
BUG/MINOR: proxy: fix source interface and usesrc leaks on deinit()
proxy conn_src.iface_name was only freed in proxy_free_defaults(), whereas
proxy conn_src.bind_hdr_name was only freed in free_proxy().
Because of that, using "source usesrc hdr_ip()" in a default proxy, or
"source interface" in a regular or default proxy would cause memory leaks
during deinit.
BUG/MINOR: proxy: fix check_{command,path} leak on deinit()
proxy check_{command,path} members (used for "external-check" feature)
weren't cleaned up in free_proxy(), resulting in small memory leak if
"external-check command" or "external-check path" were used on a regular
or default proxy.
BUG/MINOR: proxy: fix email-alert leak on deinit()
proxy email-alert settings weren't cleaned up in free_proxy(), resulting
in small memory leak if "email-alert to" or "email-alert from" were used
on a regular or default proxy.
BUG/MINOR: proxy: fix server_id_hdr_name leak on deinit()
proxy server_id_hdr_name member (used for "http-send-name-header" option)
wasn't cleaned up in free_proxy(), resulting in small memory leak if
"http-send-name-header" was used on a regular or default proxy.
Warning message to indicate that the "http-send-name-header" option is
ignored for backend in "mode log" was referenced using its internal
struct wording instead of public name (as seen in the documentation).
Let's fix that.
It may be backported with c7783fb ("MINOR: log/backend: prevent
"http-send-name-header" use with LOG mode") in 2.9.
BUG/MINOR: mux-h1: Use the right variable to set NEGO_FF_FL_EXACT_SIZE flag
Instead of setting this flag on the ones used for the zero-copy negociation,
it is set on the connection flags used for xprt->rcv_buf()
call. Fortunately, there is no real consequence. The only visible effect is
the chunk size that is written on 8 bytes for no reason.
This patch is related to issue #2598. It must be backported to 3.0.
BUG/MAJOR: mux-h1: Properly copy chunked input data during zero-copy nego
When data are transfered via zero-copy data forwarding, if some data were
already received, we try to immediately tranfer it during the negociation
step. If data are chunked and the chunk size is unknown, 10 bytes are reserved
to write the chunk size during the done step. However, when input data are
finally transferred, the offset is ignored. Data are copied into the output
buffer. But the first 10 bytes are then crushed by the chunk size. Thus the
chunk is truncated leading to a malformed message.
This patch should fix the issue #2598. It must be backported to 3.0.
This fixes an issue I've had where if a connection was idle for ~23s
it would get in a bad state. I don't understand this code, so I'm
not sure exactly why it was failing.
I discovered this by bisecting to identify the commit that caused the
regression between 2.9 and 3.0. The commit is d2c3f8dde7c2474616c0ea51234e6ba9433a4bc1: "MINOR: stconn/connection:
Move shut modes at the SE descriptor level" - a part of v3.0-dev8.
It seems to be an innocent renaming, so I looked through it and this
stood out as suspect:
- if (mode != CO_SHW_NORMAL)
+ if (mode & SE_SHW_NORMAL)
It looks like the not went missing here, so this patch reverses that
condition. It fixes my test.
I don't quite understand what this is doing or is for so I can't write
a regression test or decent commit message. Hopefully someone else
will be able to pick this up from where I've left it.
[CF: This inverts the condition to perform clean shutdowns. This means no
clean shutdown are performed when it should do. This patch must be
backported to 3.0]
Amaury Denoyelle [Fri, 31 May 2024 07:42:13 +0000 (09:42 +0200)]
BUG/MINOR: quic: ensure Tx buf is always purged
quic_conn API for sending was recently refactored. The main objective
was to regroup the different functions present for both handshake and
application emission.
After this refactoring, an optimization was introduced to avoid calling
qc_send() if there was nothing new to emit. However, this prevent the Tx
buffer to be purged if previous sending was interrupted, until new
frames are finally available.
To fix this, simply remove the optimization. qc_send() is thus now
always called in quic_conn IO handlers.
The impact of this bug should be minimal as it happens only on sending
temporary error. However in this case, this could cause extra latency or
even a complete sending freeze in the worst scenario.
BUG/MINOR: quic: fix computed length of emitted STREAM frames
qc_build_frms() is responsible to encode multiple frames in a single
QUIC packet. It accounts for room left in the buffer packet for each
newly encded frame.
An incorrect computation was performed when encoding a STREAM frame in a
single packet. Frame length was accounted twice which would reduce in
excess the buffer packet room. This caused the remaining built frames to
be reduced with the resulting packet not able to fill the whole MTU.
The impact of this bug should be minimal. It is only present when
multiple frames are encoded in a single packet after a STREAM. However
in this case datagrams built are smaller than expecting, which is
suboptimal for bandwith.
BUG/MEDIUM: ssl: bad auth selection with TLS1.2 and WolfSSL
The ClientHello callback for WolfSSL introduced in haproxy 2.9, seems
not to behave correctly with TLSv1.2.
In TLSv1.2, this is the cipher that is used to chose the authentication algorithm
(ECDSA or RSA), however an SSL client can send a signature algorithm.
In TLSv1.3, the authentication is not part of the ciphersuites, and
is selected using the signature algorithm.
The mistake in the code is that the signature algorithm in TLSv1.2 are
overwritting the auth that was selected using the ciphers.
The ClientHello Callback which is used for certificate selection uses
both the signature algorithms and the ciphers sent by the client.
However, when a client is announcing both ECDSA and RSA capabilities
with ECSDA ciphers that are not available on haproxy side and RSA
ciphers that are compatibles, the ECDSA certificate will still be used
but this will result in a "no shared cipher" error, instead of a
fallback on the RSA certificate.
For example, a client could send
'ECDHE-ECDSA-AES128-CCM:ECDHE-RSA-AES256-SHA and HAProxy could be
configured with only 'ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA'.
This patch fixes the issue by validating that at least one ECDSA cipher
is available on both side before chosing the ECDSA certificate.
MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
It may only happens when there is no data to forward but a last stream frame
must be sent with the FIN bit. It is not invalid, but it is useless to send
an empty H3 DATA frame in that case.
BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
The previous fix (792a645ec2 ["BUG/MEDIUM: mux-quic: Unblock zero-copy
forwarding if the txbuf can be released"]) introduced a regression. The
zero-copy data forwarding must only be unblocked if it was blocked by the
producer, after a successful negotiation.
It is important because during a negotiation, the consumer may be blocked
for another reason. Because of the flow control for instance. In that case,
there is not necessarily a TX buffer. And it unexpected to try to release an
unallocated TX buf.
In addition, the same may happen while a TX buf is still in-use. In that
case, it must also not be released. So testing the TX buffer is not the
right solution.
To fix the issue, a new IOBUF flag was added (IOBUF_FL_FF_WANT_ROOM). It
must be set by the producer if it is blocked after a sucessful negotiation
because it needs more room. In that case, we know a buffer was provided by
the consummer. In done_fastfwd() callback function, it is then possible to
safely unblock the zero-copy data forwarding if this flag is set.
This patch must be backported to 3.0 with the commit above.
CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume()
'lua_insert(lua->T, -lua_gettop(lua->T))' is actually used to rotate the
top value with the bottom one, thus the code was overkill and the comment
was actually misleading, let's fix that by using explicit equivalent form
(absolute index).
It may be backported with 5508db9a2 ("BUG/MINOR: hlua: fix unsafe
lua_tostring() usage with empty stack") to all stable versions to ease
code maintenance.
BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path
in hlua_ckch_commit_yield() and hlua_ckch_set(), when an error occurs,
we enter the error path and try to raise an error from the <err> msg
pointer which must be freed afterwards.
However, the fact that luaL_error() never returns was overlooked, because
of that <err> msg is never freed in such case.
To fix the issue, let's use hlua_pushfstring_safe() helper to push the
err on the lua stack and then free it before throwing the error using
lua_error().
It should be backported up to 2.6 with 30fcca18 ("MINOR: ssl/lua:
CertCache.set() allows to update an SSL certificate file")
CLEANUP: hlua: get rid of hlua_traceback() security checks
Thanks to the previous commit, we may now assume that hlua_traceback()
won't LJMP, so it's safe to use it from unprotected environment without
any precautions.
Function is often used on error paths where no precaution is taken
against LJMP. Since the function is used on error paths (which include
out-of-memory error paths) the function lua_getinfo() could also raise
a memory exception, causing the process to crash or improper error
handling if the caller isn't prepared against that eventually. Since the
function is only used on rare events (error handling) and is lacking the
__LJMP prototype pefix, let's make it safe by protecting the lua_getinfo()
call so that hlua_traceback() callers may use it safely now (the function
will always succeed, output will be truncated in case of error).
BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP
lua_pushfstring() is used in multiple cleanup paths (upon error) to
push the error message that will be raised by lua_error(). However this
is often done from an unprotected environment, or in the middle of a
cleanup sequence, thus we don't want the function to LJMP! (it may cause
various issues ranging from memory leaks to crashing the process..)
Hopefully this has very few chances of happening but since the use of
lua_pushfstring() is limited to error reporting here, it's ok to use our
own hlua_pushfstring_safe() implementation with a little overhead to
ensure that the function will never LJMP.
CLEANUP: hlua: use hlua_pusherror() where relevant
In hlua_map_new(), when error occurs we use a combination of luaL_where,
lua_pushfstring and lua_concat to build the error string before calling
lua_error().
It turns out that we already have the hlua_pusherror() macro which is
exactly made for that purpose so let's use it.
It could be backported to all stable versions to ease code maintenance.
Ensure idle_timer task is allocated in qc_kill_conn() before waking it
up. It can be NULL if idle timer has already fired but MUX layer is
still present, which prevents immediate quic_conn release.
qc_kill_conn() is only used on send() syscall fatal error to notify
upper layer of an error and close the whole connection asap.
This crash occurence is pretty rare as it relies on timing issues. It
happens only if idle timer occurs before the MUX release (a bigger
client timeout is thus required) and any send() syscall detected error.
For now, it was only reproduced using GDB to interrupt haproxy longer
than the idle timeout.
BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released
In done_fastfwd() callback function, if nothing was forwarding while the SD
is blocked, it means there is not enough space in the buffer to proceed. It
may be because there are data to be sent. But it may also be data already
sent waiting for an ack. In this case, no data to be sent by the mux. So the
quic stream is not woken up when data are finally removed from the
buffer. The data forwarding can thus be stuck. This happens when the stats
page is requested in QUIC/H3. Only applets are affected by this issue and
only with the QUIC multiplexer because it is the only mux with already sent
data in the TX buf.
To fix the issue, the idea is to release the txbuf if possible and then
unblock the SD to perform a new zero-copy data forwarding attempt. Doing so,
and thanks to the previous patch ("MEDIUM: applet: Be able to unblock
zero-copy data forwarding from done_fastfwd"), the applet will be woken up.
This patch should fix the issue #2584. It must be backported to 3.0.
MEDIUM: stconn: Be able to unblock zero-copy data forwarding from done_fastfwd
This part is only experienced by applet. When an applet try to forward data
via an iobuf, it may decide to block for any reason even if there is free
space in the buffer. For instance, the stats applet don't procude data if
the buffer is almost full.
However, in this case, it could be good to let the consumer decide a new
attempt is possible because more space was made. So, if IOBUF_FL_FF_BLOCKED
flag is removed by the consumer when done_fastfwd() callback function is
called, the SE_FL_WANT_ROOM flag is removed on the producer sedesc. It is
only done for applets. And thanks to this change, the applet can be woken up
for a new attempt.
This patch is required for a fix on the QUIC multiplexer.
BUG/MEDIUM: h1-htx: Don't state interim responses are bodyless
Interim responses are by definition bodyless. But we must not set the
corresponding HTX start-line flag, beecause the start-line of the final
response is still expected. Setting the flag above too early may lead the
multiplexer on the sending side to consider the message is finished after
the headers of the interim message.
It happens with the H2 multiplexer on frontend side if a "100-Continue" is
received from the server. The interim response is sent and
HTX_SL_F_BODYLESS_RESP flag is evaluated. Then, the headers of the final
response are sent with ES flag, because HTX_SL_F_BODYLESS_RESP flag was seen
too early, leading to a protocol error if the response has a body.
Thanks to grembo for this analysis.
This patch should fix the issue #2587. It must be backported as far as 2.9.
crash.conf:
global
lua-load crash.lua
listen front
bind localhost:9090 ssl crt reg-tests/ssl/set_cafile_client.pem ca-file reg-tests/ssl/set_cafile_interCA1.crt verify none
./haproxy -f crash.conf
[NOTICE] (267993) : haproxy version is 3.0-dev2-640ff6-910
[NOTICE] (267993) : path to executable is ./haproxy
[WARNING] (267993) : config : missing timeouts for proxy 'front'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[1] 267993 segmentation fault (core dumped) ./haproxy -f crash.conf
This is because in hlua_ckch_set/hlua_ckch_commit_yield, we always
consider that we're being called from a yield-capable runtime context.
As such, hlua_gethlua() is never checked for NULL and we systematically
try to wake hlua->task and yield every 10 instances.
In fact, if we're called from the body or init context (that is, during
haproxy startup), hlua_gethlua() will return NULL, and in this case we
shouldn't care about yielding because it is ok to commit all instances
at once since haproxy is still starting up.
Also, when calling CertCache.set() from a non-yield capable runtime
context (such as hlua fetch context), we kept doing as if the yield
succeeded, resulting in unexpected function termination (operation
would be aborted and the CertCache lock wouldn't be released). Instead,
now we explicitly state in the doc that CertCache.set() cannot be used
from a non-yield capable runtime context, and we raise a runtime error
if it is used that way.
These bugs were discovered by reading the code when trying to address
Svace report documented by @Bbulatov GH #2586.
It should be backported up to 2.6 with 30fcca18 ("MINOR: ssl/lua:
CertCache.set() allows to update an SSL certificate file")
MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table()
As reported by @Bbulatov in GH #2586, stktable_data_ptr() return value is
used without checking it isn't NULL first, which may happen if the given
type is invalid or not stored in the table.
However, since date_type is set by table_prepare_data_request() right
before cli_io_handler_table() is invoked, date_type is not expected to
be invalid: table_prepare_data_request() normally checked that the type
is stored inside the table. Thus stktable_data_ptr() should not be failing
at this point, so we add a BUG_ON() to indicate that.
Willy Tarreau [Fri, 31 May 2024 16:52:51 +0000 (18:52 +0200)]
BUG/MINOR: tools: fix possible null-deref in env_expand() on out-of-memory
In GH issue #2586 @Bbulatov reported a theoretical null-deref in
env_expand() in case there's no memory anymore to expand an environment
variable. The function should return NULL in this case so that the only
caller (str2sa_range) sees it. In practice it may only happen during
boot thus is harmless but better fix it since it's easy. This can be
backported to all versions where this applies.
Willy Tarreau [Fri, 31 May 2024 16:37:56 +0000 (18:37 +0200)]
BUG/MINOR: tcpcheck: report correct error in tcp-check rule parser
When parsing tcp-check expect-header, a copy-paste error in the error
message causes the name of the header to be reporetd as the invalid
format string instead of its value. This is really harmless but should
be backported to all versions to help users understand the cause of the
problem when this happens. This was reported in GH issue #2586 by
@Bbulatov.
Willy Tarreau [Fri, 31 May 2024 16:30:16 +0000 (18:30 +0200)]
BUG/MINOR: cfgparse: remove the correct option on httpcheck send-state warning
In GH issue #2586 @Bbulatov reported a bug where the http-check
send-state flag is removed from options instead of options2 when
http-check is disabled. It only has an effect when this option is
set and http-check disabled, where it displays a warning indicating
this will be ignored. The option removed instead is srvtcpka when
this happens. It's likely that both options being so minor, nobody
ever faced it.