Willy Tarreau [Fri, 25 Nov 2022 08:40:06 +0000 (09:40 +0100)]
DOC: halog: explain how to use -ac and -ad in the help message
Tim reported in issue #1435 that halog options -ac/-ad were poorly
documented. They're indeed used to spot infrastructure outages between
the clients and haproxy by detecting abnormal periods of silence followed
by bursts, either affecting the network itself, or also a single machine
(e.g. swapping on an edge client or proxy can cause such patterns).
Willy Tarreau [Fri, 25 Nov 2022 08:27:15 +0000 (09:27 +0100)]
DOC: config: refer to section about quoting in the "add_item" converter
As requested by Nick in issue #1719, let's add a reference to the section
about quoting there, since add_item() will often be used with commas and
it's easy to mess up.
Willy Tarreau [Fri, 25 Nov 2022 08:17:18 +0000 (09:17 +0100)]
DOC: config: provide some configuration hints for "http-reuse"
This adds some configuration hints regarding various workloads that do
not manage to achieve high reuse rates due to too low a global maxconn
or thread groups.
BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.
This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.
Willy Tarreau [Thu, 24 Nov 2022 16:13:05 +0000 (17:13 +0100)]
[RELEASE] Released version 2.7-dev10
Released version 2.7-dev10 with the following main changes :
- MEDIUM: tcp-act: add parameter rst-ttl to silent-drop
- BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
- MINOR: cli: print parsed command when not found
- BUG/MAJOR: quic: Crash after discarding packet number spaces
- CLEANUP: quic: replace "choosen" with "chosen" all over the code
- MINOR: cli/pools: store "show pools" results into a temporary array
- MINOR: cli/pools: add sorting capabilities to "show pools"
- MINOR: cli/pools: add pool name filtering capability to "show pools"
- DOC: configuration: fix quic prefix typo
- MINOR: quic: report error if force-retry without cluster-secret
- MINOR: global: generate random cluster.secret if not defined
- BUG/MINOR: resolvers: do not run the timeout task when there's no resolution
- BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
- MINOR: server/idle: make the next_takeover index per-tgroup
- BUILD: listener: fix build warning on global_listener_rwlock without threads
- BUG/MAJOR: sched: protect task during removal from wait queue
- BUILD: sched: fix build with DEBUG_THREAD with the previous commit
- DOC: quic: add note on performance issue with listener contention
- BUG/MINOR: cfgparse-listen: fix ebpt_next_dup pointer dereference on proxy "from" inheritance
- BUG/MINOR: log: fix parse_log_message rfc5424 size check
- CLEANUP: arg: remove extra check in make_arg_list arg escaping
- CLEANUP: tools: extra check in utoa_pad
- MINOR: h1: Consider empty port as invalid in authority for CONNECT
- MINOR: http: Considere empty ports as valid default ports
- BUG/MINOR: http-htx: Normalized absolute URIs with an empty port
- BUG/MINOR: h1: Replace authority validation to conform RFC3986
- REG-TESTS: http: Add more tests about authority/host matching
- BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action
- BUG/MEDIUM: mux-h1: Don't release H1C on timeout if there is a SC attached
- BUG/MEDIUM: mux-h1: Subscribe for reads on error on sending path
- BUILD: http-htx: Silent build error about a possible NULL start-line
- DOC: configuration.txt: add default_value for table_idle signature
- BUILD: ssl-sock: Silent error about NULL deref in ssl_sock_bind_verifycbk()
- BUG/MEDIUM: mux-h1: Remove H1C_F_WAIT_NEXT_REQ flag on a next request
- BUG/MINOR: mux-h1: Fix handling of 408-Request-Time-Out
- MINOR: mux-h1: Remove H1C_F_WAIT_NEXT_REQ in functions handling errors
- MINOR: mux-h1: Avoid useless call to h1_send() if no error is sent
- DOC: configuration.txt: fix typo in table_idle signature
- BUILD: stick-tables: fix build breakage in xxhash on older compilers
- BUILD: compiler: include compiler's definitions before ours
- BUILD: quic: global.h is needed in cfgparse-quic
- CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h
- BUILD: flags: really restrict the cases where flags are exposed
- BUILD: makefile: minor reordering of objects by build time
- BUILD: quic: silence two invalid build warnings at -O1 with gcc-6.5
- BUILD: quic: use openssl-compat.h instead of openssl/ssl.h
- MEDIUM: ssl: add minimal WolfSSL support with OpenSSL compatibility mode
- MINOR: sample: make the rand() sample fetch function use the statistical_prng
- MINOR: auth: silence null dereference warning in check_user()
- CLEANUP: peers: fix format string for status messages (int signedness)
- CLEANUP: qpack: fix format string in debugging code (int signedness)
- CLEANUP: qpack: properly use the QPACK macros not HPACK ones in debug code
- BUG/MEDIUM: quic: fix datagram dropping on queueing failed
Amaury Denoyelle [Thu, 24 Nov 2022 14:24:38 +0000 (15:24 +0100)]
BUG/MEDIUM: quic: fix datagram dropping on queueing failed
After reading a datagram, it is enqueud for the thread attached to the
DCID. This is done via quic_lstnr_dgram_dispatch() function. If this
step fails, we remove the datagram from the buffer of quic_receiver_buf.
This step is faulty because we use b_del() instead of b_sub(). If
quic_receiver_buf was not empty, we will remove content from another
datagram while leaving the content of the last read datagram. This
probably produces valid datagram dropping and may even result in crash.
As stated, this bug can only happen if qc_lstnr_dgram_dispatch() fails
which happen on two occaions :
* on quic_dgram allocation failure, which should be pretty rare
* on datagram labelled as invalid for QUIC protocol. This may happen
more frequently depending on the network conditions. Thus, this bug
has been labelled as a medium one.
Willy Tarreau [Thu, 24 Nov 2022 14:38:26 +0000 (15:38 +0100)]
CLEANUP: qpack: properly use the QPACK macros not HPACK ones in debug code
There were a few leftovers of DEBUG_HPACK and HPACK_SHT_SIZE instead of
their QPACK equivalent in the QPACK debug code. There's no harm anyway,
but it could lead to confusing results if the tables are not sized
equally.
Willy Tarreau [Thu, 24 Nov 2022 14:35:17 +0000 (15:35 +0100)]
CLEANUP: qpack: fix format string in debugging code (int signedness)
In issue #1939, Ilya mentions that cppchecks warned about use of "%d" to
report the QPACK table's index that's locally stored as an unsigned int.
While technically valid, this will never cause any trouble since indexes
are always small positive values, but better use %u anyway to silence
this warning.
Willy Tarreau [Thu, 24 Nov 2022 14:32:20 +0000 (15:32 +0100)]
CLEANUP: peers: fix format string for status messages (int signedness)
In issue #1939, Ilya mentions that cppchecks warned about use of "%d" to
report the status state that's locally stored as an unsigned int. While
technically valid, this will never cause any trouble since in the end
what we store there are the applet's states (just a few enum values).
Better use %u anyway to silence this warning.
MINOR: auth: silence null dereference warning in check_user()
In GH issue #1940 cppcheck warns about a possible null-dereference in
check_user() when DEBUG_AUTH is enabled. Indeed, <ep> may potentially
be NULL because upon error crypt_r() and crypt() may return a null
pointer. However it's not directly derefenced, it is only passed to
printf() with '%s' fmt. While it is in practice fine with the printf
implementations we care about (that check strings against null before
printing them), it is undefined behavior according to the spec, hence
the warning.
Let's check <ep> before passing it to printf. This should partly
solve GH #1940.
Willy Tarreau [Thu, 24 Nov 2022 13:54:29 +0000 (14:54 +0100)]
MINOR: sample: make the rand() sample fetch function use the statistical_prng
Emeric noticed that producing many randoms to fill a stick table was
saturating on the rand_lock. Since 2.4 we have the statistical PRNG
for low-quality randoms like this one, there is no point in using the
one that was originally implemented for the purpose of creating safe
UUIDs, since the doc itself clearly states that these randoms are not
secure and they have not been in the past either. With this change,
locking contention is completely gone.
Willy Tarreau [Thu, 24 Nov 2022 08:16:41 +0000 (09:16 +0100)]
BUILD: quic: silence two invalid build warnings at -O1 with gcc-6.5
Gcc 6.5 is now well known for triggering plenty of false "may be used
uninitialized", particularly at -O1, and two of them happen in quic,
quic_tp and quic_conn. Both of them were reviewed and easily confirmed
as wrong (gcc seems to ignore the control flow after the function
returns and believes error conditions are not met). Let's just preset
the variables that bothers it. In quic_tp the initialization was moved
out of the loop since there's no point inflating the code just to
silence a stupid warning.
Willy Tarreau [Thu, 24 Nov 2022 07:57:13 +0000 (08:57 +0100)]
BUILD: makefile: minor reordering of objects by build time
This time the current ordering of common objects remained mostly
unchanged, except for flt_bwlim that was added. However, the SSL
and QUIC build order still had not been handled and were extremely
imbalanced, so they were adjusted. It's even possible to start
building QUIC before openssl to save a little bit more but more
likely that a few large quic files will get split again over time.
Willy Tarreau [Thu, 24 Nov 2022 07:22:40 +0000 (08:22 +0100)]
BUILD: flags: really restrict the cases where flags are exposed
A number of internal flags started to be exposed to external programs
at the location of their definition since commit 77acaf5af ("MINOR:
flags: add a new file to host flag dumping macros"). This allowed the
"flags" utility to decode many more of them and always correctly. The
condition to expose them was to rely on the preliminary definition of
EOF that indicates that stdio is already included. But this was a
wrong approach. It only guarantees that snprintf() can safely be used
but still causes large functions to be built. But stdio is often
included before some of these includes, so these heavy inline functions
actually have to be compiled in many cases. The result is that the
build time significantly increased, especially with fast compilers
like gcc -O0 which took +50% or TCC which took +100%!
This patch addresses the problem by instead relying on an explicit
macro HA_EXPOSE_FLAGS that the calling program must explicitly define
before including these files. flags.c does this and that's all. The
previous build time is now restored with a speed up of 20 to 50%
depending on the build options.
Willy Tarreau [Thu, 24 Nov 2022 07:09:12 +0000 (08:09 +0100)]
CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h
These includes brought by commit 9c76637ff ("MINOR: anon: add new macros
and functions to anonymize contents") resulted in an increase of exactly
20% of the number of lines to build. These include are not needed there,
only tools.c needs xxhash.h.
Willy Tarreau [Thu, 24 Nov 2022 07:28:43 +0000 (08:28 +0100)]
BUILD: quic: global.h is needed in cfgparse-quic
cfgparse-quic accesses some members of the "global" struct but only
includes global-t.h. It actually used to work via tools.h due to a
long dependency chain that brought it, but it will be fixed and will
break cfgparse-quic, so let's fix it first.
Willy Tarreau [Thu, 24 Nov 2022 06:51:57 +0000 (07:51 +0100)]
BUILD: compiler: include compiler's definitions before ours
Building with TCC caused a warning on __attribute__() being redefined,
because we do define it on compilers that don't have it, but we didn't
include the compiler's definitions first to leave it a chance to expose
its definitions. The correct way to do this would be to include
sys/cdefs.h but we currently don't include it explicitly and a few
reports on the net mention some platforms where it could be missing
by default. Let's use inttypes.h instead, it always causes it (or its
equivalent) to be included and we know it's present on supported
platforms since we already depend on it.
Willy Tarreau [Thu, 24 Nov 2022 06:35:17 +0000 (07:35 +0100)]
BUILD: stick-tables: fix build breakage in xxhash on older compilers
Commit 36d156564 ("MINOR: peers: Support for peer shards") reintroduced
a direct dependency on import/xxhash.h which was previously dropped by
commit d5fc8fcb8 ("CLEANUP: Add haproxy/xxhash.h to avoid modifying
import/xxhash.h"). This results in xxhash.h being included twice, which
breaks the build on older compilers which do not like seeing XXH32_hash_t
being defined twice.
DOC: configuration.txt: fix typo in table_idle signature
An extra ',' was mistakenly added in table_idle converter signature
with commit ed36968 ("DOC: configuration.txt: add default_value for
table_idle signature").
MINOR: mux-h1: Avoid useless call to h1_send() if no error is sent
If we choose to not send an error to the client, there is no reason to call
h1_send() for nothing. This happens because functions handling errors return
1 instead of 0 when nothing is performed.
MINOR: mux-h1: Remove H1C_F_WAIT_NEXT_REQ in functions handling errors
If is cleaner to remove this flag in the internal functions handling errors,
responsible to update the H1 connection state, instead to do so in calling
functions. This will hopefully avoid bugs in future.
BUG/MINOR: mux-h1: Fix handling of 408-Request-Time-Out
When a timeout is detected waiting for the request, a 408-Request-Time-Out
response is sent. However, an error was introduced by commit 6858d76cd3
("BUG/MINOR: mux-h1: Obey dontlognull option for empty requests"). Instead
of inhibiting the log message, the option was stopping the error sending.
BUG/MEDIUM: mux-h1: Remove H1C_F_WAIT_NEXT_REQ flag on a next request
When an idle H1 connection starts to process an new request, we must take
care to remove H1C_F_WAIT_NEXT_REQ flag. This flag is used to know an idle
H1 connection has already processed at least one request and is waiting for
a next one, but nothing was received yet.
Keeping this flag leads to a crash because some running H1 connections may
be erroneously released on a soft-stop. Indeed, only idle or closed
connections must be released.
This bug was reported into #1903. It is 2.7-specific. No backport needed.
BUILD: ssl-sock: Silent error about NULL deref in ssl_sock_bind_verifycbk()
In ssl_sock_bind_verifycbk(), when compiled without QUIC support, the
compiler may report an error during compilation about a possible NULL
dereference:
src/ssl_sock.c: In function ‘ssl_sock_bind_verifycbk’:
src/ssl_sock.c:1738:12: error: potential null pointer dereference [-Werror=null-dereference]
1738 | ctx->xprt_st |= SSL_SOCK_ST_FL_VERIFY_DONE;
| ~~~^~~~~~~~~
A BUG_ON() was addeded because it must never happen. But when compiled
without DEBUG_STRICT, there is nothing to help the compiler. Thus
ALREADY_CHECKED() macro is used. The ssl-sock context and the bind config
are concerned.
DOC: configuration.txt: add default_value for table_idle signature
table_idle converter takes optional default_value argument.
The documentation correctly describes this usage but <default_value> was
missing in the converter signature.
It should be backported with bbeec37b3 ("MINOR: stick-table:
Add table_expire() and table_idle() new converters")
BUILD: http-htx: Silent build error about a possible NULL start-line
In http_replace_req_uri(), if the URI was successfully replaced, it means
the start-line exists. However, the compiler reports an error about a
possible NULL pointer dereference:
src/http_htx.c: In function ‘http_replace_req_uri’:
src/http_htx.c:392:19: error: potential null pointer dereference [-Werror=null-dereference]
392 | sl->flags &= ~HTX_SL_F_NORMALIZED_URI;
So, ALREADY_CHECKED() macro is used to silent the build error. This patch
must be backported with 84cdbe478a ("BUG/MINOR: http-htx: Don't consider an
URI as normalized after a set-uri action").
BUG/MEDIUM: mux-h1: Subscribe for reads on error on sending path
The recent refactoring about errors handling in the H1 multiplexer
introduced a bug on abort when the client wait for the server response. The
bug only exists if abortonclose option is not enabled. Indeed, in this case,
when the end of the message is reached, the mux stops to receive data
because these data are part of the next request. However, error on the
sending path are no longer fatal. An error on the reading path must be
caught to do so. So, in case of a client abort, the error is not reported to
the upper layer and the H1 connection cannot be closed if some pending data
are blocked (remember, an error on sending path was detected, blocking
outgoing data).
To be sure to have a chance to detect the abort in the case, when an error
is detected on the sending path, we force the subscription for reads.
This patch, with the previous one, should fix the issue #1943. It is
2.7-specific, no backport is needed.
BUG/MEDIUM: mux-h1: Don't release H1C on timeout if there is a SC attached
When the H1 task timed out, we must be careful to not release the H1
conneciton if there is still a H1 stream with a stream-connector
attached. In this case, we must wait. There are some tests to prevent it
happens. But the last one only tests the UPGRADING state while there is also
the CLOSING state with a SC attached. But, in the end, it is safer to test
if there is a H1 stream with a SC attached.
This patch should partially fix the issue #1943. However, it only prevent
the segfault. There is another bug under the hood. BTW, this one is
2.7-specific. Not backport is needed.
BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action
An abosulte URI is marked as normalized if it comes from an H2 client. This
way, we know we can send a relative URI to an H1 server. But, after a
set-uri action, the URI must no longer be considered as normalized.
Otherwise there is no way to send an absolute URI on the server side.
If it is important to update a normalized absolute URI without altering this
property, the host, path and/or query-string must be set separatly.
This patch should fix the issue #1938. It should be backported as far as
2.4.
BUG/MINOR: h1: Replace authority validation to conform RFC3986
Except for CONNECT method, where a normalization is performed, we expected
to have an exact match between the authority and the host header value.
However it was too strict. Indeed, default port must be handled and the
matching must respect the RFC3986.
There is already a scheme based normalizeation performed on the URI later,
on the HTX message. And we cannot normalize the URI during H1 parsing to be
able to report proper errors on the original raw buffer. And a systematic
read-only normalization to validate the authority will consume CPU for only
few requests. At the end, we decided to perform extra-checks when the exact
match fails. Now, following authority/host are considered as equivalent:
* MINOR: h1: Consider empty port as invalid in authority for CONNECT
* MINOR: http: Considere empty ports as valid default ports
It is a bug regarding the RFC3986. Technically, I may be backported as far
as 2.4. However, this must be discussed first. If it is backported, the
commits above must be backported too.
BUG/MINOR: http-htx: Normalized absolute URIs with an empty port
Thanks to the previous commit ("MINOR: http: Considere empty ports as valid
default ports"), empty ports are now considered as valid default ports. Thus,
absolute URIs with empty port should be normalized.
* MINOR: h1: Consider empty port as invalid in authority for CONNECT
* MINOR: http: Considere empty ports as valid default ports
It is a bug regarding the RFC3986. Technically, I may be backported as far
as 2.4. However, this must be discussed first. If backported, the commits
above must be backported too.
The third one is interristing because the port is empty and it is still
considered as a default port. Thus, http_get_host_port() does no longer
return IST_NULL when the port is empty. Now, a ist is returned, it points on
the first character after the colon (':') with a length of 0. In addition,
http_is_default_port() now considers an empty port as a default port,
regardless the scheme.
This patch must not be backported, if so, without the previous one ("MINOR:
h1: Consider empty port as invalid in authority for CONNECT").
MINOR: h1: Consider empty port as invalid in authority for CONNECT
For now, this change is useless because http_get_host_port() returns
IST_NULL when the port is empty. But this will change. For other methods,
empty ports are valid. But not for CONNECT method. To still return a
400-Bad-Request if a CONNECT is performed with an empty port, istlen() is
used to test the port, instead of isttest().
In parse_log_message(), if log is rfc5424 compliant, p pointer
is incremented and size is not. However size is still used in further
checks as if p pointer was not incremented.
This could lead to logic error or buffer overflow if input buf is not
null-terminated.
Fixing this by making sure size is up to date where it is needed.
BUG/MINOR: cfgparse-listen: fix ebpt_next_dup pointer dereference on proxy "from" inheritance
ebpt_next_dup() was used 2 times in a row but only the first call was
checked against NULL, probably assuming that the 2 calls always yield the
same result here.
gcc is not OK with that, and it should be safer to store the result of
the first call in a temporary var to dereference it once checked against NULL.
This should fix GH #1869.
Thanks to Ilya for reporting this issue.
Willy Tarreau [Tue, 22 Nov 2022 09:24:07 +0000 (10:24 +0100)]
BUILD: sched: fix build with DEBUG_THREAD with the previous commit
The build with DEBUG_THREAD was broken by commit fc50b9dd1 ("BUG/MAJOR:
sched: protect task during removal from wait queue"). It took me a while
to figure how to declare and aligned and initialized rwlock that wasn't
static, but it turns out that __decl_aligned_rwlock() does exactly this,
so that we don't have to assign an integer value when a struct is expected
in case of debugging.
Willy Tarreau [Tue, 22 Nov 2022 06:05:44 +0000 (07:05 +0100)]
BUG/MAJOR: sched: protect task during removal from wait queue
The issue addressed by commit fbb934da9 ("BUG/MEDIUM: stick-table: fix
a race condition when updating the expiration task") is still present
when thread groups are enabled, but this time it lies in the scheduler.
What happens is that a task configured to run anywhere might already
have been queued into one group's wait queue. When updating a stick
table entry, sometimes the task will have to be dequeued and requeued.
For this a lock is taken on the current thread group's wait queue lock,
but while this is necessary for the queuing, it's not sufficient for
dequeuing since another thread might be in the process of expiring this
task under its own group's lock which is different. This is easy to test
using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread
groups. The process crashes almost instantly under heavy traffic.
One approach could consist in storing the group number the task was
queued under in its descriptor (we don't need 32 bits to store the
thread id, it's possible to use one short for the tid and another
one for the tgrp). Sadly, no safe way to do this was figured, because
the race remains at the moment the thread group number is checked, as
it might be in the process of being changed by another thread. It seems
that a working approach could consist in always having it associated
with one group, and only allowing to change it under this group's lock,
so that any code trying to change it would have to iterately read it
and lock its group until the value matches, confirming it really holds
the correct lock. But this seems a bit complicated, particularly with
wait_expired_tasks() which already uses upgradable locks to switch from
read state to a write state.
Given that the shared tasks are not that common (stick-table expirations,
rate-limited listeners, maybe resolvers), it doesn't seem worth the extra
complexity for now. This patch takes a simpler and safer approach
consisting in switching back to a single wq_lock, but still keeping
separate wait queues. Given that shared wait queues are almost always
empty and that otherwise they're scanned under a read lock, the
contention remains manageable and most of the time the lock doesn't
even need to be taken since such tasks are not present in a group's
queue. In essence, this patch reverts half of the aforementionned
patch. This was tested and confirmed to work fine, without observing
any performance degradation under any workload. The performance with
8 groups on an EPYC 74F3 and 3 tables remains twice the one of a
single group, with the contention remaining on the table's lock first.
Willy Tarreau [Tue, 22 Nov 2022 08:08:23 +0000 (09:08 +0100)]
BUILD: listener: fix build warning on global_listener_rwlock without threads
The global_listener_rwlock was introduced by recent commit 13e86d947
("BUG/MEDIUM: listener: Fix race condition when updating the global mngmt
task"), but it's declared static and is not used when threads are disabled,
thus causing a warning to be emitted in this case. Let's just condition it
to thread usage to shut the warning.
This will need to be backported where the patch above is backported.
Willy Tarreau [Mon, 21 Nov 2022 13:14:06 +0000 (14:14 +0100)]
MINOR: server/idle: make the next_takeover index per-tgroup
In order to evenly pick idle connections from other threads, there is
a "next_takeover" index in the server, that is incremented each time
a connection is picked from another thread, and indicates which one to
start from next time.
With thread groups this doesn't work well because the index is the same
regardless of the group, and if a group has more threads than another,
there's even a risk to reintroduce an imbalance.
This patch introduces a new per-tgroup storage in servers which, for now,
only contains an instance of this next_takeover index. This way each
thread will now only manipulate the index specific to its own group, and
the takeover will become fair again. More entries may come soon.
Willy Tarreau [Mon, 21 Nov 2022 13:32:33 +0000 (14:32 +0100)]
BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.
Willy Tarreau [Mon, 21 Nov 2022 18:15:22 +0000 (19:15 +0100)]
BUG/MINOR: resolvers: do not run the timeout task when there's no resolution
The function resolv_update_resolvers_timeout() always schedules a wakeup
of the process_resolvers() task based on the "timeout resolve" setting,
regardless of the presence of an ongoing resolution or not. This is causing
one wakeup every second by default even when there's no resolvers section
(due to the default one), and can even be worse: creating a section with
"timeout resolve 1" bombs the process with 1000 wakeups per second.
Let's condition the setting to the presence of a resolution to address
this.
This issue has been there forever, but it doesn't cause that much trouble,
and given how fragile and tricky this code is, it's probably wise to
refrain from backporting it until it's reported to really cause trouble.
Amaury Denoyelle [Mon, 14 Nov 2022 15:18:46 +0000 (16:18 +0100)]
MINOR: global: generate random cluster.secret if not defined
If no cluster-secret is defined by the user, a random one is silently
generated.
This ensures that at least QUIC Retry tokens are generated if abnormal
conditions are detected. However, it is advisable to specify it in the
configuration for tokens to be valid even after a reload or across LBs
instances in the same cluster.
Amaury Denoyelle [Mon, 14 Nov 2022 15:17:13 +0000 (16:17 +0100)]
MINOR: quic: report error if force-retry without cluster-secret
QUIC Retry generation relies on global cluster-secret to produce token
valid even after a process restart and across several LBs instances.
Before this patch, Retry is automatically deactivated if no
cluster-secret is provided. This is the case even if a user has
configured a QUIC listener with quic-force-retry. Change this behavior
by now returning an error during configuration parsing. The user must
provide a cluster-secret if quic-force-retry is used.
Willy Tarreau [Mon, 21 Nov 2022 08:34:02 +0000 (09:34 +0100)]
MINOR: cli/pools: add sorting capabilities to "show pools"
The "show pools" command is used a lot for debugging but didn't get much
love over the years. This patch brings new capabilities:
- sorting the output by pool names to ese their finding ("byname").
- sorting the output by reverse item size to spot the biggest ones("bysize")
- sorting the output by reverse number of allocated bytes ("byusage")
The last one (byusage) also omits displaying the ones with zero allocation.
In addition, an optional max number of output entries may be passed so as
to dump only the N most relevant ones.
BUG/MAJOR: quic: Crash after discarding packet number spaces
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:
"BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"
This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.
This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.
Thank you to @gabrieltz for having reported this issue in GH #1903.
Abhijeet Rastogi [Thu, 17 Nov 2022 12:42:38 +0000 (04:42 -0800)]
MINOR: cli: print parsed command when not found
It is useful because when we're passing data to runtime API, specially
via code, we can mistakenly send newlines leading to some lines being
wrongly interpretted as commands.
This is analogous to how it's done in a shell, example bash:
$ not_found arg1
bash: not_found: command not found...
$
Real world example: Following the official docs to add a cert:
Note, how the payload is sent via '<<\n$(cat ./cert.pem)\n'. If cert.pem
contains a newline between various PEM blocks, which is valid, the above
command would generate a flood of 'Unknown command' messages for every
line sent after the first newline. As a new user, this detail is not
clearly visible as socket API doesn't say what exactly what was 'unknown'
about it. The cli interface should be obvious around guiding user on
"what do do next".
This commit changes that by printing the parsed cmd in output like
'Unknown command: "<cmd>"' so the user gets clear "next steps", like
bash, regarding what indeed was the wrong command that HAproxy couldn't
interpret.
Unknown command, but maybe one of the following ones is a better match:
add map [@<ver>] <map> <key> <val> : add a map entry (payload supported instead of key/val)
Unknown command: 'helpp', but maybe one of the following ones is a better match:
add map [@<ver>] <map> <key> <val> : add a map entry (payload supported instead of key/val)
BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.
Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)
With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).
Thank you to @gabrieltz for having reported this issue.
MEDIUM: tcp-act: add parameter rst-ttl to silent-drop
The silent-drop action was extended with an additional optional parameter,
[rst-ttl <ttl> ], causing HAProxy to send a TCP RST with the specified TTL
towards the client.
With this behaviour, the connection state on your own client-
facing middle-boxes (load balancers, firewalls) will be purged,
but the client will still assume the TCP connection is up because
the TCP RST packet expires before reaching the client.
Willy Tarreau [Fri, 18 Nov 2022 16:48:49 +0000 (17:48 +0100)]
[RELEASE] Released version 2.7-dev9
Released version 2.7-dev9 with the following main changes :
- BUILD: quic: QUIC mux build fix for 32-bit build
- BUILD: scripts: disable tests build on QuicTLS build
- BUG/MEDIUM: httpclient: segfault when the httpclient parser fails
- BUILD: ssl_sock: fix null dereference for QUIC build
- BUILD: quic: Fix build for m68k cross-compilation
- BUG/MINOR: quic: fix buffer overflow on retry token generation
- MINOR: quic: add version field on quic_rx_packet
- MINOR: quic: extend pn_offset field from quic_rx_packet
- MINOR: quic: define first packet flag
- MINOR: quic: extract connection retrieval
- MINOR: quic: split and rename qc_lstnr_pkt_rcv()
- MINOR: quic: refactor packet drop on reception
- MINOR: quic: extend Retry token check function
- BUG/MINOR: log: Preserve message facility when the log target is a ring buffer
- BUG/MINOR: ring: Properly parse connect timeout
- BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient
- BUG/MEDIUM: httpclient: check if the httpclient was released in the IO handler
- REGTESTS: httpclient/lua: test the lua task timeout with the httpclient
- CI: github: dump the backtrace of coredumps in the alpine container
- BUILD: Makefile: add "USE_SHM_OPEN" on the linux-musl target
- DOC: lua: add a note about compression w/ httpclient
- CLEANUP: mworker/cli: rename the status function to loadstatus
- MINOR: mworker/cli: does no try to dump the startup-logs w/o USE_SHM_OPEN
- MINOR: list: fixing typo in MT_LIST_LOCK_ELT
- DOC/MINOR: list: fixing MT_LIST_LOCK_ELT macro documentation
- MINOR: list: adding MT_LIST_APPEND_LOCKED macro
- BUG/MINOR: mux-quic: complete flow-control for uni streams
- BUG/MEDIUM: compression: handle rewrite errors when updating response headers
- MINOR: quic: do not crash on unhandled sendto error
- MINOR: quic: display unknown error sendto counter on stat page
- MINOR: peers: Support for peer shards
- MINOR: peers: handle multiple resync requests using shards
- BUG/MINOR: sink: Only use backend capability for the sink proxies
- BUG/MINOR: sink: Set default connect/server timeout for implicit ring buffers
- MINOR: ssl: add the SSL error string when failing to load a certificate
- MINOR: ssl: add the SSL error string before the chain
- MEDIUM: ssl: be stricter about chain error
- BUG/MAJOR: stick-table: don't process store-response rules for applets
- MINOR: quic: remove unnecessary quic_session_accept()
- BUG/MINOR: quic: fix subscribe operation
- BUG/MINOR: log: fixing bug in tcp syslog_io_handler Octet-Counting
- MINOR: ssl: dump the SSL string error when SSL_CTX_use_PrivateKey() failed.
- MINOR: quic: add counter for interrupted reception
- BUG/MINOR: quic: fix race condition on datagram purging
- CI: add monthly gcc cross compile jobs
- CLEANUP: assorted typo fixes in the code and comments
- CLEANUP: ssl: remove dead code in ssl_sock_load_pem_into_ckch()
- BUG/MINOR: httpclient: fixed memory allocation for the SSL ca_file
- BUG/MINOR: ssl: Memory leak of DH BIGNUM fields
- BUG/MINOR: ssl: Memory leak of AUTHORITY_KEYID struct when loading issuer
- BUG/MINOR: ssl: ocsp structure not freed properly in case of error
- CI: switch to the "latest" LibreSSL
- CI: enable QUIC for LibreSSL builds
- BUG/MEDIUM: ssl: Verify error codes can exceed 63
- MEDIUM: ssl: {ca,crt}-ignore-err can now use error constant name
- MINOR: ssl: x509_v_err_str converter transforms an integer to a X509_V_ERR name
- CLEANUP: cli: rename dynamic error printing state
- MINOR: cli: define usermsgs print context
- MINOR: server: clear prefix on stderr logs after add server
- BUG/MINOR: ssl: bind_conf is uncorrectly accessed when using QUIC
- BUILD: ssl_utils: fix build on gcc versions before 8
- BUILD: debug: remove unnecessary quotes in HA_WEAK() calls
- CI: emit the compiler's version in the build reports
- IMPORT: xxhash: update xxHash to version 0.8.1
- IMPORT: slz: declare len to fix debug build when optimal match is enabled
- IMPORT: slz: mention the potential header in slz_finish()
- IMPORT: slz: define and use a __fallthrough statement for switch/case
- BUILD: compiler: add a macro to detect if another one is set and equals 1
- BUILD: compiler: add a default definition for __has_attribute()
- BUILD: compiler: define a __fallthrough statement for switch/case
- BUILD: sample: use __fallthrough in smp_is_rw() and smp_dup()
- BUILD: quic: use __fallthrough in quic_connect_server()
- BUILD: ssl/crt-list: use __fallthrough in cli_io_handler_add_crtlist()
- BUILD: ssl: use __fallthrough in cli_io_handler_commit_{cert,cafile_crlfile}()
- BUILD: ssl: use __fallthrough in cli_io_handler_tlskeys_files()
- BUILD: hlua: use __fallthrough in hlua_post_init_state()
- BUILD: stream: use __fallthrough in stats_dump_full_strm_to_buffer()
- BUILD: tcpcheck: use __fallthrough in check_proxy_tcpcheck()
- BUILD: stats: use __fallthrough in stats_dump_proxy_to_buffer()
- BUILD: peers: use __fallthrough in peer_io_handler()
- BUILD: hash: use __fallthrough in hash_djb2()
- BUILD: tools: use __fallthrough in url_decode()
- BUILD: args: use __fallthrough in make_arg_list()
- BUILD: acl: use __fallthrough in parse_acl_expr()
- BUILD: spoe: use __fallthrough in spoe_handle_appctx()
- BUILD: logs: use __fallthrough in build_log_header()
- BUILD: check: use __fallthrough in __health_adjust()
- BUILD: http_act: use __fallthrough in parse_http_del_header()
- BUILD: h1_htx: use __fallthrough in h1_parse_chunk()
- BUILD: vars: use __fallthrough in var_accounting_{diff,add}()
- BUILD: map: use __fallthrough in cli_io_handler_*()
- BUILD: compression: use __fallthrough in comp_http_payload()
- BUILD: stconn: use __fallthrough in various shutw() functions
- BUILD: prometheus: use __fallthrough in promex_dump_metrics() and IO handler()
- CLEANUP: ssl: remove printf in bind_parse_ignore_err
- BUG/MINOR: ssl: crt-ignore-err memory leak with 'all' parameter
- BUG/MINOR: ssl: Fix potential overflow
- CLEANUP: stick-table: remove the unused table->exp_next
- OPTIM: stick-table: avoid atomic ops in stktable_requeue_exp() when possible
- BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task
- MEDIUM: http-ana: remove set-cookie2 support
- BUG/MEDIUM: wdt/clock: properly handle early task hangs
- MINOR: deinit: add a "quick-exit" option to bypass the deinit step
- OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's pfx
- OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's key
- MINOR: ssl: ssl_sock_load_cert_chain() display error strings
- MINOR: ssl: reintroduce ERR_GET_LIB(ret) == ERR_LIB_PEM in ssl_sock_load_pem_into_ckch()
- BUG/MINOR: http-htx: Fix error handling during parsing http replies
- BUG/MINOR: resolvers: Don't wait periodic resolution on healthcheck failure
- BUG/MINOR: resolvers: Set port before IP address when processing SRV records
- BUG/MINOR: mux-fcgi: Be sure to send empty STDING record in case of zero-copy
- BUG/MEDIUM: mux-fcgi: Avoid value length overflow when it doesn't fit at once
- BUG/MINOR: ssl: SSL_load_error_strings might not be defined
- MINOR: pool/debug: create a new pool_alloc_flag() macro
- MINOR: dynbuf: switch allocation and release to macros to better track users
- BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers
- REG-TESTS: cache: Remove T-E header for 304-Not-Modified responses
- DOC: config: fix alphabetical ordering of global section
- MINOR: trace: split the CLI "trace" parser in CLI vs statement
- MEDIUM: trace: create a new "trace" statement in the "global" section
- BUG/MEDIUM: ring: fix creation of server in uninitialized ring
- BUILD: quic: fix dubious 0-byte overflow on qc_release_lost_pkts
- BUILD: makefile: mark poll and tcploop targets as phony
- BUILD: makefile: properly pass CC to sub-projects
- BUILD: makefile: move default verbosity settings to include/make/verbose.mk
- BUILD: makefile: use $(cmd_MAKE) in quiet mode
- BUILD: makefile: move the compiler option detection stuff to compiler.mk
- DEV: poll: make the connect() step an action as well
- DEV: poll: strip the "do_" prefix from reported function names
- DEV: poll: indicate the FD's side in front of its value
- BUG/MINOR: pool/cli: use ullong to report total pool usage in bytes
- MINOR: mux-h1: Remove usless code inside shutr callback
- CLEANUP: mux-h1; Rename H1S_F_ERROR flag into H1S_F_ERROR_MASK
- REORG: mux-h1: Reorg the H1C structure
- CLEANUP: mux-h1: Rename H1C_F_ST_ERROR and H1C_F_ST_SILENT_SHUT flags
- MINOR: mux-h1: Add a dedicated enum to deal with H1 connection state
- MEDIUM: mux-h1: Handle H1C states via its state field instead of H1C_F_ST_*
- MINOR: mux-h1: Don't handle subscribe for reads in h1_process_demux()
- CLEANUP: mux-h1: Rename H1C_F_ERR_PENDING into H1C_F_ABRT_PENDING
- MINOR: mux-h1: Add flag on H1 stream to deal with internal errors
- MEDIUM: mux-h1: Rely on the H1C to deal with shutdown for reads
- CLEANUP: mux-h1: Reorder H1 connection flags to avoid holes
- MEDIUM: mux-h1: Don't report a final error whe a message is aborted
- MEDIUM: mux-pt: Don't always set a final error on SE on the sending path
- MEDIUM: mux-h2: Introduce flags to deal with connection read/write errors
- CLEANUP: mux-h2: Remove unused fields in h2c structures
- MEDIUM: mux-fcgi: Introduce flags to deal with connection read/write errors
- MINOR: sconn: Set SE_FL_ERROR only when there is no more data to read
- MINOR: mux-h1: Rely on a H1S flag to know a WS key was found or not
- DOC: lua-api: Remove warning about the lua filters
- BUG/MEDIUM: listener: Fix race condition when updating the global mngmt task
- CLEANUP: listener: Remove useless task_queue from manage_global_listener_queue
- BUG/MINOR: mux-h1: Fix error handling when H1S allocation failed on client side
- DOC: internal: commit notes about polling states and flags
- DOC: internal: commit notes about polling states and flags on connect()
- CLEANUP: mux-h1: Don't test h1c in h1_shutw_conn()
- BUG/MINOR: http_ana/txn: don't re-initialize txn and req var lists
- BUG/MEDIUM: raw-sock: Don't report connection error if something was received
- BUG/MINOR: ssl: don't initialize the keylog callback when not required
- BUILD: Makefile: enable USE_SHM_OPEN by default on freebsd
- BUG/MEDIUM: peers: messages about unkown tables not correctly ignored
- MINOR: cfgparse: Always check the section position
- MEDIUM: thread: Restric nbthread/thread-group(s) to very first global sections
- BUILD: peers: Remove unused variables
- MINOR: ncbuf: complete doc for ncb_advance()
- BUG/MEDIUM: quic: fix unsuccessful handshakes on ncb_advance error
- BUG/MEDIUM: quic: fix memleak for out-of-order crypto data
- MINOR: quic: complete traces/debug for handshake
Amaury Denoyelle [Fri, 18 Nov 2022 14:24:08 +0000 (15:24 +0100)]
MINOR: quic: complete traces/debug for handshake
Add more traces to follow CRYPTO data buffering in ncbuf. Offset for
quic_enc_level is now reported for event QUIC_EV_CONN_PRHPKTS. Also
ncb_advance() must never fail so a BUG_ON() statement is here to
guarantee it.
This was useful to track handshake failure reported too often. This is
related to github issue #1903.
Amaury Denoyelle [Thu, 17 Nov 2022 09:12:52 +0000 (10:12 +0100)]
BUG/MEDIUM: quic: fix memleak for out-of-order crypto data
Liberate quic_enc_level ncbuf in quic_stream_free(). In most cases, this
will already be done when handshake is completed via
qc_treat_rx_crypto_frms(). However, if a connection is released before
handshake completion, a leak was present without this patch.
Under normal situation, this leak should have been limited due to the
majority of QUIC connection success on handshake. However, another bug
caused handshakes to fail too frequently, especially with chrome client.
This had the side-effect to dramatically increase this memory leak.
Amaury Denoyelle [Fri, 18 Nov 2022 13:50:06 +0000 (14:50 +0100)]
BUG/MEDIUM: quic: fix unsuccessful handshakes on ncb_advance error
QUIC handshakes were frequently in error due to haproxy misuse of
ncbuf. This resulted in one of the following scenario :
- handshake rejected with CONNECTION_CLOSE due to overlapping data
rejected
- CRYPTO data fully received by haproxy but handshake completion signal
not reported causing the client to emit PING repeatedly before timeout
This was produced because ncb_advance() result was not checked after
providing crypto data to the SSL stack in qc_provide_cdata(). However,
this operation can fail if a too small gap is formed. In the meantime,
quic_enc_level offset was always incremented. In some cases, this caused
next ncb_add() to report rejected overlapping data. In other cases, no
error was reported but SSL stack never received the end of CRYPTO data.
Change slightly the handling of new CRYPTO frames to avoid this bug :
when receiving a CRYPTO frame for the current offset, handle them
directly as previously done only if quic_enc_level ncbuf is empty. In
the other case, copy them to the buffer before treating contiguous data
via qc_treat_rx_crypto_frms().
This change ensures that ncb_advance() operation is now conducted only
in a data block : thus this is guaranteed to never fail.
This bug was easily reproduced with chromium as it fragments CRYPTO
frames randomly in several frames out of order.
This commit has two drawbacks :
- it is slightly less worst on performance because as sometimes even
data at the current offset will be memcpy
- if a client uses too many fragmented CRYPTO frames, this can cause
repeated ncb_add() error on gap size. This can be reproduced with
chrome, albeit with a slighly less frequent rate than the fixed issue.
This change should fix in part github issue #1903.
Since 0909f62266 ("BUG/MEDIUM: peers: messages about unkown tables not
correctly ignored"), the 'sc' variable is no longer used in
peer_treat_updatemsg() and peer_treat_definemsg() functions. So, we must
remove them to avoid compilation warning.
This patch must be backported with the commit above.
MEDIUM: thread: Restric nbthread/thread-group(s) to very first global sections
nbhread, thead-group and thread-groups directives must only be defined in
very first global sections. It means no other section must have been parsed
before. Indeed, some parts of the configuratio depends on the value of these
settings and it is undefined to change them after.
MINOR: cfgparse: Always check the section position
In diag mode, the section position is checked and a warning is emitted if a
global section is defined after any non-global one. Now, this check is
always performed. But the warning is still only emitted in diag mode. In
addition, the result of this check is now stored in a global variable, to be
used from anywhere.
The aim of this patch is to be able to restrict usage of some global
directives to the very first global sections. It will be useful to avoid
undefined behaviors. Indeed, some config parts may depend on global settings
and it is a problem if these settings are changed after.
Emeric Brun [Fri, 18 Nov 2022 13:52:54 +0000 (14:52 +0100)]
BUG/MEDIUM: peers: messages about unkown tables not correctly ignored
Table defintion's messages and update messages are not correctly
ignored if the table is not configured on the local peer.
It is a bug because, receiving those messages, the parser
returns an error and the upper layer considers that the state of the
peer's connection is modified (as it is done in the case of protocol
error) and switch immediatly the automate to process the new state.
But, even if message is silently ignored because the connection's
state doesn't change and we continue to process the next message, some
processing remains not performed: for instance the ALIVE flag is not set
on the peer's connection as it should be done after receiving any valid
messages. This results in a shutdown of the connection when timeout
is elapsed as if no message has been received during this delay.
This patch fix the behavior, those messages are now silently ignored
and the upper layer continue the processing as it is done for any valid
messages.
This bug appears with the code re-work of the peers on 2.0 so
it should be backported until this version.
BUG/MINOR: ssl: don't initialize the keylog callback when not required
The registering of the keylog callback seems to provoke a loss of
performance. Disable the registration as well as the fetches if
tune.ssl.keylog is off.
BUG/MEDIUM: raw-sock: Don't report connection error if something was received
In raw_sock_to_buf(), if a low-level error is reported, we no longer
immediately set an error on the connexion if something was received. This
may happen when a RST is received with data. This way, we let a chance to
the mux to process received data first instead of immediately aborting.
This patch should fix some spurious health-check failures. It is pretty hard
to observe, but with a server immediately returning the response followed by
a RST, without waiting the request, it is possible to have some health-check
errors. For instance, with the following tcploop server:
tcploop 8000 L Q W N1 A S:"HTTP/1.0 200 OK\r\n\r\n" F K
( Accept -> send response -> FIN -> Close)
The response was received, but it is ignored because an error was reported
too. The error handling must be refactored. But it a titanic stain. Thus,
for now, a good fix is to delay the error report when something was
received. The error will be reported on the next receive, if any.
This patch should fix the issue #1863, but it must be confirmed. At least it
fixes the above example. It must be backported to 2.6. For older versions,
it must be evaluated first.
BUG/MINOR: http_ana/txn: don't re-initialize txn and req var lists
In http_create_txn(): vars_init_head() was performed on both s->vars_txn
and s->var_reqres lists.
But this is wrong, these two lists are already initialized upon stream
creation in stream_new().
Moreover, between stream_new() and http_create_txn(), some variable may
be defined (e.g.: by the frontend), resulting in lists not being empty.
Because of this "extra" list initialization, already defined variables
can be lost.
This causes txn dependant code not being able to access previously defined
variables as well as memory leak because http_destroy_txn() relies on these
lists to perform the purge.
This proved to be the case when a frontend sets variables and lua sample
fetch is used in backend section as described in GH #1935.
Many thanks to Darragh O'Toole for his detailed report.
Removing extra var_init_head (x2) in http_create_txn() to fix the issue.
Adding somme comments in the code in an attempt to prevent future misuses
of s->var_reqres, and s->var_txn lists.
It should be backported in every stable version.
(This is an old bug that seems to exist since 1.6-dev6)
[cf: On 2.0 and 1.8, for the legacy HTTP code, vars_init() are used during
the TXN cleanup, when the stream is reused. So, these calls must be
moved from http_init_txn() to http_reset_txn() and not removed.]
Willy Tarreau [Thu, 17 Nov 2022 15:14:23 +0000 (16:14 +0100)]
DOC: internal: commit notes about polling states and flags on connect()
Let's keep these notes as references for later use. Polling on connect()
can sometimes return a few unexpected state combinations that such tests
illustrate. They can serve as reminders for special error handling.
BUG/MINOR: mux-h1: Fix error handling when H1S allocation failed on client side
The goto label is not at the right place. When H1S allocation failed, the
error is immediately handled. Thus, "no_parsing" label must be set just
after h1_send() call to skip the request parsing part.
CLEANUP: listener: Remove useless task_queue from manage_global_listener_queue
At the end of manage_global_listener_queue(), the task expire date is set to
TICK_ETERNITY. Thus, it is useless to call task_queue() just after because
the function does nothing in this case.
BUG/MEDIUM: listener: Fix race condition when updating the global mngmt task
It is pretty similar to fbb934da90 ("BUG/MEDIUM: stick-table: fix a race
condition when updating the expiration task"). When the global management
task is running, at the end of its process function, it resets the expire
date by setting it to TICK_ETERNITY. In same time, a listener may reach a
global limit and decides to schedule the task. Thus it is possible to queue
the task and trigger the BUG_ON() on the expire date because its value was
set to TICK_ETERNITY in the means time:
To fix the bug, a RW lock is introduced. It is used to fix the race
condition. A read lock is taken when the task is scheduled, in
listener_accpet() and a write lock is used at the end of process function to
set the expire date to TICK_ETERNITY. This lock should not be used very
often and most of time by "readers". So, the impact should be really
limited.
This patch should fix the issue #1930. It must be backported as far as 1.8
with some cautions because the code has evolved a lot since then.
MINOR: mux-h1: Rely on a H1S flag to know a WS key was found or not
h1_process_mux() is written to allow partial headers formatting. For now,
all headers are forwarded in one time. But it is still good to keep this
ability at the H1 mux level. So we must rely on a H1S flag instead of a
local variable to know a WebSocket key was found in headers to be able to
generate a key if necessary.
MINOR: sconn: Set SE_FL_ERROR only when there is no more data to read
SE_FL_ERR_PENDING flag is used when there is still data to be read. So we
must take care to not set SE_FL_ERROR too early. Thus, on sending path, it
must be set if SE_FL_EOS was already set.
MEDIUM: mux-fcgi: Introduce flags to deal with connection read/write errors
Similarly to the H1 and H2 multiplexers, FCFI_CF_ERR_PENDING is now used to
report an error when we try to send data and FCGI_CF_ERROR to report an
error when we try to read data. In other funcions, we rely on these flags
instead of connection ones. Only FCGI_CF_ERROR is considered as a final
error. FCGI_CF_ERR_PENDING does not block receive attempt.
In addition, FCGI_CF_EOS flag was added. we rely on it to test if a read0
was received or not.
CLEANUP: mux-h2: Remove unused fields in h2c structures
Some fields in h2c structures are not used: .mfl, .mft and .mff. Just remove
them.
.msi field is also removed. It is tested but never set, except when a H2
connection is initialized. It also means h2c_mux_busy() function is useless
because it always returns 0 (.msi is always -1). And thus, by transitivity,
H2_CF_DEM_MBUSY is also useless because it is never set. So .msi field,
h2c_mux_busy() function and H2C_MUX_BUSY flag are removed.
MEDIUM: mux-h2: Introduce flags to deal with connection read/write errors
Similarly to the H1 multiplexer, H2_CF_ERR_PENDING is now used to report an
error when we try to send data and H2_CF_ERROR to report an error when we
try to read data. In other funcions, we rely on these flags instead of
connection ones. Only H2_CF_ERROR is considered as a final error.
H2_CF_ERR_PENDING does not block receive attempt.
In addition, we rely on H2_CF_RCVD_SHUT flag to test if a read0 was received
or not.
MEDIUM: mux-pt: Don't always set a final error on SE on the sending path
SE_FL_ERROR must be set on the SE descriptor only if EOS was already
reported. So call se_fl_set_error() function to properly the
ERR_PENDING/ERROR flags. It is not really a bug because the mux-pt is really
simple. But it is better to do it now the right way.
MEDIUM: mux-h1: Don't report a final error whe a message is aborted
When the H1 connection is aborted, we no longer set a final error. To do so,
the flag H1C_F_ABORTED was added. For now, it is only set when a error is
detected on the H1 stream. Idea is to use ERR_PENDING/ERROR for upgoing
errors and ABRT_PENDING/ABRTED for downgoing errors.
MEDIUM: mux-h1: Rely on the H1C to deal with shutdown for reads
read0 is now handled with a H1 connection flag (H1C_F_EOS). Corresponding
flag was removed on the H1 stream and we fully rely on the SE descriptor at
the stream level.
Concretly, it means we rely on the H1 connection flags instead of the
connection one. H1C_F_EOS is only set in h1_recv() or h1_rcv_pipe() after a
read if a read0 was detected.
MINOR: mux-h1: Add flag on H1 stream to deal with internal errors
A new error is added on H1 stream to deal with internal errors. For now,
this error is only reported when we fail to create a stream-connector. This
way, the error is reported at the H1 stream level and not the H1 connection
level.
CLEANUP: mux-h1: Rename H1C_F_ERR_PENDING into H1C_F_ABRT_PENDING
H1C_F_ERR_PENDING flags will be used to refactor error handling at the H1
connection level. It will be used to notify error during sends. Thus, the
flag to notify an error must be sent before closing the connection is now
named H1C_F_ABRT_PENDING.
This introduce a naming convertion: ERROR must be used to notify upper layer
of an event at the lower ones while ABORT must be used in the opposite
direction.
MINOR: mux-h1: Don't handle subscribe for reads in h1_process_demux()
When the request headers are not fully received, we must subscribe the H1
connection for reads to be able to receive more data. This was performed in
h1_process_demux(). It is now perfoemd in h1_process_demux().
MEDIUM: mux-h1: Handle H1C states via its state field instead of H1C_F_ST_*
The H1 connection state is now handled in a dedicated state. H1C_F_ST_*
flags are removed. All states are now exclusives. It is easier to know the
H1 connection states. It is alive, or usable, if it is not CLOSING or
CLOSED. It is CLOSING if it should be closed ASAP but a stream is still
attached and/or the output buffer is not empty. CLOSED is used when the H1
connection is ready to be closed. Other states are quite easy to understand.
There is no special changes in the H1 connection behavior. Except in
h1_send(). When a CLOSING connection is CLOSED, the function now reports an
activity. In addition, when an embryonic H1 stream is aborted, it is
destroyed. This way, the H1 connection can be switched to CLOSED state.
MINOR: mux-h1: Add a dedicated enum to deal with H1 connection state
The H1 connection state will be handled is a dedicated field. To do so,
h1_cs enum was added. The different states are more or less equivalent to
H1C_F_ST_* flags:
In addition, in this patch, the h1_is_alive() and h1_close() function are
added. The first one will be used to know if a H1 connection is alive or
not. The second one will be used to set the connection in CLOSING or CLOSED
state, depending on the output buffer state and if there is still a H1
stream or not.
Fields in H1C structure are reorganised to not have the output buffer
straddled between to cache lines. There is 4-bytes hole after the flags, but
it will be partially filled by an enum representing the H1 connection state.
For now, at the transport-layer level, there is no shutr callback (in
xprt_ops). Thus, to ease the aborts refacotring, the code is removed from
the h1_shutr() function. The callback is not removed for now. It is only
kept to have a trace message. It may be handy for debugging sessions.
Willy Tarreau [Thu, 17 Nov 2022 10:08:03 +0000 (11:08 +0100)]
BUG/MINOR: pool/cli: use ullong to report total pool usage in bytes
As noticed by Gabriel Tzagkarakis in issue #1903, the total pool size
in bytes is historically still in 32 bits, but at least we should report
the product of the number of objects and their size in 64 bits so that
the value doesn't wrap around 4G.