Willy Tarreau [Fri, 20 Nov 2020 16:22:44 +0000 (17:22 +0100)]
BUG/MAJOR: connection: reset conn->owner when detaching from session list
Baptiste reported a new crash affecting 2.3 which can be triggered
when using H2 on the backend, with http-reuse always and with a tens
of clients doing close only. There are a few combined cases which cause
this to happen, but each time the issue is the same, an already freed
session is dereferenced in session_unown_conn().
Two cases were identified to cause this:
- a connection referencing a session as its owner, which is detached
from the session's list and is destroyed after this session ends.
The test on conn->owner before calling session_unown_conn() is not
sufficent as the pointer is not null but is not valid anymore.
- a connection that never goes idle and that gets killed form the
mux, where session_free() is called first, then conn_free() calls
session_unown_conn() which scans the just freed session for older
connections. This one is only triggered with DEBUG_UAF
The reason for this session to be present here is that it's needed during
the connection setup, to be passed to conn_install_mux_be() to mux->init()
as the owning session, but it's never deleted aftrewards. Furthermore, even
conn_session_free() doesn't delete this pointer after freeing the session
that lies there. Both do definitely result in a use-after-free that's more
easily triggered under DEBUG_UAF.
This patch makes sure that the owner is always deleted after detaching
or killing the session. However it is currently not possible to clear
the owner right after a synchronous init because the proxy protocol
apparently needs it (a reg test checks this), and if we leave it past
the connection setup with the session not attached anywhere, it's hard
to catch the right moment to detach it. This means that the session may
remain in conn->owner as long as the connection has never been added to
nor removed from the session's idle list. Given that this patch needs to
remain simple enough to be backported, instead it adds a workaround in
session_unown_conn() to detect that the element is already not attached
anywhere.
This fix absolutely requires previous patch "CLEANUP: connection: do not
use conn->owner when the session is known" otherwise the situation will
be even worse, as some places used to rely on conn->owner instead of the
session.
The fix could theorically be backported as far as 1.8. However, the code
in this area has significantly changed along versions and there are more
risks of breaking working stuff than fixing real issues there. The issue
was really woken up in two steps during 2.3-dev when slightly reworking
the idle conns with commit 08016ab82 ("MEDIUM: connection: Add private
connections synchronously in session server list") and when adding
support for storing used H2 connections in the session and adding the
necessary call to session_unown_conn() in the muxes. But the same test
managed to crash 2.2 when built in DEBUG_UAF and patched like this,
proving that we used to already leave dangling pointers behind us:
It's uncertain whether an existing code path there can lead to dereferencing
conn->owner when it's bad, though certain suspicious memory corruption bugs
make one think it's a likely candidate. The patch should not be hard to
adapt there.
Backports to 2.1 and older are left to the appreciation of the person
doing the backport.
A reproducer consists in this:
global
nbthread 1
listen l
bind :9000
mode http
http-reuse always
server s 127.0.0.1:8999 proto h2
frontend f
bind :8999 proto h2
mode http
http-request return status 200
Then this will make it crash within 2-3 seconds:
$ h1load -e -r 1 -c 10 http://0:9000/
If it does not, it might be that DEBUG_UAF was not used (it's harder then)
and it might be useful to restart.
Willy Tarreau [Fri, 20 Nov 2020 16:08:15 +0000 (17:08 +0100)]
CLEANUP: connection: do not use conn->owner when the session is known
At a few places we used to rely on conn->owner to retrieve the session
while the session is already known. This is not correct because at some
of these points the reason the connection's owner was still the session
(instead of NULL) is a mistake. At one place a comparison is even made
between the session and conn->owner assuming it's valid without checking
if it's NULL. Let's clean this up to use the session all the time.
Note that this will be needed for a forthcoming fix and will have to be
backported.
Joao Morais [Sat, 21 Nov 2020 10:42:20 +0000 (07:42 -0300)]
DOC: clarify how to create a fallback crt
HAProxy uses CN and SAN of the certificates to match incoming SNI, and
use the matching certificate in the TLS handshake. `crt-list` goes
further and allows to configure SNI filters to explicitly define the
FQDNs that should match a certificate.
The first declared certificate of the `crt-list` option follows the same
rules, and it's also used as a fallback - the certificate that should be
used if SNI isn't provided or the provided one cannot match any
certificate or SNI filter. If a provided SNI matches the CN or SAN of
the first certificate, the first certificate would be used even if a
matching SNI filter is declared later.
This change clarifies this scenario and documents a filter that can be
used to convert the first declared certificate as a proper fallback.
Should be merged as far as the first SNI filter implementation.
Tim Duesterhus [Thu, 19 Nov 2020 23:16:01 +0000 (00:16 +0100)]
CI: Clean up Windows CI
This patch cleans up the Windows CI to look more similar to the refactored
Linux CI on GitHub Actions.
It switches the environment set-up from some manual cygwin setup via choco to
the msys2/setup-msys2@v2 action which just works and allows later steps to look
like any others without need to manually specify the shell.
This new setup is much faster than before where a single Windows build required
more than 10 minutes with more than 5 minutes just spent setting up the
environment and more than 6 minutes compiling HAProxy.
With this patch the setting of of the environment is done in less than a minute
and HAProxy is compiled in less than 2 minutes.
The only drawback is that Lua does not appear to be readily available. I expect
this to be acceptable and that the benefits far outweight this small drawback.
Ilya Shipitsin [Sat, 21 Nov 2020 09:37:34 +0000 (14:37 +0500)]
BUILD: SSL: guard TLS13 ciphersuites with HAVE_SSL_CTX_SET_CIPHERSUITES
HAVE_SSL_CTX_SET_CIPHERSUITES is newly defined macro set in openssl-compat.h,
which helps to identify ssl libs (currently OpenSSL-1.1.1 only) that supports
TLS13 cipersuites manipulation on TLS13 context
BUG/MEDIUM: ssl/crt-list: bundle support broken in crt-list
In issue #970 it was reported that the bundle loading does not work
anymore with crt-list.
This bug was introduced by 47da821 ("MEDIUM: ssl: emulates the
multi-cert bundles in the crtlist") which incorrectly uses "path"
instead of "crt_path" in the name resolution.
BUG/MEDIUM: http-ana: Don't eval http-after-response ruleset on empty messages
It is not possible on response comming from a server, but an errorfile may be
empty. In this case, the http-after-response ruleset must not be evaluated
because it is totally unexpected to manipulate headers on an empty HTX message.
This patch must be backported everywhere the http-after-response rules are
supported, i.e as far as 2.2.
BUG/MINOR: ssl: segv on startup when AKID but no keyid
In bug #959 it was reported that haproxy segfault on startup when trying
to load a certifcate which use the X509v3 AKID extension but without the
keyid field.
This field is not mandatory and could be replaced by the serial or the
DirName.
Willy Tarreau [Wed, 18 Nov 2020 18:53:45 +0000 (19:53 +0100)]
DOC: coding-style: update a few rules about pointers
It's really annoying to see that in 2020 we're still facing bugs caused
by dangling pointers in the code that result from poorly written rules
about how these pointers are supposed to be handled, set and reset. Let's
add a few supposedly obvious (but apparently not) rules about how pointers
have to be used through out the code in hope to make such bad practices
disappear (or at least have something to point the authors to after
reviewing their code).
William Dauchy [Sat, 14 Nov 2020 18:25:33 +0000 (19:25 +0100)]
MEDIUM: cli/ssl: configure ssl on server at runtime
in the context of a progressive backend migration, we want to be able to
activate SSL on outgoing connections to the server at runtime without
reloading.
This patch adds a `set server ssl` command; in order to allow that:
- add `srv_use_ssl` to `show servers state` command for compatibility,
also update associated parsing
- when using default-server ssl setting, and `no-ssl` on server line,
init SSL ctx without activating it
- when triggering ssl API, de/activate SSL connections as requested
- clean ongoing connections as it is done for addr/port changes, without
checking prior server state
example config:
backend be_foo
default-server ssl
server srv0 127.0.0.1:6011 weight 1 no-ssl
Tim Duesterhus [Fri, 13 Nov 2020 19:15:53 +0000 (20:15 +0100)]
CI: Make the h2spec workflow more consistent with the VTest workflow
This PR aims to make the workflow more consistent, by reusing the same wording
for the step names and the same commands to make it look like the vtest
workflow as much as possible.
It was renamed to compliance.yml to match the human readable name better. This
also allows to extend the workflow with other compliance tools later on, nicely
grouping those jobs together in a single file.
William Dauchy [Sun, 15 Nov 2020 13:04:43 +0000 (14:04 +0100)]
REGTESTS: converter: add url_dec test
while looking at `url_dec` implementation I realised there was not yet a
simple test to avoid future regressions.
This one is testing simple case, including the "+" behaviour depending
on the argument passed to `url_dec`
Willy Tarreau [Tue, 17 Nov 2020 10:45:10 +0000 (11:45 +0100)]
REGTESTS: mark the abns test as broken again
There is apparently a race in this test that would require relying on
haproxy's output to make it reliably work, but currently vtest doesn't
have this option. Let's mark it broken again to avoid polluting the CI.
REGTESTS: Add a script to test the random forwarding with several filters
Three filters are used. The compression filter is enclose by two trace filters,
both with the random forwarding enabled. An HTTP test is performed but also a
TCP test using an HTTP tunnel.
BUG/MINOR: http-ana: Don't wait for the body of CONNECT requests
CONNECT requests are bodyless messages but with no EOM blocks. Thus, conditions
to stop waiting for the message payload are not suited to this kind of
messages. Indeed, the message finishes on an EOH block. But the tunnel mode at
the stream level is only set in HTTP_XFER_BODY analyser. So, the stream is
blocked, waiting for a body that does not exist till a timeout expires.
To fix this bug, we just stop waiting for a body for CONNECT requests. Another
solution is to rely on HTX_SL_F_BODYLESS/HTTP_MSGF_BODYLESS flags. But this one
is less intrusive.
This message must be backported as far as 2.0. For the 2.0, only the HTX part
must be fixed.
BUG/MEDIUM: filters: Forward all filtered data at the end of http filtering
When http filtering ends, if there are some filtered data not forwarded yet, we
forward them, in flt_http_end(). Most of time, this doesn't happen, except when
a tunnel is established using a CONNECT. In this case, there is not EOM on the
request and there is no body. Thus the headers are never forwarded, blocking the
stream.
This patch must be backported as far as 2.0. Prior versions don't suffer of this
bug because there is no HTX support. On the 2.0, the change is only applicable
on HTX streams. A special test must be performed to make sure.
CLEANUP: config: Return ERR_NONE from config callbacks instead of 0
Return ERR_NONE instead of 0 on success for all config callbacks that should
return ERR_* codes. There is no change because ERR_NONE is a macro equals to
0. But this makes the return value more explicit.
MINOR: init: Fix the prototype for per-thread free callbacks
Functions registered to release memory per-thread have no return value. But the
registering function and the function pointer in per_thread_free_fct structure
specify it should return an integer. This patch fixes it.
BUG/MINOR: tcpcheck: Don't warn on unused rules if check option is after
When tcp-check or http-check rules are used, if the corresponding check option
(option tcp-check and option httpchk) is declared after the ruleset, a warning
is emitted about an unused check ruleset while there is no problem in reality.
MINOR: spoe: Don't close connection in sync mode on processing timeout
In sync mode, if an applet receives a ack while the processing delay has already
expired, there is not frame waiting for this ack. But there is no reason to
close the connection in this case. The ack may be ignored and the connection may
be reused to process another frame. The only reason to trigger an error and
close the connection is when the wrong ack is received while there is still a
frame waiting for its ack. In sync mode, this should never happen.
This patch may be backported in all versions supporting the SPOE.
BUG/MAJOR: spoe: Be sure to remove all references on a released spoe applet
When a SPOE applet is used to send a frame, a reference on this applet is saved
in the spoe context of the offladed stream. But, if the applet is released
before receving the corresponding ack, we must be sure to remove this
reference. This was performed for fragmented frames only. But it must also be
performed for a spoe contexts in the applet waiting_queue and in the thread
waiting_queue (used in async mode).
This bug leads to a memory corruption when an offloaded stream try to update the
state of a released applet because it still have a reference on it. There are
many ways to trigger this bug. The easiest is probably during reloads. On the
old process, all applets are woken up to be released ASAP.
Many thanks to Maciej Zdeb to report the bug and to work on it for 2
months. Without his help, it would have been much more difficult to fix the
bug. It is always a huge pleasure to see how some users are enthousiast and
helpful. Thanks again Maciej !
This patch must be backported to all versions where the spoe is supported (>=
1.7).
BUG/MINOR: http-htx: Handle warnings when parsing http-error and http-errors
First of all, this patch is tagged as a bug. But in fact, it only fixes a bug in
the 2.2. On the 2.3 and above, it only add the ability to display warnings, when
an http-error directive is parsed from a proxy section and when an errorfile
directive is parsed from a http-errors section.
But on the 2.2, it make sure to display the warning emitted on a content-length
mismatch when an errorfile is parsed. The following is only applicable to the
2.2.
commit "BUG/MINOR: http-htx: Just warn if payload of an errorfile doesn't match
the C-L" (which is only present in 2.2, 2.1 and 2.0 trees, i.e see commit 7bf3d81d3cf4b9f4587 in 2.2 tree), is changing the behavior of `http_str_to_htx`
function. It may now emit warnings. And, it is the caller responsibility to
display it.
But the warning is missing when an 'http-error' directive is parsed from
a proxy section. It is also missing when an 'errorfile' directive is
parsed from a http-errors section.
This bug only exists on the 2.2. On earlier versions, these directives
are not supported and on later ones, an error is triggered instead of a
warning.
Amaury Denoyelle [Fri, 13 Nov 2020 11:34:58 +0000 (12:34 +0100)]
MINOR: check: report error on incompatible connect proto
Report an error when using an explicit proto for a connect rule with
non-compatible mode in regards with the selected check type (tcp-check
vs http-check).
Amaury Denoyelle [Fri, 13 Nov 2020 11:34:57 +0000 (12:34 +0100)]
MINOR: check: report error on incompatible proto
If the check mux has been explicitly defined but is incompatible with
the selected check type (tcp-check vs http-check), report a warning and
prevent haproxy startup.
Amaury Denoyelle [Fri, 13 Nov 2020 11:34:56 +0000 (12:34 +0100)]
BUG/MEDIUM: check: reuse srv proto only if using same mode
Only reuse the mux from server if the check is using the same mode.
For example, this prevents a tcp-check on a h2 server to select the h2
multiplexer instead of passthrough.
This bug was introduced by the following commit :
BUG/MEDIUM: checks: Use the mux protocol specified on the server line
It must be backported up to 2.2.
BUG/MINOR: http-fetch: Fix calls w/o parentheses of the cookie sample fetches
req.cook, req.cook_val, req.cook_cnt and and their response counterparts may be
called without cookie name. In this case, empty parentheses may be used, or no
parentheses at all. In both, the result must be the same. But only the first one
works. The second one always returns a failure. This patch fixes this bug.
Note that on old versions (< 2.2), both cases fail.
This patch must be backported in all stable versions.
Maciej Zdeb [Fri, 13 Nov 2020 09:38:06 +0000 (09:38 +0000)]
BUG/MINOR: http-fetch: Extract cookie value even when no cookie name
HTTP sample fetches dealing with the cookies (req/res.cook,
req/res.cook_val and req/res.cook_cnt) must be prepared to be called
without cookie name. For the first two, the first cookie value is
returned, regardless its name. For the last one, all cookies are counted.
To do so, http_extract_cookie_value() may now be called with no cookie
name (cookie_name_l set to 0). In this case, the matching on the cookie
name is ignored and the first value found is returned.
Note this patch also fixes matching on cookie values in ACLs.
Willy Tarreau [Fri, 13 Nov 2020 13:10:20 +0000 (14:10 +0100)]
BUG/MEDIUM: peers: fix decoding of multi-byte length in stick-table messages
There is a bug in peer_recv_msg() due to an incorrect cast when trying
to decode the varint length of a stick-table message, causing lengths
comprised between 128 and 255 to consume one extra byte, ending in
protocol errors. The root cause of this is that peer_recv_msg() tries
hard to reimplement all the parsing and control that is already done in
intdecode() just to measure the length before calling it. And it got it
wrong.
Let's just get rid of this unneeded code duplication and solely rely on
intdecode() instead. The bug was introduced in 2.0 as part of a cleanup
pass on this code with commit 95203f218 ("MINOR: peers: Move high level
receive code to reduce the size of I/O handler."), so this patch must
be backported to 2.0.
The TX part of a cache for a dictionary is made of an reserved array of ebtree nodes
which are pointers to dictionary entries. So when we flush the TX part of such a
cache, we must not only remove these nodes to dictionary entries from their ebtree.
We must also reset their values. Furthermore, the LRU key and the last lookup
result must also be reset.
BUG/MINOR: peers: Do not ignore a protocol error for dictionary entries.
If we could not decode the ID of a dictionary entry from a peer update message,
we must inform the remote peer about such an error as this is done for
any other decoding error.
Amaury Denoyelle [Tue, 10 Nov 2020 13:24:31 +0000 (14:24 +0100)]
BUG/MEDIUM: stats: prevent crash if counters not alloc with dummy one
Define a per-thread counters allocated with the greatest size of any
stat module counters. This variable is named trash_counters.
When using a proxy without allocated counters, return the trash counters
from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent
segfault.
This is useful for all the proxies used internally and not
belonging to the global proxy list. As these objects does not appears on
the stat report, it does not matter to use the dummy counters.
For this fix to be functional, the extra counters are explicitly
initialized to NULL on proxy/server/listener init functions.
Most notably, the crash has already been detected with the following
vtc:
- reg-tests/lua/txn_get_priv.vtc
- reg-tests/peers/tls_basic_sync.vtc
- reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc
There is probably other parts that may be impacted (SPOE for example).
This bug was introduced in the current release and do not need to be
backported. The faulty commits are
"MINOR: ssl: count client hello for stats" and
"MINOR: ssl: add counters for ssl sessions".
Do not cache responses that do not have an explicit expiration time
(s-maxage or max-age Cache-Control directives or Expires header) or a
validator (ETag or Last-Modified headers) anymore, as suggested in
RFC 7234#3.
The TX_FLAG_IGNORE flag is used instead of the TX_FLAG_CACHEABLE so as
not to change the behavior of the checkcache option.
Thierry Fournier [Tue, 10 Nov 2020 19:38:20 +0000 (20:38 +0100)]
BUG/MINOR: lua: set buffer size during map lookups
This size is used by some pattern matching to determine if there
is sufficient room in the buffer to add final \0 if necessary.
If the size is not set, the conditions use uninitialized value.
Thierry Fournier [Tue, 10 Nov 2020 19:51:36 +0000 (20:51 +0100)]
BUG/MINOR: pattern: a sample marked as const could be written
The functions add final 0 to string if the final 0 is not set,
but don't check the flag CONST. This patch duplicates the strings
if the final zero is not set and the string is CONST.
Tim Duesterhus [Tue, 28 Jul 2020 21:00:35 +0000 (23:00 +0200)]
CI: Expand use of GitHub Actions for CI
Travis is becoming overall increasingly unreliable lately. We've already
seen that the timing sensitive tests regularly fail and thus they were
disabled.
Additionally they recently announced a new pricing model that caps the number
of minutes for Open Source projects:
https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing
GitHub Actions VMs are working well, possibly allowing to use custom runners
for special tasks in the future.
In addition to this better performance its workflow configuration language
is more expressive compared to the Travis CI one. Specifically the build
matrix does not need to be specified in YAML. Instead it can be generated
ad-hoc using a script. This allows us to cleanly define the various build
configurations without having an unreadable 80 line mess where the flags
are inconsistently activated. As an example in the current Travis CI
configuration the prometheus exporter is tested together with LibreSSL 2.9.2
for whatever reason.
In addition to all the previous points the UI of Travis is not that nice.
On GitHub you are just seeing that "Travis failed" without any details which
exact job failed. This requires you to visit the slow Travis page and look
up the details there. GitHub Actions creates a single entry for each
configuration that is tested, allowing you to see the details without needing
to leave GitHub.
This new GitHub Actions workflow aims to reproduce the configurations tested
in Travis. It comes close, but is not completely there, yet. Consider this
patch a proof of concept that will evolve in the future, ideally with Ilya's
expertise.
The current configurations are as follows. Each one is tested with both gcc
and clang.
- All features disabled (no USE flags)
- All features enabled (all USE flags)
- Standalone test of each of the supported compression libraries:
- USE_ZLIB=1
- USE_SLZ=1
- Standalone test of various SSL libraries:
- stock (the SSL installed by default on the VM)
- OpenSSL 1.0.2u
- LibreSSL 2.9.2, 3.0.2, 3.1.1
- All features enabled with ASAN (clang only)
Future additions of new tests should take care to not test unrelated stuff.
Instead a distinct configuration should be added.
Additionally there is a Mac OS test with clang and all features disabled.
Known issues:
- Apparently the git commit is not properly detected during build. The HEAD
currently shows as 2.4-dev0.
BUG/MEDIUM: ssl/crt-list: correctly insert crt-list line if crt already loaded
In issue #940, it was reported that the crt-list does not work correctly
anymore. Indeed when inserting a crt-list line which use a certificate
previously seen in the crt-list, this one won't be inserted in the SNI
list and will be silently ignored.
This bug was introduced by commit 47da821 "MEDIUM: ssl: emulates the
multi-cert bundles in the crtlist".
This patch also includes a reg-test which tests this issue.
REGTEST: ssl: test wildcard and multi-type + exclusions
This test checks that the bug #818 and #810 are fixed.
It test if there is no inconsistency with multiple certificate types and
that the exclusion of the certificate is correctly working with a negative
filter.
Willy Tarreau [Fri, 6 Nov 2020 13:24:02 +0000 (14:24 +0100)]
BUILD: http-htx: fix build warning regarding long type in printf
Commit a66adf41e ("MINOR: http-htx: Add understandable errors for the
errorfiles parsing") added a warning when loading malformed error files,
but this warning may trigger another build warning due to the %lu format
used. Let's simply cast it for output since it's just used for end user
output.
This must be backported to 2.0 like the commit above.
Willy Tarreau [Fri, 6 Nov 2020 12:19:18 +0000 (13:19 +0100)]
BUILD: ssl: silence build warning on uninitialised counters
Since commit d0447a7c3 ("MINOR: ssl: add counters for ssl sessions"),
gcc 9+ complains about this:
CC src/ssl_sock.o
src/ssl_sock.c: In function 'ssl_sock_io_cb':
src/ssl_sock.c:5416:3: warning: 'counters_px' may be used uninitialized in this function [-Wmaybe-uninitialized]
5416 | ++counters_px->reused_sess;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:5133:23: note: 'counters_px' was declared here
5133 | struct ssl_counters *counters, *counters_px;
| ^~~~~~~~~~~
Either a listener or a server are expected there, so ther counters are
always initialized and the compiler cannot know this. Let's preset
them and test before updating the counter, we're not in a hot path
here.
Willy Tarreau [Fri, 6 Nov 2020 11:07:31 +0000 (12:07 +0100)]
MINOR: server: remove idle lock in srv_cleanup_connections
This function used to grab the idle lock when scanning the threads for
idle connections, but it doesn't need it since the lock only protects
the tree. Let's remove it.
There is a typo on the ssl_c_chain_der sample fetch
(s/ssl_c_der_chain/ssl_c_chain_der/). This implies a move of the fetch to keep
it at the right place.
This should be backported as far as 2.2 or anywhere the commit a598b500b
("MINOR: ssl: add ssl_{c,s}_chain_der fetch methods") is.
MINOR: http-htx: Add understandable errors for the errorfiles parsing
No details are provided when an error occurs during the parsing of an errorfile,
Thus it is a bit hard to diagnose where the problem is. Now, when it happens, an
understandable error message is reported.
This patch is not a bug fix in itself. But it will be required to change an
fatal error into a warning in last stable releases. Thus it must be backported
as far as 2.0.
The default dh_param value is 2048 and it's preset to zero unless explicitly
set, so we must not report a warning about DH param not being loadble in 1024
bits when we're going to use 2048. Thanks to Dinko for reporting this.
CLEANUP: cfgparse: remove duplicate registration for transparent build options
Since commit 37bafdcbb ("MINOR: sock_inet: move the IPv4/v6 transparent mode code
to sock_inet"), build options for transparent proxying are registered twice.
This patch removes the older one.
Willy Tarreau [Tue, 3 Nov 2020 13:50:29 +0000 (14:50 +0100)]
MEDIUM: pattern: turn the pattern chaining to single-linked list
It does not require heavy deletion from the expr anymore, so we can now
turn this to a single-linked list since most of the time we want to delete
all instances of a given pattern from the head. By doing so we save 32 bytes
of memory per pattern. The pat_unlink_from_head() function was adjusted
accordingly.
Willy Tarreau [Tue, 3 Nov 2020 10:22:04 +0000 (11:22 +0100)]
MINOR: pattern: prepare removal of a pattern from the list head
Instead of using LIST_DEL() on the pattern itself inside an expression,
we look it up from its head. The goal is to get rid of the double-linked
list while this usage remains exclusively for freeing on startup error!
Willy Tarreau [Tue, 3 Nov 2020 12:36:58 +0000 (13:36 +0100)]
MINOR: pattern: during reload, delete elements frem the ref, not the expression
Instead of scanning all elements from the expression and using the
slow delete path there, let's use the faster way which involves
pat_delete_gen() while the elements are detached from ther reference.
Willy Tarreau [Tue, 3 Nov 2020 09:37:31 +0000 (10:37 +0100)]
MEDIUM: pattern: make pat_ref_prune() rely on pat_ref_purge_older()
When purging all of a reference, it's much more efficient to scan the
reference patterns from the reference head and delete all derivative
patterns than to scan the expressions. The only thing is that we need
to proceed both for the current and next generations, in case there is
a huge gap between the two. With this, purging 20M IP addresses in small
batches of 100 takes roughly 3 seconds.
Willy Tarreau [Wed, 28 Oct 2020 17:23:49 +0000 (18:23 +0100)]
MINOR: pattern: add pat_ref_purge_older() to purge old entries
This function will be usable to purge at most a specified number of old
entries from a reference. Entries are declared old if their generation
number is in the past compared to the one passed in argument. This will
ease removal of early entries when new ones have been appended.
We also call malloc_trim() when available, at the end of the series,
because this is one place where there is a lot of memory to save. Reloads
of 1M IP addresses used in an ACL made the process grow up to 1.7 GB RSS
after 10 reloads and roughly stabilize there without this call, versus
only 260 MB when the call is present. Sadly there is no direct equivalent
for jemalloc, which stabilizes around 800MB-1GB.
Willy Tarreau [Thu, 29 Oct 2020 08:21:43 +0000 (09:21 +0100)]
MINOR: pattern: implement pat_ref_load() to load a pattern at a given generation
pat_ref_load() basically combines pat_ref_append() and pat_ref_commit().
It's very similar to pat_ref_add() except that it also allows to set the
generation ID and the line number. pat_ref_add() was modified to directly
rely on it to avoid code duplication. Note that a previous declaration
of pat_ref_load() was removed as it was just a leftover of an earlier
incarnation of something possibly similar, so no existing functionality
was changed here.
Willy Tarreau [Wed, 28 Oct 2020 17:45:45 +0000 (18:45 +0100)]
MINOR: pattern: add pat_ref_commit() to commit a previously inserted element
This function will be used after a successful pat_ref_append() to propagate
the pattern to all use places (including parsing and indexing). On failure,
it will entirely roll back all insertions and free the pattern itself. It
also preserves the generation number so that it is convenient for use in
association with pat_ref_append(). pat_ref_add() was modified to rely on
it instead of open-coding the insertion and roll-back.
Willy Tarreau [Thu, 29 Oct 2020 08:41:34 +0000 (09:41 +0100)]
MEDIUM: pattern: only match patterns that match the current generation
Instead of matching any pattern found in the tree, only match those
matching the current generation of entries. This will make sure that
reloads are atomic, regardless of the time they take to complete, and
that newly added data are not matched until the whole reference is
committed. For consistency we proceed the same way on "show map" and
"show acl".
This will have no impact for now since generations are not used.
Willy Tarreau [Wed, 28 Oct 2020 10:43:49 +0000 (11:43 +0100)]
MINOR: pattern: store a generation number in the reference patterns
Right now it's not possible to perform a safe reload because we don't
know what patterns were recently added or were already present. This
patch adds a generation counter to the reference patterns so that it
is possible to know what generation of the reference they were loaded
with. A reference now has two generations, the current one, used for
all additions, and the next one, allocated to those wishing to update
the contents. The generation wraps at 2^32 so comparisons must be made
relative to the current position.
The idea will be that upon full reload, the caller will first get a new
generation ID, will insert all new patterns using it, will then switch
the current ID to the new one, and will delete all entries older than
the current ID. This has the benefit of supporting chunked updates that
remain consistent and that won't block the whole process for ages like
pat_ref_reload() currently does.
Willy Tarreau [Mon, 2 Nov 2020 16:30:17 +0000 (17:30 +0100)]
MINOR: pattern: introduce pat_ref_delete_by_ptr() to delete a valid reference
Till now the only way to remove a known reference was via
pat_ref_delete_by_id() which scans the whole list to find a matching pointer.
Let's add pat_ref_delete_by_ptr() which takes a valid pointer. It can be
called by the function above after the pointer is found, and can also be
used to roll back a failed insertion much more efficiently.
Willy Tarreau [Mon, 2 Nov 2020 19:20:47 +0000 (20:20 +0100)]
CLEANUP: pattern: remove pat_delete_fcts[] and pattern_head->delete()
These ones are not used anymore, so let's remove them to remove a bit
of the complexity. The ACL keyword's delete() function could be removed
as well, though most keyword declarations are positional and we have a
high risk of introducing a mistake here, so let's not touch the ACL part.
Willy Tarreau [Mon, 2 Nov 2020 19:23:10 +0000 (20:23 +0100)]
CLEANUP: acl: don't reference the generic pattern deletion function anymore
A few ACL keyword used to reference pat_delete_gen() as the deletion
function but this is not needed since it's the default one now. Let's
just remove this reference.
Willy Tarreau [Mon, 2 Nov 2020 19:15:40 +0000 (20:15 +0100)]
MINOR: pattern: perform a single call to pat_delete_gen() under the expression
When we're removing an element under the expression lock, we don't need
anymore to run over all ->delete() functions via the expressions, since
we know that the single function does it fine now. Note that at this
point, pattern->delete() is not used at all through out the code anymore.
Willy Tarreau [Mon, 2 Nov 2020 18:53:16 +0000 (19:53 +0100)]
MINOR: pattern: remerge the list and tree deletion functions
pat_del_tree_gen() was already chained onto pat_del_list_gen() to deal
with remaining cases, so let's complete the merge and have a generic
pattern deletion function acting on the reference and taking care of
reliably removing all elements.
Willy Tarreau [Mon, 2 Nov 2020 12:55:22 +0000 (13:55 +0100)]
MEDIUM: pattern: change the pat_del_* functions to delete from the references
This is the next step in speeding up entry removal. Now we don't scan
the whole lists or trees for elements pointing to the target reference,
instead we start from the reference and delete all linked patterns.
This simplifies some delete functions since we don't need anymore to
delete multiple times from an expression since all nodes appear after
the reference element. We can now have one generic list and one generic
tree deletion function.
This required the replacement of pattern_delete() with an open-coded
version since we now need to lock all expressions first before proceeding.
This means there is a high risk of lock inversion here but given that the
expressions are always scanned in the same order from the same head, this
must not happen.
Now deleting first entries is instantaneous, and it's still slow to
delete the last ones when looking up their ID since it still requires
to look them up by a full scan, but it's already way faster than
previously. Typically removing the last 10 IP from a 20M entries ACL
with a full-scan each took less than 2 seconds.
It would be technically possible to make use of indexed entries to
speed up most lookups for removal by value (e.g. IP addresses) but
that's for later.
Willy Tarreau [Mon, 2 Nov 2020 11:10:48 +0000 (12:10 +0100)]
MEDIUM: pattern: link all final elements from the reference
There is a data model issue in the current pattern design that makes
pattern deletion extremely expensive: there's no direct way from a
reference to access all indexed occurrences. As such, the only way
to remove all indexed entries corresponding to a reference update
is to scan all expressions's lists and trees to find a link to the
reference. While this was possibly OK when map removal was not
common and most maps were small, this is not conceivable anymore
with GeoIP maps containing 10M+ entries and del-map operations that
are triggered from http-request rulesets.
This patch introduces two list heads from the pattern reference, one
for the objects linked by lists and one for those linked by tree node.
Ideally a single list would be enough but the linked elements are too
much unrelated to be distinguished at the moment, so we'll need two
lists. However for the long term a single-linked list will suffice but
for now it's not possible due to the way elements are removed from
expressions. As such this patch adds 32 bytes of memory usage per
reference plus 16 per indexed entry, but both will be cut in half
later.
The links are not yet used for deletion, this patch only ensures the
list is always consistent.
Willy Tarreau [Mon, 2 Nov 2020 18:26:02 +0000 (19:26 +0100)]
MINOR: pattern: make the delete and prune functions more generic
Now we have a single prune() function to act on an expression, and one
delete function for the lists and one for the trees. The presence of a
pointer in the lists is enough to warrant a free, and we rely on the
PAT_SF_REGFREE flag to decide whether to free using free() or regfree().
Willy Tarreau [Mon, 2 Nov 2020 18:16:23 +0000 (19:16 +0100)]
MINOR: pattern: new sflag PAT_SF_REGFREE indicates regex_free() is needed
Currently we have no way to know how to delete/prune a pattern in a
generic way. A pattern doesn't contain its own type so we don't know
what function to call. Tree nodes are roughly OK but not lists where
regex are possible. Let's add one new bit for sflags at index time to
indicate that regex_free() will be needed upon deletion. It's not used
for now.
Willy Tarreau [Tue, 27 Oct 2020 17:55:20 +0000 (18:55 +0100)]
CLEANUP: pattern: delete the back refs at once during pat_ref_reload()
It's pointless to delete a backref and relink it to the next entry since
the next entry is going to do the exact same and so on until all of them
are deleted. Let's simply delete backrefs on reload.
Willy Tarreau [Mon, 2 Nov 2020 14:26:51 +0000 (15:26 +0100)]
MINOR: pattern: move the update revision to the pat_ref, not the expression
It's not possible to uniquely update a single expression without updating
the pattern reference, I don't know why we've put the revision in the
expression back then, given that it in fact provides an update for a
full pattern. Let's move the revision into the reference's head instead.
Willy Tarreau [Tue, 3 Nov 2020 14:55:35 +0000 (15:55 +0100)]
MEDIUM: pattern: call malloc_trim() on pat_ref_reload()
This is one case where we may release large amounts of data at once. Tests
show that without this, after 10 full reloads of an ACL containing 1M IP
addresses, the memory usage grew and stabilized around 1.7 GB of RSS. With
this change, it stays around 260 MB and is stable across reloads.
Willy Tarreau [Tue, 3 Nov 2020 14:50:40 +0000 (15:50 +0100)]
MINOR: compat: automatically include malloc.h on glibc
This is in order to access malloc_trim() which is convenient after
clearing huge maps to reclaim memory. When this is detected, we also
define HA_HAVE_MALLOC_TRIM.
REGTEST: converter: Add a regtest for MQTT converters
This new script tests mqtt_is_valid() and mqtt_get_field_value() converters used
to validate and extract information from a MQTT (Message Queuing Telemetry
Transport) message.