Willy Tarreau [Mon, 18 Oct 2021 14:46:38 +0000 (16:46 +0200)]
MEDIUM: resolvers: use a kill list to preserve the list consistency
When scanning resolution.curr it's possible to try to free some
resolutions which will themselves result in freeing other ones. If
one of these other ones is exactly the next one in the list, the list
walk visits deleted nodes and causes memory corruption, double-frees
and so on. The approach taken using the "safe" argument to some
functions seems to work but it's extremely brittle as it is required
to carefully check all call paths from process_ressolvers() and pass
the argument to 1 there to refrain from deleting entries, so the bug
is very likely to come back after some tiny changes to this code.
A variant was tried, checking at various places that the current task
corresponds to process_resolvers() but this is also quite brittle even
though a bit less.
This patch uses another approach which consists in carefully unlinking
elements from the list and deferring their removal by placing it in a
kill list instead of deleting them synchronously. The real benefit here
is that the complexity only has to be placed where the complications
are.
A thread-local list is fed with elements to be deleted before scanning
the resolutions, and it's flushed at the end by picking the first one
until the list is empty. This way we never dereference the next element
and do not care about its presence or not in the list. One function,
resolv_unlink_resolution(), is exported and used outside, so it had to
be modified to use this list as well. Internal code has to use
_resolv_unlink_resolution() instead.
Willy Tarreau [Tue, 19 Oct 2021 20:01:36 +0000 (22:01 +0200)]
CLEANUP: resolvers: replace all LIST_DELETE with LIST_DEL_INIT
The code as it is uses crossed lists between many elements, and at
many places the code relies on list iterators or emptiness checks,
which does not work with only LIST_DELETE. Further, it is quite
difficult to place debugging code and checks in the current situation,
and gdb is helpless.
This code replaces all LIST_DELETE calls with LIST_DEL_INIT so that
it becomes possible to trust the lists.
This function allocates requesters by hand for each and every type. This
is complex and error-prone, and it doesn't even initialize the list part,
leaving dangling pointers that complicate debugging.
This patch introduces a new function resolv_get_requester() that either
returns the current pointer if valid or tries to allocate a new one and
links it to its destination. Then it makes use of it in the function
above to clean it up quite a bit. This allows to remove complicated but
unneeded tests.
Willy Tarreau [Tue, 19 Oct 2021 09:29:21 +0000 (11:29 +0200)]
CLEANUP: always initialize the answer_list
Similar to the previous patch, the answer's list was only initialized the
first time it was added to a list, leading to bogus outdated pointer to
appear when debugging code is added around it to watch it. Let's make
sure it's always initialized upon allocation.
Willy Tarreau [Tue, 19 Oct 2021 09:17:33 +0000 (11:17 +0200)]
BUG/MEDIUM: resolvers: always check a valid item in query_list
The query_list is physically stored in the struct resolution itself,
so we have a list that contains a list to items stored in itself (and
there is a single item). But the list is first initialized in
resolv_validate_dns_response(), while it's scanned in
resolv_process_responses() later after calling the former. First,
this results in crashes as soon as the code is instrumented a little
bit for debugging, as elements from a previous incarnation can appear.
But in addition to this, the presence of an element is checked by
verifying that the return of LIST_NEXT() is not NULL, while it may
never be NULL even for an empty list, resulting in bugs or crashes
if the number of responses does not match the list's contents. This
is easily triggered by testing for the list non-emptiness outside of
the function.
Let's make sure the list is always correct, i.e. it's initialized to
an empty list when the structure is allocated, elements are checked by
first verifying the list is not empty, they are deleted once checked,
and in any case at end so that there are no dangling pointers.
This should be backported, but only as long as the patch fits without
modifications, as adaptations can be risky there given that bugs tend
to hide each other.
Willy Tarreau [Wed, 20 Oct 2021 15:29:28 +0000 (17:29 +0200)]
BUILD: resolvers: avoid a possible warning on null-deref
Depending on the code that precedes the loop, gcc may emit this warning:
src/resolvers.c: In function 'resolv_process_responses':
src/resolvers.c:1009:11: warning: potential null pointer dereference [-Wnull-dereference]
1009 | if (query->type != DNS_RTYPE_SRV && flags & DNS_FLAG_TRUNCATED) {
| ~~~~~^~~~~~
However after carefully checking, r_res->header.qdcount it exclusively 1
when reaching this place, which forces the for() loop to enter for at
least one iteration, and <query> to be set. Thus there's no code path
leading to a null deref. It's possibly just because the assignment is
too far and the compiler cannot figure that the condition is always OK.
Let's just mark it to please the compiler.
Willy Tarreau [Tue, 19 Oct 2021 09:16:11 +0000 (11:16 +0200)]
CLEANUP: resolvers: do not export resolv_purge_resolution_answer_records()
This code is dangerous enough that we certainly don't want external code
to ever approach it, let's not export unnecessary functions like this one.
It was made static and a comment was added about its purpose.
Willy Tarreau [Mon, 18 Oct 2021 14:49:14 +0000 (16:49 +0200)]
BUG/MAJOR: resolvers: add other missing references during resolution removal
There is a fundamental design bug in the resolvers code which is that
a list of active resolutions is being walked to try to delete outdated
entries, and that the code responsible for removing them also removes
other elements, including the next one which will be visited by the
list iterator. This randomly causes a use-after-free condition leading
to crashes, infinite loops and various other issues such as random memory
corruption.
A first fix for the memory fix for this was brought by commit 0efc0993e
("BUG/MEDIUM: resolvers: Don't release resolution from a requester
callbacks"). While preparing for more fixes, some code was factored by
commit 11c6c3965 ("MINOR: resolvers: Clean server in a dedicated function
when removing a SRV item"), which inadvertently passed "0" as the "safe"
argument all the time, missing one case of removal protection, instead
of always using "safe". This patch reintroduces the correct argument.
Willy Tarreau [Wed, 20 Oct 2021 09:02:13 +0000 (11:02 +0200)]
DEBUG: dns: add a few more BUG_ON at sensitive places
A few places have been caught triggering late bugs recently, always cases
of use-after-free because a freed element was still found in one of the
lists. This patch adds a few checks for such elements in dns_session_free()
before the final pool_free() and dns_session_io_handler() before adding
elements to lists to make sure they remain consistent. They do not trigger
anymore now.
Willy Tarreau [Wed, 20 Oct 2021 12:38:43 +0000 (14:38 +0200)]
CLEANUP: dns: always detach the appctx from the dns session on release
When dns_session_release() calls dns_session_free(), it was shown that
it might still be attached there:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000006437d7 in dns_session_free (ds=0x7f895439e810) at src/dns.c:768
768 BUG_ON(!LIST_ISEMPTY(&ds->ring.waiters));
[Current thread is 1 (Thread 0x7f895bbe2700 (LWP 31792))]
(gdb) bt
#0 0x00000000006437d7 in dns_session_free (ds=0x7f895439e810) at src/dns.c:768
#1 0x0000000000643ab8 in dns_session_release (appctx=0x7f89545a4ff0) at src/dns.c:805
#2 0x000000000062e35a in si_applet_release (si=0x7f89545a5550) at include/haproxy/stream_interface.h:236
#3 0x000000000063150f in stream_int_shutw_applet (si=0x7f89545a5550) at src/stream_interface.c:1697
#4 0x0000000000640ab8 in si_shutw (si=0x7f89545a5550) at include/haproxy/stream_interface.h:437
#5 0x0000000000643103 in dns_session_io_handler (appctx=0x7f89545a4ff0) at src/dns.c:725
#6 0x00000000006d776f in task_run_applet (t=0x7f89545a5100, context=0x7f89545a4ff0, state=81924) at src/applet.c:90
#7 0x000000000068b82b in run_tasks_from_lists (budgets=0x7f895bbbf5c0) at src/task.c:611
#8 0x000000000068c258 in process_runnable_tasks () at src/task.c:850
#9 0x0000000000621e61 in run_poll_loop () at src/haproxy.c:2636
#10 0x0000000000622328 in run_thread_poll_loop (data=0x8d7440 <ha_thread_info+64>) at src/haproxy.c:2807
#11 0x00007f895c54a06b in start_thread () from /lib64/libpthread.so.0
#12 0x00007f895bf3772f in clone () from /lib64/libc.so.6
(gdb) p &ds->ring.waiters
$1 = (struct list *) 0x7f895439e8a8
(gdb) p ds->ring.waiters
$2 = {
n = 0x7f89545a5078,
p = 0x7f89545a5078
}
(gdb) p ds->ring.waiters->n
$3 = (struct list *) 0x7f89545a5078
(gdb) p *ds->ring.waiters->n
$4 = {
n = 0x7f895439e8a8,
p = 0x7f895439e8a8
}
Let's always detach it before freeing so that it remains possible to
check the dns_session's ring before releasing it, and possibly catch
bugs.
Emeric Brun [Wed, 20 Oct 2021 08:49:53 +0000 (10:49 +0200)]
BUG/MAJOR: dns: attempt to lock globaly for msg waiter list instead of use barrier
The barrier is insufficient here to protect the waiters list as we can
definitely catch situations where ds->waiter shows an inconsistency
whereby the element is not attached when entering the "if" block and
is already attached when attaching it later.
This patch uses a larger lock to maintain consistency. Without it the
code would crash in 30-180 minutes under heavy stress, always showing
the same problem (ds->waiter->n->p != &ds->waiter). Now it seems to
always resist, suggesting that this was indeed the problem.
Emeric Brun [Tue, 19 Oct 2021 13:40:10 +0000 (15:40 +0200)]
BUG/MAJOR: dns: tcp session can remain attached to a list after a free
Using tcp, after a session release and free, the session can remain
attached to the list of sessions with a response message waiting for
a commit (ds->waiter). This results to a use after free of this
session.
Also, on some error path and after free, a session could remain attached
to the lists of available idle/free sessions (ds->list).
This patch ensure to remove the session from those external lists
before a free.
This patch should be backported to all version including
the dns over tcp (2.4)
BUG/MEDIUM: tcpcheck: Properly catch early HTTP parsing errors
When an HTTP response is parsed, early parsing errors are not properly
handled. When this error is reported by the multiplexer, nothing is copied
into the input buffer. The HTX message remains empty but the
HTX_FL_PARSING_ERROR flag is set. In addition CS_FL_EOI is set on the
conn-stream. This last flag must be handled to prevent subscription for
receive events. Otherwise, in the best case, a L7 timeout error is
reported. But a transient loop is also possible if a shutdown is received
because the multiplexer notifies the check of the event while the check
never handles it and waits for more data.
Now, if CS_FL_EOI flag is set on the conn-stream, expect rules are
evaluated. Any error must be handled there.
Thanks to @kazeburo for his valuable report.
This patch should fix the issue #1420. It must be backported at least to
2.4. On 2.3 and 2.2, there is no loop but the wrong error is reported (empty
response instead of invalid one). Thus it may also be backported as far as
2.2.
BUG/MEDIUM: stream: Keep FLT_END analyzers if a stream detects a channel error
If a channel error (READ_ERRO|READ_TIMEOUT|WRITE_ERROR|WRITE_TIMEOUT) is
detected by the stream, in process_stream(), FLT_END analyers must be
preserved. It is important to be sure to ends filter analysis and be able to
release the stream.
First, filters may release some ressources when FLT_END analyzers are
called. Then, the CF_FL_ANALYZE flag is used to sync end of analysis for the
request and the response. If FLT_END analyzer is ignored on a channel, this
may block the other side and freeze the stream.
This patch must be backported to all stable versions
Replace the test based on the enum value of the algorithm by an explicit
switch statement in case someone reorders it for some reason (while
still managing not to break the regtest).
MINOR: jwt: jwt_verify returns negative values in case of error
In order for all the error return values to be distributed on the same
side (instead of surrounding the success error code), the return values
for errors other than a simple verification failure are switched to
negative values. This way the result of the jwt_verify converter can be
compared strictly to 1 as well relative to 0 (any <= 0 return value is
an error).
The documentation was also modified to discourage conversion of the
return value into a boolean (which would definitely not work).
Willy Tarreau [Fri, 15 Oct 2021 06:53:44 +0000 (08:53 +0200)]
MEDIUM: resolvers: replace bogus resolv_hostname_cmp() with memcmp()
resolv_hostname_cmp() is bogus, it is applied on labels and not plain
names, but doesn't make any distinction between length prefixes and
characters, so it compares the labels lengths via tolower() as well.
The only reason for which it doesn't break is because labels cannot
be larger than 63 bytes, and that none of the common encoding systems
have upper case letters in the lower 63 bytes, that could be turned
into a different value via tolower().
Now that all labels are stored in lower case, we don't need to burn
CPU cycles in tolower() at run time and can use memcmp() instead of
resolv_hostname_cmp(). This results in a ~22% lower CPU usage on large
farms using SRV records:
Willy Tarreau [Fri, 15 Oct 2021 05:45:38 +0000 (07:45 +0200)]
MEDIUM: resolvers: lower-case labels when converting from/to DNS names
The whole code relies on performing case-insensitive comparison on
lookups, which is extremely inefficient.
Let's make sure that all labels to be looked up or sent are first
converted to lower case. Doing so is also the opportunity to eliminate
an inefficient memcpy() in resolv_dn_label_to_str() that essentially
runs over a few unaligned bytes at once. As a side note, that call
was dangerous because it relied on a sign-extended size taken from a
string that had to be sanitized first.
This is tagged medium because while this is 100% safe, it may cause
visible changes on the wire at the packet level and trigger bugs in
test programs.
Willy Tarreau [Sat, 16 Oct 2021 13:24:22 +0000 (15:24 +0200)]
[RELEASE] Released version 2.5-dev10
Released version 2.5-dev10 with the following main changes :
- MINOR: initcall: Rename __GLOBL and __GLOBL1.
- MINOR: rules: add a new function new_act_rule() to allocate act_rules
- MINOR: rules: add a file name and line number to act_rules
- MINOR: stream: report the current rule in "show sess all" when known
- MINOR: stream: report the current filter in "show sess all" when known
- CLEANUP: stream: Properly indent current_rule line in "show sess all"
- BUG/MINOR: lua: Fix lua error handling in `hlua_config_prepend_path()`
- CI: github: switch to OpenSSL 3.0.0
- REGTESTS: ssl: Fix references to removed option in test description
- MINOR: ssl: Add ssllib_name_startswith precondition
- REGTESTS: ssl: Fix ssl_errors test for OpenSSL v3
- REGTESTS: ssl: Reenable ssl_errors test for OpenSSL only
- REGTESTS: ssl: Use mostly TLSv1.2 in ssl_errors test
- MEDIUM: mux-quic: rationalize tx buffers between qcc/qcs
- MEDIUM: h3: properly manage tx buffers for large data
- MINOR: mux-quic: standardize h3 settings sending
- CLEANUP: h3: remove dead code
- MINOR: mux-quic: implement standard method to detect if qcc is dead
- MEDIUM: mux-quic: defer stream shut if remaining tx data
- MINOR: mux: remove last occurences of qcc ring buffer
- MINOR: quic: handle CONNECTION_CLOSE frame
- REGTESTS: ssl: re-enable set_ssl_cert_bundle.vtc
- MINOR: ssl: add ssl_fc_is_resumed to "option httpslog"
- MINOR: http: Add http_auth_bearer sample fetch
- MINOR: jwt: Parse JWT alg field
- MINOR: jwt: JWT tokenizing helper function
- MINOR: jwt: Insert public certificates into dedicated JWT tree
- MINOR: jwt: jwt_header_query and jwt_payload_query converters
- MEDIUM: jwt: Add jwt_verify converter to verify JWT integrity
- REGTESTS: jwt: Add tests for the jwt_verify converter
- BUILD: jwt: fix declaration of EVP_KEY in jwt-h.h
- MINOR: proto_tcp: use chunk_appendf() to ouput socket setup errors
- MINOR: proto_tcp: also report the attempted MSS values in error message
- MINOR: inet: report the faulty interface name in "bind" errors
- MINOR: protocol: report the file and line number for binding/listening errors
- MINOR: protocol: uniformize protocol errors
- MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero
- BUG/MEDIUM: resolver: make sure to always use the correct hostname length
- BUG/MINOR: resolvers: do not reject host names of length 255 in SRV records
- MINOR: resolvers: fix the resolv_dn_label_to_str() API about trailing zero
- MEDIUM: listeners: split the thread mask between receiver and bind_conf
- MINOR: listeners: add clone_listener() to duplicate listeners at boot time
- MEDIUM: listener: add the "shards" bind keyword
- BUG/MEDIUM: resolvers: use correct storage for the target address
- MINOR: resolvers: merge address and target into a union "data"
- BUG/MEDIUM: resolvers: fix truncated TLD consecutive to the API fix
- BUG/MEDIUM: jwt: fix base64 decoding error detection
- BUG/MINOR: jwt: use CRYPTO_memcmp() to compare HMACs
- DOC: jwt: fix a typo in the jwt_verify() keyword description
- BUG/MEDIUM: sample/jwt: fix another instance of base64 error detection
- BUG/MINOR: http-ana: Don't eval front after-response rules if stopped on back
- BUG/MINOR: sample: Fix 'fix_tag_value' sample when waiting for more data
- DOC: config: Move 'tcp-response content' at the right place
- BUG/MINOR: proxy: Use .disabled field as a bitfield as documented
- MINOR: proxy: Introduce proxy flags to replace disabled bitfield
- MINOR: sample/arg: Be able to resolve args found in defaults sections
- MEDIUM: proxy: Warn about ambiguous use of named defaults sections
- MINOR: proxy: Be able to reference the defaults section used by a proxy
- MINOR: proxy: Add PR_FL_READY flag on fully configured and usable proxies
- MINOR: config: Finish configuration for referenced default proxies
- MINOR: config: No longer remove previous anonymous defaults section
- MINOR: tcpcheck: Support 2-steps args resolution in defaults sections
- MEDIUM: rules/acl: Parse TCP/HTTP rules and acls defined in defaults sections
- MEDIUM: tcp-rules: Eval TCP rules defined in defaults sections
- MEDIUM: http-ana: Eval HTTP rules defined in defaults sections
- BUG/MEDIUM: sample: Cumulate frontend and backend sample validity flags
- REGTESTS: Add scripts to test support of TCP/HTTP rules in defaults sections
- DOC: config: Add documentation about TCP/HTTP rules in defaults section
- DOC: config: Rework and uniformize how TCP/HTTP rules are documented
- BUG/MINOR: proxy: Release ACLs and TCP/HTTP rules of default proxies
- BUG/MEDIUM: cpuset: fix cpuset size for FreeBSD
- BUG/MINOR: sample: fix backend direction flags consecutive to last fix
- BUG/MINOR: listener: fix incorrect return on out-of-memory
- BUG/MINOR: listener: add an error check for unallocatable trash
- CLEANUP: listeners: remove unreachable code in clone_listener()
Willy Tarreau [Sat, 16 Oct 2021 12:58:30 +0000 (14:58 +0200)]
CLEANUP: listeners: remove unreachable code in clone_listener()
Coverity reported in issue #1416 that label oom3 is not reachable in
function close_listener() added by commit 59a877dfd ("MINOR: listeners:
add clone_listener() to duplicate listeners at boot time"). The code
leading to it was removed during the development of the function, but
not the label itself.
Willy Tarreau [Sat, 16 Oct 2021 12:54:19 +0000 (14:54 +0200)]
BUG/MINOR: listener: add an error check for unallocatable trash
Coverity noticed in issue #1416 that a missing allocation error was
introduced in tcp_bind_listener() with the rework of error messages by
commit ed1748553 ("MINOR: proto_tcp: use chunk_appendf() to ouput socket
setup errors"). In practice nobody will ever face it but better address
it anyway.
Willy Tarreau [Sat, 16 Oct 2021 12:45:29 +0000 (14:45 +0200)]
BUG/MINOR: listener: fix incorrect return on out-of-memory
When the clone_listener() function was added in commit 59a877dfd
("MINOR: listeners: add clone_listener() to duplicate listeners at
boot time"), a stupid bug was introduced when splitting the error
path because while the first case where calloc fails will leave NULL
in the output value, the other cases will return the pointer to a
freed area. This was reported by Coverity in issue #1416.
In practice nobody will face it (out-of-memory while checking config),
but let's fix it.
Willy Tarreau [Sat, 16 Oct 2021 12:41:09 +0000 (14:41 +0200)]
BUG/MINOR: sample: fix backend direction flags consecutive to last fix
Commit 7a06ffb85 ("BUG/MEDIUM: sample: Cumulate frontend and backend
sample validity flags") introduced a typo confusing the request and
the response direction when checking for validity of a rule applied
to a backend. This was reported by Coverity in issue #1417.
This needs to be backported where the patch above is backported.
Amaury Denoyelle [Fri, 15 Oct 2021 13:22:31 +0000 (15:22 +0200)]
BUG/MEDIUM: cpuset: fix cpuset size for FreeBSD
Fix the macro used to retrieve the max number of cpus on FreeBSD. The
MAXCPU is not properly defined in userspace and always set to 1 despite
the machine architecture. Replace it with CPU_SETSIZE.
See https://freebsd-hackers.freebsd.narkive.com/gw4BeLum/smp-in-machine-params-h#post6
Without this, the following config file is rejected on FreeBSD even if
the machine is SMP :
global
cpu-map 1-2 0-1
BUG/MINOR: proxy: Release ACLs and TCP/HTTP rules of default proxies
It is now possible to have TCP/HTTP rules and ACLs defined in defaults
sections. So we must try to release corresponding lists when a default proxy
is destroyed.
DOC: config: Rework and uniformize how TCP/HTTP rules are documented
Now all these rules are documented using the same structure. First there is
a general description with the list of all supported actions. Then all
actions are described in details. Thus, it is easy to have a quick list of
all supported actions and this avoids to have a huge description with all
info about these actions. In addition, when it is possible, we make a
reference to already documented parts.
DOC: config: Add documentation about TCP/HTTP rules in defaults section
Documentation of each directive that can now be used in defaults section was
updated to explain how it works. A special mark was added to specify when a
keyword is supported by defaults sections with a name but not anonymous
ones. In this case an exclamation mark is added.
REGTESTS: Add scripts to test support of TCP/HTTP rules in defaults sections
3 scripts are added:
* startup/default_rules.vtc to check configuration parsing
* http-rules/default_rules.vtc to check evaluation of HTTP rules
* tcp-rules/default_rules.vtc to check evaluation of TCP rules
BUG/MEDIUM: sample: Cumulate frontend and backend sample validity flags
When the sample validity flags are computed to check if a sample is used in
a valid scope, the flags depending on the proxy capabilities must be
cumulated. Historically, for a sample on the request, only the frontend
capability was used to set the sample validity flags while for a sample on
the response only the backend was used. But it is a problem for listen or
defaults proxies. For those proxies, all frontend and backend samples should
be valid. However, at many place, only frontend ones are possible.
For instance, it is impossible to set the backend name (be_name) into a
variable from a listen proxy.
This bug exists on all stable versions. Thus this patch should probably be
backported. But with some caution because the code has probably changed
serveral times. Note that nobody has ever noticed this issue. So the need to
backport this patch must be evaluated for each branch.
MEDIUM: http-ana: Eval HTTP rules defined in defaults sections
As for TCP rules, HTTP rules from defaults section are now evaluated. These
rules are evaluated before those of the proxy. The same default ruleset
cannot be attached to the frontend and the backend. However, at this stage,
we take care to not execute twice the same ruleset. So, in theory, a
frontend and a backend could use the same defaults section. In this case,
the default ruleset is executed before all others and only once.
MEDIUM: tcp-rules: Eval TCP rules defined in defaults sections
TCP rules from defaults section are now evaluated. These rules are evaluated
before those of the proxy. For L7 TCP rules, the same default ruleset cannot
be attached to the frontend and the backend. However, at this stage, we take
care to not execute twice the same ruleset. So, in theory, a frontend and a
backend could use the same defaults section. In this case, the default
ruleset is executed before all others and only once.
MEDIUM: rules/acl: Parse TCP/HTTP rules and acls defined in defaults sections
TCP and HTTP rules can now be defined in defaults sections, but only those
with a name. Because these rules may use conditions based on ACLs, ACLs can
also be defined in defaults sections.
However there are some limitations:
* A defaults section defining TCP/HTTP rules cannot be used by a defaults
section
* A defaults section defining TCP/HTTP rules cannot be used bu a listen
section
* A defaults sections defining TCP/HTTP rules cannot be used by frontends
and backends at the same time
* A defaults sections defining 'tcp-request connection' or 'tcp-request
session' rules cannot be used by backends
* A defaults sections defining 'tcp-response content' rules cannot be used
by frontends
The TCP request/response inspect-delay of a proxy is now inherited from the
defaults section it uses. For now, these rules are only parsed. No evaluation is
performed.
MINOR: tcpcheck: Support 2-steps args resolution in defaults sections
With the commit eaba25dd9 ("BUG/MINOR: tcpcheck: Don't use arg list for
default proxies during parsing"), we restricted the use of sample fetch in
tcpcheck rules defined in a defaults section to those depending on explicit
arguments only. This means a tcpcheck rules defined in a defaults section
cannot rely on argument unresolved during the configuration parsing.
Thanks to recent changes, it is now possible again.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
MINOR: config: No longer remove previous anonymous defaults section
When the parsing of a defaults section is started, the previous anonymous
defaults section is removed. It may be a problem with referenced defaults
sections. And because all unused defautl proxies are removed after the
configuration parsing, it is not required to remove it so early.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
MINOR: config: Finish configuration for referenced default proxies
If a not-ready default proxy is referenced by a proxy during the
configuration validity check, its configuration is also finished and
PR_FL_READY flag is set on it.
For now, the arguments resolution is the only step performed.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
MINOR: proxy: Add PR_FL_READY flag on fully configured and usable proxies
The PR_FL_READY flags must now be set on a proxy at the end of the
configuration validity check to notify it is fully configured and may be
safely used.
For now there is no real usage of this flag. But it will be usefull for
referenced default proxies to finish their configuration only once.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
MINOR: proxy: Be able to reference the defaults section used by a proxy
A proxy may now references the defaults section it is used. To do so, a
pointer on the default proxy was added in the proxy structure. And a
refcount must be used to track proxies using a default proxy. A default
proxy is destroyed iff its refcount is equal to zero and when it drops to
zero.
All this stuff must be performed during init/deinit staged for now. All
unreferenced default proxies are removed after the configuration parsing.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
MEDIUM: proxy: Warn about ambiguous use of named defaults sections
It is now possible to designate the defaults section to use by adding a name
of the corresponding defaults section and referencing it in the desired
proxy section. However, this introduces an ambiguity. This named defaults
section may still be implicitly used by other proxies if it is the last one
defined. In this case for instance:
default common
...
default frt from common
...
default bck from common
...
frontend fe from frt
...
backend be from bck
...
listen stats
...
Here, it is not really obvious the last section will use the 'bck' defaults
section. And it is probably not the expected behaviour. To help users to
properly configure their haproxy, a warning is now emitted if a defaults
section is explicitly AND implicitly used. The configuration manual was
updated accordingly.
Because this patch adds a warning, it should probably not be backported to
2.4. However, if is is backported, it depends on commit "MINOR: proxy:
Introduce proxy flags to replace disabled bitfield".
MINOR: sample/arg: Be able to resolve args found in defaults sections
It is not yet used but thanks to this patch, it will be possible to resolve
arguments found in defaults sections. However, there is some restrictions:
* For FE (frontend) or BE (backend) arguments, if the proxy is explicity
defined, there is no change. But for implicit proxy (not specified), the
argument points on the default proxy. when a sample fetch using this
kind of argument is evaluated, the default proxy replaced by the current
one.
* For SRV (server) and TAB (stick-table)arguments, the proxy must always
be specified. Otherwise an error is reported.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
MINOR: proxy: Introduce proxy flags to replace disabled bitfield
This change is required to support TCP/HTTP rules in defaults sections. The
'disabled' bitfield in the proxy structure, used to know if a proxy is
disabled or stopped, is replaced a generic bitfield named 'flags'.
PR_DISABLED and PR_STOPPED flags are renamed to PR_FL_DISABLED and
PR_FL_STOPPED respectively. In addition, everywhere there is a test to know
if a proxy is disabled or stopped, there is now a bitwise AND operation on
PR_FL_DISABLED and/or PR_FL_STOPPED flags.
BUG/MINOR: proxy: Use .disabled field as a bitfield as documented
.disabled field in the proxy structure is documented to be a bitfield. So
use it as a bitfield. This change was introduced to the 2.5, by commit 8e765b86f ("MINOR: proxy: disabled takes a stopping and a disabled state").
No backport is needed except if the above commit is backported.
BUG/MINOR: sample: Fix 'fix_tag_value' sample when waiting for more data
The test on the return value of fix_tag_value() function was inverted. To
wait for more data, the return value must be a valid empty string and not
IST_NULL.
BUG/MINOR: http-ana: Don't eval front after-response rules if stopped on back
http-after-response rules evaluation must be stopped after a "allow". It
means the frontend ruleset must not be evaluated if a "allow" was performed
in the backend ruleset. Internally, the evaluation must be stopped if on
HTTP_RULE_RES_STOP return value. Only the "allow" action is concerned by
this change.
Thanks to this patch, http-response and http-after-response behave in the
same way.
Willy Tarreau [Fri, 15 Oct 2021 10:10:24 +0000 (12:10 +0200)]
BUG/MEDIUM: sample/jwt: fix another instance of base64 error detection
This is the same as for commit 468c000db ("BUG/MEDIUM: jwt: fix base64
decoding error detection"), but for function sample_conv_jwt_member_query()
that is used by sample converters jwt_header_query() and jwt_payload_query().
Thanks to Tim for the report. No backport is needed.
Willy Tarreau [Fri, 15 Oct 2021 09:52:41 +0000 (11:52 +0200)]
BUG/MINOR: jwt: use CRYPTO_memcmp() to compare HMACs
As Tim reported in github issue #1414, we ought to use a constant-time
memcmp() when comparing hashes to avoid time-based attacks. Let's use
CRYPTO_memcmp() since this code already depends on openssl.
No backport is needed, this was just merged into 2.5.
Tim reported that a decoding error from the base64 function wouldn't
be matched in case of bad input, and could possibly cause trouble
with -1 being passed in decoded_sig->data. In the case of HMAC+SHA
it is harmless as the comparison is made using memcmp() after checking
for length equality, but in the case of RSA/ECDSA this result is passed
as a size_t to EVP_DigetVerifyFinal() and may depend on the lib's mood.
The fix simply consists in checking the intermediary result before
storing it.
That's precisely what happens with one of the regtests which returned
0 instead of 4 on the intentionally defective token, so the regtest
was fixed as well.
No backport is needed as this is new in this release.
Willy Tarreau [Fri, 15 Oct 2021 06:09:25 +0000 (08:09 +0200)]
BUG/MEDIUM: resolvers: fix truncated TLD consecutive to the API fix
A bug was introduced by commit previous bf9498a31 ("MINOR: resolvers:
fix the resolv_str_to_dn_label() API about trailing zero") as the code
is particularly contrived and hard to test. The output writes the last
char at [i+1] so the trailing zero and return value must be at i+1.
This will have to be backported where the patch above is backported
since it was needed for a fix.
Willy Tarreau [Thu, 14 Oct 2021 20:52:04 +0000 (22:52 +0200)]
MINOR: resolvers: merge address and target into a union "data"
These two fields are exclusive as they depend on the data type.
Let's move them into a union to save some precious bytes. This
reduces the struct resolv_answer_item size from 600 to 576 bytes.
Willy Tarreau [Thu, 14 Oct 2021 20:30:38 +0000 (22:30 +0200)]
BUG/MEDIUM: resolvers: use correct storage for the target address
The struct resolv_answer_item contains an address field of type
"sockaddr" which is only 16 bytes long, but which is used to store
either IPv4 or IPv6. Fortunately, the contents only overlap with
the "target" field that follows it and that is large enough to
absorb the extra bytes needed to store AAAA records. But this is
dangerous as just moving fields around could result in memory
corruption.
The fix uses a union and removes the casts that were used to hide
the problem.
Older versions need to be checked and possibly fixed. This needs
to be backported anyway.
Willy Tarreau [Tue, 12 Oct 2021 13:23:03 +0000 (15:23 +0200)]
MEDIUM: listener: add the "shards" bind keyword
In multi-threaded mode, on operating systems supporting multiple listeners on
the same IP:port, this will automatically create this number of multiple
identical listeners for the same line, all bound to a fair share of the number
of the threads attached to this listener. This can sometimes be useful when
using very large thread counts where the in-kernel locking on a single socket
starts to cause a significant overhead. In this case the incoming traffic is
distributed over multiple sockets and the contention is reduced. Note that
doing this can easily increase the CPU usage by making more threads work a
little bit.
If the number of shards is higher than the number of available threads, it
will automatically be trimmed to the number of threads. A special value
"by-thread" will automatically assign one shard per thread.
Willy Tarreau [Tue, 12 Oct 2021 07:36:10 +0000 (09:36 +0200)]
MINOR: listeners: add clone_listener() to duplicate listeners at boot time
This function's purpose will be to duplicate a listener in INIT state.
This will be used to ease declaration of listeners spanning multiple
groups, which will thus require multiple FDs hence multiple receivers.
Willy Tarreau [Tue, 12 Oct 2021 06:47:54 +0000 (08:47 +0200)]
MEDIUM: listeners: split the thread mask between receiver and bind_conf
With groups at some point we'll have to have distinct masks/groups in the
receiver and the bind_conf, because a single bind_conf might require to
instantiate multiple receivers (one per group).
Let's split the thread mask and group to have one for the bind_conf and
another one for the receiver while it remains easy to do. This will later
allow to use different storage for the bind_conf if needed (e.g. support
multiple groups).
Willy Tarreau [Thu, 14 Oct 2021 06:05:25 +0000 (08:05 +0200)]
MINOR: resolvers: fix the resolv_dn_label_to_str() API about trailing zero
This function suffers from the same API issue as its sibling that does the
opposite direction, it demands that the input string is zero-terminated
*and* that its length *including* the trailing zero is passed on input,
forcing callers to pass length + 1, and itself to use that length - 1
everywhere internally.
This patch addressess this. There is a single caller, which is the
location of the previous bug, so it should probably be backported at
least to keep the code consistent across versions. Note that the
function is called dns_dn_label_to_str() in 2.3 and earlier.
Willy Tarreau [Thu, 14 Oct 2021 06:11:48 +0000 (08:11 +0200)]
BUG/MEDIUM: resolver: make sure to always use the correct hostname length
In issue #1411, @jjiang-stripe reports that do-resolve() sometimes seems
to be trying to resolve crap from random memory contents.
The issue is that action_prepare_for_resolution() tries to measure the
input string by itself using strlen(), while resolv_action_do_resolve()
directly passes it a pointer to the sample, omitting the known length.
Thus of course any other header present after the host in memory are
appended to the host value. It could theoretically crash if really
unlucky, with a buffer that does not contain any zero including in the
index at the end, and if the HTX buffer ends on an allocation boundary.
In practice it should be too low a probability to have ever been observed.
This patch modifies the action_prepare_for_resolution() function to take
the string length on with the host name on input and pass that down the
chain. This should be backported to 2.0 along with commit "MINOR:
resolvers: fix the resolv_str_to_dn_label() API about trailing zero".
Willy Tarreau [Thu, 14 Oct 2021 05:49:49 +0000 (07:49 +0200)]
MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero
This function is bogus at the API level: it demands that the input string
is zero-terminated *and* that its length *including* the trailing zero is
passed on input. While that already looks smelly, the trailing zero is
copied as-is, and is then explicitly replaced with a zero... Not only
all callers have to pass hostname_len+1 everywhere to work around this
absurdity, but this requirement causes a bug in the do-resolve() action
that passes random string lengths on input, and that will be fixed on a
subsequent patch.
Let's fix this API issue for now.
This patch will have to be backported, and in versions 2.3 and older,
the function is in dns.c and is called dns_str_to_dn_label().
Willy Tarreau [Thu, 14 Oct 2021 09:59:15 +0000 (11:59 +0200)]
MINOR: protocol: uniformize protocol errors
Some protocols fail with "error blah [ip:port]" and other fail with
"[ip:port] error blah". All this already appears in a "starting" or
"binding" context after a proxy name. Let's choose a more universal
approach like below where the ip:port remains at the end of the line
prefixed with "for".
[WARNING] (18632) : Binding [binderr.cfg:10] for proxy http: cannot bind receiver to device 'eth2' (No such device) for [0.0.0.0:1080]
[WARNING] (18632) : Starting [binderr.cfg:10] for proxy http: cannot set MSS to 12 for [0.0.0.0:1080]
Willy Tarreau [Thu, 14 Oct 2021 09:55:48 +0000 (11:55 +0200)]
MINOR: protocol: report the file and line number for binding/listening errors
Binding errors and late socket errors provide no information about
the file and line where the problem occurs. These are all done by
protocol_bind_all() and they only report "Starting proxy blah". Let's
change this a little bit so that:
- the file name and line number of the faulty bind line is alwas mentioned
- early binding errors are indicated with "Binding" instead of "Starting".
Now we can for example have this:
[WARNING] (18580) : Binding [binderr.cfg:10] for proxy http: cannot bind receiver to device 'eth2' (No such device) [0.0.0.0:1080]
Willy Tarreau [Thu, 14 Oct 2021 09:39:18 +0000 (11:39 +0200)]
MINOR: proto_tcp: also report the attempted MSS values in error message
The MSS errors are the only ones not indicating what was attempted, let's
report the value that was tried, as it can help users spot them in the
config (particularly if a default value was used).
Bjoern Jacke [Tue, 12 Jan 2021 18:24:43 +0000 (19:24 +0100)]
MINOR: proto_tcp: use chunk_appendf() to ouput socket setup errors
Right now only the last warning or error is reported from
tcp_bind_listener(), but it is useful to report all warnings and no only
the last one, so we now emit them delimited by commas. Previously we used
a fixed buffer of 100 bytes, which was too small to store more than one
message, so let's extend it.
MEDIUM: jwt: Add jwt_verify converter to verify JWT integrity
This new converter takes a JSON Web Token, an algorithm (among the ones
specified for JWS tokens in RFC 7518) and a public key or a secret, and
it returns a verdict about the signature contained in the token. It does
not simply return a boolean because some specific error cases cas be
specified by returning an integer instead, such as unmanaged algorithms
or invalid tokens. This enables to distinguich malformed tokens from
tampered ones, that would be valid format-wise but would have a bad
signature.
This converter does not perform a full JWT validation as decribed in
section 7.2 of RFC 7519. For instance it does not ensure that the header
and payload parts of the token are completely valid JSON objects because
it would need a complete JSON parser. It only focuses on the signature
and checks that it matches the token's contents.
MINOR: jwt: jwt_header_query and jwt_payload_query converters
Those converters allow to extract a JSON value out of a JSON Web Token's
header part or payload part (the two first dot-separated base64url
encoded parts of a JWS in the Compact Serialization format).
They act as a json_query call on the corresponding decoded subpart when
given parameters, and they return the decoded JSON subpart when no
parameter is given.
MINOR: jwt: Insert public certificates into dedicated JWT tree
A JWT signed with the RSXXX or ESXXX algorithm (RSA or ECDSA) requires a
public certificate to be verified and to ensure it is valid. Those
certificates must not be read on disk at runtime so we need a caching
mechanism into which those certificates will be loaded during init.
This is done through a dedicated ebtree that is filled during
configuration parsing. The path to the public certificates will need to
be explicitely mentioned in the configuration so that certificates can
be loaded as early as possible.
This tree is different from the ckch one because ckch entries are much
bigger than the public certificates used in JWT validation process.
This helper function splits a JWT under Compact Serialization format
(dot-separated base64-url encoded strings) into its different sub
strings. Since we do not want to manage more than JWS for now, which can
only have at most three subparts, any JWT that has strictly more than
two dots is considered invalid.
The full list of possible algorithms used to create a JWS signature is
defined in section 3.1 of RFC7518. This patch adds a helper function
that converts the "alg" strings into an enum member.
This fetch can be used to retrieve the data contained in an HTTP
Authorization header when the Bearer scheme is used. This is used when
transmitting JSON Web Tokens for instance.
Amaury Denoyelle [Wed, 13 Oct 2021 14:34:49 +0000 (16:34 +0200)]
MINOR: quic: handle CONNECTION_CLOSE frame
On receiving CONNECTION_CLOSE frame, the mux is flagged for immediate
connection close. A stream is closed even if there is data not ACKed
left if CONNECTION_CLOSE has been received.
Amaury Denoyelle [Tue, 12 Oct 2021 16:14:12 +0000 (18:14 +0200)]
MINOR: mux: remove last occurences of qcc ring buffer
The mux tx buffers have been rewritten with buffers attached to qcs
instances. qc_buf_available and qc_get_buf functions are updated to
manipulates qcs. All occurences of the unused qcc ring buffer are
removed to ease the code maintenance.
MEDIUM: mux-quic: defer stream shut if remaining tx data
Defer the shutting of a qcs if there is still data in its tx buffers. In
this case, the conn_stream is closed but the qcs is kept with a new flag
QC_SF_DETACH.
On ACK reception, the xprt wake up the shut_tl tasklet if the stream is
flagged with QC_SF_DETACH. This tasklet is responsible to free the qcs
and possibly the qcc when all bidirectional streams are removed.
MINOR: mux-quic: implement standard method to detect if qcc is dead
For the moment, a quic connection is considered dead if it has no
bidirectional streams left on it. This test is implemented via
qcc_is_dead function. It can be reused to properly close the connection
when needed.
MEDIUM: h3: properly manage tx buffers for large data
Properly handle tx buffers management in h3 data sending. If there is
not enough contiguous space, the buffer is first realigned. If this is
not enough, the stream is flagged with QC_SF_BLK_MROOM waiting for the
buffer to be emptied.
If a frame on a stream is successfully pushed for sending, the stream is
called if it was flagged with QC_SF_BLK_MROOM.
MEDIUM: mux-quic: rationalize tx buffers between qcc/qcs
Remove the tx mux ring buffers in qcs, which should be in the qcc. For
the moment, use a simple architecture with 2 simple tx buffers in the
qcs.
The first buffer is used by the h3 layer to prepare the data. The mux
send operation transfer these into the 2nd buffer named xprt_buf. This
buffer is only freed when an ACK has been received.
This architecture is functional but not optimal for two reasons :
- it won't limit the buffer usage by connection
- each transfer on a new stream requires an allocation
REGTESTS: ssl: Use mostly TLSv1.2 in ssl_errors test
In order for the test to run with OpenSSL 1.0.2 the test will now mostly
use TLSv1.2 and use TLS 1.3 only on some specific tests (covered by
preconditions).
REGTESTS: ssl: Reenable ssl_errors test for OpenSSL only
The test is strongly dependent on the way the errors are output by the
SSL library so it is not possible to perform the same checks when using
OpenSSL or LibreSSL. It is then reenabled for OpenSSL (whatever the
version) but still disabled for LibreSSL.
This limitation is added thanks to the new ssllib_name_startswith
precondition check.
The OpenSSL error codes for the same errors are not consistent between
OpenSSL versions. The ssl_errors test needs to be modified to only take
into account a fixed part of those error codes.
This patch focuses on the reason part of the error code by applying a
mask on the error code (whose size varies depending on the lib version).
This new ssllib_name_startswith precondition check can be used to
distinguish application linked with OpenSSL from the ones linked with
other SSL libraries (LibreSSL or BoringSSL namely). This check takes a
string as input and returns 1 when the SSL library's name starts with
the given string. It is based on the OpenSSL_version function which
returns the same output as the "openssl version" command.
REGTESTS: ssl: Fix references to removed option in test description
The log-error-via-logformat option was removed in commit 3d6350e1084ca806d9f78c3802665e235e1fbf27 and was replaced by a dedicated
error-log-format option. The references to this option need to be
removed from the test's description.
CLEANUP: stream: Properly indent current_rule line in "show sess all"
This line is not related to the response channel but to the stream. Thus it
must be indented at the same level as stream-interfaces, connections,
channels...
MINOR: stream: report the current filter in "show sess all" when known
Filters can block the stream on pre/post analysis for any reason and it can
be useful to report it in "show sess all". So now, a "current_filter" extra
line is reported for each channel if a filter is blocking the analysis. Note
that this does not catch the TCP/HTTP payload analysis because all
registered filters are always evaluated when more data are received.