Michał Kępień [Tue, 6 Sep 2022 11:36:44 +0000 (13:36 +0200)]
Add tests for CVE-2022-2795
Add a test ensuring that the amount of work fctx_getaddresses() performs
for any encountered delegation is limited: delegate example.net to a set
of 1,000 name servers in the redirect.com zone, the names of which all
resolve to IP addresses that nothing listens on, and query for a name in
the example.net domain, checking the number of times the findname()
function gets executed in the process; fail if that count is excessively
large.
Since the size of the referral response sent by ans3 is about 20 kB, it
cannot be sent back over UDP (EMSGSIZE) on some operating systems in
their default configuration (e.g. FreeBSD - see the
net.inet.udp.maxdgram sysctl). To enable reliable reproduction of
CVE-2022-2795 (retry patterns vary across BIND 9 versions) and avoid
false positives at the same time (thread scheduling - and therefore the
number of fetch context restarts - vary across operating systems and
across test runs), extend bin/tests/system/resolver/ans3/ans.pl so that
it also listens on TCP and make "ns1" in the "resolver" system test
always use TCP when communicating with "ans3".
Also add a test (foo.bar.sub.tld1/TXT) that ensures the new limitations
imposed on the resolution process by the mitigation for CVE-2022-2795 do
not prevent valid, glueless delegation chains from working properly.
Artem Boldariev [Tue, 18 Oct 2022 11:42:10 +0000 (14:42 +0300)]
TLS Stream: handle successful TLS handshake after listener shutdown
It was possible that accept callback can be called after listener
shutdown. In such a case the callback pointer equals NULL, leading to
segmentation fault. This commit fixes that.
Evan Hunt [Wed, 17 Aug 2022 19:46:02 +0000 (12:46 -0700)]
test for growth of compressed pipelined responses
add a test to compare the Content-Length of successive compressed
messages on a single HTTP connection that should contain the same
data; fail if the size grows by more than 100 bytes from one query
to the next.
Matthijs Mekking [Fri, 14 Oct 2022 14:38:25 +0000 (16:38 +0200)]
Change log level when doing rekey
This log happens when BIND checks the parental-agents if the DS has
been published. But if you don't have parental-agents set up, the list
of keys to check will be empty and the result will be ISC_R_NOTFOUND.
This is not an error, so change the log level to debug in this case.
Artem Boldariev [Fri, 14 Oct 2022 17:45:40 +0000 (20:45 +0300)]
Synchronise stop listening operation for multi-layer transports
This commit introduces a primitive isc__nmsocket_stop() which performs
shutting down on a multilayered socket ensuring the proper order of
the operations.
The shared data within the socket object can be destroyed after the
call completed, as it is guaranteed to not be used from within the
context of other worker threads.
Tony Finch [Fri, 14 Oct 2022 16:37:47 +0000 (17:37 +0100)]
Include the function name when reporting unexpected errors
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
Aram Sargsyan [Mon, 17 Oct 2022 08:45:45 +0000 (08:45 +0000)]
Handle large numbers when parsing/printing a duration
The isccfg_duration_fromtext() function is truncating large numbers
to 32 bits instead of capping or rejecting them, i.e. 64424509445,
which is 0xf00000005, gets parsed as 32-bit value 5 (0x00000005).
Fail parsing a duration if any of its components is bigger than
32 bits. Using those kind of big numbers has no practical use case
for a duration.
The isccfg_duration_toseconds() function can overflow the 32 bit
seconds variable when calculating the duration from its component
parts.
To avoid that, use 64-bit calculation and return UINT32_MAX if the
calculated value is bigger than UINT32_MAX. Again, a number this big
has no practical use case anyway.
The buffer for the generated duration string is limited to 64 bytes,
which, in theory, is smaller than the longest possible generated
duration string.
Use 80 bytes instead, calculated by the '7 x (10 + 1) + 3' formula,
where '7' is the count of the duration's parts (year, month, etc.), '10'
is their maximum length when printed as a decimal number, '1' is their
indicator character (Y, M, etc.), and 3 is two more indicators (P and T)
and the terminating NUL character.
Aram Sargsyan [Mon, 17 Oct 2022 08:45:26 +0000 (08:45 +0000)]
Fix an off-by-one error in cfg_print_duration()
The cfg_print_duration() checks added previously in the 'duration_test'
unit test uncovered a bug in cfg_print_duration().
When calculating the current 'str' pointer of the generated text in the
buffer 'buf', it erroneously adds 1 byte to compensate for that part's
indicator character. For example, to add 12 minutes, it needs to add
2 + 1 = 3 characters, where 2 is the length of "12", and 1 is the length
of "M" (for minute). The mistake was that the length of the indicator
is already included in 'durationlen[i]', so there is no need to
calculate it again.
In the result of this mistake the current pointer can advance further
than needed and end up after the zero-byte instead of right on it, which
essentially cuts off any further generated text. For example, for a
5 minutes and 30 seconds duration, instead of having this:
'P', 'T', '5', 'M', '3', '0', 'S', '\0'
The function generates this:
'P', 'T', '5', 'M', '\0', '3', '0', 'S', '\0'
Fix the bug by adding to 'str' just 'durationlen[i]' instead of
'durationlen[i] + 1'.
Aram Sargsyan [Mon, 17 Oct 2022 08:45:09 +0000 (08:45 +0000)]
Fix a logical bug in cfg_print_duration()
The cfg_print_duration() function prints a ISO 8601 duration value
converted from an array of integers, where the parts of the date and
time are stored.
durationlen[6], which holds the "seconds" part of the duration, has
a special case in cfg_print_duration() to ensure that when there are
no values in the duration, the result still can be printed as "PT0S",
instead of just "P", so it can be a valid ISO 8601 duration value.
There is a logical error in one of the two special case code paths,
when it checks that no value from the "date" part is defined, and no
"hour" or "minute" from the "time" part are defined.
Because of the error, durationlen[6] can be used uninitialized, in
which case the second parameter passed to snprintf() (which is the
maximum allowed length) can contain a garbage value.
This can not be exploited because the buffer is still big enough to
hold the maximum possible amount of characters generated by the "%u%c"
format string.
Fix the logical bug, and initialize the 'durationlen' array to zeros
to be a little safer from other similar errors.
Tony Finch [Thu, 30 Jun 2022 16:19:54 +0000 (17:19 +0100)]
CHANGES note for [GL !6517]
[performance] A new algorithm for DNS name compression based on a
hash set of message offsets. Name compression is now
more complete as well as being generally faster, and
the implementation is less complicated and requires
much less memory.
Tony Finch [Fri, 16 Sep 2022 17:55:48 +0000 (18:55 +0100)]
Test compression context hash set collisions
Check that names are correctly added and deleted in the compression
context. Use many names with differing numerical prefixes to make it
relatively easy to identify and debug problems.
Tony Finch [Thu, 23 Jun 2022 20:53:46 +0000 (21:53 +0100)]
Simplify and speed up DNS name compression
All we need for compression is a very small hash set of compression
offsets, because most of the information we need (the previously added
names) can be found in the message using the compression offsets.
This change combines dns_compress_find() and dns_compress_add() into
one function dns_compress_name() that both finds any existing suffix,
and adds any new prefix to the table. The old split led to performance
problems caused by duplicate names in the compression context.
Compression contexts are now either small or large, which the caller
chooses depending on the expected size of the message. There is no
dynamic resizing.
There is a behaviour change: compression now acts on all the labels in
each name, instead of just the last few.
A small benchmark suggests this is about 2x faster.
Artem Boldariev [Fri, 14 Oct 2022 18:36:51 +0000 (21:36 +0300)]
Fix isc_nmsocket_set_tlsctx()
During loop manager refactoring isc_nmsocket_set_tlsctx() was not
properly adapted. The function is expected to broadcast the new TLS
context for every worker, but this behaviour was accidentally broken.
Ondřej Surý [Fri, 14 Oct 2022 14:10:41 +0000 (16:10 +0200)]
Improve reporting for pthread_once errors
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/once.h with
PTHEADS_RUNTIME_CHECK(), in order to improve error reporting for any
once-related run-time failures (by augmenting error messages with
file/line/caller information and the error string corresponding to
errno).
Tom Krizek [Mon, 10 Oct 2022 11:28:29 +0000 (13:28 +0200)]
Remove system test delzone
There are multiple reasons to remove this test as obsolete:
- The test may not possibly work for over 2.5 years, since 98b3b93791777218c04a67ddaef22619162249f7 removed the rndc.py python
tool on which this test relies.
- It isn't part of the test suite either in CI or locally unless it is
explicitly enabled. As a result, there are many issues which prevent
the test from being executed caused by various refactoring efforts
accumulated over time.
- Even if the test could be executed, it has no clear failure condition.
If the python script(s) fail, the test still passes.
Ondřej Surý [Thu, 6 Oct 2022 10:56:25 +0000 (12:56 +0200)]
Replace the statschannel truncated tests with two new tests
Now that the artificial limit on the recv buffer has been removed, the
current system test always fails because it tests if the truncation has
happened.
Add test that sending more than 10 headers makes the connection to
closed; and add test that sending huge HTTP request makes the connection
to be closed.
Ondřej Surý [Thu, 6 Oct 2022 10:56:25 +0000 (12:56 +0200)]
Rewrite isc_httpd using picohttpparser and isc_url_parse
Rewrite the isc_httpd to be more robust.
1. Replace the hand-crafted HTTP request parser with picohttpparser for
parsing the whole HTTP/1.0 and HTTP/1.1 requests. Limit the number
of allowed headers to 10 (arbitrary number).
2. Replace the hand-crafted URL parser with isc_url_parse for parsing
the URL from the HTTP request.
3. Increase the receive buffer to match the isc_netmgr buffers, so we
can at least receive two full isc_nm_read()s. This makes the
truncation processing much simpler.
4. Process the received buffer from single isc_nm_read() in a single
loop and schedule the sends to be independent of each other.
The first two changes makes the code simpler and rely on already
existing libraries that we already had (isc_url based on nodejs) or are
used elsewhere (picohttpparser).
The second two changes remove the artificial "truncation" limit on
parsing multiple request. Now only a request that has too many
headers (currently 10) or is too big (so, the receive buffer fills up
without reaching end of the request) will end the connection.
We can be benevolent here with the limites, because the statschannel
channel is by definition private and access must be allowed only to
administrators of the server. There are no timers, no rate-limiting, no
upper limit on the number of requests that can be served, etc.
Ondřej Surý [Fri, 7 Oct 2022 13:48:26 +0000 (15:48 +0200)]
Add picohttpparser.{c.h} from https://github.com/h2o/picohttpparser
PicoHTTPParser is a tiny, primitive, fast HTTP request/response parser.
Unlike most parsers, it is stateless and does not allocate memory by
itself. All it does is accept pointer to buffer and the output
structure, and setups the pointers in the latter to point at the
necessary portions of the buffer.
Petr Špaček [Thu, 13 Oct 2022 12:33:52 +0000 (14:33 +0200)]
Add list of meaningless commits to .git-blame-ignore-revs
Works nicely together with:
git config --add blame.ignoreRevsFile .git-blame-ignore-revs
The list was generated by hand-picking from git log --oneline augmented
with:
--author=tbox
--grep=clang-format
--grep=copyright
--grep=reformat
--grep=whitespace
plus
git log --format='commit %H %s' --stat | grep -E 'commit|changed' | grep -B1 '[0-9][0-9][0-9] files changed'
plus some sanity checking.
Comments were added with:
for COMMIT in $(cat .git-blame-ignore-revs)
do git log -1 --format="# %s" "$COMMIT"
echo $COMMIT
done
Petr Špaček [Thu, 13 Oct 2022 08:33:41 +0000 (10:33 +0200)]
Replace #define DNS_NAMEATTR_ with struct of bools
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
Petr Špaček [Thu, 13 Oct 2022 11:08:22 +0000 (13:08 +0200)]
Fix latent bug in RBT node attributes handling
Originally RBT node stored three lowest bits from dns_name_t attributes.
This had a curious side-effect noticed by Tony Finch:
If you create an rbt node from a DYNAMIC name then the flag will be
propagated through dns_rbt_namefromnode() ... if you subsequently call
dns_name_free() it will try to isc_mem_put() a piece of an rbt node ...
but dns_name_free() REQUIRE()s that the name is dynamic so in the usual
case where rbt nodes are created from non-dynamic names, this kind of
code will fail an assertion.
This is a bug it dates back to june 1999 when NAMEATTR_DYNAMIC was
invented.
Apparently it does not happen often :-)
I'm planning to get rid of DNS_NAMEATTR_ definitions and bit operations,
so removal of this "three-bit-subset" assignment is a first step.
We can keep only the ABSOLUTE flag in RBT node and nothing else because
names attached to rbt nodes are always readonly: The internal node_name()
function always sets the NAMEATTR_READONLY when making a dns_name that
refers to the node's name, so the READONLY flag will be set in the name
returned by dns_rbt_namefromnode().
Artem Boldariev [Wed, 24 Aug 2022 17:44:47 +0000 (20:44 +0300)]
doth system test: increase transfers-in/out limits
Sometimes doth test could intermittently fail shortly after start due
to inability to complete a zone transfer in time. As it turned out, it
could happen due to transfers-in/out limits. Initially the defaults
were fine, but over time, especially when adding Strict/Mutual TLS, we
added more than 10 zones so it became possible to hit the limits.
This commit takes care of that by bumping the limits.
Artem Boldariev [Wed, 12 Oct 2022 15:46:54 +0000 (18:46 +0300)]
doth system test - decrease HTTP listener quota size
This commit reduces the size of HTTP listener quota from 300 (default)
to 100 so that it would make hitting any global limits in case of
running multiple tests in parallel in multiple containers unlikely.
This way the need in opening many file descriptors of different
kinds (e.g. client side connections and pipes) gets significantly
reduced while the required code paths are still verified.
Ondřej Surý [Tue, 11 Oct 2022 10:29:48 +0000 (12:29 +0200)]
Gracefully handle ISC_R_SHUTTINGDOWN in udp__send_cb
The ISC_R_SHUTTINGDOWN should be handled the same as ISC_R_CANCELED in
the udp__send_cb(), as we might be sending the data while the
loopmgr/netmgr shutdown has been initiated.
Ondřej Surý [Tue, 11 Oct 2022 10:03:17 +0000 (12:03 +0200)]
Make sure the unit test listening and connecting ports are different
In rare circumstances, the UDP port for the listening socket and the UDP
port for the connecting socket might be the same. Because we use the
"reuse" port socket option, this isn't caught when binding the socket,
and thus the connected client socket could send a datagram to itself,
completely bypassing the server. This doesn't happen under normal
operation mode because `named` is listening on a privileged port (53),
and even if not, it doesn't usually talk to itself as the tests do.
Pick an arbitrary port for listening (9153-9156) that is outside the
ephemeral port range for the network manager related unit tests (except
the `doh_test).
Ondřej Surý [Tue, 11 Oct 2022 07:10:16 +0000 (09:10 +0200)]
Don't set load-balancing socket option on the UDP connect sockets
The isc_nm_udpconnect() erroneously set the reuse port with
load-balancing on the outgoing connected UDP sockets. This socket
option makes only sense for the listening sockets. Don't set the
load-balancing reuse port option on the outgoing UDP sockets.
Ondřej Surý [Wed, 12 Oct 2022 07:43:56 +0000 (09:43 +0200)]
Retry on timeout in the UDP recv_one, recv_two and double_read tests
Since we are testing UDP on the localhost and the same interface, the
UDP datagrams can't get lost. Change the connect read callback, so it
starts reading again on the timeout instead of just getting stuck, and
fail when any other result codes than ISC_R_SUCCESS and ISC_R_TIMEDOUT
are received because we don't expect them to happen in these simple
tests.
Artem Boldariev [Wed, 12 Oct 2022 12:26:06 +0000 (15:26 +0300)]
DoH unit test: remove broken remnants of slowdown logic
This commit removes broken remnants of unit test slowdown logic, which
caused unit test hangs on platforms susceptible to "too many open
files" error, notably OpenBSD.
Artem Boldariev [Tue, 4 Oct 2022 18:03:23 +0000 (21:03 +0300)]
TLS: clear error queue before doing IO or calling SSL_get_error()
Ensure that TLS error is empty before calling SSL_get_error() or doing
SSL I/O so that the result will not get affected by prior error
statuses.
In particular, the improper error handling led to intermittent unit
test failure and, thus, could be responsible for some of the system
test failures and other intermittent TLS-related issues.
Ondřej Surý [Wed, 12 Oct 2022 12:40:49 +0000 (14:40 +0200)]
Ignore additional return codes in the netmgr unit tests
There was inconsistency in which error codes would get accepted and
ignored in the network manager unit test callbacks. Add following
results, so we just detach the handle instead of causing assertion
failure:
* ISC_R_SHUTTINGDOWN - when the network manager is shutting down
* ISC_R_CANCELED - the socket has been shut down
* ISC_R_EOF - the (TCP) communication has ended on the other side
* ISC_R_CONNECTIONRESET - the TCP connection was reset
This should fix some of the spurious unit test failures.
Aram Sargsyan [Wed, 12 Oct 2022 08:21:35 +0000 (08:21 +0000)]
Remove a superfluous check of sock->fd against -1
The check is left from when tcp_connect_direct() called isc__nm_socket()
and it was uncertain whether it had succeeded, but now isc__nm_socket()
is called before tcp_connect_direct(), so sock->fd cannot be -1.
*** CID 357292: (REVERSE_NEGATIVE)
/lib/isc/netmgr/tcp.c: 309 in isc_nm_tcpconnect()
303
304 atomic_store(&sock->active, true);
305
306 result = tcp_connect_direct(sock, req);
307 if (result != ISC_R_SUCCESS) {
308 atomic_store(&sock->active, false);
>>> CID 357292: (REVERSE_NEGATIVE)
>>> You might be using variable "sock->fd" before verifying that it is >= 0.
309 if (sock->fd != (uv_os_sock_t)(-1)) {
310 isc__nm_tcp_close(sock);
311 }
312 isc__nm_connectcb(sock, req, result, true);
313 }
314
Ondřej Surý [Tue, 11 Oct 2022 07:06:37 +0000 (09:06 +0200)]
Handle double timeout in udp_cancel_read test
If sending took too long the isc_nm_read() could timeout twice, leading
to extra 'cread' counter in the udp_cancel_read test. Increase the
cread counter only on ISC_R_EOF (canceled read) and deal with the
multiple ISC_R_TIMEOUTS gracefully.
Michał Kępień [Tue, 11 Oct 2022 09:54:57 +0000 (11:54 +0200)]
Fix startup detection after restart in start.pl
The bin/tests/system/start.pl script waits until a "running" message is
logged by a given name server instance before attempting to send a
version.bind/CH/TXT query to it. The idea behind this was to make the
script wait until named loads all the zones it is configured to serve
before telling the system test framework that a given server is ready to
use; this prevents the need to add boilerplate code that waits for a
specific zone to be loaded to each test expecting that.
The problem is that when it looks for "running" messages, the
bin/tests/system/start.pl script assumes that the existence of any such
message in the named.run file indicates that a given named instance has
already finished loading all zones. Meanwhile, some system tests
restart all the named instances they use throughout their lifetime (some
even do that a few times), for example to run Python-based tests. The
bin/tests/system/start.pl script handles such a scenario incorrectly: as
soon as it finds any "running" message in the named.run file it inspects
and it gets a response to a version.bind/CH/TXT query, it tells the
system test framework that a given server is ready to use, which might
not be true - it is possible that only the "version.bind" zone is loaded
at that point and the "running" message found was logged by a
previously-shutdown named instance. This triggers intermittent failures
for Python-based tests.
Fix by improving the logic that the bin/tests/system/start.pl script
uses to detect server startup: check how many "running" lines are
present in a given named.run file before attempting to start a named
instance and only proceed with version.bind/CH/TXT queries when the
number of "running" lines found in that named.run file increases after
the server is started.
Michał Kępień [Tue, 11 Oct 2022 09:54:57 +0000 (11:54 +0200)]
Do not truncate ns2 logs in the "rrsetorder" test
In the "rrsetorder" system test, the ns2 named instance is restarted
without passing the --restart option to bin/tests/system/start.pl. This
causes the log file for that named instance to be needlessly truncated.
Prevent this from happening by restarting the affected named instance
in the same way as all the other named instances used in system tests.
Ondřej Surý [Tue, 4 Oct 2022 15:07:19 +0000 (17:07 +0200)]
Record the 'edns-udp-size' in the view, not in the resolver
Getting the recorded value of 'edns-udp-size' from the resolver requires
strong attach to the dns_view because we are accessing `view->resolver`.
This is not the case in places (f.e. dns_zone unit) where `.udpsize` is
accessed. By moving the .udpsize field from `struct dns_resolver` to
`struct dns_view`, we can access the value directly even with weakly
attached dns_view without the need to lock the view because `.udpsize`
can be accessed after the dns_view object has been shut down.
Ondřej Surý [Mon, 3 Oct 2022 13:54:12 +0000 (15:54 +0200)]
Resolve violation of weak referencing dns_view
The dns_view implements weak and strong reference counting. When strong
reference counting reaches zero, the adb, ntatable and resolver objects
are shut down and detached.
In dns_zone and dns_nta the dns_view was weakly attached, but the
view->resolver reference was accessed directly leading to dereferencing
the NULL pointer.
Add dns_view_getresolver() method which attaches to view->resolver
object under the lock (if it still exists) ensuring the dns_resolver
will be kept referenced until not needed.
Tony Finch [Wed, 5 Oct 2022 11:11:41 +0000 (12:11 +0100)]
Avoid dead code warning when using a constant boolean
The value of `sign_bit` is platform-dependent but constant at compile
time. Use a cast to convert the boolean `sign_bit` to 0 or 1 instead of
ternary `?:` because one branch of the conditional is dead code. (We
could leave out the cast to `size_t` but our style prefers to handle
booleans more explicitly, hence the `?:` that caused the issue.)
*** CID 358310: Possible Control flow issues (DEADCODE)
/lib/isc/resource.c: 118 in isc_resource_setlimit()
112 * rlim_t, and whether rlim_t has a sign bit.
113 */
114 isc_resourcevalue_t rlim_max = UINT64_MAX;
115 size_t wider = sizeof(rlim_max) - sizeof(rlim_t);
116 bool sign_bit = (double)(rlim_t)-1 < 0;
117
>>> CID 358310: Possible Control flow issues (DEADCODE)
>>> Execution cannot reach the expression "1" inside this statement: "rlim_max >>= 8UL * wider + ...".
118 rlim_max >>= CHAR_BIT * wider + (sign_bit ? 1 : 0);
119 rlim_value = ISC_MIN(value, rlim_max);
120 }
121
122 rl.rlim_cur = rl.rlim_max = rlim_value;
123 unixresult = setrlimit(unixresource, &rl);
Ondřej Surý [Fri, 26 Aug 2022 09:58:51 +0000 (11:58 +0200)]
Use isc_mem_regetx() when appropriate
While refactoring the isc_mem_getx(...) usage, couple places were
identified where the memory was resized manually. Use the
isc_mem_reget(...) that was introduced in [GL !5440] to resize the
arrays via function rather than a custom code.
Ondřej Surý [Fri, 26 Aug 2022 09:58:51 +0000 (11:58 +0200)]
Use designated initializers instead of memset()/MEM_ZERO for structs
In several places, the structures were cleaned with memset(...)) and
thus the semantic patch converted the isc_mem_get(...) to
isc_mem_getx(..., ISC_MEM_ZERO). Use the designated initializer to
initialized the structures instead of zeroing the memory with
ISC_MEM_ZERO flag as this better matches the intended purpose.