Mark Andrews [Wed, 29 Sep 2021 03:55:46 +0000 (13:55 +1000)]
Rework rbtdb.c:find_coveringnsec() to use the auxilary nsec rbt
this improves the performance of looking for NSEC and RRSIG(NSEC)
records in the cache by skipping lots of nodes in the main trees
in the cache without these records present. This is a simplified
version of previous_closest_nsec() which uses the same underlying
mechanism to look for NSEC and RRSIG(NSEC) records in authorative
zones.
The auxilary NSEC tree was already being maintained as a side effect
of looking for the covering NSEC in large zones where there can be
lots of glue records that needed to be skipped. Nodes are added
to the tree whenever a NSEC record is added to the primary tree.
They are removed when the corresponding node is removed from the
primary tree.
Having nodes in the NSEC tree w/o NSEC records in the primary tree
should not impact on synth-from-dnssec efficiency as that node would
have held the NSEC we would have been needed to synthesise the
response. Removing the node when the NSEC RRset expires would only
cause rbtdb to return a NSEC which would be rejected at a higher
level.
Ondřej Surý [Wed, 1 Dec 2021 16:41:20 +0000 (17:41 +0100)]
Improve the logging on failed TCP accept
Previously, when TCP accept failed, we have logged a message with
ISC_LOG_ERROR level. One common case, how this could happen is that the
client hits TCP client quota and is put on hold and when resumed, the
client has already given up and closed the TCP connection. In such
case, the named would log:
TCP connection failed: socket is not connected
This message was quite confusing because it actually doesn't say that
it's related to the accepting the TCP connection and also it logs
everything on the ISC_LOG_ERROR level.
Change the log message to "Accepting TCP connection failed" and for
specific error states lower the severity of the log message to
ISC_LOG_INFO.
Ondřej Surý [Wed, 1 Dec 2021 11:22:59 +0000 (12:22 +0100)]
Add TCP connection reset test
The TCP connection reset test starts mock UDP and TCP server which
always returns empty DNS answer with TC bit set over UDP and resets the
TCP connection after five seconds.
When tested without the fix, the DNS query to 10.53.0.2 times out and
the ns2 server hangs at shutdown.
Evan Hunt [Tue, 30 Nov 2021 08:57:27 +0000 (09:57 +0100)]
On non-matching answer, check for missed timeout
A TCP connection may be held open past its proper timeout if it's
receiving a stream of DNS responses that don't match any queries.
In this case, we now check whether the oldest query should have timed
out.
Ondřej Surý [Mon, 29 Nov 2021 08:59:33 +0000 (09:59 +0100)]
Tear down the TCP connection on too many unexpected DNS messages
When the outgoing TCP dispatch times-out active response, we might still
receive the answer during the lifetime of the connection. Previously,
we would just ignore any non-matching DNS answers, which would allow the
server to feed us with otherwise valid DNS answer and keep the
connection open.
Add a counter for timed-out DNS queries over TCP and tear down the whole
TCP connection if we receive unexpected number of DNS answers.
Ondřej Surý [Fri, 26 Nov 2021 08:14:58 +0000 (09:14 +0100)]
Shutdown all TCP connection on invalid DNS message
Previously, when invalid DNS message is received over TCP we throw the
garbage DNS message away and continued looking for valid DNS message
that would match our outgoing queries. This logic makes sense for UDP,
because anyone can send DNS message over UDP.
Change the logic that the TCP connection is closed when we receive
garbage, because the other side is acting malicious.
Ondřej Surý [Fri, 26 Nov 2021 08:14:58 +0000 (09:14 +0100)]
Shutdown all active TCP connections on error
When outgoing TCP connection was prematurely terminated (f.e. with
connection reset), the dispatch code would not cleanup the resources
used by such connection leading to dangling dns_dispentry_t entries.
Ondřej Surý [Wed, 1 Dec 2021 09:20:31 +0000 (10:20 +0100)]
Add an idna test that _ and * characters are preserved
Add a idna that checks whether non-character letters like _ and * are
preserved when IDN is enabled. This wasn't the case when
UseSTD3ASCIIRules were enabled, f.e. _ from _tcp would get mangled to
tcp.
Artem Boldariev [Wed, 1 Dec 2021 10:50:21 +0000 (12:50 +0200)]
Increase startup timeout for servers in system tests
This change is made in particular to address the issue with 'doth'
system tests where servers are unable to iniitalise in time in CI
system under high load (that happened particularly often for Debian
Buster cross32 configuration).
The right solution, is, of course, to (re)use TLS context sparingly,
while right now we create too many of them.
Artem Boldariev [Tue, 30 Nov 2021 08:42:23 +0000 (10:42 +0200)]
TLS context handling code: Fix an abort on ancient OpenSSL version
There was a logical bug when setting a list of enabled TLS protocols,
which may lead to a crash (an abort()) on systems with ancient OpenSSL
versions.
The problem was due to the fact that we were INSIST()ing on supporting
all of the TLS versions, while checking only for mentioned in the
configuration was implied.
Artem Boldariev [Mon, 29 Nov 2021 10:50:35 +0000 (12:50 +0200)]
Add transport-acl system test
This commit adds a new system-test: transport-acl system test. It is
intended to test the new, extended syntax for ACLs, the one where port
or transport protocol can be specified. Currently, it includes the
tests only using allow-transfer statement, as this extended syntax is
used only there, at least for now.
Artem Boldariev [Tue, 23 Nov 2021 13:04:51 +0000 (15:04 +0200)]
Mention that the allow-transfer option has been extended
This commit updates both the reference manual and release notes with
the information that 'allow-transfer' has been extended with
additional "port" and "transport" options.
Artem Boldariev [Mon, 22 Nov 2021 13:31:15 +0000 (15:31 +0200)]
Extend the 'doth' system test to test extended allow-transfer option
This commit extends the 'doth' system test to verify that the new
extended 'allow-transfer' option syntax featuring 'port' and
'transport' parameters is supported and works as expected. That is, it
restricts the primary server to allow zone transfers only via XoT.
Additionally to that, it extends the 'checkonf' test with more
configuration file examples featuring the new syntax.
Artem Boldariev [Fri, 12 Nov 2021 14:53:13 +0000 (16:53 +0200)]
Integrate extended ACLs syntax featuring 'port' and 'transport' opts
This commit completes the integration of the new, extended ACL syntax
featuring 'port' and 'transport' options.
The runtime presentation and ACL loading code are extended to allow
the syntax to be used beyond the 'allow-transfer' option (e.g. in
'acl' definitions and other 'allow-*' options) and can be used to
ultimately extend the ACL support with transport-only
ACLs (e.g. 'transport-acl tls-acl port 853 transport tls'). But, due
to fundamental nature of such a change, it has not been completed as a
part of 9.17.X release series due to it being close to 9.18 stable
release status. That means that we do not have enough time to fully
test it.
The complete integration is planned as a part of 9.19.X release
series.
The code was manually verified to work as expected by temporarily
enabling the extended syntax for 'acl' statements and 'allow-query'
options, including ACL merging, negated ACLs.
Artem Boldariev [Thu, 4 Nov 2021 14:52:49 +0000 (16:52 +0200)]
Extend ACL syntax handling code with 'port' and 'transport' options
This commit extends ACL syntax handling code with 'port' and
'transport' options. Currently, the extended syntax is available only
for allow-transfer options.
Artem Boldariev [Mon, 15 Nov 2021 15:42:15 +0000 (17:42 +0200)]
Add isc_nm_socket_type()
This commit adds an isc_nm_socket_type() function which can be used to
obtain a handle's socket type.
This change obsoletes isc_nm_is_tlsdns_handle() and
isc_nm_is_http_handle(). However, it was decided to keep the latter as
we eventually might end up supporting multiple HTTP versions.
Artem Boldariev [Mon, 29 Nov 2021 08:45:35 +0000 (10:45 +0200)]
Disable unused 'tls' clause options: 'ca-file' and 'hostname'
This commit disables the unused 'tls' clause options. For these some
backing code exists, but their values are not really used anywhere,
nor there are sufficient syntax tests for them.
These options are only disabled temporarily, until TLS certificate
verification gets implemented.
Artem Boldariev [Wed, 24 Nov 2021 12:09:31 +0000 (14:09 +0200)]
TLS stream: disable TLS I/O debug log message by default
This commit makes the TLS stream code to not issue mostly useless
debug log message on error during TLS I/O. This message was cluttering
logs a lot, as it can be generated on (almost) any non-clean TLS
connection termination, even in the cases when the actual query
completed successfully. Nor does it provide much value for end-users,
yet it can occasionally be seen when using dig and quite often when
running BIND over a publicly available network interface.
This commit removes unneeded isc__nmsocket_prep_destroy() call on ALPN
negotiation failure, which was eventually causing the TLS handle to
leak.
This call is not needed, as not attaching to the transport (TLS)
handle should be enough. At this point it seems like a kludge from
earlier days of the TLS code.
Mark Andrews [Thu, 25 Nov 2021 02:16:56 +0000 (13:16 +1100)]
Exercise ISC_R_NOSPACE path in dns_sdlz_putrr
Use relative names when adding SOA record and a long domain
name to create SOA RR where the wire format is longer than
the initial buffer allocation in dns_sdlz_putrr.
Mark Andrews [Wed, 24 Nov 2021 00:03:19 +0000 (11:03 +1100)]
Do not convert ISC_R_NOSPACE to DNS_R_SERVFAIL too early
The parsing loop needs to process ISC_R_NOSPACE to properly
size the buffer. If result is still ISC_R_NOSPACE at the end
of the parsing loop set result to DNS_R_SERVFAIL.
Michal Nowak [Wed, 24 Nov 2021 15:50:57 +0000 (16:50 +0100)]
Fix "array subscript is of type 'char'" on NetBSD 9
In file included from rdata.c:602:
In file included from ./code.h:88:
./rdata/in_1/svcb_64.c:259:9: warning: array subscript is of type 'char' [-Wchar-subscripts]
if (!isdigit(*region->base)) {
^~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/ctype_inline.h:51:44: note: expanded from macro 'isdigit'
#define isdigit(c) ((int)((_ctype_tab_ + 1)[(c)] & _CTYPE_D))
^~~~
Artem Boldariev [Thu, 11 Nov 2021 14:17:02 +0000 (16:17 +0200)]
Fix a crash on unexpected incoming DNS message during XoT xfer
This commit fixes a peculiar corner case in the client-side DoT code
because of which a crash could occur during a zone transfer. A junk
DNS message should be sent at the end of a zone transfer via TLS to
trigger the crash (abort).
This commit, hopefully, fixes that.
Also, this commit adds similar changes to the TCP DNS code, as it
shares the same origin and most of the logic.
Michał Kępień [Tue, 23 Nov 2021 14:35:39 +0000 (15:35 +0100)]
Fix handling of mismatched responses past timeout
When a UDP dispatch receives a mismatched response, it checks whether
there is still enough time to wait for the correct one to arrive before
the timeout fires. If there is not, the result code is set to
ISC_R_TIMEDOUT, but it is not subsequently used anywhere as 'response'
is set to NULL a few lines earlier. This results in the higher-level
read callback (resquery_response() in case of resolver code) not being
called. However, shortly afterwards, a few levels up the call chain,
isc__nm_udp_read_cb() calls isc__nmsocket_timer_stop() on the dispatch
socket, effectively disabling read timeout handling for that socket.
Combined with the fact that reading is not restarted in such a case
(e.g. by calling dispatch_getnext() from udp_recv()), this leads to the
higher-level query structure remaining referenced indefinitely because
the dispatch socket it uses will neither be read from nor closed due to
a timeout. This in turn causes fetch contexts to linger around
indefinitely, which in turn i.a. prevents certain cache nodes (those
containing rdatasets used by fetch contexts, like fctx->nameservers)
from being cleaned.
Fix by making sure the higher-level callback does get invoked with the
ISC_R_TIMEDOUT result code when udp_recv() determines there is no more
time left to receive the correct UDP response before the timeout fires.
This allows the higher-level callback to clean things up, preventing the
reference leak described above.
Aram Sargsyan [Mon, 11 Oct 2021 18:13:39 +0000 (18:13 +0000)]
Fix catalog zone reconfiguration crash
The following scenario triggers a "named" crash:
1. Configure a catalog zone.
2. Start "named".
3. Comment out the "catalog-zone" clause.
4. Run `rndc reconfig`.
5. Uncomment the "catalog-zone" clause.
6. Run `rndc reconfig` again.
Implement the required cleanup of the in-memory catalog zone during
the first `rndc reconfig`, so that the second `rndc reconfig` could
find it in an expected state.
Evan Hunt [Tue, 16 Nov 2021 05:59:37 +0000 (21:59 -0800)]
fix intermittent resolver test error
the resolver test checks that the correct number of fetches have
been sent NS rrsets of a given size, but it formerly did so by
counting queries received by the authoritative server, which could
result in an off-by-one count if one of the queries had been resent
due to a timeout or a port number collision.
this commit changes the test to count fetches initiated by the
resolver, which should prevent the intermittent test failure, and
is the actual datum we were interested in anyway.
Mark Andrews [Fri, 19 Nov 2021 09:54:05 +0000 (10:54 +0100)]
Reject too long ECDSA public keys
opensslecdsa_fromdns() already rejects too short ECDSA public keys.
Make it also reject too long ones. Remove an assignment made redundant
by this change.
Michał Kępień [Fri, 19 Nov 2021 09:32:21 +0000 (10:32 +0100)]
Fix parsing ECDSA keys
raw_key_to_ossl() assumes fixed ECDSA private key sizes (32 bytes for
ECDSAP256SHA256, 48 bytes for ECDSAP384SHA384). Meanwhile, in rare
cases, ECDSAP256SHA256 private keys are representable in 31 bytes or
less (similarly for ECDSAP384SHA384) and that is how they are then
stored in the "PrivateKey" field of the key file. Nevertheless,
raw_key_to_ossl() always calls BN_bin2bn() with a fixed length argument,
which in the cases mentioned above leads to erroneously interpreting
uninitialized memory as a part of the private key. This results in the
latter being malformed and broken signatures being generated. Address
by using the key length provided by the caller rather than a fixed one.
Apply the same change to public key parsing code for consistency, adding
an INSIST() to prevent buffer overruns.
Mark Andrews [Thu, 18 Nov 2021 03:31:52 +0000 (14:31 +1100)]
Don't use 'dnssec-signzone -P' unless necessary
Most of the test zones in the dnssec system test can be verified.
Use -z when only a single key is being used so that the verifier
knows that only a single key is in use.
Mark Andrews [Thu, 18 Nov 2021 03:22:04 +0000 (14:22 +1100)]
Generate test zone with multiple NSEC and NSEC3 chains
The method used to generate a test zone with multiple NSEC and
NSEC3 chains was incorrect. Multiple calls to dnssec-signzone
with multiple parameters is not additive. Extract the chain on
each run then add them to the final signed zone instance.
Evan Hunt [Fri, 19 Nov 2021 03:29:07 +0000 (19:29 -0800)]
fix a use-after-free in resolver
when processing a mismatched response, we call dns_dispatch_getnext().
If that fails, for example because of a timeout, fctx_done() is called,
which cancels all queries. This triggers a crash afterward when
fctx_cancelquery() is called, and is unnecessary since fctx_done()
would have been called later anyway.
Ondřej Surý [Mon, 15 Nov 2021 11:13:22 +0000 (12:13 +0100)]
Fix the data race when shutting down dns_adb
When dns_adb is shutting down, first the adb->shutting_down flag is set
and then task is created that runs shutdown_stage2() that sets the
shutdown flag on names and entries. However, when dns_adb_createfind()
is called, only the individual shutdown flags are being checked, and the
global adb->shutting_down flag was not checked. Because of that it was
possible for a different thread to slip in and create new find between
the dns_adb_shutdown() and dns_adb_detach(), but before the
shutdown_stage2() task is complete. This was detected by
ThreadSanitizer as data race because the zonetable might have been
already detached by dns_view shutdown process and simultaneously
accessed by dns_adb_createfind().
This commit converts the adb->shutting_down to atomic_bool to prevent
the global adb lock when creating the find.