Witold Kręcicki [Thu, 1 Oct 2020 11:18:47 +0000 (13:18 +0200)]
Add DoT support to bind
Parse the configuration of tls objects into SSL_CTX* objects. Listen on
DoT if 'tls' option is setup in listen-on directive. Use DoT/DoH ports
for DoT/DoH.
This commit adds stub parser support and tests for:
- "tls" statement, specifying key and cert.
- an optional "tls" keyvalue in listen-on statements for DoT
configuration.
Documentation for these options has also been added to the ARM, but
needs further work.
Witold Kręcicki [Wed, 13 May 2020 15:37:51 +0000 (17:37 +0200)]
netmgr: server-side TLS support
Add server-side TLS support to netmgr - that includes moving some of the
isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper
tcp or tls function, and a new isc_nm_listentls() function.
Add DoT support to tcpdns - isc_nm_listentlsdns().
Mark Andrews [Fri, 23 Oct 2020 02:33:34 +0000 (13:33 +1100)]
Fixup legacy test to account for not falling back to EDNS 512 lookups.
The SOA lookup for edns512 could succeed if the negative response
for ns.edns512/AAAA completed before all the edns512/SOA query
attempts are made. The ns.edns512/AAAA lookup returns tc=1 and
the SOA record is cached after processing the NODATA response.
Lookup a TXT record at edns512 and look it up instead of the
SOA record.
Removed 'checking that TCP failures do not influence EDNS statistics
in the ADB' as it is no longer appropriate.
Evan Hunt [Mon, 9 Nov 2020 20:33:37 +0000 (12:33 -0800)]
address some possible shutdown races in xfrin
there were two failures during observed in testing, both occurring
when 'rndc halt' was run rather than 'rndc stop' - the latter dumps
zone contents to disk and presumably introduced enough delay to
prevent the races:
- a failure when the zone was shut down and called dns_xfrin_detach()
before the xfrin had finished connecting; the connect timeout
terminated without detaching its handle
- a failure when the tcpdns socket timer fired after the outerhandle
had already been cleared.
this commit incidentally addresses a failure observed in mutexatomic
due to a variable having been initialized incorrectly.
Ondřej Surý [Sat, 10 Oct 2020 05:26:18 +0000 (07:26 +0200)]
Add libssl libraries to Windows build
This commit extends the perl Configure script to also check for libssl
in addition to libcrypto and change the vcxproj source files to link
with both libcrypto and libssl.
Ondřej Surý [Mon, 9 Nov 2020 10:32:55 +0000 (11:32 +0100)]
Refactor the xfrin reference counting
Previously, the xfrin object relied on four different reference counters
(`refs`, `connects`, `sends`, `recvs`) and destroyed the xfrin object
only if all of them were zero. This commit reduces the reference
counting only to the `references` (renamed from `refs`) counter. We
keep the existing `connects`, `sends` and `recvs` as safe guards, but
they are not formally needed.
Evan Hunt [Fri, 9 Oct 2020 23:38:30 +0000 (16:38 -0700)]
remove isc_task from xfrin
since the network manager is now handling timeouts, xfrin doesn't
need an isc_task object.
it may be necessary to revert this later if we find that it's
important for zone_xfrdone() to be executed in the zone task context.
currently things seem to be working well without that, though.
Ondřej Surý [Sat, 7 Nov 2020 19:48:37 +0000 (20:48 +0100)]
netmgr: Don't crash if socket() returns an error in udpconnect
socket() call can return an error - e.g. EMFILE, so we need to handle
this nicely and not crash.
Additionally wrap the socket() call inside a platform independent helper
function as the Socket data type on Windows is unsigned integer:
> This means, for example, that checking for errors when the socket and
> accept functions return should not be done by comparing the return
> value with –1, or seeing if the value is negative (both common and
> legal approaches in UNIX). Instead, an application should use the
> manifest constant INVALID_SOCKET as defined in the Winsock2.h header
> file.
Ondřej Surý [Fri, 6 Nov 2020 15:12:17 +0000 (16:12 +0100)]
dig: Refactor recv_done, so there's less exit paths
The recv_done() callback had many exit paths with different conditions,
and every path had it's own set of destructors. The refactored code now
has unified exit path with descriptive goto labels matching the intent:
The only exception to the rule is check_for_more_data() path, where the
part of the query gets reused, so the query->readhandle and query gets
detached on it's own, and by going to the keep_query, we are just
skipping calling the destructors again.
Ondřej Surý [Fri, 6 Nov 2020 12:11:08 +0000 (13:11 +0100)]
netmgr: Always load the result from async socket
Because we use result earlier for setting the loadbalancing on the
socket, we could be left with a ISC_R_NOTIMPLEMENTED value stored in the
variable and when the UDP connection would succeed, we would
errorneously return this value instead of ISC_R_SUCCESS.
Evan Hunt [Thu, 5 Nov 2020 22:15:14 +0000 (14:15 -0800)]
dig: prevent query from being detached if udpconnect fails on first attempt
FreeBSD sometimes returns spurious errors in UDP connect() attempts,
so we try a few times before giving up. However, each failed attempt
triggers a call to udp_ready() in dighost.c, and that was causing
the query object to be detached prematurely.
Ondřej Surý [Thu, 5 Nov 2020 14:22:38 +0000 (15:22 +0100)]
dig: add reference counter to the dig_lookup_t object
Sometimes, the dig_lookup_t could be destroyed before the final
send_done() callback was be called, leading to dereferencing an
already freed dig_lookup_t object. By making the dig_lookup_t
reference counted, we are ensuring that it won't be freed until
the last reference (from dig_query_t .lookup) is released.
one of the tests in the resolver system test depends on dig
getting no response to its first two query attempts, and SERVFAIL
on the third after resolution times out.
using a 5-second retry timer in dig means the SERVFAIL response
could occur while dig is discarding the second query and preparing
to send the third. in this case the server's response could be
missed. shortening the retry interval to 4 seconds ensures that
dig has already sent the third query when the SERVFAIL response
arrives.
also, the serve-stale system test could fail due to a race in which
it timed out after waiting ten seconds for a file to be written, and
the dig timeout was just a bit longer. this is addressed by extending
the dig timeout to 11 seconds for this test.
Evan Hunt [Tue, 3 Nov 2020 03:58:05 +0000 (19:58 -0800)]
add isc_nmhandle_settimeout() function
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
because dig now uses the netmgr, printing of response messages
happens in a different thread than setup. the IDN output filtering
procedure, which set using dns_name_settotextfilter(), is stored as
thread-local data, and so if it's set during setup, it won't be
accessible when printing. we now set it immediately before printing,
in the same thread, and clear it immedately afterward.
The network manager does not support returning UDP datagrams to
clients from unexpected sources; it is therefore not possible for
dig to accept them. The "+[no]unexpected" option has therefore
been removed from the dig command and its documentation.
Michał Kępień [Thu, 5 Nov 2020 10:45:19 +0000 (11:45 +0100)]
Fix detection of CMake-built libuv on Windows
As of libuv 1.36.0, CMake is the only supported build method for libuv
on Windows. Account for that fact by adjusting the relevant paths and
DLL file names used in the win32utils/Configure script. Update
Windows-specific documentation accordingly.
Michał Kępień [Thu, 5 Nov 2020 10:45:19 +0000 (11:45 +0100)]
Use "image" key in Windows GitLab CI job templates
Our GitLab Runner Custom executor scripts now use the "image" key for
determining the Windows Docker image to use for a given CI job. Update
.gitlab-ci.yml to reflect that change.
Michał Kępień [Thu, 5 Nov 2020 06:53:43 +0000 (07:53 +0100)]
Wait for the "fast-expire" zone to be transferred
In order for a "fast-expire/IN: response-policy zone expired" message to
be logged in ns3/named.run, the "fast-expire" zone must first be
transferred in by that server. However, with unfavorable timing, ns3
may be stopped before it manages to fetch the "fast-expire" zone from
ns5 and after the latter has been reconfigured to no longer serve that
zone. In such a case, the "rpz" system test will report a false
positive for the relevant check. Prevent that from happening by
ensuring ns3 manages to transfer the "fast-expire" zone before getting
shut down.
Some setup scripts uses DEFAULT_ALGORITHM in their dnssec-policy
and/or initial signing. The tests still used the literal values
13, ECDSAP256SHA256, and 256. Replace those occurrences where
appropriate.
Ondřej Surý [Mon, 2 Nov 2020 14:55:12 +0000 (15:55 +0100)]
Put up additional safe guards to not use inactive/closed tcpdns socket
When we are operating on the tcpdns socket, we need to double check
whether the socket or its outerhandle or its listener or its mgr is
still active and when not, bail out early.
Witold Kręcicki [Sat, 31 Oct 2020 20:08:53 +0000 (21:08 +0100)]
Fix improper closed connection handling in tcpdns.
If dnslisten_readcb gets a read callback it needs to verify that the
outer socket wasn't closed in the meantime, and issue a CANCELED callback
if it was.
Ondřej Surý [Tue, 27 Oct 2020 16:12:41 +0000 (17:12 +0100)]
add a netmgr unit test
tests of UDP and TCP cases including:
- sending and receiving
- closure sockets without reading or sending
- closure of sockets at various points while sending and receiving
- since the teste is multithreaded, cmocka now aborts tests on the
first failure, so that failures in subthreads are caught and
reported correctly.
Ondřej Surý [Thu, 29 Oct 2020 11:04:00 +0000 (12:04 +0100)]
Fix more races between connect and shutdown
There were more races that could happen while connecting to a
socket while closing or shutting down the same socket. This
commit introduces a .closing flag to guard the socket from
being closed twice.
Ondřej Surý [Tue, 27 Oct 2020 19:00:08 +0000 (20:00 +0100)]
Fix a race between isc__nm_async_shutdown() and new sends/reads
There was a data race where a new event could be scheduled after
isc__nm_async_shutdown() had cleaned up all the dangling UDP/TCP
sockets from the loop.
Ondřej Surý [Mon, 26 Oct 2020 16:31:55 +0000 (17:31 +0100)]
Refactor udp_recv_cb()
- more logical code flow.
- propagate errors back to the caller.
- add a 'reading' flag and call the callback from failed_read_cb()
only when it the socket was actively reading.
Ondřej Surý [Mon, 26 Oct 2020 13:19:37 +0000 (14:19 +0100)]
Fix netmgr read/connect timeout issues
- don't bother closing sockets that are already closing.
- UDP read timeout timer was not stopped after reading.
- improve handling of TCP connection failures.
add netmgr functions to support outgoing DNS queries
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
timeout argument to ensure connections time out and are correctly
cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
failure if reading is interrupted by a non-network thread (e.g.
a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
prior to 1.27, when uv_udp_connect() was added
all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
Mark Andrews [Wed, 28 Oct 2020 05:40:36 +0000 (16:40 +1100)]
Check that a zone in the process of being signed resolves
ans10 simulates a local anycast server which has both signed and
unsigned instances of a zone. 'A' queries get answered from the
signed instance. Everything else gets answered from the unsigned
instance. The resulting answer should be insecure.
Mark Andrews [Wed, 28 Oct 2020 00:58:38 +0000 (11:58 +1100)]
Handle DNS_R_NCACHENXRRSET in fetch_callback_{dnskey,validator}()
DNS_R_NCACHENXRRSET can be return when zones are in transition state
from being unsigned to signed and signed to unsigned. The validation
should be resumed and should result in a insecure answer.
Witold Kręcicki [Tue, 27 Oct 2020 09:09:30 +0000 (10:09 +0100)]
Properly handle outer TCP connection closed in TCPDNS.
If the connection is closed while we're processing the request
we might access TCPDNS outerhandle which is already reset. Check
for this condition and call the callback with ISC_R_CANCELED result.
Evan Hunt [Thu, 29 Oct 2020 01:01:49 +0000 (18:01 -0700)]
fix a typo in rpz test
"tcp-only" was not being tested correctly in the RPZ system test
because the option to the "digcmd" function that causes queries to
be sent via TCP was misspelled in one case, and was being interpreted
as a query name.
the "ckresult" function has also been changed to be case sensitive
for consistency with "digcmd".