one of the tests in the resolver system test depends on dig
getting no response to its first two query attempts, and SERVFAIL
on the third after resolution times out.
using a 5-second retry timer in dig means the SERVFAIL response
could occur while dig is discarding the second query and preparing
to send the third. in this case the server's response could be
missed. shortening the retry interval to 4 seconds ensures that
dig has already sent the third query when the SERVFAIL response
arrives.
also, the serve-stale system test could fail due to a race in which
it timed out after waiting ten seconds for a file to be written, and
the dig timeout was just a bit longer. this is addressed by extending
the dig timeout to 11 seconds for this test.
Evan Hunt [Tue, 3 Nov 2020 03:58:05 +0000 (19:58 -0800)]
add isc_nmhandle_settimeout() function
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
because dig now uses the netmgr, printing of response messages
happens in a different thread than setup. the IDN output filtering
procedure, which set using dns_name_settotextfilter(), is stored as
thread-local data, and so if it's set during setup, it won't be
accessible when printing. we now set it immediately before printing,
in the same thread, and clear it immedately afterward.
The network manager does not support returning UDP datagrams to
clients from unexpected sources; it is therefore not possible for
dig to accept them. The "+[no]unexpected" option has therefore
been removed from the dig command and its documentation.
Michał Kępień [Thu, 5 Nov 2020 10:45:19 +0000 (11:45 +0100)]
Fix detection of CMake-built libuv on Windows
As of libuv 1.36.0, CMake is the only supported build method for libuv
on Windows. Account for that fact by adjusting the relevant paths and
DLL file names used in the win32utils/Configure script. Update
Windows-specific documentation accordingly.
Michał Kępień [Thu, 5 Nov 2020 10:45:19 +0000 (11:45 +0100)]
Use "image" key in Windows GitLab CI job templates
Our GitLab Runner Custom executor scripts now use the "image" key for
determining the Windows Docker image to use for a given CI job. Update
.gitlab-ci.yml to reflect that change.
Michał Kępień [Thu, 5 Nov 2020 06:53:43 +0000 (07:53 +0100)]
Wait for the "fast-expire" zone to be transferred
In order for a "fast-expire/IN: response-policy zone expired" message to
be logged in ns3/named.run, the "fast-expire" zone must first be
transferred in by that server. However, with unfavorable timing, ns3
may be stopped before it manages to fetch the "fast-expire" zone from
ns5 and after the latter has been reconfigured to no longer serve that
zone. In such a case, the "rpz" system test will report a false
positive for the relevant check. Prevent that from happening by
ensuring ns3 manages to transfer the "fast-expire" zone before getting
shut down.
Some setup scripts uses DEFAULT_ALGORITHM in their dnssec-policy
and/or initial signing. The tests still used the literal values
13, ECDSAP256SHA256, and 256. Replace those occurrences where
appropriate.
Ondřej Surý [Mon, 2 Nov 2020 14:55:12 +0000 (15:55 +0100)]
Put up additional safe guards to not use inactive/closed tcpdns socket
When we are operating on the tcpdns socket, we need to double check
whether the socket or its outerhandle or its listener or its mgr is
still active and when not, bail out early.
Witold Kręcicki [Sat, 31 Oct 2020 20:08:53 +0000 (21:08 +0100)]
Fix improper closed connection handling in tcpdns.
If dnslisten_readcb gets a read callback it needs to verify that the
outer socket wasn't closed in the meantime, and issue a CANCELED callback
if it was.
Ondřej Surý [Tue, 27 Oct 2020 16:12:41 +0000 (17:12 +0100)]
add a netmgr unit test
tests of UDP and TCP cases including:
- sending and receiving
- closure sockets without reading or sending
- closure of sockets at various points while sending and receiving
- since the teste is multithreaded, cmocka now aborts tests on the
first failure, so that failures in subthreads are caught and
reported correctly.
Ondřej Surý [Thu, 29 Oct 2020 11:04:00 +0000 (12:04 +0100)]
Fix more races between connect and shutdown
There were more races that could happen while connecting to a
socket while closing or shutting down the same socket. This
commit introduces a .closing flag to guard the socket from
being closed twice.
Ondřej Surý [Tue, 27 Oct 2020 19:00:08 +0000 (20:00 +0100)]
Fix a race between isc__nm_async_shutdown() and new sends/reads
There was a data race where a new event could be scheduled after
isc__nm_async_shutdown() had cleaned up all the dangling UDP/TCP
sockets from the loop.
Ondřej Surý [Mon, 26 Oct 2020 16:31:55 +0000 (17:31 +0100)]
Refactor udp_recv_cb()
- more logical code flow.
- propagate errors back to the caller.
- add a 'reading' flag and call the callback from failed_read_cb()
only when it the socket was actively reading.
Ondřej Surý [Mon, 26 Oct 2020 13:19:37 +0000 (14:19 +0100)]
Fix netmgr read/connect timeout issues
- don't bother closing sockets that are already closing.
- UDP read timeout timer was not stopped after reading.
- improve handling of TCP connection failures.
add netmgr functions to support outgoing DNS queries
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
timeout argument to ensure connections time out and are correctly
cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
failure if reading is interrupted by a non-network thread (e.g.
a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
prior to 1.27, when uv_udp_connect() was added
all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
Mark Andrews [Wed, 28 Oct 2020 05:40:36 +0000 (16:40 +1100)]
Check that a zone in the process of being signed resolves
ans10 simulates a local anycast server which has both signed and
unsigned instances of a zone. 'A' queries get answered from the
signed instance. Everything else gets answered from the unsigned
instance. The resulting answer should be insecure.
Mark Andrews [Wed, 28 Oct 2020 00:58:38 +0000 (11:58 +1100)]
Handle DNS_R_NCACHENXRRSET in fetch_callback_{dnskey,validator}()
DNS_R_NCACHENXRRSET can be return when zones are in transition state
from being unsigned to signed and signed to unsigned. The validation
should be resumed and should result in a insecure answer.
Witold Kręcicki [Tue, 27 Oct 2020 09:09:30 +0000 (10:09 +0100)]
Properly handle outer TCP connection closed in TCPDNS.
If the connection is closed while we're processing the request
we might access TCPDNS outerhandle which is already reset. Check
for this condition and call the callback with ISC_R_CANCELED result.
Evan Hunt [Thu, 29 Oct 2020 01:01:49 +0000 (18:01 -0700)]
fix a typo in rpz test
"tcp-only" was not being tested correctly in the RPZ system test
because the option to the "digcmd" function that causes queries to
be sent via TCP was misspelled in one case, and was being interpreted
as a query name.
the "ckresult" function has also been changed to be case sensitive
for consistency with "digcmd".
Ondřej Surý [Tue, 27 Oct 2020 13:18:43 +0000 (14:18 +0100)]
Fix possible NULL dereference in cd->dlz_destroy()
If the call to cd->dlz_create() in dlopen_dlz_create() fails, cd->dbdata
may be NULL when dlopen_dlz_destroy() gets called in the cleanup path
and passing NULL to the cd->dlz_destroy() callback may cause a NULL
dereference. Ensure that does not happen by checking whether cd->dbdata
is non-NULL before calling the cd->dlz_destroy() callback.
Ondřej Surý [Tue, 20 Oct 2020 21:51:08 +0000 (23:51 +0200)]
Use libuv's shared library handling capabilities
While libltdl is a feature-rich library, BIND 9 code only uses its basic
capabilities, which are also provided by libuv and which BIND 9 already
uses for other purposes. As libuv's cross-platform shared library
handling interface is modeled after the POSIX dlopen() interface,
converting code using the latter to the former is simple. Replace
libltdl function calls with their libuv counterparts, refactoring the
code as necessary. Remove all use of libltdl from the BIND 9 source
tree.
Ondřej Surý [Tue, 20 Oct 2020 21:51:08 +0000 (23:51 +0200)]
Refactor the cleanup code in lt_dl code
The cleanup code that would clean the object after plugin/dlz/dyndb
loading has failed was duplicating the destructor for the object, so
instead of the extra code, we just use the destructor instead.
Ondřej Surý [Wed, 28 Oct 2020 14:25:44 +0000 (15:25 +0100)]
Unify lt_dlopen() error handling
Make sure an error gets logged when any lt_dlopen() call in the source
tree fails. Also make sure that NULL values returned by lt_dlerror()
are replaced with a generic error message to prevent passing NULL as an
argument for the %s format specifier.
Ondřej Surý [Mon, 26 Oct 2020 10:14:49 +0000 (11:14 +0100)]
Remove redundant lt_dlerror() calls
The redundant lt_dlerror() calls were taken from the examples to clean
any previous errors from lt_dl...() calls. However upon code
inspection, it was discovered there are no such paths that could cause
the lt_dlerror() to return spurious error messages.
Michal Nowak [Tue, 27 Oct 2020 09:20:05 +0000 (10:20 +0100)]
Get rid of bashisms in string comparisons
The double equal sign ('==') is a Bash-specific string comparison
operator. Ensure the single equal sign ('=') is used in all POSIX shell
scripts in the system test suite in order to retain their portability.
Michal Nowak [Tue, 16 Jun 2020 12:19:41 +0000 (14:19 +0200)]
Add "stress" tests to GitLab CI
Run "stress" tests for scheduled pipelines and pipelines created for
tags. These tests were previously only performed manually (as part of
pre-release testing of each new BIND version). Their purpose is to
detect memory leaks and potential performance issues.
As the run time of each "stress" test itself is set to 1 hour, set the
GitLab CI job timeout to 2 hours in order to account for the extra time
needed to set the test up and gather its results.
Ondřej Surý [Thu, 22 Oct 2020 10:32:18 +0000 (12:32 +0200)]
Postpone the isc_app_shutdown() after rndc response has been sent
When `rndc stop` is received, the isc_app_shutdown() was being called
before response to the rndc client has been sent; as the
isc_app_shutdown() also tears down the netmgr, the message was never
sent and rndc would complain about connection being interrupted in the
middle of the transaction. We now postpone the shutdown after the rndc
response has been sent.
Ondřej Surý [Wed, 21 Oct 2020 10:52:09 +0000 (12:52 +0200)]
Fix the isc_nm_closedown() to actually close the pending connections
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
whether the socket was still alive and scheduling reads/sends on
closed socket.
2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
changed to always return the error conditions via the callbacks, so
they always succeed. This applies to all protocols (UDP, TCP and
TCPDNS).
Ondřej Surý [Wed, 21 Oct 2020 06:56:21 +0000 (08:56 +0200)]
Fix the way tcp_send_direct() is used
There were two problems how tcp_send_direct() was used:
1. The tcp_send_direct() can return ISC_R_CANCELED (or translated error
from uv_tcp_send()), but the isc__nm_async_tcpsend() wasn't checking
the error code and not releasing the uvreq in case of an error.
2. In isc__nm_tcp_send(), when the TCP send is already in the right
netthread, it uses tcp_send_direct() to send the TCP packet right
away. When that happened the uvreq was not freed, and the error code
was returned to the caller. We need to return ISC_R_SUCCESS and
rather use the callback to report an error in such case.
Ondřej Surý [Tue, 20 Oct 2020 18:57:19 +0000 (20:57 +0200)]
Explicitly stop reading before closing the nmtcpsocket
When closing the socket that is actively reading from the stream, the
read_cb() could be called between uv_close() and close callback when the
server socket has been already detached hence using sock->statichandle
after it has been already freed.
Ondřej Surý [Tue, 20 Oct 2020 06:07:44 +0000 (08:07 +0200)]
Fix the way udp_send_direct() is used
There were two problems how udp_send_direct() was used:
1. The udp_send_direct() can return ISC_R_CANCELED (or translated error
from uv_udp_send()), but the isc__nm_async_udpsend() wasn't checking
the error code and not releasing the uvreq in case of an error.
2. In isc__nm_udp_send(), when the UDP send is already in the right
netthread, it uses udp_send_direct() to send the UDP packet right
away. When that happened the uvreq was not freed, and the error code
was returned to the caller. We need to return ISC_R_SUCCESS and
rather use the callback to report an error in such case.
The reason for running the last two jobs above sequentially rather than
in parallel is that both of them create *.gcda files (containing
coverage data) in the same locations. While some way of merging these
files from different job artifact archives could probably be designed
with the help of additional tools, the simplest thing to do is not to
run unit test and system test jobs in parallel, carrying *.gcda files
over between jobs as gcov knows how to append coverage data to existing
*.gcda files.
Also note that test coverage will not be visualized if any of the jobs
in the above dependency chain fails (because the gcov job will not be
run).