Michał Kępień [Mon, 2 Nov 2020 11:27:55 +0000 (12:27 +0100)]
Fix cross-compilation
Using AC_RUN_IFELSE() in configure.ac breaks cross-compilation:
configure: error: cannot run test program while cross compiling
Commit 978c7b2e89aa37a7ddfe2f6b6ba12ce73dd04528 caused AC_RUN_IFELSE()
to be used instead of AC_LINK_IFELSE() because the latter had seemingly
been causing the check for --wrap support in the linker to not work as
expected. However, it later turned out that the problem lied elsewhere:
a minus sign ('-') was missing from the LDFLAGS variable used in the
relevant check [1].
Revert to using AC_LINK_IFELSE() for checking whether the linker
supports the --wrap option in order to make cross-compilation possible
again.
Michał Kępień [Fri, 30 Oct 2020 08:12:50 +0000 (09:12 +0100)]
Fix getrbp()
The following compiler warning is emitted for the BACKTRACE_X86STACK
part of lib/isc/backtrace.c:
backtrace.c: In function ‘getrbp’:
backtrace.c:142:1: warning: no return statement in function returning non-void [-Wreturn-type]
While getrbp() stores the value of the RBP register in the RAX register
and thus does attempt to return a value, this is not enough for an
optimizing compiler to always produce the expected result. With -O2,
the following machine code may be generated in isc_backtrace_gettrace():
(Note that this method of grabbing a stack trace is finicky anyway
because in order for RBP to be relied upon, -fno-omit-stack-frame must
be present among CFLAGS.)
Michał Kępień [Fri, 30 Oct 2020 08:12:50 +0000 (09:12 +0100)]
Check for _Unwind_Backtrace() support
Some operating systems (e.g. Linux, FreeBSD) provide the
_Unwind_Backtrace() function in libgcc_s.so, which is automatically
linked into any binary using the functions provided by that library. On
OpenBSD, though, _Unwind_Backtrace() is provided by libc++abi.so, which
is not automatically linked into binaries produced by the stock system C
compiler.
Meanwhile, lib/isc/backtrace.c assumes that any GNU-compatible toolchain
allows _Unwind_Backtrace() to be used without any extra provisions in
the build system. This causes build failures on OpenBSD (and possibly
other systems).
Instead of making assumptions, actually check for _Unwind_Backtrace()
support in the toolchain if the backtrace() function is unavailable.
Michał Kępień [Fri, 30 Oct 2020 07:49:16 +0000 (08:49 +0100)]
Do not test "make depend" for out-of-tree builds
The make/mkdep script does not understand the concept of generated
source files (like lib/dns/dnstap.pb-c.c), which prevents it from
working correctly for out-of-tree builds. As "make depend" is not
required for building BIND and the "depend" make target was removed
altogether in the development branch, just prevent the "make depend"
check from being performed for out-of-tree builds in GitLab CI instead
of trying to add support for handling generated source files to
make/mkdep.
Michał Kępień [Fri, 30 Oct 2020 07:49:16 +0000 (08:49 +0100)]
Fix the "make depend" check in GitLab CI
"make depend" prints errors to stderr, not to stdout. This means that
the check for "make depend" errors currently used in the definition of
every build job in GitLab CI could never fail. Fix that check by
redirecting stderr to stdout. Also employ tee to prevent the output of
"make depend" from being hidden in the job log. (While using tee hides
the exit code of "make depend" itself, the next line still checks for
errors anyway.)
Mark Andrews [Wed, 28 Oct 2020 05:40:36 +0000 (16:40 +1100)]
Check that a zone in the process of being signed resolves
ans10 simulates a local anycast server which has both signed and
unsigned instances of a zone. 'A' queries get answered from the
signed instance. Everything else gets answered from the unsigned
instance. The resulting answer should be insecure.
Mark Andrews [Wed, 28 Oct 2020 00:58:38 +0000 (11:58 +1100)]
Handle DNS_R_NCACHENXRRSET in fetch_callback_{dnskey,validator}()
DNS_R_NCACHENXRRSET can be return when zones are in transition state
from being unsigned to signed and signed to unsigned. The validation
should be resumed and should result in a insecure answer.
Witold Kręcicki [Tue, 27 Oct 2020 09:09:30 +0000 (10:09 +0100)]
Properly handle outer TCP connection closed in TCPDNS.
If the connection is closed while we're processing the request
we might access TCPDNS outerhandle which is already reset. Check
for this condition and call the callback with ISC_R_CANCELED result.
Evan Hunt [Thu, 29 Oct 2020 01:01:49 +0000 (18:01 -0700)]
fix a typo in rpz test
"tcp-only" was not being tested correctly in the RPZ system test
because the option to the "digcmd" function that causes queries to
be sent via TCP was misspelled in one case, and was being interpreted
as a query name.
the "ckresult" function has also been changed to be case sensitive
for consistency with "digcmd".
The reason for running the last two jobs above sequentially rather than
in parallel is that both of them create *.gcda files (containing
coverage data) in the same locations. While some way of merging these
files from different job artifact archives could probably be designed
with the help of additional tools, the simplest thing to do is not to
run unit test and system test jobs in parallel, carrying *.gcda files
over between jobs as gcov knows how to append coverage data to existing
*.gcda files.
Also note that test coverage will not be visualized if any of the jobs
in the above dependency chain fails (because the gcov job will not be
run).
Michal Nowak [Tue, 27 Oct 2020 09:20:05 +0000 (10:20 +0100)]
Get rid of bashisms in string comparisons
The double equal sign ('==') is a Bash-specific string comparison
operator. Ensure the single equal sign ('=') is used in all POSIX shell
scripts in the system test suite in order to retain their portability.
Michal Nowak [Tue, 16 Jun 2020 12:19:41 +0000 (14:19 +0200)]
Add "stress" tests to GitLab CI
Run "stress" tests for scheduled pipelines and pipelines created for
tags. These tests were previously only performed manually (as part of
pre-release testing of each new BIND version). Their purpose is to
detect memory leaks and potential performance issues.
As the run time of each "stress" test itself is set to 1 hour, set the
GitLab CI job timeout to 2 hours in order to account for the extra time
needed to set the test up and gather its results.
Ondřej Surý [Wed, 21 Oct 2020 10:52:09 +0000 (12:52 +0200)]
Fix the isc_nm_closedown() to actually close the pending connections
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
whether the socket was still alive and scheduling reads/sends on
closed socket.
2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
changed to always return the error conditions via the callbacks, so
they always succeed. This applies to all protocols (UDP, TCP and
TCPDNS).
Ondřej Surý [Wed, 21 Oct 2020 06:56:21 +0000 (08:56 +0200)]
Fix the way tcp_send_direct() is used
There were two problems how tcp_send_direct() was used:
1. The tcp_send_direct() can return ISC_R_CANCELED (or translated error
from uv_tcp_send()), but the isc__nm_async_tcpsend() wasn't checking
the error code and not releasing the uvreq in case of an error.
2. In isc__nm_tcp_send(), when the TCP send is already in the right
netthread, it uses tcp_send_direct() to send the TCP packet right
away. When that happened the uvreq was not freed, and the error code
was returned to the caller. We need to return ISC_R_SUCCESS and
rather use the callback to report an error in such case.
Ondřej Surý [Tue, 20 Oct 2020 18:57:19 +0000 (20:57 +0200)]
Explicitly stop reading before closing the nmtcpsocket
When closing the socket that is actively reading from the stream, the
read_cb() could be called between uv_close() and close callback when the
server socket has been already detached hence using sock->statichandle
after it has been already freed.
Ondřej Surý [Tue, 20 Oct 2020 06:07:44 +0000 (08:07 +0200)]
Fix the way udp_send_direct() is used
There were two problems how udp_send_direct() was used:
1. The udp_send_direct() can return ISC_R_CANCELED (or translated error
from uv_udp_send()), but the isc__nm_async_udpsend() wasn't checking
the error code and not releasing the uvreq in case of an error.
2. In isc__nm_udp_send(), when the UDP send is already in the right
netthread, it uses udp_send_direct() to send the UDP packet right
away. When that happened the uvreq was not freed, and the error code
was returned to the caller. We need to return ISC_R_SUCCESS and
rather use the callback to report an error in such case.
Diego Fronza [Thu, 10 Sep 2020 18:33:15 +0000 (15:33 -0300)]
Added test for the proposed fix
This test is very simple, two nameserver instances are created:
- ns4: master, with 'minimal-responses yes', authoritative
for example. zone
- ns5: slave, stub zone
The first thing verified is the transfer of zone data from master
to slave, which should be saved in ns5/example.db.
After that, a query is issued to ns5 asking for target.example.
TXT, a record present in the master database with the "test" string
as content.
If that query works, it means stub zone successfully request
nameserver addresses from master, ns4.example. A/AAAA
The presence of both A/AAAA records for ns4 is also verified in the
stub zone local file, ns5/example.db.
Diego Fronza [Thu, 10 Sep 2020 18:09:14 +0000 (15:09 -0300)]
Fix transfer of glue records in stub zones if master has minimal-responses set
Stub zones don't make use of AXFR/IXFR for the transfering of zone
data, instead, a single query is issued to the master asking for
their nameserver records (NS).
That works fine unless master is configured with 'minimal-responses'
set to yes, in which case glue records are not provided by master
in the answer with nameservers authoritative for the zone, leaving
stub zones with incomplete databases.
This commit fix this problem in a simple way, when the answer with
the authoritative nameservers is received from master (stub_callback),
for each nameserver listed (save_nsrrset), a A and AAAA records for
the name is verified in the additional section, and if not present
a query is created to resolve the corresponsing missing glue.
A struct 'stub_cb_args' was added to keep relevant information for
performing a query, like TSIG key, udp size, dscp value, etc, this
information is borrowed from, and created within function 'ns_query',
where the resolving of nameserver from master starts.
A new field was added to the struct 'dns_stub', an atomic integer,
namely pending_requests, which is used to keep how many queries are
created when resolving nameserver addresses that were missing in
the glue.
When the value of pending_requests is zero we know we can release
resources, adjust zone timers, dump to zone file, etc.
Matthijs Mekking [Tue, 20 Oct 2020 08:57:16 +0000 (10:57 +0200)]
Don't increment network error stats on UV_EOF
When networking statistics was added to the netmgr (in commit 5234a8e00a6ae1df738020f27544594ccb8d5215), two lines were added that
increment the 'STATID_RECVFAIL' statistic: One if 'uv_read_start'
fails and one at the end of the 'read_cb'. The latter happens
if 'nread < 0'.
According to the libuv documentation, I/O read callbacks (such as for
files and sockets) are passed a parameter 'nread'. If 'nread' is less
than 0, there was an error and 'UV_EOF' is the end of file error, which
you may want to handle differently.
In other words, we should not treat EOF as a RECVFAIL error.
Diego Fronza [Thu, 1 Oct 2020 17:04:05 +0000 (14:04 -0300)]
Fix dnstap system test on FreeBSD
This commit ensures that dnstap output files captured
by fstrm_capture are properly flushed before any attempt
on reading them with dnstap-read is done.
By reading fstrm-capture source code it was noticed that
signal SIGHUP is used to flush the capture file.
Mark Andrews [Wed, 16 Sep 2020 02:40:52 +0000 (12:40 +1000)]
Try to improve rrl timing
Add a +burst option to mdig so that we have a second to setup the
mdig calls then they run at the start of the next second.
RRL uses 'queries in a second' as a approximation to
'queries per second'. Getting the bursts of traffic to all happen in
the same second should prevent false negatives in the system test.
We now have a second to setup the traffic in. Then the traffic should
be sent at the start of the next second. If that still fails we
should move to +burst=<now+2> (further extend mdig) instead of the
implicit <now+1> as the trigger second.
Mark Andrews [Mon, 12 Oct 2020 06:51:09 +0000 (17:51 +1100)]
Complete the isc_nmhandle_detach() in the worker thread.
isc_nmhandle_detach() needs to complete in the same thread
as shutdown_walk_cb() to avoid a race. Clear the caller's
pointer then pass control to the worker if necessary.
WARNING: ThreadSanitizer: data race
Write of size 8 at 0x000000000001 by thread T1:
#0 isc_nmhandle_detach lib/isc/netmgr/netmgr.c:1258:15
#1 control_command bin/named/controlconf.c:388:3
#2 dispatch lib/isc/task.c:1152:7
#3 run lib/isc/task.c:1344:2
Clone the csock in accept_connection(), not in callback
If we clone the csock (children socket) in TCP accept_connection()
instead of passing the ssock (server socket) to the call back and
cloning it there we unbreak the assumption that every socket is handled
inside it's own worker thread and therefore we can get rid of (at least)
callback locking.
Ondřej Surý [Fri, 2 Oct 2020 07:28:29 +0000 (09:28 +0200)]
Change the isc__nm_tcpdns_stoplistening() to be asynchronous event
The isc__nm_tcpdns_stoplistening() would call isc__nmsocket_clearcb()
that would clear the .accept_cb from non-netmgr thread. Change the
tcpdns_stoplistening to enqueue ievent that would get processed in the
right netmgr thread to avoid locking.
Change the default ENDS buffer size to 1232 for DNS Flag Day 2020
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication. In this commit, we implement the minimal
required changes by changing the defaults for `edns-udp-size`,
`max-udp-size` and `nocookie-udp-size` to `1232` (the value picked by
DNS Flag Day 2020).