Michał Kępień [Fri, 30 Oct 2020 08:12:50 +0000 (09:12 +0100)]
Fix getrbp()
The following compiler warning is emitted for the BACKTRACE_X86STACK
part of lib/isc/backtrace.c:
backtrace.c: In function ‘getrbp’:
backtrace.c:139:1: warning: no return statement in function returning non-void [-Wreturn-type]
While getrbp() stores the value of the RBP register in the RAX register
and thus does attempt to return a value, this is not enough for an
optimizing compiler to always produce the expected result. With -O2,
the following machine code may be generated in isc_backtrace_gettrace():
(Note that this method of grabbing a stack trace is finicky anyway
because in order for RBP to be relied upon, -fno-omit-stack-frame must
be present among CFLAGS.)
Michał Kępień [Fri, 30 Oct 2020 08:12:50 +0000 (09:12 +0100)]
Check for _Unwind_Backtrace() support
Some operating systems (e.g. Linux, FreeBSD) provide the
_Unwind_Backtrace() function in libgcc_s.so, which is automatically
linked into any binary using the functions provided by that library. On
OpenBSD, though, _Unwind_Backtrace() is provided by libc++abi.so, which
is not automatically linked into binaries produced by the stock system C
compiler.
Meanwhile, lib/isc/backtrace.c assumes that any GNU-compatible toolchain
allows _Unwind_Backtrace() to be used without any extra provisions in
the build system. This causes build failures on OpenBSD (and possibly
other systems).
Instead of making assumptions, actually check for _Unwind_Backtrace()
support in the toolchain if the backtrace() function is unavailable.
Michał Kępień [Fri, 30 Oct 2020 07:49:16 +0000 (08:49 +0100)]
Do not test "make depend" for out-of-tree builds
The make/mkdep script does not understand the concept of generated
source files (like lib/dns/dnstap.pb-c.c), which prevents it from
working correctly for out-of-tree builds. As "make depend" is not
required for building BIND and the "depend" make target was removed
altogether in the development branch, just prevent the "make depend"
check from being performed for out-of-tree builds in GitLab CI instead
of trying to add support for handling generated source files to
make/mkdep.
Michał Kępień [Fri, 30 Oct 2020 07:49:16 +0000 (08:49 +0100)]
Fix the "make depend" check in GitLab CI
"make depend" prints errors to stderr, not to stdout. This means that
the check for "make depend" errors currently used in the definition of
every build job in GitLab CI could never fail. Fix that check by
redirecting stderr to stdout. Also employ tee to prevent the output of
"make depend" from being hidden in the job log. (While using tee hides
the exit code of "make depend" itself, the next line still checks for
errors anyway.)
Mark Andrews [Wed, 28 Oct 2020 05:40:36 +0000 (16:40 +1100)]
Check that a zone in the process of being signed resolves
ans10 simulates a local anycast server which has both signed and
unsigned instances of a zone. 'A' queries get answered from the
signed instance. Everything else gets answered from the unsigned
instance. The resulting answer should be insecure.
Mark Andrews [Wed, 28 Oct 2020 00:58:38 +0000 (11:58 +1100)]
Handle DNS_R_NCACHENXRRSET in fetch_callback_{dnskey,validator}()
DNS_R_NCACHENXRRSET can be return when zones are in transition state
from being unsigned to signed and signed to unsigned. The validation
should be resumed and should result in a insecure answer.
Evan Hunt [Thu, 29 Oct 2020 01:01:49 +0000 (18:01 -0700)]
fix a typo in rpz test
"tcp-only" was not being tested correctly in the RPZ system test
because the option to the "digcmd" function that causes queries to
be sent via TCP was misspelled in one case, and was being interpreted
as a query name.
the "ckresult" function has also been changed to be case sensitive
for consistency with "digcmd".
The reason for running the last two jobs above sequentially rather than
in parallel is that both of them create *.gcda files (containing
coverage data) in the same locations. While some way of merging these
files from different job artifact archives could probably be designed
with the help of additional tools, the simplest thing to do is not to
run unit test and system test jobs in parallel, carrying *.gcda files
over between jobs as gcov knows how to append coverage data to existing
*.gcda files.
Also note that test coverage will not be visualized if any of the jobs
in the above dependency chain fails (because the gcov job will not be
run).
Michal Nowak [Tue, 16 Jun 2020 12:19:41 +0000 (14:19 +0200)]
Add "stress" tests to GitLab CI
Run "stress" tests for scheduled pipelines and pipelines created for
tags. These tests were previously only performed manually (as part of
pre-release testing of each new BIND version). Their purpose is to
detect memory leaks and potential performance issues.
As the run time of each "stress" test itself is set to 1 hour, set the
GitLab CI job timeout to 2 hours in order to account for the extra time
needed to set the test up and gather its results.
Diego Fronza [Thu, 10 Sep 2020 18:33:15 +0000 (15:33 -0300)]
Added test for the proposed fix
This test is very simple, two nameserver instances are created:
- ns4: master, with 'minimal-responses yes', authoritative
for example. zone
- ns5: slave, stub zone
The first thing verified is the transfer of zone data from master
to slave, which should be saved in ns5/example.db.
After that, a query is issued to ns5 asking for target.example.
TXT, a record present in the master database with the "test" string
as content.
If that query works, it means stub zone successfully request
nameserver addresses from master, ns4.example. A/AAAA
The presence of both A/AAAA records for ns4 is also verified in the
stub zone local file, ns5/example.db.
Diego Fronza [Wed, 21 Oct 2020 19:37:59 +0000 (16:37 -0300)]
Fix transfer of glue records in stub zones if master has minimal-responses set
Stub zones don't make use of AXFR/IXFR for the transfering of zone
data, instead, a single query is issued to the master asking for
their nameserver records (NS).
That works fine unless master is configured with 'minimal-responses'
set to yes, in which case glue records are not provided by master
in the answer with nameservers authoritative for the zone, leaving
stub zones with incomplete databases.
This commit fix this problem in a simple way, when the answer with
the authoritative nameservers is received from master (stub_callback),
for each nameserver listed (save_nsrrset), a A and AAAA records for
the name is verified in the additional section, and if not present
a query is created to resolve the corresponsing missing glue.
A struct 'stub_cb_args' was added to keep relevant information for
performing a query, like TSIG key, udp size, dscp value, etc, this
information is borrowed from, and created within function 'ns_query',
where the resolving of nameserver from master starts.
A new field was added to the struct 'dns_stub', an atomic integer,
namely pending_requests, which is used to keep how many queries are
created when resolving nameserver addresses that were missing in
the glue.
When the value of pending_requests is zero we know we can release
resources, adjust zone timers, dump to zone file, etc.
Michał Kępień [Thu, 22 Oct 2020 13:03:31 +0000 (15:03 +0200)]
Test a --disable-atomic build in GitLab CI
Extend GitLab CI with build and test jobs utilizing the --disable-atomic
configure switch as it is used to work around broken atomics support in
certain build toolchains.
Diego Fronza [Thu, 1 Oct 2020 17:04:05 +0000 (14:04 -0300)]
Fix dnstap system test on FreeBSD
This commit ensures that dnstap output files captured
by fstrm_capture are properly flushed before any attempt
on reading them with dnstap-read is done.
By reading fstrm-capture source code it was noticed that
signal SIGHUP is used to flush the capture file.
Mark Andrews [Wed, 16 Sep 2020 02:40:52 +0000 (12:40 +1000)]
Try to improve rrl timing
Add a +burst option to mdig so that we have a second to setup the
mdig calls then they run at the start of the next second.
RRL uses 'queries in a second' as a approximation to
'queries per second'. Getting the bursts of traffic to all happen in
the same second should prevent false negatives in the system test.
We now have a second to setup the traffic in. Then the traffic should
be sent at the start of the next second. If that still fails we
should move to +burst=<now+2> (further extend mdig) instead of the
implicit <now+1> as the trigger second.
Change the default ENDS buffer size to 1232 for DNS Flag Day 2020
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication. In this commit, we implement the minimal
required changes by changing the defaults for `edns-udp-size`,
`max-udp-size` and `nocookie-udp-size` to `1232` (the value picked by
DNS Flag Day 2020).
Michał Kępień [Fri, 2 Oct 2020 06:41:43 +0000 (08:41 +0200)]
Rework "rrset-order" documentation
Certain parts of the existing documentation for the "rrset-order"
statement are incorrect, others are ambiguous. Rework the relevant
section of the ARM to make it clear and up-to-date with the source code.
Diego Fronza [Mon, 21 Sep 2020 20:44:29 +0000 (17:44 -0300)]
Properly handling dns_message_t shared references
This commit fix the problems that arose when moving the dns_message_t
object from fetchctx_t to the query structure.
Since the lifetime of query objects are different than that of a
fetchctx and the dns_message_t object held by the query may be being
used by some external module, e.g. validator, even after the query may
have been destroyed, propery handling of the references to the message
were added in this commit to avoid accessing an already destroyed
object.
Specifically, in resquery_response(), a reference to the message is
attached at the beginning of the function and detached at the end, since
a possible call to fctx_cancelquery() would release the dns_message_t
object, and in the next lines of code a call to add_bad() would require
a valid pointer to the same object.
In valcreate() a new reference is attached to the message object, this
ensures that if the corresponding query object is destroyed before the
validator attempts to access it, no invalid pointer access occurs.
In validated() we have to attach a new reference to the message, since
we destroy the validator object at the beginning of the function, and we
need access to the message in the next lines of the same function.
Diego Fronza [Mon, 21 Sep 2020 20:32:39 +0000 (17:32 -0300)]
Fix invalid dns message state in resolver's logic
The assertion failure REQUIRE(msg->state == DNS_SECTION_ANY),
caused by calling dns_message_setclass within function resquery_response()
in resolver.c, was happening due to wrong management of dns message_t
objects used to process responses to the queries issued by the resolver.
Before the fix, a resolver's fetch context (fetchctx_t) would hold
a pointer to the message, this same reference would then be used over all
the attempts to resolve the query, trying next server, etc... for this to work
the message object would have it's state reset between each iteration, marking
it as ready for a new processing.
The problem arose in a scenario with many different forwarders configured,
managing the state of the dns_message_t object was lacking better
synchronization, which have led it to a invalid dns_message_t state in
resquery_response().
Instead of adding unnecessarily complex code to synchronize the object,
the dns_message_t object was moved from fetchctx_t structure to the
query structure, where it better belongs to, since each query will produce
a response, this way whenever a new query is created an associated
dns_messate_t is also created.
This commit deals mainly with moving the dns_message_t object from fetchctx_t
to the query structure.
Michał Kępień [Mon, 28 Sep 2020 07:30:00 +0000 (09:30 +0200)]
Disable OpenSSL hashing when using native PKCS#11
When building with "--enable-native-pkcs11 --with-openssl", OpenSSL
support is automatically disabled in favor of native PKCS#11:
checking for OpenSSL library... use of native PKCS11 instead
However, adding "--enable-openssl-hash" to the above two switches causes
the build to fail:
checking for OpenSSL library... use of native PKCS11 instead
disabled because of native PKCS11
checking for using OpenSSL for hash functions... configure: error: No OpenSSL for hash functions
In other words, "--with-openssl" and "--enable-openssl-hash" are not
behaving consistently when used together with "--enable-native-pkcs11".
Fix by automatically disabling OpenSSL hashing support when native
PKCS#11 support is enabled.
Michał Kępień [Mon, 28 Sep 2020 07:21:59 +0000 (09:21 +0200)]
Make native PKCS#11 require dlopen() support
PKCS#11 support in BIND requires dlopen() support from the operating
system and thus building with "--enable-native-pkcs11 --without-dlopen"
should not be possible. Add an Autoconf check which enforces that
constraint. Adjust the pairwise testing model accordingly.
Mark Andrews [Tue, 22 Sep 2020 05:22:34 +0000 (15:22 +1000)]
Break lock order loop by sending TAT in an event
The dotat() function has been changed to send the TAT
query asynchronously, so there's no lock order loop
because we initialize the data first and then we schedule
the TAT send to happen asynchronously.
This breaks following lock-order loops:
zone->lock (dns_zone_setviewcommit) while holding view->lock
(dns_view_setviewcommit)
keytable->lock (dns_keytable_find) while holding zone->lock
(zone_asyncload)
view->lock (dns_view_findzonecut) while holding keytable->lock
(dns_keytable_forall)
Michal Nowak [Wed, 1 Jul 2020 08:29:36 +0000 (10:29 +0200)]
Add pairwise testing
Pairwise testing is a test case generation technique based on the
observation that most faults are caused by interactions of at most two
factors. For BIND, its configure options can be thought of as such
factors.
Process BIND configure options into a model that is subsequently
processed by the PICT tool in order to find an effective test vector.
That test vector is then used for configuring and building BIND using
various combinations of configure options.