This commit adds support for generating backtraces on Windows and
refactors the isc_backtrace API to match the Linux/BSD API (without
the isc_ prefix)
* isc_backtrace_gettrace() was renamed to isc_backtrace(), the third
argument was removed and the return type was changed to int
* isc_backtrace_symbols() was added
* isc_backtrace_symbols_fd() was added and used as appropriate
Add trampoline around iocompletionport_createthreads()
On Windows, the iocompletionport_createthreads() didn't use
isc_thread_create() to create new threads for processing IO, but just a
simple CreateThread() function that completely circumvent the
isc_trampoline mechanism to initialize global isc_tid_v. This lead to
segmentation fault in isc_hp API because '-1' isn't valid index to the
hazard pointer array.
This commit changes the iocompletionport_createthreads() to use
isc_thread_create() instead of CreateThread() to properly initialize
isc_tid_v.
The nsupdate system test did not record failures from the
'update_test.pl' Perl script. This was because the 'ret' value was
not being saved outside the '{ $PERL ... || ret=1 } cat_i' scope.
Change this piece to store the output in a separate file and then
cat its contents. Now the 'ret' value is being saved.
Also record failures in 'update_test.pl' if sending the update
failed.
Add missing 'n' incrementals to 'nsupdate/test.sh' to keep track of
test numbers.
Add a test case when a dnssec-policy is reconfigured to "none",
without setting it to "insecure" first. This is unsupported behavior,
but we want to make sure the behavior is somewhat expected. The
zone should remain signed (but will go bogus once the signatures
expire).
Update the ARM to mention the new built-in "insecure" policy. Update
the DNSSEC guide recipe "Revert to unsigned" to add the additional
step of reconfiguring the zone to "insecure" (instead of immediately
set it to "none").
Add a new built-in policy "insecure", to be used to gracefully unsign
a zone. Previously you could just remove the 'dnssec-policy'
configuration from your zone statement, or remove it.
The built-in policy "none" (or not configured) now actually means
no DNSSEC maintenance for the corresponding zone. So if you
immediately reconfigure your zone from whatever policy to "none",
your zone will temporarily be seen as bogus by validating resolvers.
This means we can remove the functions 'dns_zone_use_kasp()' and
'dns_zone_secure_to_insecure()' again. We also no longer have to
check for the existence of key state files to figure out if a zone
is transitioning to insecure.
Mark Andrews [Wed, 28 Apr 2021 02:05:02 +0000 (12:05 +1000)]
Update ZONEMD to match RFC 8976
* The location of the digest type field has changed to where the
reserved field was.
* The reserved field is now called scheme and is where the digest
type field was.
* Digest type 2 has been defined (SHA256).
Michal Nowak [Wed, 10 Feb 2021 13:21:08 +0000 (14:21 +0100)]
Suppress TSAN errors from libfstrm.so
dnstap_test produces TSAN errors which originate in libfstrm.so. Unless
libfstrm is TSAN clean or a workaround is placed in libfstrm sources,
suppressing TSAN coming from libfstrm is necessary to test DNSTAP under
TSAN.
Michal Nowak [Tue, 26 Jan 2021 16:57:34 +0000 (17:57 +0100)]
Configure with --enable-dnstap by default
All platforms but OpenBSD have dnstap dependencies readily in their
respective repositories, and dnstap thus can be tested there. Given that
majority of images have dnstap dependencies available, it seems fitting
to make dnstap enabled by default.
Michal Nowak [Thu, 29 Apr 2021 09:19:13 +0000 (11:19 +0200)]
Disable pytest cacheprovider plugin in CI
The pytest "cacheprovider" plugin produces a .cache/v/cache/lastfailed
file, which holds a Python dictionary structure with failed tests.
However, on Ubuntu 16.04 (Xenial) the file is created even though the
test passed and the file contains just an empty dictionary ("{}").
Given that we are not interested in this feature, disabling the
"cacheprovider" plugin globally and removing per-test removals of the
.cache directory seems like the best course of action.
Michał Kępień [Thu, 29 Apr 2021 11:24:21 +0000 (13:24 +0200)]
Add a Sphinx role for linking GitLab issues/MRs
Define a :gl: Sphinx role that takes a GitLab issue/MR number as an
argument and creates a hyperlink to the relevant ISC GitLab URL. This
makes it easy to reach ISC GitLab pages directly from the release notes.
Make all GitLab references in the release notes use the new Sphinx role.
Use SIGABRT instead of SIGKILL to produce cores on failed start
When the `named` would hang on startup it would be killed with SIGKILL
leaving us with no information about the state the process was in.
This commit changes the start.pl script to send SIGABRT instead, so we
can properly collect and process the coredump from the hung named
process.
When reducing the number of NSEC3 iterations to 150, commit aa26cde2aea459d682f6f609a7c902ef9a7a35eb added tests for dnssec-policy
to check that a too high iteration count is a configuration failure.
The test is not sufficient because 151 was always too high for
ECDSAP256SHA256. The test should check for a different algorithm.
There was an existing test case that checks for NSEC3 iterations.
Update the test with the new maximum values.
Update the code in 'kaspconf.c' to allow at most 150 iterations.
Mark Andrews [Mon, 1 Mar 2021 05:46:07 +0000 (16:46 +1100)]
Handle DNAME lookup via itself
When answering a query, named should never attempt to add the same RRset
to the ANSWER section more than once. However, such a situation may
arise when chasing DNAME records: one of the DNAME records placed in the
ANSWER section may turn out to be the final answer to a client query,
but there is no way to know that in advance. Tweak the relevant INSIST
assertion in query_respond() so that it handles this case properly.
qctx->rdataset is freed later anyway, so there is no need to clean it up
in query_respond().
Mark Andrews [Thu, 25 Feb 2021 03:11:05 +0000 (14:11 +1100)]
Unload a zone if a transfer breaks its SOA record
If a zone transfer results in a zone not having any NS records, named
stops serving it because such a zone is broken. Do the same if an
incoming zone transfer results in a zone lacking an SOA record at the
apex or containing more than one SOA record.
Mark Andrews [Wed, 3 Feb 2021 00:10:20 +0000 (11:10 +1100)]
Check SOA owner names in zone transfers
An IXFR containing SOA records with owner names different than the
transferred zone's origin can result in named serving a version of that
zone without an SOA record at the apex. This causes a RUNTIME_CHECK
assertion failure the next time such a zone is refreshed. Fix by
immediately rejecting a zone transfer (either an incremental or
non-incremental one) upon detecting an SOA record not placed at the apex
of the transferred zone.
While working on the serve-stale backports, I noticed the following
oddities:
1. In the serve-stale system test, in one case we keep track of the
time how long it took for dig to complete. In commit aaed7f9d8c2465790d769221dfe8378c7147f5eb, the code removed the
exception to check for result == ISC_R_SUCCESS on stale found
answers, and adjusted the test accordingly. This failed to update
the time tracking accordingly. Move the t1/t2 time track variables
back around the two dig commands to ensure the lookups resolved
faster than the resolver-query-timeout.
2. We can remove the setting of NS_QUERYATTR_STALEOK and
DNS_RDATASETATTR_STALE_ADDED on the "else if (stale_timeout)"
code path, because they are added later when we know we have
actually found a stale answer on a stale timeout lookup.
3. We should clear the NS_QUERYATTR_STALEOK flag from the client
query attributes instead of DNS_RDATASETATTR_STALE_ADDED (that
flag is set on the rdataset attributes).
4. In 'bin/named/config.c' we should set the configuration options
in alpabetical order.
5. In the ARM, in the backports we have added "(stale)" between
"cached" and "RRset" to make more clear a stale RRset may be
returned in this scenario.
Michał Kępień [Wed, 28 Apr 2021 05:56:47 +0000 (07:56 +0200)]
Warn when log files grow too big in system tests
Exerting excessive I/O load on the host running system tests should be
avoided in order to limit the number of false positives reported by the
system test suite. In some cases, running named with "-d 99" (which is
the default for system tests) results in a massive amount of logs being
generated, most of which are useless. Implement a log file size check
to draw developers' attention to overly verbose named instances used in
system tests. The warning threshold of 200,000 lines was chosen
arbitrarily.
Michał Kępień [Wed, 28 Apr 2021 05:56:47 +0000 (07:56 +0200)]
Prevent useless logging in the "tcp" system test
The regression test for CVE-2020-8620 causes a lot of useless messages
to be logged. However, globally decreasing the log level for the
affected named instance would be a step too far as debugging information
may be useful for troubleshooting other checks in the "tcp" system test.
Starting a separate named instance for a single check should be avoided
when possible and thus is also not a good solution. As a compromise,
run "rndc trace 1" for the affected named instance before starting the
regression test for CVE-2020-8620.
Michał Kępień [Wed, 28 Apr 2021 05:56:47 +0000 (07:56 +0200)]
Limit logging for verbose system tests
The system test framework starts all named instances with the "-d 99"
command line option (unless it is overridden by a named.args file in a
given instance's working directory). This causes a lot of log messages
to be written to named.run files - currently over 5 million lines for a
single test suite run. While debugging information preserved in the log
files is essential for troubleshooting intermittent test failures, some
system tests involve sending hundreds or even thousands of queries,
which causes the relevant log files to explode in size. When multiple
tests (or even multiple test suites) are run in parallel, excessive
logging contributes considerably to the I/O load on the test host,
increasing the odds of intermittent test failures getting triggered.
Decrease the debug level for the seven most verbose named instances:
- use "-d 3" for ns2 in the "cacheclean" system test (it is the lowest
logging level at which the test still passes without the need to
apply any changes to tests.sh),
- use "-d 1" for the other six named instances.
This roughly halves the number of lines logged by each test suite run
while still leaving enough information in the logs to allow at least
basic troubleshooting in case of test failures.
This approach was chosen as it results in a greater decrease in the
number of lines logged than running all named instances with "-d 3",
without causing any test failures.
Diego Fronza [Thu, 1 Apr 2021 20:03:59 +0000 (17:03 -0300)]
Add malloc attribute to memory allocation functions
The malloc attribute allows compiler to do some optmizations on
functions that behave like malloc/calloc, like assuming that the
returned pointer do not alias other pointers.
Diego Fronza [Thu, 1 Apr 2021 19:48:16 +0000 (16:48 -0300)]
Removed unnecessary check (mpctx->items == NULL)
There is no possibility for mpctx->items to be NULL at the point where
the code was removed, since we enforce that fillcount > 0, if
mpctx->items == NULL when isc_mempool_get is called, then we will
allocate fillcount more items and add to the mpctx->items list.
Diego Fronza [Mon, 29 Mar 2021 17:04:12 +0000 (14:04 -0300)]
Fix following up lookup failure if more resolvers are available
_query_detach function was incorrectly unliking the query object from
the lookup->q query list, this made it impossible to follow a query
lookup failure with the next one in the list (possibly using a separate
resolver), as the link to the next query in the list was dissolved.
Fix by unliking the node only when the query object is about to be
destroyed, i.e. there is no more references to the object.
When the keymgr needs to create new keys, it is possible it needs to
create multiple keys. The keymgr checks for keyid conflicts with
already existing keys, but it should also check against that it just
created.
Michał Kępień [Mon, 26 Apr 2021 05:16:38 +0000 (07:16 +0200)]
Test "--without-gssapi" in GitLab CI
GitLab CI pipelines do not currently include a Linux job that would have
GSSAPI support disabled. Add the "--without-gssapi" option to the
./configure invocation on Debian 9 to address that deficiency and also
to continuously test that build-time switch.
Michał Kępień [Mon, 26 Apr 2021 05:16:38 +0000 (07:16 +0200)]
Test "tkey-gssapi-credential" conditionally
If "tkey-gssapi-credential" is set in the configuration and GSSAPI
support is not available, named will refuse to start. As the test
system framework does not support starting named instances
conditionally, ensure that "tkey-gssapi-credential" is only present in
named.conf if GSSAPI support is available.
as with TLS, the destruction of a client stream on failed read
needs to be conditional: if we reached failed_read_cb() as a
result of a timeout on a timer which has subsequently been
reset, the stream must not be closed.