In dns_rpz_update_from_db we call setup_update which creates the db
iterator and calls dns_dbiterator_first. This unpauses the iterator and
might cause db->tree_lock to be acquired. We then do isc_task_send(...)
on an event to do quantum_update, which (correctly) after each iteration
calls dns_dbiterator_pause, and re-isc_task_sends itself.
That's an obvious bug, as we're holding a lock over an async task send -
if a task requesting write (e.g. prune_tree) is scheduled on the same
workers queue as update_quantum but before it, it will wait for the
write lock indefinitely, resulting in a deadlock.
To fix it we have to pause dbiterator in setup_update.
Michał Kępień [Wed, 3 Apr 2019 10:57:33 +0000 (12:57 +0200)]
Do not rely on default dig options in system tests
Some system tests assume dig's default setings are in effect. While
these defaults may only be silently overridden (because of specific
options set in /etc/resolv.conf) for BIND releases using liblwres for
parsing /etc/resolv.conf (i.e. BIND 9.11 and older), it is arguably
prudent to make sure that tests relying on specific +timeout and +tries
settings specify these explicitly in their dig invocations, in order to
prevent test failures from being triggered by any potential changes to
current defaults.
Witold Kręcicki [Tue, 5 Mar 2019 14:14:08 +0000 (15:14 +0100)]
Fix assertion failure in nslookup/dig/mdig when message has multiple SIG(0) options.
When parsing message with DNS_MESSAGE_BESTEFFORT (used exclusively in
tools, never in named itself) if we hit an invalid SIG(0) in wrong
place we continue parsing the message, and put the sig0 in msg->sig0.
If we then hit another sig0 in a proper place we see that msg->sig0
is already 'taken' and we don't free name and rdataset, and we don't
set seen_problem. This causes an assertion failure.
This fixes that issue by setting seen_problem if we hit second sig0,
tsig or opt, which causes name and rdataset to be always freed.
Michał Kępień [Wed, 20 Mar 2019 21:21:30 +0000 (22:21 +0100)]
Fix key ID extraction in the "dnssec" system test
Simply looking for the key ID surrounded by spaces in the tested
dnssec-signzone output file is not a precise enough method of checking
for signatures prepared using a given key ID: it can be tripped up by
cross-algorithm key ID collisions and certain low key IDs (e.g. 60, the
TTL specified in bin/tests/system/dnssec/signer/example.db.in), which
triggers false positives for the "dnssec" system test. Make key ID
extraction precise by using an awk script which operates on specific
fields.
Michał Kępień [Wed, 20 Mar 2019 08:50:35 +0000 (09:50 +0100)]
Increase dig query timeout to 2 seconds
The "mirror" system test expects all dig queries (including recursive
ones) to be responded to within 1 second, which turns out to be overly
optimistic in certain cases and leads to false positives being
triggered. Increase dig query timeout used throughout the "mirror"
system test to 2 seconds in order to alleviate the issue.
Michał Kępień [Wed, 20 Mar 2019 08:50:35 +0000 (09:50 +0100)]
Increase TAT query interval
Currently, ns3 in the "mirror" system test sends trust anchor telemetry
queries every second as it is started with "-T tat=1". Given the number
of trust anchors configured on ns3 (9), TAT-related traffic clutters up
log files, hindering troubleshooting efforts. Increase TAT query
interval to 3 seconds in order to alleviate the issue.
Note that the interval chosen cannot be much higher if intermittent test
failures are to be avoided: TAT queries are only sent after the
configured number of seconds passes since resolver startup. Quick
experiments show that even on contemporary hardware, ns3 should be
running for at least 5 seconds before it is first shut down, so a
3-second TAT query interval seems to be a reasonable, future-proof
compromise. Ensure the relevant check is performed before ns3 is first
shut down to emphasize this trade-off and make it more clear by what
time TAT queries are expected to be sent.
Michał Kępień [Wed, 20 Mar 2019 07:46:58 +0000 (08:46 +0100)]
Wait until "rndc dumpdb" completes
"rndc dumpdb" works asynchronously, i.e. the requested dump may not yet
be fully written to disk by the time "rndc" returns. Prevent false
positives for the "serve-stale" system test by only checking dump
contents after the line indicating that it is complete is written.
Matthijs Mekking [Tue, 15 Jan 2019 13:12:14 +0000 (14:12 +0100)]
DLV tests unsupported/disabled algorithms
This tests both the cases when the DLV trust anchor is of an
unsupported or disabled algorithm, as well as if the DLV zone
contains a key with an unsupported or disabled algorithm.
Michał Kępień [Thu, 24 Jan 2019 11:50:01 +0000 (12:50 +0100)]
Move code handling key loading errors into a common function
Some values returned by dstkey_fromconfig() indicate that key loading
should be interrupted, others do not. There are also certain subsequent
checks to be made after parsing a key from configuration and the results
of these checks also affect the key loading process. All of this
complicates the key loading logic.
In order to make the relevant parts of the code easier to follow, reduce
the body of the inner for loop in load_view_keys() to a single call to a
new function, process_key(). Move dstkey_fromconfig() error handling to
process_key() as well and add comments to clearly describe the effects
of various key loading errors.
Matthijs Mekking [Tue, 15 Jan 2019 10:32:53 +0000 (11:32 +0100)]
Ignore trust anchors using disabled algorithm
More specifically: ignore configured trusted and managed keys that
match a disabled algorithm. The behavioral change is that
associated responses no longer SERVFAIL, but return insecure.
Michał Kępień [Tue, 19 Mar 2019 09:26:36 +0000 (10:26 +0100)]
Make stop.pl wait for lock file cleanup
bin/tests/system/stop.pl only waits for the PID file to be cleaned up
while named cleans up the lock file after the PID file. Thus, the
aforementioned script may consider a named instance to be fully shut
down when in fact it is not.
Fix by also checking whether the lock file exists when determining a
given instance's shutdown status. This change assumes that if a named
instance uses a lock file, it is called "named.lock".
Also rename clean_pid_file() to pid_file_exists(), so that it is called
more appropriately (it does not clean up the PID file itself, it only
returns the server's identifier if its PID file is not yet cleaned up).
Michał Kępień [Tue, 19 Mar 2019 09:26:36 +0000 (10:26 +0100)]
Correctly invoke stop.pl when start.pl fails
MR !1141 broke the way stop.pl is invoked when start.pl fails:
- start.pl changes the working directory to $testdir/$server before
attempting to start $server,
- commit 27ee629e6b583f60fea0ab78fb3ebd0d1d71d9d2 causes the $testdir
variable in stop.pl to be determined using the $SYSTEMTESTTOP
environment variable, which is set to ".." by all tests.sh scripts,
- commit e227815af51c0656e22e5aebfe99e2399106b31c makes start.pl pass
$test (the test's name) rather than $testdir (the path to the test's
directory) to stop.pl when a given server fails to start.
Thus, when a server is restarted from within a tests.sh script and such
a restart fails, stop.pl attempts to look for the server directory in a
nonexistent location ($testdir/$server/../$test, i.e. $testdir/$test,
instead of $testdir/../$test). Fix the issue by changing the working
directory before stop.pl is invoked in the scenario described above.
Petr Menšík [Thu, 14 Mar 2019 12:40:14 +0000 (13:40 +0100)]
Fix regression in dnstap_test with native pkcs11
Change to cmocka broken initialization of TZ environment. This time,
commit 1cf12540515e4a3fc93ace02b81815209f1e709e is not soon enough. Has
to be moved more forward, before any other tests. It library is not full
reinitialized on each test.