Evan Hunt [Thu, 31 Jan 2019 02:16:07 +0000 (18:16 -0800)]
move the test lists into conf.sh.common
there is now a common list of tests in conf.sh.common, with the
tests that are either unique to windows or to unix, or which are
enabled or disabled by configure or Configure, being listed in
separate variables in conf.sh.in and conf.sh.win32.
Evan Hunt [Sat, 26 Jan 2019 18:36:47 +0000 (10:36 -0800)]
enable parallel system tests on windows
this moves the creation of "parallel.mk" into a separate shell script
instead of bin/tests/system/Makefile. that shell script can now be
executed by runall.sh, allowing us to make use of the cygwin "make"
command, which supports parallel execution.
Michał Kępień [Fri, 26 Apr 2019 18:38:02 +0000 (20:38 +0200)]
Simplify trailing period handling in system tests
Windows systems do not allow a trailing period in file names while Unix
systems do. When BIND system tests are run, the $TP environment
variable is set to an empty string on Windows systems and to "." on Unix
systems. This environment variable is then used by system test scripts
for handling this discrepancy properly.
In multiple system test scripts, a variable holding a zone name is set
to a string with a trailing period while the names of the zone's
corresponding dlvset-* and/or dsset-* files are determined using
numerous sed invocations like the following one:
In order to improve code readability, use zone names without trailing
periods and replace sed invocations with variable substitutions.
To retain local consistency, also remove the trailing period from
certain other zone names used in system tests that are not subsequently
processed using sed.
Michał Kępień [Fri, 26 Apr 2019 09:05:56 +0000 (11:05 +0200)]
Make root zone data match root hints
In the "allow-query" system test, ns3 uses a root hints file which
contains a single entry for a.root-servers.nil (10.53.0.1). This name
is not present in the root zone served by ns1, which means querying it
for that name and any type will yield an NXDOMAIN response. When
combined with unfavorable thread scheduling, this can lead to ns3
caching an NXDOMAIN response for the only root server it is aware of and
thus to false positives for the "allow-query" system test caused by ns3
returning unexpected SERVFAIL responses. Fix by modifying the root zone
served by ns1 so that authoritative responses to a.root-servers.nil
queries match the root hints file used by ns3.
Evan Hunt [Fri, 22 Feb 2019 22:53:30 +0000 (14:53 -0800)]
restore allowance for tcp-clients < interfaces
in the "refactor tcpquota and pipeline refs" commit, the counting
of active interfaces was tightened in such a way that named could
fail to listen on an interface if there were more interfaces than
tcp-clients. when checking the quota to start accepting on an
interface, if the number of active clients was above zero, then
it was presumed that some other client was able to handle accepting
new connections. this, however, ignored the fact that the current client
could be included in that count, so if the quota was already exceeded
before all the interfaces were listening, some interfaces would never
listen.
we now check whether the current client has been marked active; if so,
then the number of active clients on the interface must be greater
than 1, not 0.
Evan Hunt [Wed, 6 Feb 2019 19:27:11 +0000 (11:27 -0800)]
refactor tcpquota and pipeline refs; allow special-case overrun in isc_quota
- if the TCP quota has been exceeded but there are no clients listening
for new connections on the interface, we can now force attachment to the
quota using isc_quota_force(), instead of carrying on with the quota not
attached.
- the TCP client quota is now referenced via a reference-counted
'ns_tcpconn' object, one of which is created whenever a client begins
listening for new connections, and attached to by members of that
client's pipeline group. when the last reference to the tcpconn
object is detached, it is freed and the TCP quota slot is released.
- reduce code duplication by adding mark_tcp_active() function
- convert counters to stdatomic
Evan Hunt [Wed, 6 Feb 2019 19:26:36 +0000 (11:26 -0800)]
better tcpquota accounting and client mortality checks
- ensure that tcpactive is cleaned up correctly when accept() fails.
- set 'client->tcpattached' when the client is attached to the tcpquota.
carry this value on to new clients sharing the same pipeline group.
don't call isc_quota_detach() on the tcpquota unless tcpattached is
set. this way clients that were allowed to accept TCP connections
despite being over quota (and therefore, were never attached to the
quota) will not inadvertently detach from it and mess up the
accounting.
- simplify the code for tcpquota disconnection by using a new function
tcpquota_disconnect().
- before deciding whether to reject a new connection due to quota
exhaustion, check to see whether there are at least two active
clients. previously, this was "at least one", but that could be
insufficient if there was one other client in READING state (waiting
for messages on an open connection) but none in READY (listening
for new connections).
- before deciding whether a TCP client object can to go inactive, we
must ensure there are enough other clients to maintain service
afterward -- both accepting new connections and reading/processing new
queries. A TCP client can't shut down unless at least one
client is accepting new connections and (in the case of pipelined
clients) at least one additional client is waiting to read.
Witold Kręcicki [Fri, 4 Jan 2019 11:50:51 +0000 (12:50 +0100)]
tcp-clients could still be exceeded (v2)
the TCP client quota could still be ineffective under some
circumstances. this change:
- improves quota accounting to ensure that TCP clients are
properly limited, while still guaranteeing that at least one client
is always available to serve TCP connections on each interface.
- uses more descriptive names and removes one (ntcptarget) that
was no longer needed
- adds comments
Witold Kręcicki [Thu, 3 Jan 2019 13:17:43 +0000 (14:17 +0100)]
fix enforcement of tcp-clients (v1)
tcp-clients settings could be exceeded in some cases by
creating more and more active TCP clients that are over
the set quota limit, which in the end could lead to a
DoS attack by e.g. exhaustion of file descriptors.
If TCP client we're closing went over the quota (so it's
not attached to a quota) mark it as mortal - so that it
will be destroyed and not set up to listen for new
connections - unless it's the last client for a specific
interface.
Matthijs Mekking [Tue, 26 Feb 2019 14:55:29 +0000 (15:55 +0100)]
Fix nxdomain-redirect assertion failure
- Always set is_zonep in query_getdb; previously it was only set if
result was ISC_R_SUCCESS or ISC_R_NOTFOUND.
- Don't reset is_zone for redirect.
- Style cleanup.
Key IDs may accidentally match dig output that is not the key ID (for
example the RRSIG inception or expiration time, the query ID, ...).
Search for key ID + signer name should prevent that, as that is what
only should occur in the RRSIG record, and signer name always follows
the key ID.
Remove sleep calls from test, rely on wait_for_log(). Make
wait_for_log() and dnssec_loadkeys_on() fail the test if the
appropriate log line is not found.
Slightly adjust the echo_i() lines to print only the key ID (not the
key name).
Michał Kępień [Tue, 23 Apr 2019 12:59:05 +0000 (14:59 +0200)]
Wait more than 1 second for NSEC3 chain changes
One second may not be enough for an NSEC3 chain change triggered by an
UPDATE message to complete. Wait up to 10 seconds when checking whether
a given NSEC3 chain change is complete in the "nsupdate" system test.
Michał Kępień [Tue, 23 Apr 2019 12:59:05 +0000 (14:59 +0200)]
Remove redundant sleeps
In the "nsupdate" system test, do not sleep before checking results of
changes which are expected to be processed synchronously, i.e. before
nsupdate returns.
force SERVFAIL response in the gotanswer failure case
- named could return FORMERR if parsing iterative responses
ended with a result code such as DNS_R_OPTERR. instead of
computing a response code based on the result, in this case
we now just force the response to be SERVFAIL.
Michał Kępień [Fri, 19 Apr 2019 09:21:43 +0000 (11:21 +0200)]
Update interface lists in ifconfig scripts
Make bin/tests/system/ifconfig.bat also configure addresses ending with
9 and 10, so that the script is in sync with its Unix counterpart.
Update comments listing the interfaces created by ifconfig.{bat,sh} so
that they do not include addresses whose last octet is zero (since an
address like 10.53.1.0/24 is not a valid host address and thus the
aforementioned scripts do not even attempt configuring them).
Michał Kępień [Fri, 19 Apr 2019 09:21:43 +0000 (11:21 +0200)]
Fix the "dnssec" system test on Windows
On Windows, the bin/tests/system/dnssec/signer/example.db.signed file
contains carriage return characters at the end of each line. Remove
them before passing the aforementioned file to the awk script extracting
key IDs so that the latter can work properly.
Michał Kępień [Fri, 19 Apr 2019 09:21:43 +0000 (11:21 +0200)]
Do not wait for lock file cleanup on Windows
As signals are currently not handled by named on Windows, instances
terminated using signals are not able to perform a clean shutdown, which
involves e.g. removing the lock file. Thus, waiting for a given
instance's lock file to be removed beforing assuming it is shut down
is pointless on Windows, so do not even attempt it.
Michał Kępień [Fri, 19 Apr 2019 07:37:51 +0000 (09:37 +0200)]
win32: fix service state reported during shutdown
When a Windows service receives a request to stop, it should not set its
state to SERVICE_STOPPED until it is completely shut down as doing that
allows the operating system to kill that service prematurely, which in
the case of named may e.g. prevent the PID file and/or the lock file
from being cleaned up.
Set service state to SERVICE_STOP_PENDING when named begins its shutdown
and only report the SERVICE_STOPPED state immediately before exiting.
Evan Hunt [Sat, 19 Jan 2019 22:47:58 +0000 (14:47 -0800)]
if recursion is allowed and minimal-responses is no, search other databases
this restores functionality that was removed in commit 03be5a6b4e,
allowing named to search in authoritative zone databases outside the
current zone for additional data, if and only if recursion is allowed
and minimal-responses is disabled.
Matthijs Mekking [Fri, 22 Mar 2019 14:42:10 +0000 (15:42 +0100)]
With update-check-ksk also consider offline keys
The option `update-check-ksk` will look if both KSK and ZSK are
available before signing records. It will make sure the keys are
active and available. However, for operational practices keys may
be offline. This commit relaxes the update-check-ksk check and will
mark a key that is offline to be available when adding signature
tasks.
Matthijs Mekking [Thu, 14 Mar 2019 08:32:20 +0000 (09:32 +0100)]
Add test for ZSK rollover while KSK offline
This commit adds a lengthy test where the ZSK is rolled but the
KSK is offline (except for when the DNSKEY RRset is changed). The
specific scenario has the `dnskey-kskonly` configuration option set
meaning the DNSKEY RRset should only be signed with the KSK.
A new zone `updatecheck-kskonly.secure` is added to test against,
that can be dynamically updated, and that can be controlled with rndc
to load the DNSSEC keys.
There are some pre-checks for this test to make sure everything is
fine before the ZSK roll, after the new ZSK is published, and after
the old ZSK is deleted. Note there are actually two ZSK rolls in
quick succession.
When the latest added ZSK becomes active and its predecessor becomes
inactive, the KSK is offline. However, the DNSKEY RRset did not
change and it has a good signature that is valid for long enough.
The expected behavior is that the DNSKEY RRset stays signed with
the KSK only (signature does not need to change). However, the
test will fail because after reconfiguring the keys for the zone,
it wants to add re-sign tasks for the new active keys (in sign_apex).
Because the KSK is offline, named determines that the only other
active key, the latest ZSK, will be used to resign the DNSKEY RRset,
in addition to keeping the RRSIG of the KSK.
The question is: Why do we need to resign the DNSKEY RRset
immediately when a new key becomes active? This is not required,
only once the next resign task is triggered the new active key
should replace signatures that are in need of refreshing.