Aram Sargsyan [Mon, 29 May 2023 17:34:13 +0000 (17:34 +0000)]
Light refactoring of the fetchlimit system test
Prepare the fetchlimit system test for adding a clients-per-query
check. Change some functions and commands to accept a destination
NS IP address instead of using the hardcoded 10.53.0.3.
Remove the code implementing nonstardard behaviors that were formerly
needed to allow GSS-TSIG to work with Windows 2000, which passed
End-of-Life in 2010.
Deprecate the "oldgsstsig" command and "-o" command line option
to nsupdate; these are now treated as synonyms for "gsstsig" and "-g"
respectively.
Michal Nowak [Fri, 26 May 2023 08:50:58 +0000 (10:50 +0200)]
Change images for TSAN jobs
Fedora 38 and Debian "bullseye" images were "forked" to images used only
for TSAN CI jobs. The new images contain TSAN-aware liburcu that does
not fit well with ASAN CI jobs for which original images were also used.
Also, drop liburcu-related TSAN suppressions because they are
unnecessary with TSAN-aware liburcu.
Michal Nowak [Wed, 26 Apr 2023 09:21:28 +0000 (11:21 +0200)]
Look for core files in $TOP_BUILDDIR
The get_core_dumps.sh script couldn't find and process core files of
out-of-tree configurations because it looked for them in the source
instead of the build directory.
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.
However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.
Add 'answer_found' checks to the serve-stale logic to fix this issue.
Add a test case where when priming the cache with a slow authoritative
resolver, the stale-answer-client-timeout option should not return
a delegation to the client (it should wait until an applicable answer
is found, if no entry is found in the cache).
Since we don't use networking directly but rather via libuv, these
configure options were no-op. Remove the configure checks for epoll
(Linux), kqueue (BSDs) and /dev/poll (Solaris).
Mark Andrews [Fri, 26 May 2023 01:09:33 +0000 (11:09 +1000)]
Add regression test for [GL # 4090]
These insertions are added to produce a radix tree that will trigger
the INSIST reported in [GL #4090]. Due to fixes added since BIND 9.9
an extra insert in needed to ensure node->parent is non NULL.
Mark Andrews [Fri, 26 May 2023 00:28:39 +0000 (10:28 +1000)]
Move isc_mem_put to after node is checked for equality
isc_mem_put NULL's the pointer to the memory being freed. The
equality test 'parent->r == node' was accidentally being turned
into a test against NULL.
This commit ensures that access to the TLS context cache within zone
manager is properly synchronised.
Previously there was a possibility for it to get unexpectedly
NULLified for a brief moment by a call to
dns_zonemgr_set_tlsctx_cache() from one thread, while being accessed
from another (e.g. from got_transfer_quota()). This behaviour could
lead to server abort()ing on configuration reload (under very rare
circumstances).
Tom Krizek [Wed, 24 May 2023 11:58:01 +0000 (13:58 +0200)]
Disable rrl check in slow environments
The check for 'would limit' log message is triggered by sending at least
three messages within one second. However, in extremely slow conditions
(currently when running with clang+TSAN in CI), the individual queries
might take too much time to send enough of them within one second.
Since this is a pretty rare condition, let's just silently skip this
test in environments where a single query takes more than 500 ms, since
there's no way to perform the check under such conditions.
Evan Hunt [Tue, 16 May 2023 22:35:00 +0000 (15:35 -0700)]
fix handling of TCP timeouts
when a TCP dispatch times out, we call tcp_recv() with a result
value of ISC_R_TIMEDOUT; this cancels the oldest dispatch
entry in the dispatch's active queue, plus any additional entries
that have waited longer than their configured timeouts. if, at
that point, there were more dispatch entries still on the active
queue, it resumes reading, but until now it failed to restart
the timer.
this has been corrected: we now calculate a new timeout
based on the oldest dispatch entry still remaining. this
requires us to initialize the start time of each dispatch entry
when it's first added to the queue.
in order to ensure that the handling of timed-out requests is
consistent, we now calculate the runtime of each dispatch
entry based on the same value for 'now'.
incidentally also fixed a compile error that turned up when
DNS_DISPATCH_TRACE was turned on.
Evan Hunt [Sun, 21 May 2023 19:59:38 +0000 (12:59 -0700)]
prevent TSIG keys from being added to multiple rings
it was possible to add a TSIG key to more than one TSIG
keyring at a time, and this was in fact happening with the
session key, which was generated once and then added to the
keyrings for each view as it was configured.
this has been corrected and a REQUIRE added to dns_tsigkeyring_add()
to prevent it from happening again.
Aram Sargsyan [Wed, 24 May 2023 14:26:04 +0000 (14:26 +0000)]
Fix an interfacemgr use-after-free error in zoneconf.c:isself()
The 'named_g_server->interfacemgr' pointer is saved in the zone
structure using dns_zone_setisself(), as a void* argument to be
passed to the isself() callback, so there is no attach/detach,
and when shutting down, the interface manager can be destroyed
by the shutdown_server(), running in exclusive mode, and causing
isself() to crash when trying to use the pointer.
Instead of keeping the interface manager pointer in the zone
structure, just check and use the 'named_g_server->interfacemgr'
itself, as it was implemented originally in the 3aca8e5bf3740bbcc3bb13dde242d7cc369abb27 commit. Later, in the 8eb88aafee951859264e36c315b1289cd8c2088b commit, the code was
changed to pass the interface manager pointer using the additional
void* argument, but the commit message doesn't mention if there
was any practical reason for that.
Additionally, don't pass the interfacemgr pointer to the
ns_interfacemgr_getaclenv() function before it is checked
against NULL.
The 'update-nsec3.example' requires to be DNSSEC maintained via
dynamic update. Commit 03b22983cd20cec51ad8b9f25f2e7d0e472dc79c adds
checks to make sure the raw zone is not signed. So the test case neesd
to be updated to allow for DNSSEC maintenance.
A zone in multisigner model 2 should also be possible to remove
previously added DNSKEY, CDS and CDNSKEY records from the zone operated
by the other provider.
Add a test case where updates are being made against a hidden primary
and two bump in the wire signers (the providers in the multisigner
model) serve the zone.
The test covers the same cases as for two primary providers that is:
- Add DNSKEY
- Remove (previously added) DNSKEY
- Add CDNSKEY
- Remove (previously added) CDNSKEY
- Add CDS
- Remove (previously added) CDS
Mark Andrews [Wed, 12 Oct 2022 06:01:57 +0000 (17:01 +1100)]
Don't sign the raw zone
The raw zone is not supposed to be signed. DNSKEY records in a raw zone
should not trigger zone signing. The update code needs to be able to
identify when it is working on a raw zone. Add dns_zone_israw() and
dns_zone_issecure() enable it to do this. Also, we need to check the
case for 'auto-dnssec maintain'.
A zone in multisigner model 2 should also be possible to publish the
CDS and CDNSKEY records from their KSK into the zone operated by the
other provider.
For inline-signing zones, sometimes kasp was not detected because
the function was called on the raw (unsigned) version of the zone,
but the kasp is only set on the secure (signed) version of the zone.
Fix the dns_zone_getkasp() function to check whether the zone
structure is inline_raw(), and if so, use the kasp from the
secure version.
In zone.c we can access the kasp pointer directly.
When synchronizing the journal or database from the unsigned version of
the zone to the secure version of the zone, allow DNSKEY records to be
synced, because these may be added by the user with the sole intent to
publish the record (not used for signing). This may be the case for
example in the multisigner model 2 (RFC 8901).
Additional code needs to be added to ensure that we do not remove DNSKEY
records that are under our control. Keys under our control are keys that
are used for signing the zone and thus that we have key files for.
Same counts for CDNSKEY and CDS (records that are derived from keys).
Add a new system test to test multisigner model use cases. This
initial test just tests a small part of the model 2, and uses two
providers for the same zone, ns3 and ns4, each with their own unique
key set. This commit tests that each provider can import their ZSK
of the other provider into their DNSKEY RRset, using dynamic update.
Both providers use dnssec-policy, ns3 applies the DNSSEC records
directly, while ns4 uses inline-signing.
Mark Andrews [Wed, 10 May 2023 04:50:59 +0000 (14:50 +1000)]
Silence Coverity USE_AFTER_FREE warning
Use current used pointer - 16 instead of a saved pointer as Coverity
thinks the memory may be freed between assignment and use of 'cp'.
isc_buffer_put{mem,uint{8,16,32}} can theoretically free the memory
if there is a dynamic buffer in use but that is not the case here.
Tom Krizek [Mon, 22 May 2023 12:20:29 +0000 (14:20 +0200)]
Reorder dead primary checks in upforwd test
The check which attempts to forward dynamic update to a dead primary may
trigger a timing issue #4080. For some reason, this has manifested under
the pytest runner, while the test still passes with the legacy runner.
Move the dead primary check closer to the end of the test to avoid
hitting this issue before we have a proper fix.
Tom Krizek [Tue, 9 May 2023 12:20:02 +0000 (14:20 +0200)]
Tear down module logger handler in system tests
The module-level logger has a handler that writes into a temporary
directory. Ensure the logging output is flushed and the handler is
closed before attempting to remove this temporary directory.
Tom Krizek [Tue, 9 May 2023 11:32:13 +0000 (13:32 +0200)]
Rewrite run.sh to invoke pytest in a system test directory
Previously, run.sh tried to use pytest's -k option for test selection.
The downside was that this filter expression matched any test case with
the given substring, rather than executing a system test suite with the
given name.
The run.sh has been rewritten to invoke pytest from a system test
directory instead. This behaves more consistently with the run.sh from
legacy system test framework.
run.sh is now also a shell script to avoid confusion regarding its
file extension.
Tom Krizek [Fri, 14 Apr 2023 12:40:51 +0000 (14:40 +0200)]
Add .log.txt to gitignore
It can be useful to append the .txt extension to logs. When this
extension is used, GitLab is able to set the proper content type on such
artifacts in CI. This makes it possible to display those files directly
in the browser rather than having to download them.
Tom Krizek [Wed, 12 Apr 2023 15:29:26 +0000 (17:29 +0200)]
Remove "which" declaration from env vars in EL8+ tests
EL8+ systems declare "which" function using environment variables in the
/etc/profile.d/which2.sh file. Because of our suboptimal environment
variable detection, which is required in order to support the legacy
runner, these variables are picked up by the pytest runner.
If subprocesses are spawned with these environment variables set, it
will cause the following issue when they spawn yet another subprocess:
/bin/sh: which: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `which'
Tom Krizek [Wed, 12 Apr 2023 12:22:27 +0000 (14:22 +0200)]
Capture log output during pytest runner setup
Instantiate a new logger that is used during pytest initialization /
configuration. This logging isn't handled by pytest itself, since it
happens outside of any tests or fixtures.
Root logger can't be reused for this purpose, because that would
duplicate the logs. Instead, create a conftest-specific logger for this
purpose.
Unfortunately, this introduces another log file,
pytest.conftest.log.txt, which contains only the logging from pytest
initialization. However, unless one is debugging the runner /
environment, there should be no need to investigate this file.
Tom Krizek [Wed, 5 Apr 2023 10:39:56 +0000 (12:39 +0200)]
Execute long running system tests first
In order to take the most advantage of parallel execution of tests,
ensure certain long running tests are scheduled first.
The list of tests considered long-running was created empirically. In
addition to the test run time, its position in the default
(alphabetical) ordering was also taken into account.
Tom Krizek [Tue, 4 Apr 2023 15:44:10 +0000 (17:44 +0200)]
Add test specific logger for pytests
The logger fixture is provided as a test-level logging facility which
can be easily passed to tests to enable capturing and/or displaying
messages from tests written in Python.
While this works optimally with the pytest runner, messages on INFO
level or above will also be visible when using the legacy runner.
Tom Krizek [Tue, 4 Apr 2023 08:34:37 +0000 (10:34 +0200)]
Mark selected statschannel tests as xfail
The test_zone_timers_secondary_json() and
test_zone_timers_secondary_xml() tests are affected by issue #3983. Due
to the way tests are run, they are only affected when executing them
with the pytest runner.
Strict mode is set for pytest runner, as it always fails there. The
strict mode ensures we'll catch the change when the it starts passing
once the underlying issue is fixed. It can't be set for the legacy
runner, since the test (incorrectly) passes there.
Tom Krizek [Mon, 3 Apr 2023 16:05:29 +0000 (18:05 +0200)]
Ensure assertions and exceptions end up in system test log
If a test fails with an assertion failure or exception, its content
along with traceback is displayed in pytest output. This information
should be preserved in the test-specific logger for a given system test
to make it easier to debug test failures.
Petr Špaček [Tue, 28 Mar 2023 09:52:38 +0000 (11:52 +0200)]
Use raw byte format of env variables in pytest
In order to avoid issues with decoding/encoding env variables due to
different encodings on different systems, deal with the environment
variables directly as bytes.
Petr Špaček [Thu, 16 Mar 2023 17:40:59 +0000 (18:40 +0100)]
Enable live logging for non-parallel pytest runs
This provides incremental output when test is running _without xdist_,
just like the old runner did.
With xdist the live output is not available, I believe because of
https://github.com/pytest-dev/pytest-xdist/issues/402
https://github.com/pytest-dev/pytest-xdist/pull/883 might help with
that, but I'm not going to hold my breath until it is available on
distros we use.
Tom Krizek [Tue, 28 Mar 2023 14:52:49 +0000 (16:52 +0200)]
Ensure --dist=loadscope is used when running pytest in parallel
The loadscope setting is required for parallel execution of our system
tests using pytest. The option ensure that all tests within a single
(module) scope will be assigned to the same worker.
This is neccessary because the worker sets up the nameservers for all
the tests within a module scope. If tests from the same module would be
assigned to different workers, then the setup could happen multiple
times, causing a race condition. This happens because each module uses
deterministic port numbers for the nameservers.
Tom Krizek [Fri, 13 Jan 2023 15:32:39 +0000 (16:32 +0100)]
Invoke pytest runner from run.sh
Utilize developers' muscle memory to incentivize using the pytest runner
instead of the legacy one. The script also serves as basic examples of
how to run the pyest command to achieve the same results as the legacy
runner.
Invoking pytest directly should be the end goal, since it offers many
potentially useful options (refer to pytest --help).
Tom Krizek [Tue, 22 Nov 2022 09:47:15 +0000 (10:47 +0100)]
Use pytest system test runner in CI
Replace the legacy system test runner by the pytest system test runner.
Since EL7 and OpenBSD have only ancient versions of pytest / xdist, keep
using the legacy test runner there for now.
Out of tree tests aren't supported by the pytest runner yet. Use the
legacy test runner for that purpose as well.
Use awk to display failures and errors at the end of the log for
convenience, since pytest displays them first, which makes them
difficult to find.
Tom Krizek [Tue, 29 Nov 2022 14:07:02 +0000 (15:07 +0100)]
Add pytest functions for shell system tests
In order to run the shell system tests, the pytest runner has to pick
them up somehow. Adding an extra python file with a single function
for the shell tests for each system test proved to be the most
compatible way of running the shell tests across older pytest/xdist
versions.
Modify the legacy run.sh script to ignore these pytest-runner specific
glue files when executing tests written in pytest.
Tom Krizek [Thu, 12 Jan 2023 17:01:02 +0000 (18:01 +0100)]
Ensure compatiblity with older pytest
Special care needs to be taken to support older pytest / xdist versions.
The target versions are what is available in EL8, since that seems to
have the oldest versions that can be reasonably supported.
Tom Krizek [Thu, 12 Jan 2023 16:59:51 +0000 (17:59 +0100)]
Keep the tempdir in case test setup/teardown fails
When an issue occurs inside a fixture (e.g. servers fail to start/stop),
the test result won't be detected as failed, but rather an error will be
thrown.
To ensure the tempdir is kept even if the test itself passes but the
system_test() fixture throws an error, a different mechanism is needed.
At the start of the critical test setup section, note that the fixture
hasn't finished yet. When this is detected in the system_test_dir()
fixture, it is recognized as error in test setup/teardown and the temp
directory is kept.
This may seem cumbersome, because it is. It's basically a workaround for
the way pytest handles fixtures and test errors in general.
Tom Krizek [Thu, 12 Jan 2023 16:41:35 +0000 (17:41 +0100)]
Ensure tempdir is kept for failed system tests
The temporary directory contains artifacts for the pytest module. That
module may contain multiple individual tests which were executed
sequentially. The artifacts should be kept if even one of these tests
failed.
Since pytest doesn't have any facility to expose test results to
fixtures, customize the pytest_runtest_makereport() hook to enable that.
It stores the test results into a session scope variable which is
available in all fixtures.
When deciding whether to remove the temporary directory, find the
relevant test results for this module and don't remove the tmpdir if any
one the tests failed.
Tom Krizek [Thu, 12 Jan 2023 16:30:22 +0000 (17:30 +0100)]
Run system tests inside temporary directories
For every pytest module, create a copy of its system test directory and
run the test from that directory. This makes it easier to clean up
afterwards and makes it less error-prone when re-running (failed) tests.
Configure the logger to capture the module's log output into a separate
file available from the temporary directory. This is quite convenient
for exploring failures.
Cases where temporary directory should be kept are handled in a
follow-up commits.
Tom Krizek [Thu, 12 Jan 2023 16:01:36 +0000 (17:01 +0100)]
Utility fixtures for pytest runner
Add fixtures for deriving system test name from the directory name and
for a module-specific logger.
Add fixtures to execute shell and perl scripts. Similar to how run.sh
calls commands, this functionality is also needed in pytest. The
fixtures take care of switching to a proper directory, logging
everything and handling errors.
Note that the fixture system_test_dir is not defined yet. It is omitted
now for review readability and it's added in follow-up commits.