Štěpán Balážik [Wed, 18 Jun 2025 10:13:37 +0000 (12:13 +0200)]
Use isctest.asyncserver in the "zero" test
The original `ans.pl` server was based on a copy of the one in
`fetchlimit`, so there are some changes:
- The server now only responds with A replies (which is the only thing
needed).
- The incrementing of the IP address goes beyond the least significant
octet (so, after 192.0.2.255 it will yield 192.0.3.0).
Michał Kępień [Sat, 21 Jun 2025 04:58:59 +0000 (04:58 +0000)]
[9.20] chg: ci: move "stress" test generation script to QA repo
Move the util/generate-stress-test-configs.py script from the BIND 9
source repository to the BIND 9 QA repository. This simplifies the
maintenance of that script by eliminating the need to backport every
change applied to it to multiple branches.
Backport of MR !10585
Merge branch 'backport-michal/move-stress-test-generation-script-to-qa-repo-9.20' into 'bind-9.20'
Michał Kępień [Sat, 21 Jun 2025 04:43:36 +0000 (06:43 +0200)]
Move "stress" test generation script to QA repo
Move the util/generate-stress-test-configs.py script from the BIND 9
source repository to the BIND 9 QA repository. This simplifies the
maintenance of that script by eliminating the need to backport every
change applied to it to multiple branches.
Nicki Křížek [Thu, 19 Jun 2025 14:25:07 +0000 (14:25 +0000)]
[9.20] fix: test: Ignore softhsm2 errors when deleting token in keyfromlabel test
In some rare cases, the softhsm2 utility reports failure to delete the
token directory, despite the token being found. Subsequent attempts to
delete the token again indicate that the token was deleted.
Ignore this cleanup error, as it doesn't prevent our tests from working
properly. There is also an attempt to delete the token before the test
starts which ensures a clean state before the test is executed, in case
there's actually a leftover token.
Closes #5244
Backport of MR !10607
Merge branch 'backport-5244-ignore-softhsm2util-delete-token-error-9.20' into 'bind-9.20'
Nicki Křížek [Thu, 19 Jun 2025 13:09:39 +0000 (15:09 +0200)]
Ignore softhsm2 errors when deleting token in keyfromlabel test
In some rare cases, the softhsm2 utility reports failure to delete the
token directory, despite the token being found. Subsequent attempts to
delete the token again indicate that the token was deleted.
Ignore this cleanup error, as it doesn't prevent our tests from working
properly. There is also an attempt to delete the token before the test
starts which ensures a clean state before the test is executed, in case
there's actually a leftover token.
Nicki Křížek [Thu, 19 Jun 2025 13:41:17 +0000 (13:41 +0000)]
[9.20] chg: test: Improve logging from isctest.run.retry_with_timeout
Allow use of exception (and by extension, assert statements) in the
called function in order to extract essential debug information about
the type of failure that was encountered.
In case the called function fails to succeed on the last retry and
raised an exception, log it as error and set it as the assert message to
propagate it through the pytest framework.
Closes #5324
Backport of MR !10580
Merge branch 'backport-5324-pytest-isctest-run-logging-9.20' into 'bind-9.20'
Nicki Křížek [Thu, 19 Jun 2025 12:09:57 +0000 (14:09 +0200)]
Use time.monotonic() for time measumeremts in pytest
For duration measurements, i.e. deadlines and timeouts, it's more
suitable to use monotonic time as it's guaranteed to only go forward,
unlike time.time() which can be affected by local clock settings.
Nicki Křížek [Fri, 6 Jun 2025 13:11:44 +0000 (15:11 +0200)]
Improve logging from isctest.run.retry_with_timeout
Allow use of exception (and by extension, assert statements) in the
called function in order to extract essential debug information about
the type of failure that was encountered.
In case the called function fails to succeed on the last retry and
raised an exception, log it as error and set it as the assert message to
propagate it through the pytest framework.
Matthijs Mekking [Thu, 22 May 2025 09:23:48 +0000 (11:23 +0200)]
Fix spurious missing key files log messages
This happens because old key is purged by one zone view, then the other
is freaking out about it.
Keys that are unused or being purged should not be taken into account
when verifying key files are available.
The keyring is maintained per zone. So in one zone, a key in the
keyring is being purged. The corresponding key file is removed.
The key maintenance is done for the other zone view. The key in that
keyring is not yet set to purge, but its corresponding key file is
removed. This leads to "some keys are missing" log errors.
We should not check the purge variable at this point, but the
current time and purge-keys duration.
Create a test scenario where a signed zone is in multiple views and
then a key may be purged. This is a bug case where the key files are
removed by one view and then the other view starts complaining.
Matthijs Mekking [Wed, 19 Mar 2025 13:37:28 +0000 (14:37 +0100)]
Convert going insecure kasp test to pytest
When going insecure, we publish CDS and CDNSKEY DELETE records. Update
the check_apex function to test this.
Also, skip some tests in the 'check_rollover_step()' function. If
we change the DNSSEC Policy, keys that no longer match the policy will
be retired. When this exactly happens is hard to determine, as it
happens on the reconfigure. So for these tests, we skip the key timing
metadata checks.
Also, the zone becomes unsigned, so don't call 'check_zone_is_signed'
in those cases.
Matthijs Mekking [Wed, 19 Mar 2025 10:35:18 +0000 (11:35 +0100)]
Convert policy changes tests to pytest
These test cases involve a reconfiguration. The first one is a zone
that changes from dynamic to inline-signing. The others are tests that
key lifetimes are updated correctly after changing them.
Evan Hunt [Fri, 13 Jun 2025 04:39:20 +0000 (04:39 +0000)]
[9.20] fix: usr: Use IPv6 queries in delv +ns
`delv +ns` invokes the same code to perform name resolution as `named`,
but it neglected to set up an IPv6 dispatch object first. Consequently,
it was behaving more like `named -4`. It now sets up dispatch objects
for both address families, and performs resolver queries to both v4 and v6
addresses, except when one of the address families has been suppressed
by using `delv -4` or `delv -6`.
Closes #5352
Backport of MR !10563
Merge branch 'backport-5352-delv-ipv6-9.20' into 'bind-9.20'
Evan Hunt [Thu, 5 Jun 2025 21:10:21 +0000 (14:10 -0700)]
add tests for 'delv +ns -4' and '-6'
check that `delv +ns` sends iterative queries over both address
families when -4 and -6 are not used, and suppresses queries
appropriately when they are.
Evan Hunt [Thu, 5 Jun 2025 18:41:17 +0000 (11:41 -0700)]
Use ipv6 queries in delv +ns
`delv +ns` invokes the same code to perform name resolution as `named`,
but it neglected to set up an IPv6 dispatch object first. Consequently,
it was behaving more like `named -4`.
It now sets up dispatch objects for both address families, and performs
resolver queries to both v4 and v6 addresses, except when one of the
address families has been suppressed by using `delv -4` or `delv -6`.
Aydın Mercan [Tue, 3 Jun 2025 15:05:10 +0000 (15:05 +0000)]
[9.20] rem: pkg: Implement the systemd notification protocol manually to remove dependency on libsystemd.
libsystemd, despite being useful, adds a huge surface area for just
using the sd_notify API. libsystemd's surface has been exploited in the
past [1].
Implement the systemd notification protocol by hand since it is just
sending newline-delimited datagrams to a UNIX socket. The code shouldn't
need more attention in the future since the notification protocol is
covered under systemd's stability promise [2].
We don't need to support VSOCK-backed service notifications since they
are only intended for virtual machine inits.
Aydın Mercan [Sun, 16 Mar 2025 16:54:18 +0000 (19:54 +0300)]
implement the systemd notification protocol manually, drop libsystemd
libsystemd, despite being useful, adds a huge surface area for just
using the sd_notify API. libsystemd's surface has been exploited in the
past [1].
Implement the systemd notification protocol by hand since it is just
sending newline-delimited datagrams to a UNIX socket. The code shouldn't
need more attention in the future since the notification protocol is
covered under systemd's stability promise [2].
We don't need to support VSOCK-backed service notifications since they
are only intended for virtual machine inits.
Evan Hunt [Thu, 29 May 2025 17:55:25 +0000 (10:55 -0700)]
Prevent .hypothesis artifacts in system test directories
The "run.sh" script, used by "make test", changes the working
directory to the system test directory before executing pytest.
If the test drops hypothesis artifacts while running, this
can cause spurious test failures due to an apparent mismatch
between the contents of the system test directory and the
temporary pytest directory. This has been addressed by having
"run.sh" call pytest from the parent directory instead.
Mark Andrews [Tue, 3 Jun 2025 03:04:01 +0000 (03:04 +0000)]
[9.20] fix: nil: Extend named-rrchecker multi-line parsing support
named-rrchecker now parses the braces which support multi-line input
from the beginning of the input rather than only when reading the
data fields of the record.
Closes #5336
Backport of MR !10521
Merge branch 'backport-5336-extend-named-rrchecker-multiline-support-9.20' into 'bind-9.20'
Mark Andrews [Fri, 30 May 2025 03:03:16 +0000 (13:03 +1000)]
Extend named-rrchecker multi-line parsing support
named-rrchecker now parses the braces which support multi-line input
from the beginning of the input rather than only when reading the
data fields of the record.
Mark Andrews [Tue, 3 Jun 2025 00:26:43 +0000 (00:26 +0000)]
[9.20] fix: nil: Silence potential divide by zero warning in qpmulti.c
Coverity flagged a potential divide by zero error in collect in
qpmulti.c when the elapsed time is zero but that is only called
once the elapsed time is greater than or equal to RUNTIME (1/4
second) so INSIST this is the case.
Closes #5329
Backport of MR !10519
Merge branch 'backport-5329-potential-divide-by-zero-in-qpmulti-c-9.20' into 'bind-9.20'
Mark Andrews [Fri, 30 May 2025 00:51:21 +0000 (10:51 +1000)]
Silence potential divide by zero warning in qpmulti.c
Coverity flagged a potential divide by zero error in collect in
qpmulti.c when the elapsed time is zero but that is only called
once the elapsed time is greater than or equal to RUNTIME (1/4
second) so INSIST this is the case.
Petr Špaček [Mon, 2 Jun 2025 09:59:23 +0000 (11:59 +0200)]
Fix link to TXT RRtype specification
The odd-looking "\ " escape is required to italicize <character-string>
without italicizing the final "s". See reStructuredText Markup
Specification, sections "Inline markup recognition rules" and "Escaping
Mechanism". Most importantly:
Escaped whitespace characters are removed from the output document
together with the escaping backslash. This allows for character-level
inline markup.
Petr Špaček [Wed, 28 May 2025 13:46:14 +0000 (15:46 +0200)]
Run CI danger job even if user canceled it while it was running
Limitation: The after_script is not executed if the job did not start at
all, i.e. if the user canceled the job before it got onto a runner.
See https://gitlab.com/groups/gitlab-org/-/epics/10158
Nicki Křížek [Mon, 26 May 2025 15:10:15 +0000 (17:10 +0200)]
Add dynamic update facility to NamedInstance
Deduplicate the code for dynamic updates and increase code clarity by
using an actual dns.update.UpdateMessage rather than an undefined
intermediary format passed around as a list of arguments.
Matthijs Mekking [Wed, 19 Mar 2025 09:10:13 +0000 (10:10 +0100)]
Convert csk rollover test cases to pytest
Move the 'csk-roll1' and 'csk-roll2' zones to the rollover test dir and
convert CSK rollover tests to pytest.
The DS swap spans multiple steps. Only the first time we should check
if the "CDS is now published" log is there, and only the first time we
should run 'rndc dnssec -checkds' on the keys. Add a new key to the
step dictionary to disable the DS swap checks.
This made me realize that we need to check for "is not None" in case
the value in the dictionary is False. Update check_rollover_step()
accordingly, and also add a log message which step/zone we are currently
checking.
Matthijs Mekking [Tue, 18 Mar 2025 13:20:54 +0000 (14:20 +0100)]
Convert ksk rollover test case to pytest
Move the 'ksk-doubleksk' zones to the rollover test dir and convert KSK
rollover test to pytest.
Since the 'ksk-doubleksk' policy publishes different CDNSKEY/CDS RRsets,
update the 'check_rollover_step' to check which CDNSKEY/CDS RRsets should
be published and which should be prohibited. Update 'isctest.kasp'
accordingly.
We are changing the ZSK lifetime to unlimited in this test case as it
is of no importance (this actually discovered a bug in setting the
next time the keymgr should run).
Matthijs Mekking [Tue, 18 Mar 2025 11:18:34 +0000 (12:18 +0100)]
Convert zsk rollover test case to pytest
Move the 'zsk-prepub' zones to the rollover test dir and convert ZSK
rollover test to pytest.
We need a way to signal a smooth rollover is going on. Signatures are
being replaced gradually during a ZSK rollover, so the existing
signatures of the predecessor ZSK are still being used. Add a smooth
operator to set the right expectations on what signatures are being
used.
Setting expected key relationships is a bit crude: a list of two
elements where the first element is the index of the expected keys that
is the predecessor, and the second element is the index of the expected
keys that is the successor.
We are changing the KSK lifetime to unlimited in this test case as it
is of no importance.
Matthijs Mekking [Tue, 18 Mar 2025 09:34:53 +0000 (10:34 +0100)]
Convert enable dnssec test case to pytest
Move the 'enable-dnssec' to the rollover test dir and convert to pytest.
This requires new test functionality to check that "CDS is published"
messages are logged (or prohibited).
The setup part is slightly adapted such that it no longer needs to
set the '-P sync' value in most cases (this is then set by 'named'),
and to adjust for the inappropriate safety intervals fix.
Matthijs Mekking [Tue, 18 Mar 2025 07:41:02 +0000 (08:41 +0100)]
Convert kasp multi-signer tests to pytest
Move the multi-signer test scenarios to the rollover directory and
convert tests to pytest.
- If the KeyProperties set the "legacy" to True, don't set expected
key times, nor check them. Also, when a matching key is found, set
key.external to True.
- External keys don't show up in the 'rndc dnssec -status' output so
skip them in the 'check_dnssecstatus' function. External keys never
sign RRsets, so also skip those keys in the '_check_signatures'
function.
- Key properties strings now can set expected key tag ranges, and if
KeyProperties have tag ranges set, they are checked.
Matthijs Mekking [Fri, 28 Feb 2025 14:52:20 +0000 (15:52 +0100)]
Move rollover test cases to separate test dir
In order to keep the kasp system test somewhat approachable, let's
move all rollover scenarios to its own test directory. Starting with
the manual rollover test cases.
A new test function is added to 'isctest.kasp', to verify that the
relationship metadata (Predecessor, Successor) is set correctly.
The configuration and setup for the zone 'manual-rollover.kasp' are
almost copied verbatim, the only exception is the keytimes. Similar
to the test kasp cases, we no longer set "SyncPublish/PublishCDS" in
the setup script. In addition to that, the offset is changed from one
day ago to one week ago, so that the key states match the timing
metadata (one day is too short to move a key from "hidden" to
"omnipresent").
Michał Kępień [Fri, 30 May 2025 19:19:56 +0000 (19:19 +0000)]
[9.20] chg: test: Use isctest.asyncserver in the "chain" test
Replace the custom DNS servers used in the "chain" system test with
new code based on the isctest.asyncserver module.
For ans3, replace the sequence of logical conditions present in Perl
code with zone files and a limited amount of custom logic applied on top
of them where necessary.
For ans4, replace the ctl_channel() and create_response() functions with
a custom control command handler coupled with a dynamically instantiated
response handler, making the code more robust and readable.
Migrate sendcmd() and its uses to the new way of sending control queries
to custom servers used in system tests.
Depends on !10409
Backport of MR !10410
Merge branch 'backport-michal/chain-asyncserver-9.20' into 'bind-9.20'
Michał Kępień [Fri, 30 May 2025 16:23:21 +0000 (18:23 +0200)]
Use isctest.asyncserver in the "chain" test
Replace the custom DNS servers used in the "chain" system test with
new code based on the isctest.asyncserver module.
For ans3, replace the sequence of logical conditions present in Perl
code with zone files and a limited amount of custom logic applied on top
of them where necessary.
For ans4, replace the ctl_channel() and create_response() functions with
a custom control command handler coupled with a dynamically instantiated
response handler, making the code more robust and readable.
Migrate sendcmd() and its uses to the new way of sending control queries
to custom servers used in system tests.
Michał Kępień [Fri, 30 May 2025 16:23:21 +0000 (18:23 +0200)]
Improve readability of sendcmd() calls
To improve readability of sendcmd() calls used for controlling
isctest.asyncserver-based custom DNS servers, pass the command's name
and arguments as separate parameters.
Michał Kępień [Fri, 30 May 2025 16:22:54 +0000 (16:22 +0000)]
[9.20] new: test: Handle alias records in zone files loaded by AsyncDnsServer
dnspython does not treat CNAME records in zone files in any special way;
they are just RRsets belonging to zone nodes. Process CNAMEs when
preparing zone-based responses just like a normal authoritative DNS
server would.
Adding proper DNAME support to AsyncDnsServer would add complexity to
its code for little gain: DNAME use in custom system test servers is
limited to crafting responses that attempt to trigger bugs in named.
This fact will not be obvious to AsyncDnsServer users as it
automatically loads all zone files it finds and handles CNAME records
like a normal authoritative DNS server would.
Therefore, to prevent surprises:
- raise an exception whenever DNAME records are found in any of the
zone files loaded by AsyncDnsServer,
- add a new optional argument to the AsyncDnsServer constructor that
enables suppressing this new behavior, enabling zones with DNAME
records to be loaded anyway.
This enables response handlers to use the DNAME records present in zone
files in arbitrary ways without complicating the "base" code.
Backport of MR !10409
Merge branch 'backport-michal/asyncserver-alias-records-9.20' into 'bind-9.20'
Michał Kępień [Fri, 30 May 2025 16:08:54 +0000 (18:08 +0200)]
Force manual DNAME handling to be acknowledged
Adding proper DNAME support to AsyncDnsServer would add complexity to
its code for little gain: DNAME use in custom system test servers is
limited to crafting responses that attempt to trigger bugs in named.
This fact will not be obvious to AsyncDnsServer users as it
automatically loads all zone files it finds and handles CNAME records
like a normal authoritative DNS server would.
Therefore, to prevent surprises:
- raise an exception whenever DNAME records are found in any of the
zone files loaded by AsyncDnsServer,
- add a new optional argument to the AsyncDnsServer constructor that
enables suppressing this new behavior, enabling zones with DNAME
records to be loaded anyway.
This enables response handlers to use the DNAME records present in zone
files in arbitrary ways without complicating the "base" code.
Michał Kępień [Fri, 30 May 2025 16:08:54 +0000 (18:08 +0200)]
Drop unused AsyncDnsServer constructor argument
The constructor for the AsyncDnsServer class takes a 'load_zones'
argument that is not used anywhere and is not expected to be useful in
the future: zone files are not required for an AsyncDnsServer instance
to start and, if necessary, zone-based answers can be suppressed or
modified by installing a custom response handler.
Michał Kępień [Fri, 30 May 2025 16:08:54 +0000 (18:08 +0200)]
Properly handle CNAMEs when preparing responses
dnspython does not treat CNAME records in zone files in any special way;
they are just RRsets belonging to zone nodes. Process CNAMEs when
preparing zone-based responses just like a normal authoritative DNS
server would.
The `pytest` cases checks if a zone is signed by looking at the `NSEC` record at the apex. If that has an RRSIG record, it is considered signed. But `named` signs zones incrementally (in batches) and so the zone may still lack some signatures. In other words, the tests may consider a zone signed while in fact signing is not yet complete, then performs additional checks such as is a subdomain signed with the right key. If this check happens before the zone is actually fully
signed, the check will fail.
Fix this by using `check_dnssec_verify` instead of `check_is_zone_signed`. We were already doing this check, but we now move it up. This will transfer the zone and then run `dnssec-verify` on the response. If the zone is partially signed, the check will fail, and it will retry for up to ten times.
Closes #5303
Backport of MR !10445
Merge branch 'backport-5303-kasp-pytest-intermittent-test-failures-9.20' into 'bind-9.20'
The pytest cases checks if a zone is signed by looking at the NSEC
record at the apex. If that has an RRSIG record, it is considered
signed. But 'named' signs zones incrementally (in batches) and so
the zone may still lack some signatures. In other words, the tests
may consider a zone signed while in fact signing is not yet complete,
then performs additional checks such as is a subdomain signed with the
right key. If this check happens before the zone is actually fully
signed, the check will fail.
Fix this by using 'check_dnssec_verify' instead of
'check_is_zone_signed'. We were already doing this check, but we now
move it up. This will transfer the zone and then run 'dnssec-verify'
on the response. If the zone is partially signed, the check will fail,
and it will retry for up to ten times.
Nicki Křížek [Thu, 29 May 2025 11:18:23 +0000 (11:18 +0000)]
[9.20] chg: test: Add utility module to import correct version of hypothesis
On FIPS-enabled platforms, we need to ensure a minimal version of
hypothesis which no longer uses MD5. This doesn't need to be enforced
for other platforms.
Move the import magic to a utility module to avoid copy-pasting the
boilerplate code around.
Backport of MR !10442
Merge branch 'backport-nicki/pytest-import-hypothesis-9.20' into 'bind-9.20'
Nicki Křížek [Mon, 5 May 2025 16:00:07 +0000 (18:00 +0200)]
Ensure supported version of hypothesis is available
On FIPS-enabled platforms, we need to ensure a minimal version of
hypothesis which no longer uses MD5. This doesn't need to be enforced
for other platforms.
Move the import magic to a utility module to avoid copy-pasting the
boilerplate code around.
Mark Andrews [Thu, 29 May 2025 08:01:23 +0000 (08:01 +0000)]
[9.20] fix: nil: silence tainted scalar in client.c
Coverity detected that 'optlen' was not being checked in 'process_opt'.
This is actually already done when the OPT record was initially
parsed. Add an INSIST to silence Coverity as is done in message.c.
Closes #5330
Backport of MR !10500
Merge branch 'backport-5330-tainted-scalar-in-client-c-9.20' into 'bind-9.20'
Mark Andrews [Wed, 28 May 2025 23:42:08 +0000 (09:42 +1000)]
Silence tainted scalar in client.c
Coverity detected that 'optlen' was not being checked in 'process_opt'.
This is actually already done when the OPT record was initially
parsed. Add an INSIST to silence Coverity as is done in message.c.
The memory context for managers and dlz_dlopen_driver units had no name
and that was causing trouble with the statistics channel output. Set
the name for the two memory context that were missing a proper name.
Ondřej Surý [Wed, 28 May 2025 19:04:49 +0000 (19:04 +0000)]
[9.20] fix: usr: Fix zone deletion issue
A secondary zone could initiate a new zone transfer from the
primary server after it had been already deleted from the
secondary server, and before the internal garbage collection
was activated to clean it up completely. This has been fixed.
Closes #5291
Backport of MR !10449
Merge branch 'backport-5291-zone-delete-bug-9.20' into 'bind-9.20'
Aram Sargsyan [Mon, 12 May 2025 13:58:38 +0000 (13:58 +0000)]
Prepare a zone for shutting down when deleting it from a view
After b171cacf4f0123ba96bef6eedfc92dfb608db6b7, a zone object can
remain in the memory for a while, until garbage collection is run.
Setting the DNS_ZONEFLG_EXITING flag should prevent the zone
maintenance function from running while it's in that state.
Otherwise, a secondary zone could initiate a zone transfer after
it had been deleted.
Ondřej Surý [Wed, 28 May 2025 17:53:22 +0000 (17:53 +0000)]
[9.20] fix: usr: Fix a zone refresh bug
A secondary zone could fail to further refresh with new
versions of the zone from a primary server if named was
reconfigured during the SOA request step of an ongoing
zone transfer. This has been fixed.
Closes #5307
Backport of MR !10468
Merge branch 'backport-5307-zone-refresh-stuck-after-reconfiguration-fix-9.20' into 'bind-9.20'