git.ipfire.org Git - thirdparty/bind9.git/log

Define DEVELOPER_MODE in developer-mode builds

So that build-time consumers (e.g. feature-test) can detect developer
mode through a single dedicated symbol rather than proxying through
implementation-detail defines like ISC_MEM_TRACKLINES.

Assisted-by: Claude:claude-opus-4-7

fix: usr: Fix nxdomain-redirect combined with dns64

When a resolver was configured with both `nxdomain-redirect` and `dns64`
in the same view, an AAAA query for a nonexistent name could abort
`named`. The combination failed whenever the redirect zone held A
records but no AAAA records. The server now serves the empty AAAA
response from the redirect zone as-is, instead of attempting DNS64
synthesis on top of it.

Closes #5789

Merge branch '5789-fix-nxdomain-redirect-dns64-assert' into 'main'

See merge request isc-projects/bind9!12059

Skip DNS64 synthesis when answering a redirected response

redirect2() swaps qctx->db to the redirect zone before
query_nodata() runs. The DNS64 fallback there issues an A lookup
for the original query name, which is out of zone for the
redirect db, and the resulting query_notfound() trips
INSIST(!is_zone). The cached NCACHENXRRSET variant trips a
REQUIRE in dns_rdataset_first() on a disassociated rdataset.
The synth-from-dnssec entry reaches the same fallback via
query_coveringnsec(). Guarding the fallback with
!qctx->redirected leaves the nxdomain-redirect NXRRSET answer to
be served as-is.

System test for nxdomain-redirect combined with dns64

An AAAA query for a non-existent name into a view that combines
nxdomain-redirect with dns64 used to abort named via the DNS64
fallback in query_nodata(). The new module exercises all three
documented entry paths into query_redirect(): the authoritative
NXDOMAIN path (ns7, tripping INSIST(!is_zone) in
query_notfound()), the recursive NCACHENXRRSET path (ns8,
tripping REQUIRE in dns_rdataset_first() on a disassociated
rdataset), and the synth-from-dnssec path (ns10 validating
against ns9's signed root, with a primer A query so the second
AAAA reaches query_redirect() via query_coveringnsec()). ns9
serves as a neutral upstream so the cached and synthesized
negatives land real NXRRSETs.

Assisted-by: Claude:claude-opus-4-7

rem: dev: Remove useless PR-Agent jobs

The experiment was a failure, the PR-Agent doesn't send a full context
to the AI Agents and the results are abysmal because of that.

Merge branch 'ondrej/remove-useless-pr-agent' into 'main'

See merge request isc-projects/bind9!12119

Remove useless PR-Agent jobs

The experiment was a failure, the PR-Agent doesn't send a full context
to the AI Agents and the results are abysmal because of that.

chg: test: Improve pytest jinja2 templates

- Enable rendering ns-specific data in jinja2 templates using the `ns` varible.
- Add common zone/config snippets an `_common` templates.
- Allow jinja2 imports from `_common`.
- Improve the `_common/controls.conf.j2` snippet to render ns-specific IP rather than hardocded one.

Merge branch 'nicki/pytest-template-improvements' into 'main'

See merge request isc-projects/bind9!11805

Restrict cross-test jinja2 includes to _common/

The previous loader was a FileSystemLoader rooted at $srcdir, which
allowed any system test to include any other test's templates -- a
wider scope than intended. Every existing cross-test include already
targets _common/, so make that the only path.

ChoiceLoader + PrefixLoader keeps the existing '_common/foo.j2' path
convention working without changes to call sites. The '_common/'
prefix is deliberately kept rather than dropping it by rooting the
FileSystemLoader at _common/ directly:

  - It signals at the include site that the file is a shared
    template, not a sibling of the current test; readers don't need
    to know the loader configuration to understand where the file
    lives.
  - It prevents shadowing: a test-local 'controls.conf.j2' would
    not collide with the shared one, and the unqualified name keeps
    its test-local meaning.
  - It makes the dependency greppable: 'grep -rl _common/'
    identifies every test that consumes shared snippets.

Assisted-by: Claude:claude-opus-4-7

Create common templates for test zones

Add commonly used zone-related data (config snippet and zone file
snippets) as templates which can be reused by filling in different data.

Adjust the isctest.template.Zone to use filepath argument rather than
filename for clarity.

Include controls.conf as jinja2 template

Rather than using named.conf include, render the controls directly into
the config using jinja2 template include.

Add _common dir to jinja2 template loader

This allows include of template snippets from _common/ directory.

Reduce whitespace in jinja2 templates

Omit extra newlines when combining and including templates.

Adjust the xfer/ns8/small.db.j2 so it doesn't trim the endline twice
(as that would join the two subsequent records on the same line).

Allow instantiating template dataclasses in jinja2 templates

In some cases, the template data might need to be set directly in the
jinja2 templates using `{% set %}`. Expose the template dataclasses to
the templates so we can use these existing classes, rather than creating
ad-hoc data containers.

Add a directory-specific nameserver data to templates

If a template is being rendered into a directory that represents a
nameserver (e.g. "ns1"), include a nameserver-specific information in
the data - variable called "ns" which has information about the
nameserver this file belongs to.

Ensure the "ns" variable is only exposed to the template when rendered,
without affecting the environment variables (always work with a copy of
the env_vars).

Improve isctest.template dataclasses' defaults

Extend the Nameserver to generate the default IPv4/IPv6 values, add NSX
values for the predefined nameservers (there are 11 of them, as per
bin/tests/system/ifconfig.sh.in max value). Add the missing ns11
fixture.

Extend the Zone to derive the zone filename by default, unless
specified.

Adjust the existing uses of these classes to utilize the simplified
defaults.

fix: usr: Fix crash on badly configured secondary signer

A badly configured secondary signer that was missing the 'file' entry caused the server to crash, rather than to reject the configuration. This has been fixed.

Closes #5993

Merge branch '5993-fix-bump-in-the-wire-crash' into 'main'

See merge request isc-projects/bind9!12045

Fix startup crash on bump in the wire signer

A secondary server that is configured as a bump in the wire signer
with inline-signing implicitly enabled via dnssec-policy requires
a 'file' entry.

Check conf dnssec-policy inline-signing secondary

Add a variant of checking configuration where inline-signing is
enabled on the secondary, requiring the 'file' entry. This time,
inline-signing is implicitly enabled via dnssec-policy.

fix: doc: Ignore gitlab.gnome.org links in Sphinx linkcheck

Merge branch 'mnowak/linkcheck-fix' into 'main'

See merge request isc-projects/bind9!12109

Clean up OpenSSL/BoringSSL/LibreSSL reference URLs in changelog

Drop the #Lxxx-Lyyy fragments (replaced with prose line numbers) and
unwrap the line-broken URLs so Sphinx linkcheck can validate them.

Assisted-by: Claude:claude-opus-4-7

Ignore gitlab.gnome.org links in Sphinx linkcheck

GNOME GitLab returns HTTP 406 to Sphinx's linkcheck requests, the
same behavior already worked around for gitlab.isc.org.

Assisted-by: Claude:claude-opus-4-7

chg: ci: Add rule for stable tags in CI and use it in the update-stable-tag job

Add a rule to match open source stable tags in CI and apply it to the
update-stable-tag job.

Merge branch 'andoni/show-update-stable-tag-job-in-stable-versions' into 'main'

See merge request isc-projects/bind9!11646

Add rule for the stable tags in CI and use for job update-stable-tag

The update-stable-tag job should only be run for the stable tag, which
is used by Read the Docs to build the docs for the "stable" version.

A new rule called rule_tag_open_source_stable is introduced, in order to
prevent the job from appearing in the pipeline for non-stable versions.
Having this rule in YAML is necessary, because if it were in the script
itself, the job would show up in the pipeline.

Besides, the new rule allows other jobs to be run only for the stable
tag in the future, without modifying their internal logic.

The CI variable STABLE_VERSION contains a regular expression in
Gitlab CI sense[1]: it uses the RE2 syntax[2] and must be enclosed by
slashes (i.e. /.../). It must be updated every time the minor version
is changed: releasing v9.22 will require changing STABLE_VERSION from
"/v9.20/" to "/v9.22/".

The variable is imported from common Gitlab CI YAML in the project
isc-projects/bind9-qa, so as to maintain it in a central place.

[1]: https://docs.gitlab.com/ci/jobs/job_rules/#compare-a-variable-to-a-regular-expression
[2]: https://github.com/google/re2/wiki/Syntax

Include common Gitlab CI YAML from isc-projects/bind9-qa

The template file .gitlab-ci-common.yml is to be used across ISC
projects, while it is maintained in the isc-projects/bind9-qa project.

chg: test: Move requirement checks to `pytest_configure` hook

This leads to nicer logging if requirements aren't met.

Merge branch 'stepan/dont-run-system-tests-without-requirements' into 'main'

See merge request isc-projects/bind9!11551

Move pytest requirements check to pytest_configure hook

Logging from a pytest hook looks better.

Reorder the check for presence of `featuretest` before `init_vars` to
produce more sensible errors.

chg: ci: Run unit tests with PKCS#11-aware OpenSSL

Closes isc-projects/bind9#4958

Closes isc-projects/bind9#4957

Merge branch 'mnowak/pkcs11-aware-unit-gcc-ossl3-amd64' into 'main'

See merge request isc-projects/bind9!9543

Call tzset() after setenv("TZ", ...) in unit tests

POSIX does not require localtime_r() to behave as if tzset() was called,
so the TZ environment change isn't picked up if some library has already
primed libc's tz cache. Loading pkcs11-provider during OpenSSL init
does exactly that, causing the time and dnstap cmocka tests to format
timestamps in UTC instead of the requested zone.

Assisted-by: Claude:claude-opus-4-7

Run unit tests with PKCS#11-aware OpenSSL

fix: test: Handle large query IDs in xfer/ans5 properly

Previously, the server would crash if it received a query with an ID
close to 65535 in the badmessageid case, as adding 50 to it would not
fit in uint16.

This was an oversight in porting it from Perl to Python in
f9ed3650acdc2c5b38d8b36729b045ca63f983ef.

Fixes #6025.

Merge branch 'stepan/fix-xfer-large-qid' into 'main'

See merge request isc-projects/bind9!12097

Handle large query IDs in xfer/ans5 properly

Previously, the server would crash if it received a query with an ID
close to 65535 in the badmessageid case, as adding 50 to it would not
fit in uint16.

This was an oversight in porting it from Perl to Python in
f9ed3650acdc2c5b38d8b36729b045ca63f983ef.

fix:usr: Enable Edwards curves with PKCS#11

Ed25519 and Ed448 curves did not work in PKCS#11. This has been fixed.

Closes isc-projects/bind9#5762

Merge branch 'mnowak/pkcs11-enable-edwards-curves' into 'main'

See merge request isc-projects/bind9!11591

Generate Ed25519/Ed448 keys via PKCS#11 when a label is set

When a dst_key_t carries a PKCS#11 URI in key->label (as named
does for dnssec-policy zones backed by a key-store "hsm"), key
generation must happen inside the HSM, not in software.
opensslecdsa_generate already branches on key->label and calls
the matching pkcs11 wrapper; the EDDSA generator silently ignored
the label and produced a software key, which named then wrote to
the .private file with both a Label: line and the raw PrivateKey:
bytes -- a corrupt hybrid record that prevented zone signing.

Add the missing wrapper:

  - lib/isc/ossl_wrap/ossl3.c gains generate_pkcs11_eddsa_key()
    and the public isc_ossl_wrap_generate_pkcs11_ed25519_key() /
    isc_ossl_wrap_generate_pkcs11_ed448_key() entry points.  They
    use EVP_PKEY_CTX_new_from_name(NULL, "ED25519" or "ED448",
    "provider=pkcs11") with the pkcs11_uri and pkcs11_key_usage
    parameters, mirroring the existing EC wrapper.
  - lib/isc/ossl_wrap/ossl1_1.c provides stubs returning
    ISC_R_NOTIMPLEMENTED for the new EDDSA wrappers; the
    pkcs11-provider stack requires OpenSSL 3.  The pre-existing
    isc_ossl_wrap_generate_pkcs11_rsa_key() stub used to silently
    delegate to software keygen -- that hid the same "HSM label
    on a software key" hazard for RSA on OpenSSL 1.1 builds, so
    align it with the EDDSA stubs and return ISC_R_NOTIMPLEMENTED
    too.
  - lib/isc/include/isc/ossl_wrap.h declares the new wrappers.
  - lib/dns/openssleddsa_link.c routes openssleddsa_generate()
    through the new wrappers when key->label is non-NULL, leaving
    the existing EVP_PKEY_keygen() path untouched for software
    keys.  The Ed448 case is guarded by HAVE_OPENSSL_ED448 to
    match the surrounding code.

Assisted-by: Claude:claude-opus-4-7

Tolerate non-extractable Ed25519/Ed448 private keys in tofile

openssleddsa_tofile() called EVP_PKEY_get_raw_private_key()
unconditionally whenever the dst_key_t had a private EVP_PKEY
attached and aborted with ISC_R_FAILURE on any error. That is
wrong for keys whose private material lives in a hardware token
(PKCS#11): the provider deliberately refuses to export the raw
bytes, but the keypair is still valid and the .private file
should be written containing only the PKCS#11 label, with no raw
key material. Without this, "dnssec-keyfromlabel -a ed25519 -l
pkcs11:..." fails with "failed to write key ...: failure" even
though pkcs11-tool has generated a valid Ed25519 key in SoftHSM.

Mirror the behaviour already implemented in opensslecdsa_tofile():
if the raw private key cannot be retrieved AND the key has a
PKCS#11 label to fall back on, clear the OpenSSL error queue and
fall through to writing just the Label element.

If extraction fails and there is no label to fall back on, return
the OpenSSL failure rather than silently producing a .private
file with neither raw key material nor a label, which would be
unusable on the next load.

Consolidate buffer cleanup into a single cleanup: path, freeing
with the original allocation size (alginfo->key_size) rather than
the potentially-modified len output parameter.

Assisted-by: Claude:claude-opus-4-7

Enable Edwards curves with PKCS#11

Ed25519 and Ed448 support (PKCS#11 v3.2) was added to libp11-0.4.17.

fix: ci: Misc .gitlab-ci.yml fixes

The meson-format job had two `rules:` blocks. YAML silently overwrites
the first with the second, so the job only ran on MRs with meson.build
changes. The intended rules (tags, schedules, other pipeline sources)
were silently discarded. Remove the duplicate to restore the original
intent.

Assisted-by: Claude:claude-opus-4-7
Merge branch 'mnowak/fix-gitlab-ci' into 'main'

See merge request isc-projects/bind9!12093

Add missing autorebase rule to meson-format CI job

The clang-format and coccinelle jobs both include
*rule_branch_after_autorebase in their rules, but meson-format
did not. This meant meson-format would not run after autorebase
operations push new commits.

Assisted-by: Claude:claude-opus-4-7

Remove duplicate <<: *build_job merge in clang:tsan CI job

The clang:tsan job merged *build_job twice (lines 1723 and 1729).
The second merge is redundant — copy-paste artifact.

Assisted-by: Claude:claude-opus-4-7

Add missing TSAN_SYMBOLIZER_PATH in system:gcc:tsan CI job

The system:gcc:tsan job had an empty variables: block, unlike its
three sibling TSAN jobs (unit:gcc:tsan, system:clang:tsan,
unit:clang:tsan) which all set TSAN_SYMBOLIZER_PATH.

Assisted-by: Claude:claude-opus-4-7

Fix duplicate rules: key in meson-format CI job

The meson-format job had two `rules:` blocks. YAML silently overwrites
the first with the second, so the job only ran on MRs with meson.build
changes. The intended rules (tags, schedules, other pipeline sources)
were silently discarded. Remove the duplicate to restore the original
intent.

Assisted-by: Claude:claude-opus-4-7

fix: nil: Fix mypy var-annotated error on FEATURE_VARS

Mypy reports 'Need type annotation for "FEATURE_VARS"'; init_features()
populates it with str->str entries.

Assisted-by: Claude:claude-opus-4-7
Merge branch 'nicki/fix-isctest-vars-mypy-annotation' into 'main'

See merge request isc-projects/bind9!12086

Fix mypy var-annotated error on FEATURE_VARS

Mypy reports 'Need type annotation for "FEATURE_VARS"'; init_features()
populates it with str->str entries.

Assisted-by: Claude:claude-opus-4-7

new: ci: Add Debian "trixie" (386)

Merge branch 'mnowak/add-debian-trixie-386' into 'main'

See merge request isc-projects/bind9!12079

Tolerate dnspython post-2038 timestamp overflow on 32-bit

dnspython's RRSIG.to_text() converts the signature inception/expiration
fields by calling time.gmtime(), which on 32-bit platforms raises
OverflowError for values past 2038-01-19 (INT32_MAX). Several DNSSEC
test fixtures use far-future expirations: the precomputed RRSIGs in
the dnssec test's rsasha1.example.db.in zone expire in 2093, ans4 of
the chain test hardcodes 2090, and ans10 of the dnssec test uses
2**32-1 (year 2106). Whenever a response carrying such an RRSIG is
formatted with str()/to_text() the overflow propagates out and either
fails the test (when triggered in isctest.query's debug logging) or
kills the asyncserver-based ans* server (when triggered in its
response logger), which in turn cascades into "Failed to stop
servers" teardown errors and SERVFAIL responses for subsequent tests.

Wrap the to_text() calls in isctest/query.py and the str(response)
call in asyncserver's _log_response() with try/except OverflowError,
falling back to a placeholder message. The conversions are only used
for debug logging, so losing the human-readable form there does not
affect what the tests actually validate.

Assisted-by: Claude:claude-opus-4-7

Fix stack corruption in copy_initfile() on 32-bit

copy_initfile() declared a size_t local variable to receive the size of
the initial file and passed it to isc_file_getsizefd() with an explicit
(off_t *) cast. On 32-bit platforms with _FILE_OFFSET_BITS=64, off_t is
8 bytes while size_t is only 4 bytes, so isc_file_getsizefd()'s
"*size = stats.st_size;" writes 8 bytes into the 4-byte slot and
clobbers the adjacent "output" FILE * on the stack. The next iteration
of the read/write loop then calls clearerr() through a NULL pointer and
named crashes with SIGSEGV.

This is triggered whenever a zone with an initial-file (e.g. one
configured via a template) is loaded for the first time, so on 32-bit
the addzone and masterfile system tests crash named in ns2 with cores.

Declare "len" as off_t to match the API and drop the unsafe cast.

Assisted-by: Claude:claude-opus-4-7

Add Debian "trixie" (386)

fix: dev: fix possible NULL dereference in `cfg_map_findclause()`

`cfg_map_findclause()` did not check whether a clause existed before dereferencing it, which could lead to a NULL dereference. Add the missing check to prevent this.

In practice, this was not triggering any known bug, since `cfg_map_findclause()` is only called in contexts where the clause is known to exist.

Closes #5997

Merge branch '5997-findclause' into 'main'

See merge request isc-projects/bind9!12052

renamed `cfg_map_addclone()` into `cfg_map_add()`

Since there is no `cfg_map_add()` anymore, and `cfg_map_addclone()`
wasn't quite clear, let's rename to `cfg_map_addclone()` into
`cfg_map_add()`, as this is fundamentally what this function is doing.

add test for `cfg_map_findclause()`

Add a test for `cfg_map_findclause()` ensuring its correct behaviour
(returning NULL) if a clause does not exists.

fix possible NULL dereference in `cfg_map_findclause()`

`cfg_map_findclause()` did not check whether a clause existed before
dereferencing it, which could lead to a NULL dereference. Add the
missing check to prevent this.

In practice, this was not triggering any known bug, since
`cfg_map_findclause()` is only called in contexts where the clause is
known to exist.

remove unused `cfg_map_add()` function

Function `cfg_map_add()` was unused, it is now removed.

fix: dev: Move last_purge declaration under the same #ifdef as its user

The static atomic last_purge is only read and written from mem_purge(),
which is compiled only when JEMALLOC_API_SUPPORTED or __GLIBC__ is
defined. This used to fail on OpenBSD:

    ../lib/isc/mem.c:405:31: error: unused variable 'last_purge' [-Werror,-Wunused-variable]
      405 | static _Atomic(isc_stdtime_t) last_purge = 0;
          |                               ^~~~~~~~~~

Merge branch 'mnowak/move-last_purge-to-ifdef' into 'main'

See merge request isc-projects/bind9!12058

Move last_purge declaration under the same #ifdef as its user

The static atomic last_purge is only read and written from mem_purge(),
which is compiled only when JEMALLOC_API_SUPPORTED or __GLIBC__ is
defined. This used to fail on OpenBSD:

    ../lib/isc/mem.c:405:31: error: unused variable 'last_purge' [-Werror,-Wunused-variable]
      405 | static _Atomic(isc_stdtime_t) last_purge = 0;
          |                               ^~~~~~~~~~

fix: usr: Clear REDIRECT flag when it isn't needed

When `nxdomain-redirect` is in use, and a recursive query is used to get the redirected answer, a flag is set to distinguish it from a normal recursive response. Previously, that flag was left set afterward, which could trigger an assertion if a normal recursive query was sent later on behalf of the same client: for example, because the `filter-aaaa` plugin was in use. This has been fixed.

Closes #5936

Merge branch '5936-clear-redirect-flag' into 'main'

See merge request isc-projects/bind9!12073

Clear REDIRECT flag when it isn't needed

The NS_QUERYATTR_REDIRECT flag is set when processing a recursive
NXDOMAIN redirection lookup, so that if that lookup also returns
NXDOMAIN we don't end up looping.

Previously, the flag was left active after use, but if the
same client triggered a subsequent recursive lookup (for example,
in the filter-aaaa plugin), then the wrong branch could be reached
in query_resume(), potentially leading to an assertion failure. This
has been fixed.

fix: dev: Validate nsec3hash arguments instead of relying on atoi()

The nsec3hash tool parsed its algorithm, flags, and iterations
arguments with atoi(), then range-checked the result. For values
that overflow int during digit-by-digit accumulation, atoi() is
undefined; in practice on musl libc the modular wrap leaves
n == 0, which silently passes the "iterations > 0xffffU" check.
On Alpine Linux this made nsec3hash succeed with iterations
treated as 0 for inputs like 4294967296 (2^32).

The latent bug only surfaced when the recent image rebuild pulled
in Hypothesis 6.152.9 (2026-05-19), which unified the distribution
used for bounded and unbounded integers() strategies. The new
smoother distribution explores the 2^32 boundary on unbounded
ranges like integers(min_value=65536); earlier versions did not
reach there, so test_nsec3hash_too_many_iterations only started
failing on Alpine after the image refresh.

Replace the three atoi() calls with isc_parse_uint8 /
isc_parse_uint16, which uniformly reject overflow, trailing
garbage, leading sign, and non-numeric input across libc
implementations. As a side effect, error messages now include
the offending argument and a specific reason ("out of range" vs
"not a valid number").

Assisted-by: Claude:claude-opus-4-7
Closes #6013

Merge branch '6013-nsec3hash-iterations-overflow' into 'main'

See merge request isc-projects/bind9!12062

Validate nsec3hash arguments instead of relying on atoi()

The nsec3hash tool parsed its algorithm, flags, and iterations
arguments with atoi(), then range-checked the result. For values
that overflow int during digit-by-digit accumulation, atoi() is
undefined; in practice on musl libc the modular wrap leaves
n == 0, which silently passes the "iterations > 0xffffU" check.
On Alpine Linux this made nsec3hash succeed with iterations
treated as 0 for inputs like 4294967296 (2^32).

The latent bug only surfaced when the recent image rebuild pulled
in Hypothesis 6.152.9 (2026-05-19), which unified the distribution
used for bounded and unbounded integers() strategies. The new
smoother distribution explores the 2^32 boundary on unbounded
ranges like integers(min_value=65536); earlier versions did not
reach there, so test_nsec3hash_too_many_iterations only started
failing on Alpine after the image refresh.

Replace the three atoi() calls with isc_parse_uint8 /
isc_parse_uint16, which uniformly reject overflow, trailing
garbage, leading sign, and non-numeric input across libc
implementations. As a side effect, error messages now include
the offending argument and a specific reason ("out of range" vs
"not a valid number").

Assisted-by: Claude:claude-opus-4-7

chg: test: Clean up custom server code in the "resend_loop" system test

Apply assorted cleanups to `bin/tests/system/resend_loop/ans3/ans.py`.

Merge branch 'michal/resend_loop-test-ans3-cleanup' into 'main'

See merge request isc-projects/bind9!12063

Follow common naming and coding conventions

Make the handlers defined in bin/tests/system/resend_loop/ans3/ans.py
follow canonical naming conventions used in other system tests. Keep
all server initialization code in the main() function.

Turn _get_cookie() into a method

Since the _get_cookie() function is only used by the CookieHandler
class, make the former a method of the latter to keep related logic
close in the source code.

Tweak the _get_cookie() method

The "len(cookie.server) == 0" condition is superfluous for the
"resend_loop" system test, so remove it. Add a return type annotation
to the _get_cookie() function.

Remove workarounds for dnspython < 2.7.0

dnspython 2.7.0 is now required to run the BIND 9 system test suite.
Drop the workarounds for older dnspython versions as they are now
redundant.

Fix flawed response logic for COOKIE-less queries

The "yield" keyword does not cause a function to return.  By design,
get_responses() may yield multiple DNS responses in a single call.  As
currently implemented, CookieHandler.get_responses() sends two responses
to each client query that does not contain a COOKIE option.  Make the
logic in that method consistent with code comments by only sending one
response to every query - either SERVFAIL or BADCOOKIE, never both.

Drop redundant uses of authoritative=True

The ans3 custom server instance is created with default_aa=True. Do not
pass the authoritative=True keyword argument to the DnsResponseSend
constructor in CookieHandler.get_responses() as it is redundant.

Drop unnecessary qctx.prepare_new_response() call

The ans3 custom server does not have any zones defined, so the responses
passed to its handlers by core isctest.asyncserver code are guaranteed
to be empty. Remove a call to qctx.prepare_new_response() from
CookieHandler.get_responses() as it is redundant.

Remove NoErrorHandler

The NoErrorHandler class does not get matched to any query sent by ns4
in the "resend_loop" test. Remove it as it is redundant.

Simplify match criteria for CookieHandler

The CookieHandler class handles all traffic for the "example." domain.
Make it a subclass of DomainHandler to simplify its definition.

Make static response handlers more specific

The RootNSHandler and ExampleNSHandler classes are only equipped to
respond to specific QNAME/QTYPE tuples, not all queries for a specific
QNAME. Turn them into subclasses of QnameQtypeHandler and make them
only respond to QTYPE=NS queries to prevent sending NS responses for
non-NS queries.

chg: ci: Various autorebase improvements

  - Rewrite cherry-pick references during autorebases
  - Fix autorebase error reporting
  - Limit post-push pipelines for autorebased branches
  - Only autorebase when there is anything to rebase
  - Conflate missing commit reference notifications
  - Support autorebasing backported security MRs

Merge branch 'michal/autorebase-improvements' into 'main'

See merge request isc-projects/bind9!12024

Support autorebasing backported security MRs

Autorebasing a backported security fix enables convenient refreshing of
cherry-pick references, which makes it trivial for developers to satisfy
Danger rules just before the merge request is merged. Add a manual CI
job that is only created for backported merge requests targeting
security-* branches.

Conflate missing commit reference notifications

Instead of creating a separate (potentially lengthy) Danger notification
for every missing commit reference in a backport, produce a single
notification with a list of all unreferenced commit hashes. This makes
Danger output more concise while retaining all the relevant feedback for
the developer.

Only autorebase when there is anything to rebase

In an optimistic future, security-* branches will become empty, at least
intermittently.  When that happens, there will be nothing left to rebase
on those branches, so when something gets merged into their base
branches, an autorebase will effectively be a fast-forward.  While the
existing autorebase logic would handle such a case perfectly fine, it is
prudent to avoid creating a test pipeline after pushing such a
fast-forward update as the code revision getting pushed will have
already been tested by other pipelines.  However, the push should still
happen as non-empty downstream autorebased branches may exist and those
will still need to be rebased.  Achieve both of these objectives by
checking early whether there is anything to rebase and pushing the
fast-forwarded version of the branch without setting the AUTOREBASE CI
variable if there is not.

Limit post-push pipelines for autorebased branches

Current CI job triggering rules cause a full pipeline to be started
after every push to security-* branches.  In this context, "push" means
"branch update", which covers both "git push" invocations and merging a
merge request.  Meanwhile, running a test pipeline is only desired after
a rebase; if a branch is fast-forwarded, it means that a merge request
has been merged into it and a pipeline should have already been run for
that merge request itself.  Limit resource use by only triggering
pipelines for security-* branches when they are pushed to with a "magic"
CI variable that is only set in autorebase jobs.  Leave all the other
triggering rules (for scheduled/manual pipelines) intact.

Fix autorebase error reporting

The logic used for detecting the commit breaking an autorebase does not
work correctly if the offending commit is not the first one applied
during the "reverse rebase". Fix by using REBASE_HEAD instead of
processing the output of "git status" in a convoluted way.

Furthermore, the approach used for identifying the first offending merge
request in the case of a successful autorebase followed by a failed
build only works correctly if the base branch is not autorebased itself.
Since a solution that would work correctly for a branch autorebased on
top of a branch that only moves forward does not work correctly for a
branch autorebased on top of another autorebased branch and vice versa,
accurately identifying the most likely culprit after a successful
autorebase is a very complicated and brittle task. Since reporting no
details at all is arguably better than reporting false details, only
produce a minimal error notification if the build fails after a
successful autorebase.

Rewrite cherry-pick references during autorebases

Use a custom rebasing script instead of "git rebase" to enable rewriting
cherry-pick references during autorebases.

new: ci: Add Fedora 44

Merge branch 'mnowak/fedora-44' into 'main'

See merge request isc-projects/bind9!12064

Add Fedora 44

fix: test: Make deleg cleanuptests memory assertions 32-bit-safe

Each address entry stored by dns_delegset_addaddr() is an
isc_netaddrlink_t, whose size depends on sizeof(void *) via the
ISC_LINK macro (24 bytes of address + two prev/next pointers): 40
bytes on 64-bit, 32 bytes on 32-bit. The hardcoded 4 MB / 8 MB
ranges only held on 64-bit, so dns_deleg_cleanuptests failed on
armv7l with isc_mem_inuse() returning ~3.2 MB.

Express the expected ranges in terms of sizeof(isc_netaddrlink_t)
so they scale with pointer width, and pull the 99999 entry count
out into a NENTRIES macro.

Close isc-projects/bind9#6012

Merge branch 'mnowak/armv7l-fix-dns_deleg_cleanuptests' into 'main'

See merge request isc-projects/bind9!12061

Make deleg cleanuptests memory assertions 32-bit-safe

Each address entry stored by dns_delegset_addaddr() is an
isc_netaddrlink_t, whose size depends on sizeof(void *) via the
ISC_LINK macro (24 bytes of address + two prev/next pointers): 40
bytes on 64-bit, 32 bytes on 32-bit. The hardcoded 4 MB / 8 MB
ranges only held on 64-bit, so dns_deleg_cleanuptests failed on
armv7l with isc_mem_inuse() returning ~3.2 MB.

Express the expected ranges in terms of sizeof(isc_netaddrlink_t)
so they scale with pointer width, and pull the 99999 entry count
out into a NENTRIES macro.

Assisted-by: Claude:claude-opus-4-7

Merge tag 'v9.21.22'

fix: nil: Properly handle BN_num_bits() return value

BN_num_bits() returns 0 on NULL input and a negative value on internal
error. The error return value is now properly handled.

Merge branch 'ondrej/fix-BN_num_bits-return-value' into 'main'

See merge request isc-projects/bind9!12057

Properly handle the return value of BN_num_bits()

BN_num_bits() returns 0 when passed NULL and a negative value on
internal error. The OpenSSL wrappers stored the result in a size_t,
so a 0 return falsely satisfied the bit-length check and a negative
return wrapped to a huge value. Capture the int return, reject
non-positive values, then compare against the limit.

fix: usr: Reject RRSIG records covering meta-types

A recursive resolver could accept and cache an RRSIG record whose
Type-Covered field names a meta-type (ANY, AXFR, IXFR, MAILA, MAILB),
even though no real RRset of those types ever exists. Such records
are now rejected by the DNS message parser.

Closes #6002

Merge branch '6002-reject-rrsig-covering-meta-types' into 'main'

See merge request isc-projects/bind9!12048

Reject malformed RRSIG records

A signature cannot cover a meta-type (NONE, ANY, AXFR, IXFR, MAILB,
MAILA, OPT, TSIG, TKEY); previously such records were cached by the
recursive resolver and collided with negative-cache entries on the
same owner name, corrupting the QP-trie cache.

Assisted-by: Claude:claude-opus-4-7

fix: dev: Don't remove corresponding RRSIG in the same loop

The `dns_db_deleterdataset()` removing the corresponding signature within the iterator is wrong, because it mutates an rdataset that is not the current one. This has been fixed.

Merge branch 'matthijs-fix-evict-cname-other' into 'main'

See merge request isc-projects/bind9!12047

Don't remove corresponding RRSIG in the same loop

The dns_db_deleterdataset() removing the corresponding signature
within the iterator is wrong, because it mutates an rdataset
that is not the current one.

fix: usr: Fix TCP fallback after repeated UDP timeouts

When an authoritative server failed to respond to two consecutive
UDP queries in a fetch, named was supposed to retry the next attempt
over TCP but in fact still sent it over UDP. The resolver now
properly switches the transport to TCP on the third attempt to
the same server.

Closes #5529

Merge branch '5529-fix-tcp-fallback-after-udp-timeouts' into 'main'

See merge request isc-projects/bind9!12022

Skip EDNS UDP-size hint on TCP retries

The hint feeds the EDNS OPT UDP-size field, which has no effect on TCP
transport. Avoid the dns_adb_getudpsize() lookup when the query is
already pinned to TCP.

Assisted-by: Claude:claude-opus-4-7

Raise the per-server recursive-clients ceiling in fetchlimit

With the resolver now legitimately escalating to TCP after repeated
UDP timeouts to the same authoritative, each lame-server lookup
takes ~50% longer to fail. The recursive-client backlog therefore
peaks a little higher before the fetches-per-server auto-tune drops
the quota below 200.

Bump the upper bound for the burst-against-lame-server and recovery
steps from 200 to 250 to absorb that extra latency. The lower bound
and the final post-recovery target (clients <= 20) are unchanged.

Assisted-by: Claude:claude-opus-4-7

Add pytest serve_stale TCP-fallback regression tests

The serve_stale shell suite uses a UDP-only perl mock as its
authoritative server. Now that the resolver escalates to TCP after
repeated UDP timeouts, three steps in serve_stale/tests.sh that
exercise resolver-query-timeout behaviour no longer reach the
timeout — the TCP fallback short-circuits to SERVFAIL via
`connection refused` on the perl mock.

Move those scenarios to a new system test directory
`bin/tests/system/serve_stale_tcp/` that uses a
ControllableAsyncDnsServer mock listening on both UDP and TCP, so
the resolver's TCP path is exercised end-to-end and the original
timing semantics are preserved. Remove the corresponding shell
steps from serve_stale/tests.sh.

Assisted-by: Claude:claude-opus-4-7

Allow either UDP or TCP queries in flight in statistics test

The "active sockets" and "queries in progress" assertions previously
required exactly one extra UDP/IPv4 socket and exactly one UDP query in
progress, with no TCP counterpart. That shape held only because the
broken TCP-fallback path left the resolver retrying UDP indefinitely.

With the fix in place, after two UDP timeouts to the same authority the
resolver legitimately escalates to TCP, and a stats snapshot taken
during recursion may catch the in-flight query on either transport.
Count the UDP and TCP counters together so the test reflects the new
correct behaviour.

Assisted-by: Claude:claude-opus-4-7

Tighten serve_stale dig timeouts and inter-step sleeps

With the TCP fallback now actually firing after repeated UDP timeouts,
the resolver covers more retry transitions in the same wall-clock
window, and the original 3-second budgets in two steps of the
serve_stale test left no margin: the dig client at +timeout=3 and the
"sleep 3" before re-enabling the upstream both straddled the moment at
which the resolver switched transport, making the asserted outcome
race-prone.

Drop the dig timeout to 2s and the sleep to 1s so each step lands
firmly on one side of the transport switch.

Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7

Emit EDE 22 when the resolver runs out of usable addresses

Two exits from fctx_try() landed at DNS_R_SERVFAIL without attaching
DNS_EDE_NOREACHABLEAUTH: when fctx_getaddresses() returned a non-success,
non-wait status, and when every candidate addrinfo was unusable
(over-quota or filtered) after a restart.

With the new TCP fallback actually firing, those paths are now reached
by serve-stale and similar scenarios in which the auth is unreachable.
Attach the EDE so SERVFAIL responses keep carrying the same operator
signal that the timeout-based exit paths already produce.

Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7

Open the stale-refresh-time window on any resolver failure

The TCP-fallback fix in the previous commits means a query that would
previously have timed out on UDP now actually escalates to TCP, and a
TCP-side failure surfaces a non-ISC_R_TIMEDOUT result code to
query_usestale(). The trigger for DNS_DBFIND_STALESTART was previously
narrowed to ISC_R_TIMEDOUT, so the stale-refresh-time window stopped
opening for those clients.

Broaden the condition to any failure that has already cleared the
upstream DUPLICATE/DROP filtering in query_usestale() — the spirit of
the window is "the resolver tried and could not get a fresh answer",
not "the resolver timed out specifically".

Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7

Force TCP after repeated UDP timeouts to the same authoritative

Make the decision in fctx_query() before the dispatch is bound so the
chosen transport and the DNS_FETCHOPT_TCP flag agree. The previous
location in resquery_send() ran after the UDP dispatch had already been
attached, so the flag flip had no effect on the wire.

Moving the decision earlier also means FCTX_ADDRINFO_NOEDNS0 servers,
previously exempt, now escalate to TCP too. TCP works regardless of
EDNS state, so this is the intended behaviour.

Assisted-by: Claude:claude-opus-4-7

Temporarily remove TCP fallback after UDP timeouts

The retry path in resquery_send() that flipped DNS_FETCHOPT_TCP on a
query whose dispatch had already been bound as UDP in fctx_query() had
no effect on the transport actually used, but did leave a stale TCP
bit visible to downstream consumers (dnstap framing, cookie checks,
the AUTHORITY-NS spoofability guard).

The ineffective code has been removed from resquery_send(). The
TCP fallback functionality will be corrected and restored in the next
commit.

Assisted-by: Claude:claude-opus-4-7

chg: usr: named could crash on concurrent TKEY DELETE for the same key

On a server configured with tkey-gssapi-keytab (or tkey-gssapi-credential),
an authenticated peer could crash named by sending two TKEY DELETE requests
for the same dynamic key in rapid succession. This has been fixed.

Closes #6001

Merge branch '6001-tsig-tkey-delete-uaf' into 'main'

See merge request isc-projects/bind9!12041

Fix use-after-free in concurrent dns_tsigkey_delete()

Two TSIG-authenticated TKEY DELETE queries for the same dynamic key,
arriving on different worker loops, could each enter
dns_tsigkey_delete() and cause over-decrementing the key refcount.

This has been fixed by making dns_tsigkey_delete() idempotent.

fix: usr: The resolver now removes other RRsets at the same name when caching a CNAME

When an RRset is in stale cache, and the authoritative server changes the record type to CNAME, the resolver fails to refresh the stale cache. This has been fixed.

Closes #5302

Merge branch '5302-serve-stale-cname-to-a' into 'main'

See merge request isc-projects/bind9!11758

When caching names, check for CNAME RRsets

CNAME and other record types cannot coexist. DNSSEC records are the
exceptions to this rule.

If the answer contains a name with a CNAME, remove existing RRsets at
the same name from the cache.

If the answer contains a name without a CNAME, remove the CNAME RRset
at the same name from the cache.