Michal Nowak [Thu, 28 May 2026 15:44:27 +0000 (17:44 +0200)]
[9.18] chg: test: Prioritize the 10 slowest system test scopes
Update PRIORITY_TESTS with the 10 longest-running test
scopes measured from CI (job 7468217). These get scheduled
first so that with --dist=loadscope they land on separate
workers instead of piling up at the end.
Also fix "serve-stale/" to "serve_stale/" to match the
actual directory name, and add a startup check that fails
if any PRIORITY_TESTS entry does not match an existing
directory.
Assisted-by: Claude:claude-opus-4-7
Backport of MR !12104
Merge branch 'backport-mnowak/prioritize-slow-system-tests-9.18' into 'bind-9.18'
Michal Nowak [Tue, 26 May 2026 16:40:13 +0000 (16:40 +0000)]
Prioritize the 10 slowest system test scopes
Update PRIORITY_TESTS with the 10 longest-running test
scopes measured from CI (job 7468217). These get scheduled
first so that with --dist=loadscope they land on separate
workers instead of piling up at the end.
Also fix "serve-stale/" to "serve_stale/" to match the
actual directory name, and add a startup check that fails
if any PRIORITY_TESTS entry does not match an existing
directory.
Nicki Křížek [Thu, 28 May 2026 14:52:56 +0000 (16:52 +0200)]
[9.18] chg: test: Improve pytest jinja2 templates
- Enable rendering ns-specific data in jinja2 templates using the `ns` varible.
- Add common zone/config snippets an `_common` templates.
- Allow jinja2 imports from `_common`.
- Improve the `_common/controls.conf.j2` snippet to render ns-specific IP rather than hardocded one.
Backport of MR !11805
Merge branch 'backport-nicki/pytest-template-improvements-9.18' into 'bind-9.18'
Nicki Křížek [Wed, 20 May 2026 14:34:02 +0000 (14:34 +0000)]
Restrict cross-test jinja2 includes to _common/
The previous loader was a FileSystemLoader rooted at $srcdir, which
allowed any system test to include any other test's templates -- a
wider scope than intended. Every existing cross-test include already
targets _common/, so make that the only path.
ChoiceLoader + PrefixLoader keeps the existing '_common/foo.j2' path
convention working without changes to call sites. The '_common/'
prefix is deliberately kept rather than dropping it by rooting the
FileSystemLoader at _common/ directly:
- It signals at the include site that the file is a shared
template, not a sibling of the current test; readers don't need
to know the loader configuration to understand where the file
lives.
- It prevents shadowing: a test-local 'controls.conf.j2' would
not collide with the shared one, and the unqualified name keeps
its test-local meaning.
- It makes the dependency greppable: 'grep -rl _common/'
identifies every test that consumes shared snippets.
Allow instantiating template dataclasses in jinja2 templates
In some cases, the template data might need to be set directly in the
jinja2 templates using `{% set %}`. Expose the template dataclasses to
the templates so we can use these existing classes, rather than creating
ad-hoc data containers.
Add a directory-specific nameserver data to templates
If a template is being rendered into a directory that represents a
nameserver (e.g. "ns1"), include a nameserver-specific information in
the data - variable called "ns" which has information about the
nameserver this file belongs to.
Ensure the "ns" variable is only exposed to the template when rendered,
without affecting the environment variables (always work with a copy of
the env_vars).
Extend the Nameserver to generate the default IPv4/IPv6 values, add NSX
values for the predefined nameservers (there are 11 of them, as per
bin/tests/system/ifconfig.sh.in max value). Add the missing ns11
fixture.
Extend the Zone to derive the zone filename by default, unless
specified.
Adjust the existing uses of these classes to utilize the simplified
defaults.
Michal Nowak [Thu, 21 May 2026 07:31:15 +0000 (07:31 +0000)]
Tolerate dnspython post-2038 timestamp overflow on 32-bit
dnspython's RRSIG.to_text() converts the signature inception/expiration
fields by calling time.gmtime(), which on 32-bit platforms raises
OverflowError for values past 2038-01-19 (INT32_MAX). Several DNSSEC
test fixtures use far-future expirations: the precomputed RRSIGs in
the dnssec test's rsasha1.example.db.in zone expire in 2093, ans4 of
the chain test hardcodes 2090, and ans10 of the dnssec test uses
2**32-1 (year 2106). Whenever a response carrying such an RRSIG is
formatted with str()/to_text() the overflow propagates out and either
fails the test (when triggered in isctest.query's debug logging) or
kills the asyncserver-based ans* server (when triggered in its
response logger), which in turn cascades into "Failed to stop
servers" teardown errors and SERVFAIL responses for subsequent tests.
Wrap the to_text() calls in isctest/query.py and the str(response)
call in asyncserver's _log_response() with try/except OverflowError,
falling back to a placeholder message. The conversions are only used
for debug logging, so losing the human-readable form there does not
affect what the tests actually validate.
Andoni Duarte [Thu, 21 May 2026 13:55:47 +0000 (13:55 +0000)]
[9.18] fix: doc: Remove 9.21-only release note from May 2026 release notes
Issue #5826 has two different fixes: one released in April 2026 that
applies to 9.20 and 9.18, and another released in May 2026 that applies
to 9.21. The 9.21 release note was mistakenly included in the release
notes for 9.20 and 9.18. This commit removes it.
Backport of MR !12067
Merge branch 'backport-andoni/fix-doc-duplicate-note-5826-9.18' into 'bind-9.18'
Remove 9.21-only release note from May 2026 release notes
Issue #5826 has two different fixes: one released in April 2026 that
applies to 9.20 and 9.18, and another released in May 2026 that applies
to 9.21. The 9.21 release note was mistakenly included in the release
notes for 9.20 and 9.18. This commit removes it.
Michał Kępień [Thu, 21 May 2026 13:14:29 +0000 (15:14 +0200)]
[9.18] fix: usr: Clear REDIRECT flag when it isn't needed
When `nxdomain-redirect` is in use, and a recursive query is used to get the redirected answer, a flag is set to distinguish it from a normal recursive response. Previously, that flag was left set afterward, which could trigger an assertion if a normal recursive query was sent later on behalf of the same client: for example, because the `filter-aaaa` plugin was in use. This has been fixed.
Closes #5936
Backport of MR !12073
Merge branch 'backport-5936-clear-redirect-flag-9.18' into 'bind-9.18'
Evan Hunt [Tue, 5 May 2026 00:05:11 +0000 (17:05 -0700)]
Clear REDIRECT flag when it isn't needed
The NS_QUERYATTR_REDIRECT flag is set when processing a recursive
NXDOMAIN redirection lookup, so that if that lookup also returns
NXDOMAIN we don't end up looping.
Previously, the flag was left active after use, but if the
same client triggered a subsequent recursive lookup (for example,
in the filter-aaaa plugin), then the wrong branch could be reached
in query_resume(), potentially leading to an assertion failure. This
has been fixed.
Michal Nowak [Thu, 21 May 2026 12:50:59 +0000 (14:50 +0200)]
[9.18] fix: dev: Validate nsec3hash arguments instead of relying on atoi()
The nsec3hash tool parsed its algorithm, flags, and iterations
arguments with atoi(), then range-checked the result. For values
that overflow int during digit-by-digit accumulation, atoi() is
undefined; in practice on musl libc the modular wrap leaves
n == 0, which silently passes the "iterations > 0xffffU" check.
On Alpine Linux this made nsec3hash succeed with iterations
treated as 0 for inputs like 4294967296 (2^32).
The latent bug only surfaced when the recent image rebuild pulled
in Hypothesis 6.152.9 (2026-05-19), which unified the distribution
used for bounded and unbounded integers() strategies. The new
smoother distribution explores the 2^32 boundary on unbounded
ranges like integers(min_value=65536); earlier versions did not
reach there, so test_nsec3hash_too_many_iterations only started
failing on Alpine after the image refresh.
Replace the three atoi() calls with isc_parse_uint8 /
isc_parse_uint16, which uniformly reject overflow, trailing
garbage, leading sign, and non-numeric input across libc
implementations. As a side effect, error messages now include
the offending argument and a specific reason ("out of range" vs
"not a valid number").
Assisted-by: Claude:claude-opus-4-7
Closes #6013
Backport of MR !12062
Merge branch 'backport-6013-nsec3hash-iterations-overflow-9.18' into 'bind-9.18'
Michal Nowak [Wed, 20 May 2026 17:58:41 +0000 (17:58 +0000)]
Validate nsec3hash arguments instead of relying on atoi()
The nsec3hash tool parsed its algorithm, flags, and iterations
arguments with atoi(), then range-checked the result. For values
that overflow int during digit-by-digit accumulation, atoi() is
undefined; in practice on musl libc the modular wrap leaves
n == 0, which silently passes the "iterations > 0xffffU" check.
On Alpine Linux this made nsec3hash succeed with iterations
treated as 0 for inputs like 4294967296 (2^32).
The latent bug only surfaced when the recent image rebuild pulled
in Hypothesis 6.152.9 (2026-05-19), which unified the distribution
used for bounded and unbounded integers() strategies. The new
smoother distribution explores the 2^32 boundary on unbounded
ranges like integers(min_value=65536); earlier versions did not
reach there, so test_nsec3hash_too_many_iterations only started
failing on Alpine after the image refresh.
Replace the three atoi() calls with isc_parse_uint8 /
isc_parse_uint16, which uniformly reject overflow, trailing
garbage, leading sign, and non-numeric input across libc
implementations. As a side effect, error messages now include
the offending argument and a specific reason ("out of range" vs
"not a valid number").
Michał Kępień [Thu, 21 May 2026 09:52:56 +0000 (11:52 +0200)]
Follow common naming and coding conventions
Make the handlers defined in bin/tests/system/resend_loop/ans3/ans.py
follow canonical naming conventions used in other system tests. Keep
all server initialization code in the main() function.
Michał Kępień [Thu, 21 May 2026 09:52:56 +0000 (11:52 +0200)]
Turn _get_cookie() into a method
Since the _get_cookie() function is only used by the CookieHandler
class, make the former a method of the latter to keep related logic
close in the source code.
Michał Kępień [Thu, 21 May 2026 09:52:56 +0000 (11:52 +0200)]
Tweak the _get_cookie() method
The "len(cookie.server) == 0" condition is superfluous for the
"resend_loop" system test, so remove it. Add a return type annotation
to the _get_cookie() function.
Michał Kępień [Thu, 21 May 2026 09:52:56 +0000 (11:52 +0200)]
Fix flawed response logic for COOKIE-less queries
The "yield" keyword does not cause a function to return. By design,
get_responses() may yield multiple DNS responses in a single call. As
currently implemented, CookieHandler.get_responses() sends two responses
to each client query that does not contain a COOKIE option. Make the
logic in that method consistent with code comments by only sending one
response to every query - either SERVFAIL or BADCOOKIE, never both.
Michał Kępień [Thu, 21 May 2026 09:52:56 +0000 (11:52 +0200)]
Drop redundant uses of authoritative=True
The ans3 custom server instance is created with default_aa=True. Do not
pass the authoritative=True keyword argument to the DnsResponseSend
constructor in CookieHandler.get_responses() as it is redundant.
Michał Kępień [Thu, 21 May 2026 09:52:56 +0000 (11:52 +0200)]
Drop unnecessary qctx.prepare_new_response() call
The ans3 custom server does not have any zones defined, so the responses
passed to its handlers by core isctest.asyncserver code are guaranteed
to be empty. Remove a call to qctx.prepare_new_response() from
CookieHandler.get_responses() as it is redundant.
Michał Kępień [Thu, 21 May 2026 09:36:49 +0000 (11:36 +0200)]
[9.18] chg: ci: Various autorebase improvements
- Rewrite cherry-pick references during autorebases
- Fix autorebase error reporting
- Limit post-push pipelines for autorebased branches
- Only autorebase when there is anything to rebase
- Conflate missing commit reference notifications
- Support autorebasing backported security MRs
Backport of MR !12024
Merge branch 'backport-michal/autorebase-improvements-9.18' into 'bind-9.18'
Michał Kępień [Thu, 21 May 2026 09:13:30 +0000 (11:13 +0200)]
Support autorebasing backported security MRs
Autorebasing a backported security fix enables convenient refreshing of
cherry-pick references, which makes it trivial for developers to satisfy
Danger rules just before the merge request is merged. Add a manual CI
job that is only created for backported merge requests targeting
security-* branches.
Michał Kępień [Thu, 21 May 2026 09:13:30 +0000 (11:13 +0200)]
Conflate missing commit reference notifications
Instead of creating a separate (potentially lengthy) Danger notification
for every missing commit reference in a backport, produce a single
notification with a list of all unreferenced commit hashes. This makes
Danger output more concise while retaining all the relevant feedback for
the developer.
Michał Kępień [Thu, 21 May 2026 09:13:30 +0000 (11:13 +0200)]
Only autorebase when there is anything to rebase
In an optimistic future, security-* branches will become empty, at least
intermittently. When that happens, there will be nothing left to rebase
on those branches, so when something gets merged into their base
branches, an autorebase will effectively be a fast-forward. While the
existing autorebase logic would handle such a case perfectly fine, it is
prudent to avoid creating a test pipeline after pushing such a
fast-forward update as the code revision getting pushed will have
already been tested by other pipelines. However, the push should still
happen as non-empty downstream autorebased branches may exist and those
will still need to be rebased. Achieve both of these objectives by
checking early whether there is anything to rebase and pushing the
fast-forwarded version of the branch without setting the AUTOREBASE CI
variable if there is not.
Michał Kępień [Thu, 21 May 2026 09:13:30 +0000 (11:13 +0200)]
Limit post-push pipelines for autorebased branches
Current CI job triggering rules cause a full pipeline to be started
after every push to security-* branches. In this context, "push" means
"branch update", which covers both "git push" invocations and merging a
merge request. Meanwhile, running a test pipeline is only desired after
a rebase; if a branch is fast-forwarded, it means that a merge request
has been merged into it and a pipeline should have already been run for
that merge request itself. Limit resource use by only triggering
pipelines for security-* branches when they are pushed to with a "magic"
CI variable that is only set in autorebase jobs. Leave all the other
triggering rules (for scheduled/manual pipelines) intact.
Michał Kępień [Thu, 21 May 2026 09:13:30 +0000 (11:13 +0200)]
Fix autorebase error reporting
The logic used for detecting the commit breaking an autorebase does not
work correctly if the offending commit is not the first one applied
during the "reverse rebase". Fix by using REBASE_HEAD instead of
processing the output of "git status" in a convoluted way.
Furthermore, the approach used for identifying the first offending merge
request in the case of a successful autorebase followed by a failed
build only works correctly if the base branch is not autorebased itself.
Since a solution that would work correctly for a branch autorebased on
top of a branch that only moves forward does not work correctly for a
branch autorebased on top of another autorebased branch and vice versa,
accurately identifying the most likely culprit after a successful
autorebase is a very complicated and brittle task. Since reporting no
details at all is arguably better than reporting false details, only
produce a minimal error notification if the build fails after a
successful autorebase.
When an authoritative server failed to respond to two consecutive
UDP queries, named marked the next retry as TCP but still sent it
over UDP, producing misleading dnstap records. The ineffective
retry path has been removed; a corrected TCP fallback will be
restored in future BIND 9 versions.
Closes #5529
Backport of MR !12022
Merge branch 'backport-5529-fix-tcp-fallback-after-udp-timeouts-9.18' into 'bind-9.18'
Ondřej Surý [Thu, 14 May 2026 08:04:20 +0000 (10:04 +0200)]
Temporarily remove TCP fallback after UDP timeouts
The retry path in resquery_send() that flipped DNS_FETCHOPT_TCP on a
query whose dispatch had already been bound as UDP in fctx_query() had
no effect on the transport actually used, but did leave a stale TCP
bit visible to downstream consumers (dnstap framing, cookie checks,
the AUTHORITY-NS spoofability guard).
The ineffective code has been removed from resquery_send(). The
TCP fallback functionality will be corrected and restored in the next
commit.
Ondřej Surý [Sat, 16 May 2026 13:08:29 +0000 (15:08 +0200)]
[9.18] new: dev: Enable PR-Agent reviews on merge requests
Adds a CI job that runs PR-Agent against each merge request opened from the canonical repository, posting an automated review and code-improvement suggestions as MR comments. The job is gated to same-project source branches so the OpenAI key and personal access token are not exposed to fork pipelines.
Backport of MR!12032, MR!12033 and MR!12035
Merge branch 'ondrej/add-pr-agent-9.18' into 'bind-9.18'
Ondřej Surý [Sat, 16 May 2026 06:23:50 +0000 (08:23 +0200)]
Add PR-Agent job to GitLab CI for merge-request review
Run PR-Agent's `review` and `improve` commands against each merge
request from the canonical repository, posting an automated review
and code-improvement suggestions as MR comments. The rule restricts
the job to MRs whose source project matches CI_PROJECT_PATH so the
OpenAI key and GitLab personal access token are never exposed to
fork pipelines.
Ondřej Surý [Fri, 15 May 2026 07:51:18 +0000 (09:51 +0200)]
[9.18] fix: test: Fix flaky reclimit test
The max-types-per-name cache eviction tests were flaky because two test steps were missing a sleep between queries, causing TTL-based cache verification to fail when both queries completed within the same second.
Backport of MR !11782
Merge branch 'backport-ondrej/fix-flaky-reclimit-9.18' into 'bind-9.18'
The cache verification in steps 11 and 15 checks that the TTL has
decreased from its initial value to confirm the response was served
from cache, but the sleep between the two queries was missing. Both
queries could complete within the same second, leaving the TTL
unchanged and causing the test to incorrectly conclude the entry was
not cached.
Ondřej Surý [Fri, 15 May 2026 07:50:52 +0000 (09:50 +0200)]
[9.18] chg: usr: Fall back to TCP on a UDP response with a mismatched query id
BIND used to wait silently for the correct DNS message id on a UDP fetch
even after receiving a response from the expected server with the wrong
id, leaving room for off-path spoofing attempts to keep guessing within
that window. The resolver now retries the fetch over TCP on the first
such response, and a new MismatchTCP statistics counter tracks how
often the fallback fires.
Closes #5449
Backport of MR !12023
Merge branch 'backport-5449-immediate-tcp-fallback-on-id-mismatch-9.18' into 'bind-9.18'
Ondřej Surý [Thu, 14 May 2026 10:20:19 +0000 (12:20 +0200)]
Switch UDP fetches to TCP on the first response with a wrong query id
Until now, the dispatcher silently dropped UDP responses from the
expected peer that carried the wrong DNS message id and kept listening
for the correct id to arrive within the read timeout. An off-path
attacker who knows the destination address and source port of an
outgoing fetch could exploit that quiet retry window to flood the
resolver with guessed responses; with a gigabit link the per-query
success probability grows linearly with the number of guesses that
arrive before the legitimate answer or the timeout.
Treat any such mismatch as a possible spoofing attempt and let the
resolver immediately retry the same query over TCP, the same control
path the truncation handler already uses.
Add a resolver statistics counter - exposed as 'queries retried over TCP
after a response with mismatched query id' in rndc stats and
'MismatchTCP' in the statistics channel
The global RUNNER_SCRIPT_TIMEOUT: 55m in the parent pipeline was being
forwarded to the stress and tsan:stress child pipelines, where forwarded
yaml variables outrank job-level variables. That caused stress jobs with
BIND_STRESS_TESTS_RUN_TIME >= 60 to be killed at 55 minutes, regardless
of the per-job RUNNER_SCRIPT_TIMEOUT set in the generated child config.
Set forward:yaml_variables: false on both trigger jobs; the generated
configs already declare every variable they need.
Assisted-by: Claude:claude-opus-4-7
Backport of MR !12012
Merge branch 'backport-mnowak/fix-stress-test-script-timeout-9.18' into 'bind-9.18'
Michal Nowak [Wed, 13 May 2026 09:44:26 +0000 (11:44 +0200)]
Selectively inherit yaml vars in stress trigger jobs
The parent's global RUNNER_SCRIPT_TIMEOUT: 55m was reaching the stress
and tsan:stress child pipelines via inherited yaml variables, where
inherited values outrank the child's job-level variables. That caused
stress jobs with BIND_STRESS_TESTS_RUN_TIME >= 60 to be killed at 55
minutes, regardless of the per-job RUNNER_SCRIPT_TIMEOUT set in the
generated child config.
Use inherit:variables with a positive list on both trigger jobs:
inherit only CI_REGISTRY_IMAGE so the parent's registry override
(needed for image pulls in the child) flows through, while keeping
RUNNER_SCRIPT_TIMEOUT (and other globals) out of the child pipeline's
variable scope. The per-job RUNNER_SCRIPT_TIMEOUT values set by the
generated child config now take effect.
Michal Nowak [Wed, 25 Mar 2026 12:31:49 +0000 (13:31 +0100)]
Set RUNNER_SCRIPT_TIMEOUTs
Sometimes jobs can get stuck and be terminated by GitLab, leaving us
without artefacts that could contain useful information about why the
job got stuck.
Mark Andrews [Wed, 17 Aug 2022 01:13:41 +0000 (11:13 +1000)]
tsiggss: regenerate kerberos credentials
The existing set of kerberos credential used deprecated algorithms
which are not supported by some implementations in FIPS mode.
Regenerate the saved credentials using more modern algorithms.
Added tsiggss/krb/setup.sh which sets up a test KDC with the required
principals for the system test to work. The tsiggss system test
needs to be run once with this active and KRB5_CONFIG appropriately.
set. See tsiggss/tests.sh for an example of how to do this.
Michał Kępień [Mon, 11 May 2026 15:46:55 +0000 (17:46 +0200)]
[9.18] chg: ci: Add commit link and diff to RPM build job logs
The output of update_rpms.py is terse, making it difficult to verify its
actions. Add a commit link and "git show" output to the log of every CI
job running the update_rpms.py script in "build" mode to facilitate
double-checking its actions.
Backport of MR !11828
Merge branch 'backport-michal/add-commit-link-and-diff-to-rpm-build-job-logs-9.18' into 'bind-9.18'
Michał Kępień [Mon, 11 May 2026 15:41:50 +0000 (17:41 +0200)]
Add commit link and diff to RPM build job logs
The output of update_rpms.py is terse, making it difficult to verify its
actions. Add a commit link and "git show" output to the log of every CI
job running the update_rpms.py script in "build" mode to facilitate
double-checking its actions.
Michał Kępień [Mon, 11 May 2026 14:27:39 +0000 (16:27 +0200)]
[9.18] fix: ci: Increase GIT_DEPTH for the "assign-milestones" job
Cloning tags with the default GIT_DEPTH of 1 prevents the milestone
assignment script from identifying any merge requests that are included
in a given release. Fix by increasing GIT_DEPTH to an arbitrary value
that is high enough for practical purposes.
The GIT_DEPTH CI variable defaults to 1 for all jobs through the
top-level "variables" key. Explicitly setting it to 1 in job
definitions is unnecessary and may cause confusion. Remove these
redundant assignments.
Backport of MR !11996
Merge branch 'backport-michal/fix-assign-milestones-job-9.18' into 'bind-9.18'
Michał Kępień [Mon, 11 May 2026 14:07:47 +0000 (16:07 +0200)]
Remove redundant "GIT_DEPTH: 1" assignments
The GIT_DEPTH CI variable defaults to 1 for all jobs through the
top-level "variables" key. Explicitly setting it to 1 in job
definitions is unnecessary and may cause confusion. Remove these
redundant assignments.
Michał Kępień [Mon, 11 May 2026 14:07:47 +0000 (16:07 +0200)]
Increase GIT_DEPTH for the "assign-milestones" job
Cloning tags with the default GIT_DEPTH of 1 prevents the milestone
assignment script from identifying any merge requests that are included
in a given release. Fix by increasing GIT_DEPTH to an arbitrary value
that is high enough for practical purposes.
Michał Kępień [Mon, 11 May 2026 08:14:13 +0000 (10:14 +0200)]
[9.18] fix: ci: Fix triggering rules for the "publish-cleanup" job
The "publish-cleanup" tag pipeline job is currently created for all
security releases, including BIND -S releases, but it depends on the
"publish" job, which is only created for open source releases. This
breaks CI configuration for BIND -S tags, preventing pipelines from
getting created for such tags altogether. Fix by only creating the
"publish-cleanup" job in tag pipelines for open source security
releases.
Backport of MR !11992
Merge branch 'backport-michal/fix-triggering-rules-for-the-publish-cleanup-job-9.18' into 'bind-9.18'
Michał Kępień [Mon, 11 May 2026 08:07:38 +0000 (10:07 +0200)]
Fix triggering rules for the "publish-cleanup" job
The "publish-cleanup" tag pipeline job is currently created for all
security releases, including BIND -S releases, but it depends on the
"publish" job, which is only created for open source releases. This
breaks CI configuration for BIND -S tags, preventing pipelines from
getting created for such tags altogether. Fix by only creating the
"publish-cleanup" job in tag pipelines for open source security
releases.
Michał Kępień [Thu, 7 May 2026 16:08:34 +0000 (18:08 +0200)]
[9.18] chg: ci: Mark merged security fixes as "Not released yet"
Adjust the triggering rules for the "merged-metadata" CI job so that
merge requests merged into security-* branches are automatically
assigned to the "Not released yet" milestone, just like merge requests
targeting public branches. This enables merge requests containing
security fixes to be correctly processed by release automation scripts.
Backport of MR !11984
Merge branch 'backport-pspacek/extend-not-released-yet-milestone-9.18' into 'bind-9.18'
Petr Špaček [Tue, 5 May 2026 13:04:36 +0000 (15:04 +0200)]
Mark merged security fixes as "Not released yet"
Adjust the triggering rules for the "merged-metadata" CI job so that
merge requests merged into security-* branches are automatically
assigned to the "Not released yet" milestone, just like merge requests
targeting public branches. This enables merge requests containing
security fixes to be correctly processed by release automation scripts.
Michał Kępień [Thu, 7 May 2026 15:55:32 +0000 (17:55 +0200)]
[9.18] chg: ci: Enable automatic backports for security fixes
Ensure the "backports" CI job is created when new changes are merged
into security-* branches. This enables using backport automation for
security fixes.
Backport of MR !11938
Merge branch 'backport-michal/extend-automatic-backports-9.18' into 'bind-9.18'
Michał Kępień [Thu, 7 May 2026 15:45:35 +0000 (17:45 +0200)]
Enable automatic backports for security fixes
Ensure the "backports" CI job is created when new changes are merged
into security-* branches. This enables using backport automation for
security fixes.
Ondřej Surý [Tue, 5 May 2026 13:20:43 +0000 (15:20 +0200)]
[9.18] chg: usr: Fix CPU spikes and slow queries when cache approaches memory limit
When the cache grew close to the configured max-cache-size, every subsequent
entry triggered all worker threads to run cache cleanup at once, causing CPU
spikes and a drop in query throughput. Cleanup is now spread probabilistically
across inserts as memory approaches the limit, so the work is distributed evenly
instead of piling up at the threshold.
Backport of MR !1002
Merge branch '5891-improve-overmem-cleaning-9.18' into 'security-bind-9.18'
Ondřej Surý [Wed, 6 May 2026 08:12:35 +0000 (10:12 +0200)]
Pass empty string instead of NULL to ns_client_dumpmessage()
The two new call sites added by the CLASS-validation work passed NULL
as the reason, but ns_client_dumpmessage() bails out early on a NULL
reason — so the message dump never happened. The intent was to dump
the message and let the follow-up ns_client_log() carry the reason
text, so pass "" to suppress the prefix without short-circuiting the
dump.
Evan Hunt [Mon, 4 May 2026 22:51:22 +0000 (22:51 +0000)]
[9.18] [CVE-2026-5946] sec: usr: Disable recursion, UPDATE, and NOTIFY for non-IN views
Recursion, dynamic updates (UPDATE), and zone change notifications
(NOTIFY) are now disabled for views with a class other than IN
(such as CHAOS or HESIOD); authoritative service for non-IN zones
(e.g. version.bind in class CHAOS) continues to work as before.
Servers configured with recursion yes in a non-IN view will log a
warning at startup, and named-checkconf flags the same condition.
UPDATE and NOTIFY messages that specify the meta-classes ANY or NONE
in the question section are now rejected with FORMERR.
This addresses a set of closely related security issues collectively
identified as CVE-2026-5946. ISC would like to thank Mcsky23 for
bringing these issues to our attention.
Backport of https://gitlab.isc.org/isc-private/bind9/-/merge_requests/936
Merge branch 'each-security-disable-chaos-recursion-security-bind-9.18' into 'security-bind-9.18'
Replace the hysteretic hi_water/lo_water switch with a stochastic
check: always false below lo_water, always true at or above hi_water,
linearly ramped probability in between. This spreads cache cleaning
across many inserts instead of triggering a thundering herd once the
hi_water mark is crossed (which causes every addrdataset to enter the
LRU purge path simultaneously and serializes lookups behind the node
write locks).
The is_overmem atomic and its stores are no longer needed and are
removed. The existing tests that asserted specific hysteretic state
transitions are simplified to check only the deterministic boundaries.
Fixed a memory leak where each GSS-API TKEY negotiation leaked a security context inside the GSS library. An unauthenticated attacker could exhaust server memory by sending repeated TKEY queries to a server with tkey-gssapi-keytab configured. The leaked memory was allocated by the GSS library, bypassing BIND's memory accounting.
Multi-round GSS-API negotiation (GSS_S_CONTINUE_NEEDED) is now rejected, as BIND never supported it correctly and Kerberos/SPNEGO completes in a single round.
Also implemented missing RFC 3645 requirement: the client now verifies that mutual authentication and integrity flags are granted by the GSS-API mechanism (Section 3.1.1).
Closes: https://gitlab.isc.org/isc-projects/bind9/-/issues/5752
Backport of !965
Merge branch 'backport-5752-fix-memory-leak-in-TKEY-negotiation-9.18' into 'security-bind-9.18'
Colin Vidal [Thu, 30 Apr 2026 18:49:35 +0000 (20:49 +0200)]
[9.18] [CVE-2026-3592] sec: usr: Limit resolver server list size
When resolving a domain with many nameservers that share overlapping IP addresses (e.g., 10 NS records all pointing at the same set of addresses), BIND could previously waste time querying duplicate addresses and build up excessively large server lists. Deduplicate addresses in the resolver's server list so that each unique IP is only queried once per resolution attempt, regardless of how many NS records point to it and cap the number of addresses stored per nameserver name to 6 (combined A and AAAA), preventing memory and CPU overhead from domains with unusually large NS/glue sets.
Closes isc-projects/bind9#5641
Backport of !909
Merge branch 'backport-5641-selfpointedglue-9.18' into 'security-bind-9.18'
Fix output token and GSS context leaks in TKEY/GSS-API error paths
In dst_gssapi_acceptctx(), rename outtoken to outtokenp (matching BIND
convention for output pointer parameters) and free the allocated output
token buffer on error in the cleanup path.
In process_gsstkey(), route the empty-principal error path through
cleanup via CLEANUP() instead of returning early, so that the output
token, GSS context, and TSIG key are all freed consistently by the
existing cleanup block.
Evan Hunt [Mon, 9 Mar 2026 04:50:04 +0000 (15:50 +1100)]
Test server behavior when sending various UPDATE requests
Send update messages for zones with CLASS0, ANY and NONE. The class
ANY UPDATE also attempts to delete a KX record in an existing IN
class zone to trigger a REQUIRE.
A bug during bad server handling could cause the resolver to enter an infinite loop, continuously sending queries to an upstream server with no exit condition, until the resolver query timeout was hit. This has been fixed.
ISC would like to thank Billy Baraja (BielraX) for bringing this issue to our attention.
Closes isc-projects/bind9#5804
Backport of !985
Merge branch 'backport-5804-incr-query-counters-9.18' into 'security-bind-9.18'
Colin Vidal [Thu, 30 Apr 2026 18:02:47 +0000 (19:02 +0100)]
Fix `resend_loop` system test
Commit `c78016ff91ed33221831b4723108d69639430913` backported asyncserver
features to 9.18 branches, but the `resend_loop` test was still using
the previous API to install handlers (passing a list of handlers rather
than a varags). This is now fixed.
Ondřej Surý [Fri, 20 Mar 2026 07:43:28 +0000 (08:43 +0100)]
Add regression test for GSS-API context leak via TKEY CONTINUE
Send crafted SPNEGO NegTokenInit tokens that propose the krb5
mechanism without a mechToken. This causes gss_accept_sec_context()
to return GSS_S_CONTINUE_NEEDED, which on unfixed code leaks the
GSS context handle (~520 bytes per query).
The test verifies that the server rejects the negotiation (TKEY
error != 0, no continuation token) rather than returning a CONTINUE
response (error=0 with output token).