Michał Kępień [Fri, 29 Mar 2024 07:27:49 +0000 (08:27 +0100)]
Restore consistency in YAML anchor names
Commit 42b75e7759d30618ae038c96ca99630d79dd06b9 added "pipelines" to CI
job trigger lists without updating the names of the YAML anchors
containing those lists accordingly. Update YAML anchor names so that
they are consistent with their own contents.
Michał Kępień [Fri, 29 Mar 2024 07:27:49 +0000 (08:27 +0100)]
Do not check CHANGES in pre-release pipelines
Since pre-release testing is usually carried out for branches in which
CHANGES entries are intentionally malformed to prevent entry numbering
conflicts down the road, do not run the "changes" GitLab CI job in
pipelines that are triggered by a parent pipeline (which can currently
only be a pre-release testing pipeline) to prevent triggering job
failures that would be meaningless anyway.
Michał Kępień [Fri, 29 Mar 2024 07:27:49 +0000 (08:27 +0100)]
Extract CHANGES checks to a separate GitLab CI job
Checking the contents of the CHANGES file currently requires invoking
multiple shell scripts. These invocations are conflated with those for
other test scripts in the "misc" GitLab CI job. Extract the commands
checking the contents of the CHANGES file to a separate GitLab CI job,
"changes", to improve readability. Remove similar checks for the
CHANGES.SE file altogether as they are only relevant for BIND -S and
therefore should not be present in an open source branch.
Michał Kępień [Thu, 28 Mar 2024 17:56:35 +0000 (18:56 +0100)]
Fix check interaction in the "serve-stale" test
Commit f351c210344c4ce0b69a307ae8e0e22efd107097 modified the
"serve-stale" system test by adding the ns3/named9.conf.in configuration
file and making the ns3 named instance load that file near the end of
the test. However, ns3/named9.conf.in changes the
stale-answer-client-timeout setting to a very low value, which affects
all subsequent checks in tests.sh (rather than just the check that needs
the low value to be set) and may cause false positives. Fix by
reloading configuration from ns3/named8.conf.in as soon as the check
using a very low stale-answer-client-timeout value is finished.
Mark Andrews [Thu, 12 Oct 2023 04:25:57 +0000 (15:25 +1100)]
Check dns64 + server-stale short timeout
Check that named correctly returns a synthesized DNS64 answer when the
server stale timer triggers for the A lookup. Use a small value for
stale-answer-client-timeout (2ms) and delay the A response by 1 second.
Mark Andrews [Mon, 9 Oct 2023 23:54:16 +0000 (10:54 +1100)]
Checking nxdomain-redirect against built-in RFC-1918 zone
Check that RFC 1918 leak detection does not trigger an assertion
when nxdomain redirection is enabled in the server but not for the
RFC 1918 reverse namespace.
Tom Krizek [Fri, 12 Jan 2024 14:03:53 +0000 (15:03 +0100)]
Export variable in resolver system test
Variable assignment when calling subroutines might not be portable.
Notably, it doesn't work with FreeBSD shell, where the value of HOME
would be ignored in this case.
Since the commands are already executed in a subshell, export the HOME
variable to ensure it is properly handled in all shells.
Michał Kępień [Thu, 21 Mar 2024 11:29:21 +0000 (12:29 +0100)]
Add "pipelines" to CI job trigger lists
To enable GitLab CI jobs in other projects to trigger pipelines in the
BIND 9 project using their CI_JOB_TOKEN, add "pipelines" to the relevant
GitLab CI job trigger lists.
Michał Kępień [Thu, 21 Mar 2024 05:47:29 +0000 (06:47 +0100)]
Work around a TSAN issue with newer kernels
The ThreadSanitizer version currently available from Fedora 39
repositories is unable to cope with very high ASLR entropy, which is the
default in some recent Linux distributions [1]. This causes all
TSAN-enabled builds to fail on the affected systems with an error like:
Matthijs Mekking [Mon, 11 Mar 2024 10:52:03 +0000 (11:52 +0100)]
Test secure chain that includes inactive KSK
Add a regression test case for the scenario where a secure chain of
trust includes an inactive KSK, that is a KSK that is not signing the
DNSKEY RRset.
Michał Kępień [Thu, 7 Mar 2024 08:42:38 +0000 (09:42 +0100)]
Account for changes to struct dns_rbtnode
Commit eba7fb5f9f4925bbfd0d85847117847586b1ee9e modified the definition
of struct dns_rbtnode. Doing that changes the layout of map-format zone
files. Bump MAPAPI and update the offsets used in map-format zone file
checks in the "masterformat" system test, as these changes were
inadvertently omitted from the aforementioned change.
Michał Kępień [Thu, 7 Mar 2024 08:42:38 +0000 (09:42 +0100)]
Account for changes to struct dns_rbtnode
Commit 540a5b5a2c82170acc5c08d2c2ef74a700c7236f modified the definition
of struct dns_rbtnode. Doing that changes the layout of map-format zone
files. Bump MAPAPI and update the offsets used in map-format zone file
checks in the "masterformat" system test, as these changes were
inadvertently omitted from the aforementioned change.
Ondřej Surý [Wed, 6 Mar 2024 12:26:04 +0000 (13:26 +0100)]
Move the task creation into cache_create_db()
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
Ondřej Surý [Fri, 1 Mar 2024 11:43:15 +0000 (12:43 +0100)]
Create a second pruning task for rbtdb with unlimited quantum
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized). However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.
To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.
This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.
Ondřej Surý [Mon, 4 Mar 2024 06:34:34 +0000 (07:34 +0100)]
Restore the parent cleaning logic in prune_tree()
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.
Update code comments so that they more precisely describe what the
relevant bits of code actually do.
Ondřej Surý [Wed, 6 Mar 2024 12:26:04 +0000 (13:26 +0100)]
Move the task creation into cache_create_db()
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
Ondřej Surý [Fri, 1 Mar 2024 11:43:15 +0000 (12:43 +0100)]
Create a second pruning task for rbtdb with unlimited quantum
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized). However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.
To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.
This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.
Ondřej Surý [Mon, 4 Mar 2024 06:34:34 +0000 (07:34 +0100)]
Restore the parent cleaning logic in prune_tree()
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.
Update code comments so that they more precisely describe what the
relevant bits of code actually do.
Michał Kępień [Sat, 2 Mar 2024 05:36:37 +0000 (06:36 +0100)]
Check the prunelink member of the correct node
Commit 37101c7c8abbacaf07c30d5094bc6880cf4f7ca0 checks the prunelink
member of the node that was just pruned, not its parent node that was
intended to be examined. Fix by checking the prunelink member of the
parent node, so that adding the latter to its relevant prunenodes list
twice is properly guarded against.
Michał Kępień [Sat, 2 Mar 2024 05:36:37 +0000 (06:36 +0100)]
Check the prunelink member of the correct node
Commit 4b6fc97af6f936616a12e733b14ffc450af6df87 checks the prunelink
member of the node that was just pruned, not its parent node that was
intended to be examined. Fix by checking the prunelink member of the
parent node, so that adding the latter to its relevant prunenodes list
twice is properly guarded against.
Evan Hunt [Tue, 6 Feb 2024 21:33:21 +0000 (13:33 -0800)]
move RRL broken-config check to checkconf
the RRL test included a test case that tried to start named with
a broken configuration. the same error could be found with
named-checkconf, so it should have been tested in the checkconf
system test.
Michał Kępień [Fri, 1 Mar 2024 17:12:37 +0000 (18:12 +0100)]
Do not re-add a node to the same prunenodes list
If a node cleaned up by prune_tree() happens to belong to the same node
bucket as its parent, the latter is directly appended to the prunenodes
list currently processed by prune_tree(). However, the relevant code
branch does not account for the fact that the parent might already be on
the list it is trying to append it to. Fix by only calling
ISC_LIST_APPEND() for parent nodes not yet added to their relevant
prunenodes list.
Michał Kępień [Fri, 1 Mar 2024 17:12:37 +0000 (18:12 +0100)]
Do not re-add a node to the same prunenodes list
If a node cleaned up by prune_tree() happens to belong to the same node
bucket as its parent, the latter is directly appended to the prunenodes
list currently processed by prune_tree(). However, the relevant code
branch does not account for the fact that the parent might already be on
the list it is trying to append it to. Fix by only calling
ISC_LIST_APPEND() for parent nodes not yet added to their relevant
prunenodes list.
Michał Kępień [Thu, 29 Feb 2024 16:38:52 +0000 (17:38 +0100)]
Gracefully handle resending a node to prune_tree()
Commit 801e888d03e0ae34c5ecf00385defa77844f4023 made the prune_tree()
function use send_to_prune_tree() for triggering pruning of deleted leaf
nodes' parents. This enabled the following sequence of events to
happen:
1. Node A, which is a leaf node, is passed to send_to_prune_tree() and
its pruning is queued.
2. Node B is added to the RBTDB as a child of node A before the latter
gets pruned.
3. Node B, which is now a leaf node itself (and is likely to belong to
a different node bucket than node A), is passed to
send_to_prune_tree() and its pruning gets queued.
4. Node B gets pruned. Its parent, node A, now becomes a leaf again
and therefore the prune_tree() call that handled node B calls
send_to_prune_tree() for node A.
5. Since node A was already queued for pruning in step 1 (but not yet
pruned), the INSIST(!ISC_LINK_LINKED(node, prunelink)); assertion
fails for node A in send_to_prune_tree().
The above sequence of events is not a sign of pathological behavior.
Replace the assertion check with a conditional early return from
send_to_prune_tree().
Michał Kępień [Thu, 29 Feb 2024 16:38:52 +0000 (17:38 +0100)]
Gracefully handle resending a node to prune_tree()
Commit 2df147cb1264b30c7f26c1d75310a010615687bc made the prune_tree()
function use send_to_prune_tree() for triggering pruning of deleted leaf
nodes' parents. This enabled the following sequence of events to
happen:
1. Node A, which is a leaf node, is passed to send_to_prune_tree() and
its pruning is queued.
2. Node B is added to the RBTDB as a child of node A before the latter
gets pruned.
3. Node B, which is now a leaf node itself (and is likely to belong to
a different node bucket than node A), is passed to
send_to_prune_tree() and its pruning gets queued.
4. Node B gets pruned. Its parent, node A, now becomes a leaf again
and therefore the prune_tree() call that handled node B calls
send_to_prune_tree() for node A.
5. Since node A was already queued for pruning in step 1 (but not yet
pruned), the INSIST(!ISC_LINK_LINKED(node, prunelink)); assertion
fails for node A in send_to_prune_tree().
The above sequence of events is not a sign of pathological behavior.
Replace the assertion check with a conditional early return from
send_to_prune_tree().
Ondřej Surý [Tue, 20 Feb 2024 07:50:58 +0000 (08:50 +0100)]
Make the TTL-based cleaning more aggressive
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet. Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.
Ondřej Surý [Tue, 20 Feb 2024 07:50:58 +0000 (08:50 +0100)]
Remove expired rdataset headers from the heap
It was discovered that an expired header could sit on top of the heap
a little longer than desireable. Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.
Ondřej Surý [Tue, 20 Feb 2024 07:50:58 +0000 (08:50 +0100)]
Make the TTL-based cleaning more aggressive
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet. Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.
Ondřej Surý [Tue, 20 Feb 2024 07:50:58 +0000 (08:50 +0100)]
Remove expired rdataset headers from the heap
It was discovered that an expired header could sit on top of the heap
a little longer than desireable. Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.
Instead of issuing a separate isc_task_send() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_task_send() call if
pruning has not yet been triggered by another RBTDB node.
The extra queuing overhead eliminated by this change could be remotely
exploited to cause excessive memory use.
However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.
This commit makes the prunenodes list bucketed, adds a quantum of 10
items per prune_tree() run, and simplifies parent node cleaning in the
prune_tree() logic.
Instead of juggling node locks in a cycle, only clean up the node
currently being pruned and queue its parent (if it is also eligible) for
pruning in the same way (by sending an event).
This simplifies the code and also spreads the pruning load across more
task loop ticks, which is better for lock contention as less things run
in a tight loop.
Instead of issuing a separate isc_task_send() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_task_send() call if
pruning has not yet been triggered by another RBTDB node.
The extra queuing overhead eliminated by this change could be remotely
exploited to cause excessive memory use.
However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.
This commit makes the prunenodes list bucketed, adds a quantum of 10
items per prune_tree() run, and simplifies parent node cleaning in the
prune_tree() logic.
Instead of juggling node locks in a cycle, only clean up the node
currently being pruned and queue its parent (if it is also eligible) for
pruning in the same way (by sending an event).
This simplifies the code and also spreads the pruning load across more
task loop ticks, which is better for lock contention as less things run
in a tight loop.
Mark Andrews [Thu, 22 Feb 2024 23:12:47 +0000 (10:12 +1100)]
Do not use header_prev in expire_lru_headers
dns__cacherbt_expireheader can unlink / free header_prev underneath
it. Use ISC_LIST_TAIL after calling dns__cacherbt_expireheader
instead to get the next pointer to be processed.
Michał Kępień [Wed, 14 Feb 2024 13:49:49 +0000 (14:49 +0100)]
Mention CVE-2023-50868 in CHANGES entry 6322
Since CVE-2023-50868 does not have a dedicated fix in BIND 9, mention
its CVE identifier in the CHANGES entry for CVE-2023-50387 (KeyTrap),
which accompanied the code change that addresses both of these
vulnerabilities.