Evan Hunt [Sun, 17 Aug 2025 20:53:04 +0000 (13:53 -0700)]
simplify iterator_active()
the if statements calling iterator_active() checked the EXISTS
flag on the header and then iterator_active() checked it again.
simplify so only the caller checks it.
Ondřej Surý [Wed, 13 Aug 2025 07:30:45 +0000 (09:30 +0200)]
Move SIEVE-LRU to dns_slabtop_t structure
As the qpcache has only one active header at the time, we can move the
SIEVE-LRU members from dns_slabheader_t to dns_slabtop_t structure thus
saving a little bit of memory in each slabheader and using it only once
per type.
Ondřej Surý [Tue, 5 Aug 2025 16:05:52 +0000 (18:05 +0200)]
Split the top level slab header hierarchy and the headers
The code that combines the top-level hierarchy (per-typepair) and
individual slab headers (per-version) saves a little bit of memory, but
makes the code convoluted, hard to read and hard to modify. Change the
top level hierarchy to be of different type with individual slabheaders
"hanging" from the per-typepair dns_slabtop_t structure.
This change makes the future enhancements (changing the top level data
structure for faster lookups; coupling type + sig(type) into single
slabtop) much easier.
Ondřej Surý [Tue, 5 Aug 2025 16:05:52 +0000 (18:05 +0200)]
Pass 'mctx' instead of 'db' to dns_slabheader_new()
The slabheader doesn't directly attach or link to 'db' anymore. Pass
only the memory context needed to create the slab header to make the
lack of relation ship more prominent.
Also don't call dns_slabheader_reset() from dns_slabheader_new(), it has
no added value.
Ondřej Surý [Wed, 13 Aug 2025 06:45:45 +0000 (08:45 +0200)]
Always return DNS_R_UNCHANGED when new slabheader was not added
Change the add() function in the dns_qpcache to properly return
DNS_R_UNCHANGED if the newheader was not actually consumed, and move
the dns_slabheader_destroy() call outside of the add() function.
Ondřej Surý [Fri, 15 Aug 2025 05:36:13 +0000 (07:36 +0200)]
chg: dev: Remove locking from rdataslab_getownercase()
Under normal circumstances, the case bitfield in the slabheader should
be set only once. By actually (soft-)enforcing this, the read locking
can be completely removed from the rdataslab_getownercase() as we can
check whether the case has been already set or not and making everything
immutable once the case has been set.
Merge branch 'ondrej/remove-locking-from-slabheader-ownercase' into 'main'
Ondřej Surý [Tue, 12 Aug 2025 10:10:24 +0000 (12:10 +0200)]
Remove locking from rdataslab_getownercase()
Under normal circumstances, the case bitfield in the slabheader should
be set only once. By actually (soft-)enforcing this, the read locking
can be completely removed from the rdataslab_getownercase() as we can
check whether the case has been already set or not and making everything
immutable once the case has been set.
Ondřej Surý [Tue, 12 Aug 2025 10:21:56 +0000 (12:21 +0200)]
Move the slabheader attribute helpers to private header
The slabheader.c, qpzone.c and qpcache.c had couple of shared macros
that were copied and paste between the units. Move these common
attributes access macros into private header, so these can be shared
among the three compilation units.
Ondřej Surý [Tue, 12 Aug 2025 09:31:57 +0000 (11:31 +0200)]
Unify the NONEXISTENT() macro in qpzone to EXISTS()
In the dns_qpcache unit, we use EXISTS() macro, but in the dns_qpzone
there's a NONEXISTENT() macro for the same slabheader attribute. Unify
the macro to be also EXISTS() in dns_qpzone.
Ondřej Surý [Mon, 11 Aug 2025 07:39:13 +0000 (09:39 +0200)]
Remove the negative type logic from qpcache
Previously, when a negative header was stored in the cache, it would be
stored in the dns_typepair_t as .type = 0, .covers = <negative type>.
When searching the cache internally, we would have to look for both
positive and negative typepair and the slabheader .down list could be a
mix of positive and negative types.
Remove the extra representation of the negative type and simply use the
negative attribute on the slabheader. Other units (namely dns_ncache)
can still insert the (0, type) negative rdatasets into the cache, but
internally, those will be converted into (type, 0) slabheaders, and vice
versa - when binding the rdatasets, the negative (type, 0) slabheader
will be converted to (0, type) rdataset. Simple DNS_TYPEPAIR() helper
macro was added to simplify converting single rdatatype to typepair
value.
As a side-effect, the search logic in all places can exit early if
there's a negative header for the type we are looking for, f.e. when
searching for the zone cut, we don't have to walk through all the
slabheaders, if there's a stored negative slabheader.
Ondřej Surý [Mon, 11 Aug 2025 05:17:43 +0000 (07:17 +0200)]
Use dns_rdatatype_none more consistently
Use dns_rdatatype_none instead of plain '0' for dns_rdatatype_t and
dns_typepair_t manipulation. While plain '0' is technically ok, it
doesn't carry the required semantic meaning, and using the named
dns_rdatatype_none constant makes the code more readable.
Ondřej Surý [Thu, 7 Aug 2025 06:12:36 +0000 (08:12 +0200)]
Add strict checks on typepair values in the developer's mode
When in developer's mode, make the DNS_TYPEPAIR_* macros be more
strict on the contents of the 'base' and 'covers', so we can catch
invalid use of the API.
Ondřej Surý [Thu, 7 Aug 2025 06:08:24 +0000 (08:08 +0200)]
Disallow TYPE0 to be queried or inserted into the database
The RR type 0 is a reserved type for SIG[1] resource record. It should
not be ever inserted into the database nor queried. Add a special
handling to bail out quickly with DNS_R_DISALLOWED when inserting and
ISC_R_NOTFOUND when looking up TYPE0. This is also prerequisite for
stricter checks in the follow-up commit.
Ondřej Surý [Mon, 11 Aug 2025 14:22:03 +0000 (16:22 +0200)]
Fix typo in nsupdate where covers would be equal to type
There was an apparent typo where rdatalist->covers would be assigned the
same value as rdatalist->type. As nsupdate can't update signatures, the
covers must be dns_rdatatype_none.
Ondřej Surý [Wed, 6 Aug 2025 17:34:35 +0000 (19:34 +0200)]
Unify the dns_typepair_t variable naming and usage
The dns_typepair_t and dns_rdatatype_t variables were both named 'type'
in multiple places. Rename all dns_typepair_t variables to include word
'pair' in the variable name to make sure that the distinction between
the two types is more clear.
Ondřej Surý [Fri, 15 Aug 2025 05:06:11 +0000 (07:06 +0200)]
fix: dev: Simplify the DNS_R_UNCHANGED handling in dns_resolver unit
Instead of catching the DNS_R_UNCHANGED from dns_db_addrdataset() (via
cache_rrset() and dns_ncache_add()) individually, mask it properly as
soon as possible by moving the sigrdataset caching logic inside the
cache_rrset() and returning ISC_R_SUCCESS from cache_rrset() and
dns_ncache_add() when the database was unchanged.
Closes #5473
Merge branch '5473-fix-crash-in-validated' into 'main'
Ondřej Surý [Thu, 14 Aug 2025 06:35:05 +0000 (08:35 +0200)]
Simplify the DNS_R_UNCHANGED handling in dns_resolver unit
Instead of catching the DNS_R_UNCHANGED from dns_db_addrdataset() (via
cache_rrset() and dns_ncache_add()) individually, mask it properly as
soon as possible, by moving the sigrdataset caching logic inside
cache_rrset() and returning ISC_R_SUCCESS from cache_rrset() and
dns_ncache_add() when the database was unchanged.
Ondřej Surý [Fri, 15 Aug 2025 04:25:23 +0000 (06:25 +0200)]
fix: dev: result could be set incorrectly in validated()
During a recent refactoring of `validated()`, a line was
removed, causing `result` to be left unchanged. This
caused time to be wasted continuing to try to validate when a
non-recoverable error had occurred, and also caused the wrong
reason to be logged in `add_bad()`.
Ondřej Surý [Thu, 14 Aug 2025 06:41:05 +0000 (08:41 +0200)]
Always delete the cached results on broken chain
The logic to delete records from the cache was relying on the contents
of the validation answer. Change the logic to always delete the
contents of the cache on the broken chain result.
Evan Hunt [Thu, 14 Aug 2025 06:11:29 +0000 (23:11 -0700)]
result could be set incorrectly in validated()
during a recent refactoring of validated(), a line was
removed, causing 'result' to be left unchanged. this
wasted time continuing to try to validate when a
non-recoverable error had occured, and caused the wrong
reason to be logged in add_bad().
Mark Andrews [Thu, 14 Aug 2025 22:07:33 +0000 (08:07 +1000)]
fix: dev: Use DNS_RDATACOMMON_INIT to hide branch differences
Initialization of the common members of rdata type structures varies
across branches. Standardize it by using the `DNS_RDATACOMMON_INIT`
macro for all types, so that new types are more likely to use it,
and hence backport more cleanly.
Closes #5467
Merge branch '5467-use-dns_rdatacommon_init-to-hide-branch-differences' into 'main'
Mark Andrews [Wed, 6 Aug 2025 05:28:39 +0000 (15:28 +1000)]
Use DNS_RDATACOMMON_INIT to hide branch differences
Initialization of the common members of rdata type structures varies
across branches. Standardize it by using the DNS_RDATACOMMON_INIT
macro for all types, so that new types are more likely to use it,
and hence backport more cleanly.
Nicki Křížek [Thu, 14 Aug 2025 18:57:03 +0000 (20:57 +0200)]
fix: ci: Update DNS Shotgun parameters for an updated dataset
We've switched to an updated dataset for shotgun jobs. The change in
underlying traffic caused the more sensitive doh-get (and partially dot)
jobs to overload the resolver, making the jobs unstable and unreliable,
due to an increased number of timeouts.
Readjust the load parameters slightly to avoid exceeding ~2 % of
timeouts in the cold cache scenario to stabilize the job results.
Merge branch 'nicki/ci-shotgun-load-new-dataset' into 'main'
Nicki Křížek [Mon, 11 Aug 2025 13:04:50 +0000 (15:04 +0200)]
Update DNS Shotgun parameters for an updated dataset
We've switched to an updated dataset for shotgun jobs. The change in
underlying traffic caused the more sensitive doh-get (and partially dot)
jobs to overload the resolver, making the jobs unstable and unreliable,
due to an increased number of timeouts.
Readjust the load parameters slightly to avoid exceeding ~2 % of
timeouts in the cold cache scenario to stabilize the job results.
Alessio Podda [Thu, 14 Aug 2025 10:10:21 +0000 (10:10 +0000)]
chg: dev: Split dbmethods into node and db vtable
All databases in the codebase follow the same structure: a database is
an associative container from DNS names to nodes, and each node is an
associative container from RR types to RR data.
Each database implementation (qpzone, qpcache, sdlz, builtin, dyndb) has
its own corresponding node type (qpznode, qpcnode, etc). However, some
code needs to work with nodes generically regardless of their specific
type - for example, to acquire locks, manage references, or
register/unregister slabs from the heap.
Before this MR, these generic node operations were implemented as methods in
a `dns_dbmethods_t` vtable. This created a coupling between the database
and node lifetimes. If a node were to outlive its parent database, the node
destructor would destroy all RR data, and each RR data destructor would
try to unregister from heaps by calling a virtual function from the
database vtable. Since the database was already freed, this would cause a
crash.
This MR breaks the coupling by standardizing the layout of all
database nodes, adding a `dns_dbnode_methods_t` vtable for node
operations, and moving node-specific methods from the database vtable to
the node vtable.
Alessio Podda [Thu, 5 Jun 2025 09:51:29 +0000 (11:51 +0200)]
Decouple database and node lifetimes by adding node-specific vtables
All databases in the codebase follow the same structure: a database is
an associative container from DNS names to nodes, and each node is an
associative container from RR types to RR data.
Each database implementation (qpzone, qpcache, sdlz, builtin, dyndb) has
its own corresponding node type (qpznode, qpcnode, etc). However, some
code needs to work with nodes generically regardless of their specific
type - for example, to acquire locks, manage references, or
register/unregister slabs from the heap.
Currently, these generic node operations are implemented as methods in
the database vtable, which creates problematic coupling between database
and node lifetimes. If a node outlives its parent database, the node
destructor will destroy all RR data, and each RR data destructor will
try to unregister from heaps by calling a virtual function from the
database vtable. Since the database was already freed, this causes a
crash.
This commit breaks the coupling by standardizing the layout of all
database nodes, adding a dedicated vtable for node operations, and
moving node-specific methods from the database vtable to the node
vtable.
Refactor sdlz to use name instead of pointer to name
Right now dns_sdlzlookup has a slight difference from other dbnode
implementations in that it stores a pointer to a dns name instead of
the dns name itself.
This commit harmonizes dns_sdlzlookup with other dbnode
implementations, facilitating further refactoring.
Each run of `meson test` overwrites the default log file testlog.txt,
this means we lose the backtraces of previous run. This commit assigns
a different log file for each run.
Nicki Křížek [Wed, 6 Aug 2025 10:38:32 +0000 (12:38 +0200)]
fix: ci: Allow unit tests to fail on AlmaLinux 8
The doh unit test has been timing out recently and we don't have a fix
for it yet. Mark it as warning rather than a hard failure, since it's a
known issue.
Related #5448
Merge branch '5448-allow-failure-unit-almalinux8-doh' into 'main'
Nicki Křížek [Tue, 5 Aug 2025 15:28:52 +0000 (17:28 +0200)]
Allow unit tests to fail on AlmaLinux 8
The doh unit test has been timing out recently and we don't have a fix
for it yet. Mark it as warning rather than a hard failure, since it's a
known issue.
Nicki Křížek [Mon, 4 Aug 2025 14:30:41 +0000 (16:30 +0200)]
Use full path for shared test code imports in rollover tests
Previously, symlinks and relative directory imports were used in test
modules. This caused a name clash when a shared code module "common.py"
was introduced for a different test. To avoid the issue, use full paths
in imports.
cut down the number of identical lines in the filter-aaaa test:
- replace identical test cases with small check functions
(check_aaaa_only, check_any, check_nodata, etc).
- group those together into large check functions (check_filter,
check_filter_other_family) that have options for recursive and
break_dnssec, then run those for each combination of options
on servers connfigured with filter-aaaa-on-v4 and filter-aaaa-on-v6.
Ondřej Surý [Tue, 5 Aug 2025 11:27:59 +0000 (13:27 +0200)]
fix: dev: Refactor resolver cache_name() and validated() functions
These functions were excessive in length and complexity, with McCabe complexity values of 110 and 105 respectively, and also included some dead code. They have been cleaned up and split into smaller functions, with a maximum complexity of 27. A few minor coding errors were discovered and fixed along the way.
Merge branch 'each-refactor-cache-name' into 'main'
Evan Hunt [Wed, 26 Feb 2025 21:59:19 +0000 (13:59 -0800)]
refactor validated()
- there was special-case code in validated() to handle the results
of a validator started by a CD=1 query. since that never happens,
the code has been removed.
- the section of code that handles opportunistic caching of
validated SOA, NS and NSEC data has been split out to a separate
function.
- the number of goto statements has been reduced considerably.
Evan Hunt [Sun, 2 Mar 2025 09:24:08 +0000 (01:24 -0800)]
split out helper functions
- fctx_setresult() sets the event result in a fetch response
according to the rdataset being returned - DNS_R_NCACHENXDOMAIN or
DNS_R_NXRRSET for negative responses, ISC_R_SUCCESS, DNS_R_CNAME,
or DNS_R_DNAME for positive ones.
- cache_rrset() looks up a node and adds an rdataset.
- delete_rrset() looks up a node and removes rdatasets of a specified
type and, optionally, the associated signatures.
- gettrust() returns the trust level of an rdataset, or dns_trust_none
if the rdataset is NULL or not associated.
- getrrsig() scans the rdatasets associated with a name for the
RRSIG covering a given type.
Evan Hunt [Sun, 2 Mar 2025 06:15:11 +0000 (22:15 -0800)]
further subdivide caching functions
rctx_cacherdataset() has been split into two functions:
- rctx_cache_secure() starts validation for rdatasets
that need it; they are then cached by the validator
completion callback validated()
- rctx_cache_insecure() caches rdatasets immediately; it
is called when validation is disabled or the data
to be cached is glue.
Evan Hunt [Sun, 2 Mar 2025 04:04:18 +0000 (20:04 -0800)]
rename and refactor cache_name() and related functions
- renamed cache_message() to rctx_cachemessage()
- renamed cache_name() to rctx_cachename()
- merged ncache_message() into rctx_ncache()
- split out a new function, rctx_cacherdataset(), which is
called by rctx_cachename() in a loop to process each of
the rdatasets associated with the name.
Evan Hunt [Sun, 2 Mar 2025 05:38:34 +0000 (21:38 -0800)]
reduce code duplication around findnoqname()
every call to findnoqname() was followed by a call to
dns_rdataset_addnoqname(). we can move that call into
findnoqname() itself, and simplify the calling functions
a bit.
Evan Hunt [Sat, 1 Mar 2025 23:40:07 +0000 (15:40 -0800)]
set ANSWERSIG flag when processing ANY responses
previously, rctx_answer_any() set the ANSWER flag for all
rdatasets in the answer section; it now sets ANSWERSIG for
RRSIG/SIG rdatasets and ANSWER for everything else. this
error didn't cause any harm in the current code, but it
could have led to unexpected behavior in the future.
Evan Hunt [Thu, 27 Feb 2025 22:28:37 +0000 (14:28 -0800)]
split out some functionality in cache_name()
there are now separate functions to check the cacheability of
an rdataset or to normalize TTLs, and the code to determine
whether validation is necessary has been simplified.
Evan Hunt [Fri, 28 Feb 2025 01:10:21 +0000 (17:10 -0800)]
add functions to match rdataset types
- dns_rdataset_issigtype() returns true if the rdataset is
of type RRSIG and covers a specified type
- dns_rdataset_matchestype() returns true if the rdataset
is of the specified type *or* the RRSIG covering it.
Evan Hunt [Thu, 27 Feb 2025 20:43:52 +0000 (12:43 -0800)]
reduce steps for negative caching
whenever ncache_adderesult() was called, some preparatory code
was run first; this has now been moved into a single function
negcache() to reduce code duplication.
Evan Hunt [Thu, 27 Feb 2025 06:06:40 +0000 (22:06 -0800)]
change issecuredomain() functions to bool
dns_keytable_issecuredomain() and dns_view_issecuredomain()
previously returned a result code to inform the caller of
unexpected database failures when looking up names in the
keytable and/or NTA table. such failures are not actually
possible. both functions now return a simple bool.
also, dns_view_issecuredomain() now returns false if
view->enablevalidation is false, so the caller no longer
has to check for that.
Ondřej Surý [Tue, 5 Aug 2025 09:24:35 +0000 (11:24 +0200)]
fix: test: Add support for small stack size for threads
When running the isc_quota unit test with less than
usual amount of RAM (e.g. in a CI for architectures
with 32 bits of address space), the pthread_create()
function fails with the "Resource temporarily unavailable
(11)" error code.
Add functions to get and set the thread stack size (if requested),
and use these to set the thread stack size to smaller value in the
isc_quota unit test.
Merge branch 'aram/isc-thread-stack-size-small' into 'main'
Ondřej Surý [Tue, 5 Aug 2025 05:34:15 +0000 (07:34 +0200)]
Document the current default stack sizes on different systems
The default stack sizes varies between operating systems and between
different system libc libraries from 128kB (Alpine Linux with MUSL) to
8M (Linux with glibc). Document the different values used to justify
the value of THREAD_MINSTACKSIZE (currently set to 1MB).
Ondřej Surý [Mon, 4 Aug 2025 15:03:42 +0000 (17:03 +0200)]
Add support for setting thread stack size
When running the isc_quota unit test with less than usual amount of
RAM (e.g. in a CI for architectures with 32 bits of address space),
the pthread_create() function fails with the "Resource temporarily
unavailable (11):" error code.
Add functions to get and set the thread stack size (if requested),
and use these to set the thread stack size to smaller value in the
isc_quota unit test.
fix: usr: Add RPZ extended DNS error for zones with a CNAME override policy configured
When the zone is configured with a CNAME override policy, or the response policy zone contains a wildcard CNAME, the extended DNS error code was not added. This has been fixed.
Closes #5342
Merge branch '5342-rpz-cname-override-ede-not-added' into 'main'
Ondřej Surý [Mon, 4 Aug 2025 10:11:01 +0000 (12:11 +0200)]
chg: dev: Add and use global memory context called isc_g_mctx
Instead of having individual memory contexts scattered across different
files and called different names, add a single memory context called
isc_g_mctx that replaces named_g_mctx and various other global memory
contexts in various utilities and tests.
Merge branch 'ondrej/add-global-isc_g_mctx-instance' into 'main'
There is a data race when QP is reclaiming chunks on the call_rcu
threads and it tries to log the number of reclaimed chunks while the
server is shuttingdown. Workaround this by adding rcu_barrier() before
shuttingdown the global log context.
This required couple of internal changes to the isc_mem_debugging.
The isc_mem_debugging is now internal to isc_mem unit and there are
three new functions:
1. isc_mem_setdebugging() can change the debugging setting for an
individual memory context. This is need for the memory contexts used
for OpenSSL, libxml and libuv accounting as recording and tracing
memory is broken there.
2. isc_mem_debugon() / isc_mem_debugoff() can be used to change default
memory debugging flags as well as debugging flags for isc_g_mctx.
Additionally, the memory debugging is inconsistent across the code-base.
For now, we are keeping the existing flags, but three new environment
variables have been added 'ISC_MEM_DEBUGRECORD', 'ISC_MEM_DEBUGTRACE'
and 'ISC_MEM_DEBUGUSAGE' to set the global debugging flags at any
program using the memory contexts.
Add and use global memory context called isc_g_mctx
Instead of having individual memory contexts scattered across different
files and called different names, add a single memory context called
isc_g_mctx that replaces named_g_mctx and various other global memory
contexts in various utilities and tests.
Mark Andrews [Wed, 17 Nov 2021 02:09:03 +0000 (13:09 +1100)]
validator.c:check_signer now clones val->sigrdataset
Spurious validation failures were traced back to check_signer looping
over val->sigrdataset directly. Cloning val->sigrdataset prevents
check_signer from interacting with callers that are also looping
over val->sigrdataset.
Most of the shell-based tests in the `dnssec` system test have been converted to python. The only exceptions are the test cases that exercised the `dnssec-*` command line tools, and did not interact with a name server; those have been relocated into a new `dnssectools` system test.
Merge branch 'each-convert-dnssec-test' into 'main'
If nsX.reconfigure() is used in a way that might affect other tests
within the same module, it's best to split up the tests which need the
reconfig to a separate module. This ensures the reconfigure() won't
interfere with test results in case the tests are executed separately,
or in a different order.
many of the zones in the dnssec system test were identical or
had only trivial differences, and it would be easier to keep track
of them if they were sourced from template files.
also, the extra_artifacts have been simplified and restored to
the test files.
the shell tests that queried servers to check correct signing
behavior (using dnssec-signzone, dnssec-policy and nsupdate),
as well as "rndc signing", private-type records, rndc zonestatus,
offline keys, etc, have been moved to tests_signing.py.
the minimal update test in the dnssec_update_test.pl script
was also moved here and the perl script has been removed.