There's currently an issue with the shotgun workflow that's being
investigated. Until it's resolved, there's no point in creating the
shotgun jobs as they'll just fail.
Merge branch 'nicki/ci-temporarily-disable-shotgun-jobs' into 'main'
There's currently an issue with the shotgun workflow that's being
investigated. Until it's resolved, there's no point in creating the
shotgun jobs as they'll just fail.
chg: dev: Add option to compile named with static linking and LTO
Statically linking lib{isc,dns,ns,cfg,isccc} and enabling LTO shows over 10% improvements on all almost measurements in perflab. That said, we can't use Meson's option for LTO since it would result in every binary being compiled with LTO and a great increase in compile time.
To work around it, we add a configuration option that enables LTO and static linking only for the `named` binary.
Add named-lto option to meson build to named with LTO
Enabling LTO yields substantial performance gains on both authoritative
and resolver benchmarks.
But since LTO defers many optimization passes to link time, enabling LTO
across the board would cause an increase in compilation time, as passes
that would be run only once would need to be run for each executable.
As a compromise, this commit adds a named-lto build option, that
compiles the individual object files with the -ffat-lto-object option
and then enables LTO only for the named executable. Object files are
reused between lib*.so and the named executable.
Enabling LTO in the subsequent commit requires the file names to be
unique and having same probes.d in each of the libraries breaks this
requirement. Rename probes.d to probes-{isc,dns,ns}.d files and adjust
the includes.
Colin Vidal [Wed, 24 Sep 2025 09:46:38 +0000 (11:46 +0200)]
chg: dev: refactor view creation/configuration loops in dedicated functions
Refactor a bit of `apply_configuration` by extracting (into respective dedicated function) the logic to build the keystores list, the KASP list as well as creating the view/zones and configuring those. This is the next step of MR !10895 and !10901
While the code is extracted, some global variables has been changed into a function parameters which enable to have a clear view of the dependency of the function, typically, to know if it depends on local configuration object or runtime "production" object. The end goal (not in this MR, but later on) is to move as much as possible initialization logic outside of the exclusive mode.
As a first step, latest commits move the keystores list, KASP list and view/zones creation outside of the exclusive mode. (The view/zone configuration remain in exclusive mode for now, because of a dependency to the runtime "cachelist". This is the target of a next MR.
For the record; while moving the keystores list, KASP list and view/zone creation doesn't have a significant impact on the time the exclusive mode is taken (from my experiment on a 1M small zones instance); moving `configure_views` did have a _massive_ impact (basically, the time spend in the exclusive mode is then non calculable). Configuring views outside the exclusive mode needs more work, which will be done in future MRs.
See #4673
Merge branch 'colin/refactor-applyconfig' into 'main'
Colin Vidal [Wed, 10 Sep 2025 13:17:11 +0000 (15:17 +0200)]
apply_configuration: log subroutines for tests
In order to have a (minimal) test ensuring we don't move back
`apply_configuration` subroutines which can be done before the exclusive
lock is taken, `APPLY_CONFIGURATION_SUBROUTINE_LOG` macro is added and
used for the few subroutines already extracted from the exclusive mode.
Those expected logs are added in `configloading` system test checks.
Colin Vidal [Tue, 9 Sep 2025 13:41:17 +0000 (15:41 +0200)]
creation of client TLS ctx before exclusive mode
When the server is configured (inside `apply_configuration`) a client
TLS context cache is created and attached to the global server object.
It is then used by `configure_view` flow (and also during runtime though
the zone manager).
It is now created before the exclusive mode, and the swap of the
previous TLS cache ctx is done at the end of the exclusive mode, if
everything went well.
This allows us (among other follow-up changes) to move the
`configure_views` function outside of the exclusive mode.
Colin Vidal [Mon, 8 Sep 2025 13:57:47 +0000 (15:57 +0200)]
move creation of keystores, kasp list and view outside of exclusive mode
The keystores initialization, the KASP list initialization as well as
the initialization of the view no longer depends of any data shared by
running "production" objects during re-configuration of the server. This
allows us to move those outside (before) the exclusive mode is taken.
Colin Vidal [Mon, 8 Sep 2025 12:58:47 +0000 (14:58 +0200)]
cfg_aclconfctx_t object is part of named_server
`named_g_actconfctx` is a global variable holding the ACL configuration
context alive (in particular, to dynamically load zones). However, this
object is build once per configuration (early) and is used only inside
server.c `apply_configuration` flow. (Two exceptions: the shutdown flow,
still in server.c and plugin check flow, which doesn't need it, so it's
NULL in such case).
Instead of leaving this global publicly exposed, it is now part of the
`named_server_t` object. This allows us to clearly see that, when
reconfigureing the server, the new instance of the ACL context is known
only by the newly built object and not currently used by "production"
object; and will help to move move logic before the exclusive mode is
taken.
The other advantage is that the ACL configuration context can now be
built before the exclusive lock as well.
Colin Vidal [Thu, 28 Aug 2025 15:29:23 +0000 (17:29 +0200)]
apply_configuration: add configure_kasplist
The kasplist (dnssec-policy defined in the builtin and global
configuration options) was built inside apply_configuration. This
commit extracts this logic into its separate function.
In order to make the view configuration independent of the global
`server` object, the newly built kasplist is now passed as parameter.
(This eventually will help to be able to configure the views outside of
the exclusive mode by limiting its dependency to the global
`server`/`named_g_server`).
Colin Vidal [Tue, 26 Aug 2025 11:19:10 +0000 (13:19 +0200)]
apply_configuration: remove builtin_viewlist
When creating/configuring the view, the user-defined views are built and
set into the viewlist, then builtin-view inside the builtin_viewlist.
But there is no seperate logic applied to those two lists, and they are
immediately merged into viewlist right after. This commit removes this
intermediate list and add builtin-views directly into the main viewlist
instead.
Colin Vidal [Tue, 26 Aug 2025 10:35:56 +0000 (12:35 +0200)]
refactor view creation/config in apply_configuration
In order to help splitting apply_configuration, the inline loops and bit
of logic around it for views creation and configuration, each of those
are now in a dedicatated function.
chg: dev: Use lock-free hashtable for storing resolver fetch contexts
Replace the locked hashmap with the lock-free hashtable from the RCU
library and protect the fetch contexts against reuse by replacing the
libisc reference counting with urcu_ref that can soft-fail in situation
where the reference count is already zero. This allows us to easily
skip re-using the fetch context if it is already in process of being
destroyed.
Merge branch 'ondrej/use-urcu-lfht-for-resolver-tables' into 'main'
Use lock-free hashtable for storing resolver fetch contexts
Previously, the fetch contexts were stored inside rwlocked hashmap
table. This was one of the most contended places for the resolver,
especially in the cold cache situation.
Replace the locked hashmap with the lock-free hashtable from the RCU
library and protect the fetch contexts against reuse by replacing the
libisc reference counting with urcu_ref that can soft-fail in situation
where the reference count is already zero. This allows us to easily
skip re-using the fetch context if it is already in process of being
destroyed.
chg: dev: Add a circular reference between slabtops for type and RRSIG(type)
Previously, the slabtops for "type" and its signature was only loosely
coupled and the headers could expire at different time (both TTL and LRU
based expiry). Add a .related member to the slabtop that allows us to
expire the headers in both related headers and also optimize the lookups
because now both slabtops are looked up at the same time.
Closes #3396
Merge branch '3396-bind-rrsigs-to-records' into 'main'
Previously, the slabtops for "type" and its signature was only loosely
coupled and the headers could expire at different time (both TTL and LRU
based expiry). This commit expires the headers in both related
headers.
Add a circular reference between slabtops for type and RRSIG(type)
Previously, the slabtops for "type" and its signature was only loosely
coupled. Add a .related member to the slabtop that allows us to
optimize the lookups because now both slabtops are looked up at the
same time.
There was a pattern where first the header was checked for NULL
and then for being stale. In both cases the code path is the same
so it makes sense to put them in a separate function.
chg: dev: Convert slabtop and slabheader to use the cds list
This is the first MR in series that aims to reduce the node locking
by replacing the single-linked list of slabtop(s) and slabheader(s)
with CDS linked list. This commit doesn't do anything else beyond
replacing .next and .down links with the cds_list_head. The RCU
semantics will be added later.
Merge branch 'ondrej/use-rcu-list-for-slabtop' into 'main'
This is the second commit in series that aims to reduce the node locking
by replacing the single-linked list of slabheader(s) with CDS linked list.
This commit doesn't do anything else beyond replacing .next link with
the cds_list_head. RCU semantics is going to be added in the subsequent
commits.
This is the first commit in series that aims to reduce the node locking
by replacing the single-linked list of slabtop(s) with CDS linked list.
This commit doesn't do anything else beyond replacing .next link with
the cds_list_head. RCU semantics is going to be added in the subsequent
commits.
fix: dev: Fix datarace between unlocking fctx lock and shuttingdown fctx
There was a data race where new fetch response could be added to the
fetch context after we unlock the fetch context and before we shut it
down. This could cause assertion failure when fctx__done() was called
with ISC_R_SUCCESS because there was originally no fetch response, but
new fetch response without associated dataset was added before we had a
chance to shutdown the fetch context. This manifested in the
validated() callback, where cache_rrset() now returns ISC_R_SUCCESS
instead of DNS_R_UNCHANGED when cache was not changed. However the data
race was wrong on a general level.
Add new argument to fctx__done() that allows to call it with fctx->lock
already acquired to prevent these data races.
Closes #5507
Merge branch '5507-dont-release-fctx-lock-on-done' into 'main'
Fix datarace between unlocking fctx lock and shuttingdown fctx
There was a data race where new fetch response could be added to the
fetch context after we unlock the fetch context and before we shut it
down. This could cause assertion failure when fctx__done() was called
with ISC_R_SUCCESS because there was originally no fetch response, but
new fetch response without associated dataset was added before we had a
chance to shutdown the fetch context. This manifested in the
validated() callback, where cache_rrset() now returns ISC_R_SUCCESS
instead of DNS_R_UNCHANGED when cache was not changed. However the data
race was wrong on a general level.
When the fctx__done() is called with ISC_R_SUCCESS as result is expects
the fctx->lock to be already acquired to prevent these data races.
Split the fctx_done() into success and failure variants
The split will allow us to call fctx__done() with fctx->lock acquired
when it is called with ISC_R_SUCESS to prevent data races when finishing
the fetch context.
chg: ci: Only run relevant CI jobs based on the changes
Trigger selected CI jobs on MR automatically only if there are related
code changes. Otherwise, offer an option to run the jobs manually in
MRs. For other sources, like schedules, tags etc., execute the jobs as
usual.
Merge branch 'nicki/ci-restrict-rules-changes' into 'main'
Trigger selected CI jobs on MR automatically only if there are related
code changes. Otherwise, offer an option to run the jobs manually in
MRs. For other sources, like schedules, tags etc., execute the jobs as
usual.
Colin Vidal [Wed, 17 Sep 2025 15:38:54 +0000 (17:38 +0200)]
fix: usr: preserve cache when reload fails and reload the server again
Fixes an issue where failing to reconfigure/reload the server would prevent to preserved the views caches on the subsequent server reconfiguration/reload.
Colin Vidal [Tue, 16 Sep 2025 15:14:33 +0000 (17:14 +0200)]
preserve cache when reload fails
If the server is reloaded, new views are created and preexisting cache
is attached to those _but_ something goes wrong later, the previous
views are restored but the previous cache list is destroyed. This makes
the subsequent reload to drop the existing cache. This fixes it by
avoiding a mutation of the old cache list.
Colin Vidal [Tue, 16 Sep 2025 15:14:46 +0000 (17:14 +0200)]
test that cache is preserved on reconfing failure
A named bug scrap the cache on a second reload after an initial reload
failure. Adds a test checking that the cache is preserved between server
reconfiguration/reloads even if it fails at some point (after attempting
to re-use the cache) and the server is re-loaded later.
The dns_qpcache already had all the namespace changes needed to put the
normal data and auxiliary NSEC data into a single tree. Remove the
extra nsec QP trie and use the single QP trie for all the cache data.
Merge branch 'ondrej/use-qp-namespace-in-cache' into 'main'
As we removed the ability to count nodes in the auxiliary trees (because
there are no auxiliary trees), we can also cleanup the API and
associated enum type (dns_dbtree_t).
The dns_qpcache already had all the namespace changes needed to put the
normal data and auxiliary NSEC data into a single tree. Remove the
extra nsec QP trie and use the single QP trie for all the cache data.
Remove the dbiterator_{last,prev} from the qpcache
The dbiterator_{last,prev} functions are not used in the cache, and the
implementation would get quite complicated when we squash the main and
nsec trees together. It's easier to just not implement these.
Petr Špaček [Fri, 11 Jul 2025 19:10:19 +0000 (21:10 +0200)]
Add ability to load root zone into AsyncServer
We would prefer if explicit $ORIGIN is used only for root zone and
nothing else, solely to avoid zone files named "..db". For all other
zones the file name should match zone name.
Some of the API calls in `dns_db` were obsolete, and have been removed. Others were more complicated than necessary, and have been refactored to simplify.
Evan Hunt [Thu, 14 Aug 2025 23:28:11 +0000 (16:28 -0700)]
remove dns_db_{un,}locknode
remove the dns_db_locknode() and _unlocknode() calls, so that callers no
longer have the ability to directly manipulate the internal locking of
cache and zone databases.
Evan Hunt [Mon, 11 Aug 2025 19:13:53 +0000 (12:13 -0700)]
make getoriginnode implementation optional
if the dns_db_getoriginnode() call is not implemented, we can
fall back to running dns_db_findnode() on the database origin.
we now only implement getoriginnode directly in databases where
it's clearly faster than the fallback implementation would be.
Evan Hunt [Thu, 7 Aug 2025 21:24:01 +0000 (14:24 -0700)]
minor cleanup in sdlz.c
dns_db_issecure() and dns_db_nodecount() return false and 0,
respectively, if they are not implemented, so there's no need to
have implementation functions that only return false and 0.
Evan Hunt [Mon, 4 Aug 2025 23:54:08 +0000 (16:54 -0700)]
merge dns_db_find/findext and dns_db_findnode/findnodeext
the dns_db_findext and _findnodeext calls are extended versions
of dns_db_find and _findnode, which take additional arguments for
client information in order to support ECS. previously, database
implementations could support either API call, with cross-compatibility
so that, for example, dns_db_findext() could call a find implementation
if findext was not implemented, and dns_db_find() could call findext
if find was not implemented.
this has now been simplified. the find and findnodeext implementations
now support client info. all database implementations will now provide
these calls. implementations which do not support ECS will simply
ignore the clientinfo and clientinfomethods parameters.
this only affects the underlying implementation; callers will still
use the same interface. dns_db_find() and dns_db_findnode() are now
macros which pass NULL to the clientinfo parameters, so that callers
don't have to do so explicitly. dns_db_findext() and dns_db_findnodeext()
are still available for callers that do wish to pass clientinfo pointers.
Evan Hunt [Mon, 4 Aug 2025 23:24:09 +0000 (16:24 -0700)]
remove obsolete dns_db_hashsize()
this function's purpose was to populate the "CacheBuckets" statistic,
but there are no databases left that implemented it, so the return
value was always 0. "CacheBuckets" has now been removed from the
statistics, and the dns_db_hashsize() API call has been removed.
Evan Hunt [Mon, 4 Aug 2025 23:06:23 +0000 (16:06 -0700)]
dns_rdatalist functions are not for general use
the rdataset method implementation functions in dns/rdatalist.c (i.e.,
dns_rdatalist_first, _next, etc) are not meant to be called directly;
they're called via dns_rdataset_first(), dns_rdataset_next(), etc.
in dnssec-ksr.c, a list-based rdataset was iterated using these
functions. this has been fixed, and the functions have been renamed
to use the `dns__` prefix as a signal that they aren't meant to be
used outside the rdataset implementation.
fix: dev: Fix detection of whether node is active in find_wildcard()
The current code would fail during the write transaction. The first
header would not match the search->serial and the node might be
incorrectly detected as inactive.
Merge branch 'ondrej/fix-find-wildnode-active-node-detection' into 'main'
Fix detection whether node is active in find_wildcard()
The current code would fail during the write transaction. The first
header would not match the search->serial and the node might be
incorrectly detected as inactive.
chg: dev: Make the database ownercase modifiable only via addrdataset()
Simplify the implementation around the database ownercase. Remove the
dns_rdataset_setownercase() implementation for the slabheaders and only
allow setting ownercase on rdatalists and rdatasets. The ownercase in
the database can now be set only with dns_db_addrdataset() by passing
rdataset with correctly set ownercase.
Merge branch 'ondrej/make-database-ownercase-mostly-constant' into 'main'
Make the database ownercase modifiable only via addrdataset()
Simplify the implementation around the database ownercase. Remove the
dns_rdataset_setownercase() implementation for the slabheaders and only
allow setting ownercase on rdatalists and rdatasets. The ownercase in
the database can now be set only with dns_db_addrdataset() by passing
rdataset with correctly set ownercase.
Colin Vidal [Thu, 11 Sep 2025 12:21:24 +0000 (14:21 +0200)]
fix: dev: do not inline dns_zone_gethooktable
Since !10959 `dns_zone_gethooktable()` is only called once per query,
and the suspicion (from perflab analysis) that this (simple, as just
returning a pointer) call was slowing things down (perhaps because of
code locality reasons?) doesn't matter anymore. So even if !10959
inlined it, it shouldn't matter anymore.
Merge branch 'colin/minimize-hooktable-lookup-followup' into 'main'
Colin Vidal [Thu, 11 Sep 2025 09:51:06 +0000 (11:51 +0200)]
do not inline dns_zone_gethooktable
Since !10959 `dns_zone_gethooktable()` is only called once per query,
and the suspicion (from perflab analysis) that this (simple, as just
returning a pointer) call was slowing things down (perhaps because of
code locality reasons?) doesn't matter anymore. So even if !10959
inlined it, it shouldn't matter anymore.
Petr Špaček [Thu, 11 Sep 2025 09:06:21 +0000 (11:06 +0200)]
Prevent Sphinx from messing up syntax with "smartquotes" feature
Sphinx's smartquotes feature was rewriting -- to en-dash, "" to proper
English quotes etc. This was messing up syntax at unpredictable places.
Disable this feature instead of attempting to escape all the places in
the manual.
Colin Vidal [Thu, 11 Sep 2025 06:37:03 +0000 (08:37 +0200)]
fix: dev: minimize zone hooktable lookups
Merging !10483 caused a performance regression because the zone hooktable had to be looked up every time a hook point was reached, even if no zone plugins were configured. We now look up the zone hooktable when a zone is attached to the query context, and keep a pointer to it until the qctx is destroyed.
Merge branch 'each-zoneplugin-zonehook-once' into 'main'
query_reset() is called during query initialization, but the only
time the NS_QUERY_SETUP hook runs is when it's called from
query_cleanup(). it makes more sense to move the hook point to
there and rename it to NS_QUERY_CLEANUP.
this change caused a crash in the unit tests due to the view being
unnecessarily detached before ns__client_reset_cb() was called.
this has also been fixed.
Colin Vidal [Wed, 10 Sep 2025 20:40:44 +0000 (22:40 +0200)]
don't call hooks when a query hasn't started
guard the call to the NS_QUERY_RESET hook so it's called only if
the view has been set. If the view is NULL, it means the client has
been reset _before_ the query even started, and no other hook could
have been called, so it doesn't make sense to call this one.
this also enables us to avoid a NULL-check on the qctx->view in the
CALL_HOOK macros.
add a 'zhooks' member to the query_ctx structure, so that we only
need to look up the hook table for the zone once when iniitalizing
a qctx, and not once for every hook point.
chg: nil: Reduce the code duplication around getting slabheaders from slabtop
There was a lot of duplicated code around getting the first header that
exists, is active, and matches the version header from the qpzonedb.
Move the duplicate code into a helper function and unify the same
approach for the qpcache too even though the code is much simpler there.
It should come handy when top->header is something more complicated than
a pointer to first slabheader.
Merge branch 'ondrej/refactor-getting-the-first-slabheader-from-slabtop' into 'main'
Reduce the code duplication around getting slabheaders from slabtop
There was a lot of duplicated code around getting the first header that
exists, is active, and matches the version header from the qpzonedb.
Move the duplicate code into a helper function and unify the same
approach for the qpcache too even though the code is much simpler there.
It should come handy when top->header is something more complicated than
a pointer to first slabheader.
Mark Andrews [Wed, 3 Sep 2025 23:57:10 +0000 (09:57 +1000)]
Fix missing RRSIGs for "glue" lookups with CD=1
The code to test whether to store the RRSIGs on DNS_R_UNCHANGED
with CD=1 was failing because the comparison methods of the two
rdatatset instances were not compatible. Move the testing into
dns_db_addrdataset(), and request it by setting the DNS_ADD_EQUALOK
option. If the option is set and the old and new rrsets compare
as equal, dns_db_addrdataset() returns ISC_R_SUCCESS instead of
DNS_R_UNCHANGED.
When the `dnssec-signzone` tests were moved to the `dnssectools`
system test, an unused copy of the `signer` directory was left in
the `dnssec` test. This has been removed.
Merge branch 'each-cleanup-dnssec-files' into 'main'
Evan Hunt [Wed, 20 Aug 2025 18:28:32 +0000 (11:28 -0700)]
remove 'signer' files from dnssec test
when the dnssec-signzone tests were moved to the dnssectools
system test, a unused copy of the 'signer' directory was left in
the dnssec test. This has been removed.