Petr Špaček [Wed, 11 May 2022 13:45:57 +0000 (15:45 +0200)]
Add link anchors to statements and blocks in the ARM
All statements now use .. namedconf:statement:: or
.. rndcconf:statement:: syntax provided by our Sphinx extension.
This has several consequences:
- It changes how statement headings are rendered
- Statements are indexed and show up as separate items in doc
search results (in the HTML version)
- Statements can be linked to using either :any:`statement` or
:namedconf:ref:`statement` syntax (not used in this commit)
- Statements can be categorized and printed using ..
namedconf:statatementlist:: syntax (not used in this commit)
Michał Kępień [Wed, 22 Jun 2022 13:09:43 +0000 (15:09 +0200)]
Add a note to the ARM on dnstap & resolver traffic
Warn users that server-side IP addresses are not stored in dnstap
captures of resolver traffic unless "query-source(-v6)" is explicitly
set, explaining why it is so.
Michał Kępień [Wed, 22 Jun 2022 11:45:46 +0000 (13:45 +0200)]
Fix destination port extraction for client queries
The current logic for determining the address of the socket to which a
client sent its query is:
1. Get the address:port tuple from the netmgr handle using
isc_nmhandle_localaddr().
2. Convert the address:port tuple from step 1 into an isc_netaddr_t
using isc_netaddr_fromsockaddr().
3. Convert the address from step 2 back into a socket address with the
port set to 0 using isc_sockaddr_fromnetaddr().
Note that the port number (readily available in the netmgr handle) is
needlessly lost in the process, preventing it from being recorded in
dnstap captures of client traffic produced by named.
Fix by first storing the address:port tuple returned by
isc_nmhandle_localaddr() in client->destsockaddr and then creating an
isc_netaddr_t from that structure. This allows the port number to be
retained in client->destsockaddr, which is what subsequently gets passed
to dns_dt_send().
Petr Špaček [Thu, 16 Jun 2022 12:03:45 +0000 (14:03 +0200)]
Deduplicate Manual Signing between DNSSEC chapter and DNSSEC Guide
The two procedures were essentially the same, but each instance was
missing some details from the other. They are now combined into one text
in the DNSSEC Guide and linked from DNSSEC chapter.
Petr Špaček [Thu, 16 Jun 2022 10:56:04 +0000 (12:56 +0200)]
Move Private Type Records in DNSSEC chapter to higher level
Private Type Records are not specific to manually signing, so it is
better to move it to the end of the "Zone Signing" section shared by all
three methods.
Petr Špaček [Thu, 9 Jun 2022 09:53:13 +0000 (11:53 +0200)]
Rewrite DNSSEC Validation subchapter in the ARM
Mostly deduplicating and linking information across the ARM.
Generally people should not touch it unless they what they are doing, so
let's try to discourage them a bit.
Matthijs Mekking [Thu, 12 May 2022 08:26:25 +0000 (10:26 +0200)]
Rewrite Dynamic Zones section
Restructure the section about dynamic zones and automatic signing:
- Focus on dynamic zones with 'auto-dnssec allow;'.
- Add a section about multi-signer models.
- Move NSEC3 related topics into one section.
- Remove any text that does not concern dynamic zones (mostly duplicate
text anyway).
Matthijs Mekking [Wed, 11 May 2022 10:09:43 +0000 (12:09 +0200)]
Add a section about Denial of Existence
Move bits from the "DNSSEC, Dynamic Zones, and Automatic Signing"
about denial of existence to a separate section below the "Key and
Signing Policy" section.
Add a brief introduction about denial of existence to this section.
Matthijs Mekking [Wed, 11 May 2022 09:04:47 +0000 (11:04 +0200)]
Rewrite DNSSEC chapter - signing
Restructure the first part of the DNSSEC chapter that deals with zone
signing. Put dnssec-policy first. Mention Key and Signing Policy.
Only then talk about the DNSSEC tools.
Michał Kępień [Wed, 22 Jun 2022 10:59:33 +0000 (12:59 +0200)]
Clean up convert-trs-to-junit.py invocations
- Use absolute paths when invoking the convert-trs-to-junit.py script
so that it also works correctly for out-of-tree and tarball-based
test jobs.
- Quote the variables used in convert-trs-to-junit.py invocations to
future-proof the code.
- Use "&&" instead of ";" in shell pipelines invoking the
convert-trs-to-junit.py script in order to prevent "source" errors
from being silently ignored.
- Ensure convert-trs-to-junit.py is invoked from the correct directory
for out-of-tree and tarball-based unit test jobs by adding
appropriate "cd" invocations.
- Ensure the convert-trs-to-junit.py invocations are always the last
step in each 'after_script', in order to run that script from the
correct directory for out-of-tree and tarball-based system test jobs
and to ensure that any potential errors in that script do not
prevent more important steps in the 'after_script' from being
executed.
Michał Kępień [Wed, 22 Jun 2022 10:59:33 +0000 (12:59 +0200)]
Move out-of-tree workspace back to $CI_PROJECT_DIR
Out-of-tree build & test jobs currently defined in GitLab CI use
/tmp/out_of_tree_workspace as the working directory. This requires
juggling that directory around as it gets passed from the build job to
the test jobs and then again after the test jobs are finished, so that
artifacts can be collected for the purpose of investigating test
failures. The original intention of doing this was to ensure that
bin/tests/system/run.sh does not rely on being executed from within a
Git working copy (which happens e.g. if the out-of-tree workspace is a
subdirectory of $CI_PROJECT_DIR, i.e. the path into which GitLab
Runner clones the project in each job).
However, even with these complications in place, not all possible
scenarios that should be handled properly by the system test framework
(e.g. invoking a given test one time after another from the same
out-of-tree build directory) are tested in GitLab CI anyway. Meanwhile,
the requirement for moving the out-of-tree workspace into
$CI_PROJECT_DIR in the 'after_script' for each out-of-tree job makes
these jobs less robust than they could be; for example, if any step in
the 'after_script' returns a non-zero exit code, the job's artifacts
will not include the out-of-tree workspace, hindering troubleshooting.
Simplify job definitions in .gitlab-ci.yml by moving the workspace used
by out-of-tree build & test jobs back to a subdirectory of
$CI_PROJECT_DIR. Whether the out-of-tree workspace exists within a Git
working copy or not does not matter for Autotools, so this is considered
to be a reasonable trade-off in terms of test coverage.
Michal Nowak [Wed, 15 Jun 2022 14:06:48 +0000 (16:06 +0200)]
Do not run Ubuntu 18.04 jobs in MR-triggered pipelines
With the addition of Ubuntu 22.04 three more CI jobs were added. To
compensate for that, move Ubuntu 18.04 jobs out of MR-triggered
pipelines to schedule-triggered ones.
Also, move --disable-geoip ./configure options from Ubuntu 18.04 to
Ubuntu 20.04 jobs to keep these options in the more frequent
MR-triggered pipelines.
Michal Nowak [Thu, 16 Jun 2022 09:25:43 +0000 (11:25 +0200)]
Fix implicit string concatenation in tests-checkds.py
pylint 2.14.2 reports the following warnings:
bin/tests/system/checkds/tests-checkds.py:265:0: W1404: Implicit string concatenation found in call (implicit-str-concat)
bin/tests/system/checkds/tests-checkds.py:273:0: W1404: Implicit string concatenation found in call (implicit-str-concat)
Tom Krizek [Wed, 15 Jun 2022 13:00:27 +0000 (15:00 +0200)]
Report reasons for skipped/xfailed system pytests
If skip/xfail is used in pytest, it can have a reason string associated
with it. When evaluating these tests, it can be useful to be able to
differentiate the reason why the test was skipped/xfailed/xpassed,
because there might be multiple possible reasons for that.
The extra options passed to pytest ensure that the string with the
reason appears in the test summary and thus we're able to find the
string with the reason in the log output.
See https://docs.pytest.org/en/7.1.x/how-to/skipping.html for more info
Petr Špaček [Thu, 9 Jun 2022 17:26:40 +0000 (19:26 +0200)]
Update NSEC3 guidance to match draft-ietf-dnsop-nsec3-guidance-10
https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-nsec3-guidance-10
is on it's way to become RFC, so let's update our recommendations in the
docs to be in line with it.
Artem Boldariev [Wed, 15 Jun 2022 10:57:52 +0000 (13:57 +0300)]
TLS DNS: do not call accept callback twice
Before the changes from this commit were introduced, the accept
callback function will get called twice when accepting connection
during two of these stages:
* when accepting the TCP connection;
* when handshake has completed.
That is clearly an error, as it should have been called only once. As
far as I understand it the mistake is a result of TLS DNS transport
being essentially a fork of TCP transport, where calling the accept
callback immediately after accepting TCP connection makes sense.
This commit fixes this mistake. It did not have any very serious
consequences because in BIND the accept callback only checks an ACL
and updates stats.
Petr Špaček [Mon, 13 Jun 2022 15:34:37 +0000 (17:34 +0200)]
Update Authoritative Server Hardware requirements in DNSSEC Guide
Based on measurements done on BIND v9_19_2 using bank. TLD and a
synthetitc fullly signed zone, using RSASHA256 and ECDSAP256SHA256
algorithms with NSEC and NSEC3 without opt-out.
Petr Špaček [Fri, 10 Jun 2022 12:40:17 +0000 (14:40 +0200)]
Rewrite Recursive Server Hardware requirements in DNSSEC Guide
This section was completely out of date. Current measurements on dataset
Telco EU 2022-02 and BIND 9.19.1 indicate absolutely different results
than described in the old version of the text.
Petr Špaček [Fri, 10 Jun 2022 11:43:14 +0000 (13:43 +0200)]
Remove outdated software requirements from DNSSEC Guide
Guide in this repo is tied to latest version anyway, so let's not even
mention ancient versions of BIND.
This also solves the OpenSSL question because it is now mandatory for
build, which subsequently removes the entropy problem - so let's not
mention it either.
Aram Sargsyan [Tue, 14 Jun 2022 10:49:04 +0000 (10:49 +0000)]
Fix a race condition between shutdown and route_connected()
When shutting down, the interface manager can be destroyed
before the `route_connected()` callback is called, which is
unexpected for the latter and can cause a crash.
Move the interface manager attachment code from the callback
to the place before the callback is registered using
`isc_nm_routeconnect()` function, which will make sure that
the interface manager will live at least until the callback
is called.
Make sure to detach the interface manager if the
`isc_nm_routeconnect()` function is not implemented, or when
the callback is called with a result value which differs from
`ISC_R_SUCCESS`.
Aram Sargsyan [Tue, 14 Jun 2022 10:42:28 +0000 (10:42 +0000)]
Do not use the interface manager until it is ready
The `ns_interfacemgr_create()` function, when calling
`isc_nm_routeconnect()`, uses the newly created `ns_interfacemgr_t`
instance before initializing its reference count and the magic value.
Defer the `isc_nm_routeconnect()` call until the initializations
are complete.
Aram Sargsyan [Thu, 19 May 2022 20:44:32 +0000 (20:44 +0000)]
Fix a crash in dig NS search mode
In special NS search mode, after the initial lookup, dig starts the
followup lookup with discovered NS servers in the queries list. If one
of those queries then fail, dig, as usual, tries to start the next query
in the list, which results in a crash, because the NS search mode is
special in a way that the queries are running in parallel, so the next
query is usually already started.
Apply some special logic in `recv_done()` function to deal with the
described situation when handling the query result for the NS search
mode. Particularly, print a warning message for the failed query,
and do not try to start the next query in the list. Also, set a non-zero
exit code if all the queries in the followup lookup fail.
Michal Nowak [Thu, 10 Feb 2022 10:05:46 +0000 (11:05 +0100)]
Capture scripts for Coverity Scan analysis
With the recent Coverity Scan 2021.12 version, Python 3 scripts are
being analyzed in addition to C files. The --fs-capture-search option
scripts for Coverity Scan analysis should be added to leverage this
feature.
Michal Nowak [Tue, 15 Feb 2022 10:24:01 +0000 (11:24 +0100)]
Download Coverity Scan analysis tool to /tmp
Downloading and unpacking Coverity Scan analysis tool tarball
(cov-analysis-linux64.tgz) to $CI_PROJECT_DIR interferes with the
execution of the analysis tool when the --fs-capture-search option is
used because the tool starts to analyze some of its Javascript files.
(There's the --fs-capture-search-exclude-regex <path> option, but I
failed to find a way to make it work.)
Michal Nowak [Mon, 14 Feb 2022 20:06:31 +0000 (21:06 +0100)]
Drop coverity cache feature
The coverity CI job cache feature is used to ensure that the 1 GB
cov-analysis-linux64.tgz file is being cached on GitLab CI runner, where
it was downloaded in the past. This feature does not seem to work
anymore; given that the proper solution to creating distributed cache is
complicated, better to drop the feature altogether.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Assert on unknown isc_quota_attach() return values
The only values that the isc_quota_attach() function (called from
check_recursionquota() via recursionquotatype_attach_soft()) can
currently return are: ISC_R_SUCCESS, ISC_R_SOFTQUOTA, and ISC_R_QUOTA.
Instead of just propagating any other (unexpected) error up the call
stack, assert immediately, so that if the isc_quota_* API gets updated
in the future to return values currently matching the "default"
statement, check_recursionquota() can be promptly updated to handle such
new return values as desired.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Ensure ns_query_cancel() handles all recursions
Previously, multiple code paths reused client->query.fetch, so it was
enough for ns_query_cancel() to issue a single call to
dns_resolver_cancelfetch() with that fetch as an argument. Now, since
each slot in the 'recursions' array can hold a reference to a separate
resolver fetch, ns_query_cancel() needs to handle all of them, so that
all recursion callbacks get a chance to clean up the associated
resources when a query is canceled.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Separate prefetch handling from RPZ fetch handling
Both prefetch code and RPZ code ignore recursion results (caching the
response notwithstanding). RPZ code has been (ab)using that fact since
commit 08e36aa5a5c7697a839f83831fccf8fb3f792848 by employing
prefetch_done() as the fetch completion callback. This is only
seemingly a simplification as it makes the code harder to follow ("why
is prefetch code used for handling RPZ-triggered recursion?").
Turn prefetch_done() into a new function whose name clearly conveys its
purpose. Add a parameter to its prototype in order to allow callers to
specify which slot in the 'recursions' array it should use. Reintroduce
prefetch_done() as a wrapper for that function. Add rpzfetch_done(), an
RPZ-exclusive wrapper for that function (using a distinct recursion
type).
Since each slot in the 'recursions' array needs to be initialized before
getting cleaned up when recursion completes, rework fetch_and_forget()
so that it takes recursion type rather than extra fetch options as the
last parameter and make it use the requested slot in the 'recursions'
array rather than a fixed slot (RECTYPE_PREFETCH) for all callers. This
makes fetch_and_forget() a logical complement of cleanup_after_fetch().
Collectively, these changes make prefetch and RPZ code logically
separate (except for reusing client->recursionquota, which will be
refactored later).
Ondřej Surý [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Add recursionquota_attach*()
Add a set of new helper functions for attaching to the recursion quota
in order to reduce code duplication and to ensure that the recursive
clients counter is always adjusted properly. Since some callers
(query_prefetch(), query_rpzfetch()) treat exceeding the soft quota as
an error while others (check_recursionquota()) do not, also add two
wrapper functions whose names help convey their purpose, in order to
improve code readability.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Remove redundant recursion quota pointer checks
When the client->recursionquota pointer was overloaded by different
features, each of those features had to be aware of that fact and handle
any updates of that pointer gracefully. Example: prefetch code
initiates recursion, attaching to client->recursionquota, then query
processing restarts due to a CNAME being encountered, then that CNAME is
not found in the cache, so another recursion is triggered, but
client->recursionquota is already attached to; even though it is not
CNAME chaining code that attached to that pointer, that code still has
to handle such a situation gracefully.
However, all features that can initiate recursion have now been updated
to use separate slots in the 'recursions' array, so keeping the old
checks in place means masking future programming errors that could
otherwise be caught - and should be caught because each feature needs to
properly maintain its own quota reference.
Remove outdated recursion quota pointer checks to enable the assertions
in isc_quota_*() functions to detect programming errors in code paths
that can start recursion. Remove an outdated comment to prevent
confusion.