Michał Kępień [Wed, 11 Dec 2019 11:04:29 +0000 (12:04 +0100)]
Add a job creating a release tarball to GitLab CI
Add a GitLab CI job (which is run only if all other jobs in a pipeline
succeed) that builds a BIND release tarball, i.e. fetches the source
tarball from the tarball building job, creates Windows zips, puts
certain parts of BIND documentation into the appropriate places, and
packs it all up into a single tarball whose contents can be subsequently
signed and published.
Michał Kępień [Wed, 11 Dec 2019 11:04:29 +0000 (12:04 +0100)]
Add a Windows debug system test job to GitLab CI
Add a system test job for binaries created by Visual Studio in the
"Debug" build configuration to GitLab CI so that they can be tested
along their "Release" counterparts when necessary.
Michał Kępień [Wed, 11 Dec 2019 11:04:29 +0000 (12:04 +0100)]
Add a Windows debug build job to GitLab CI
Add a Visual Studio build job using the "Debug" build configuration to
GitLab CI without enabling it for every pipeline as it takes about twice
as long to complete as its "Release" counterpart.
Michał Kępień [Wed, 11 Dec 2019 11:04:29 +0000 (12:04 +0100)]
Create and test BIND source tarballs in GitLab CI
Add a set of jobs to GitLab CI that create a BIND source tarball and
then build and test its contents. Run those extra jobs only when a tag
is pushed to the Git repository as they are only meant to be sanity
checks of BIND source tarball contents.
Michał Kępień [Wed, 11 Dec 2019 11:04:29 +0000 (12:04 +0100)]
Include prepare-softhsm2.sh in source tarballs
The util/prepare-softhsm2.sh script is useful for initializing a working
SoftHSM environment which can be used by unit tests and system tests.
However, since it is a test-specific script, it does not really belong
in the util/ subdirectory which is mostly pruned during the BIND source
tarball creation process. Move the prepare-softhsm2.sh script to
bin/tests/ so that its location is more appropriate for its purpose and
also so that it does not get removed during the BIND source tarball
creation process, allowing it to be used for setting up test
environments for tarball-based builds.
Michał Kępień [Wed, 11 Dec 2019 11:04:29 +0000 (12:04 +0100)]
List paths which should be excluded from tarballs
Convert the logic (currently present in the form of "rm -rf" calls in
util/kit.sh) for removing files and directories which are tracked by Git
but redundant in release tarballs into a set of .gitattributes rules
which allow the same effect to be achieved using "git archive".
Michał Kępień [Tue, 10 Dec 2019 09:31:33 +0000 (10:31 +0100)]
Only use LC_ALL=C where intended
The LC_ALL=C assignments in the "idna" system test, which were only
meant to affect a certain subset of checks, in fact persist throughout
all the subsequent checks in that system test. That affects the test's
behavior and is misleading.
When the "VARIABLE=value command ..." syntax is used in a shell script,
in order for the variable assignment to only apply to "command", the
latter must be an external binary; otherwise, the VARIABLE=value
assignment persists for all subsequent commands in a script:
$ /bin/sh foo.sh
bar: BAR=baz1
foo: BAR=baz0
bar: BAR=baz2
foo: BAR=baz2
$
Fix by saving the value of LC_ALL before the relevant set of checks in
the "idna" system test, restoring it afterwards, and dropping the
"LC_ALL=C command ..." syntax.
The autosign test has a test case where a DNSSEC maintaiend zone
has a set of DNSSEC keys without any timing metadata set. It
tests if named picks up the key for publication and signing if a
delayed dnssec-settime/loadkeys event has occured.
The test failed intermittently despite the fact it sleeps for 5
seconds but the triggered key reconfigure action should happen after
3 seconds.
However, the test output showed that the test query came in before
the key reconfigure action was complete (see excerpts below).
The loadkeys command is received:
15:38:36 received control channel command 'loadkeys delay.example.'
The reconfiguring zone keys action is triggered after 3 seconds:
15:38:39 zone delay.example/IN: reconfiguring zone keys
15:38:39 DNSKEY delay.example/NSEC3RSASHA1/7484 (ZSK) is now published
15:38:39 DNSKEY delay.example/NSEC3RSASHA1/7455 (KSK) is now published
15:38:39 writing to journal
And 6 more seconds later the reconfigure keys action is complete:
15:38:47 zone delay.example/IN: next key event: 05-Dec-2019 15:48:39
This commit fixes the test by checking the "next key event" log has
been seen before executing the test query, making sure that the
reconfigure keys action has been complete.
This commit however does not fix, nor explain why it took such a long
time (8 seconds) to reconfigure the keys.
Michał Kępień [Fri, 6 Dec 2019 13:11:01 +0000 (14:11 +0100)]
Automatically run clean.sh from run.sh
The first step in all existing setup.sh scripts is to call clean.sh. To
reduce code duplication and ensure all system tests added in the future
behave consistently with existing ones, invoke clean.sh from run.sh
before calling setup.sh.
Michał Kępień [Fri, 6 Dec 2019 13:11:01 +0000 (14:11 +0100)]
Remove bin/tests/system/clean.sh
Since the role of the bin/tests/system/clean.sh script has now been
reduced to calling a given system test's clean.sh script, remove the
former altogether and replace its only use with a direct invocation of
the latter.
Michał Kępień [Fri, 6 Dec 2019 13:11:01 +0000 (14:11 +0100)]
Remove the -r switch from system test scripts
Since files containing system test output are no longer stored in test
subdirectories, bin/tests/system/clean.sh no longer needs to take care
of removing the test.output file for a given test as testsummary.sh
already takes care of that and even if a test suite terminates
abnormally and another one is started, tee invoked without the -a
command line switch overwrites the destination file if it exists, so
leftover test.output.* files from previous test suite runs are not a
concern. Remove the -r command line switch and the code associated with
it from the relevant scripts.
Michał Kępień [Fri, 6 Dec 2019 13:11:01 +0000 (14:11 +0100)]
Store system test output in bin/tests/system/
Some clean.sh scripts contain overly broad file deletion wildcards which
cause the test.output file (used by the system test framework for
collecting output) in a given system test's directory to be erroneously
removed immediately after the test is started (due to setup.sh scripts
calling clean.sh at the beginning). This prevents the test's output
from being placed in bin/tests/system/systests.output at the end of a
test suite run and thus can lead to test failures being ignored. Fix by
storing each test's output in a test.output.<test-name> file in
bin/tests/system/, which prevents clean.sh scripts from removing it (as
they should only ever affect files contained in a given system test's
directory).
Michał Kępień [Fri, 6 Dec 2019 13:11:01 +0000 (14:11 +0100)]
Detect missing system test results
At the end of each system test suite run, the system test framework
collects all existing test.output files from system test subdirectories
and produces bin/tests/system/systests.output from those files.
However, it does not check whether a test.output file was found for
every executed test. Thus, if the test.output file is accidentally
deleted by the system test itself (e.g. due to an overly broad file
removal wildcard present in clean.sh), its output will not be included
in bin/tests/system/systests.output. Since the result of each system
test suite run is determined by bin/tests/system/testsummary.sh, which
only operates on the contents of bin/tests/system/systests.output, this
can lead to test failures being ignored. Fix by ensuring the number of
test results found in bin/tests/system/systests.output is equal to the
number of tests run and triggering a system test suite failure in case
of a discrepancy between these two values.
Mark Andrews [Thu, 28 Nov 2019 22:59:03 +0000 (09:59 +1100)]
Assign fctx->client when fctx is created rather when the join happens.
This prevents races on fctx->client whenever a new fetch joins a existing
fetch (by calling fctx_join) as it is now invariant for the active life of
fctx.
Mark Andrews [Mon, 2 Dec 2019 04:11:50 +0000 (15:11 +1100)]
Make fctx->attributes atomic.
FCTX_ATTR_SHUTTINGDOWN needs to be set and tested while holding the node
lock but the rest of the attributes don't as they are task locked. Making
fctx->attributes atomic allows both behaviours without races.
Michał Kępień [Mon, 2 Dec 2019 15:03:23 +0000 (16:03 +0100)]
Move xmlInitThreads()/xmlCleanupThreads() calls
xmlInitThreads() and xmlCleanupThreads() are called from within
named_statschannels_configure() and named_statschannels_shutdown(),
respectively. Both of these functions are executed by worker threads,
not the main named thread. This causes ASAN to report memory leaks like
the following one upon shutdown (as long as named is asked to produce
any XML output over its configured statistics channels during its
lifetime):
Direct leak of 968 byte(s) in 1 object(s) allocated from:
#0 0x7f677c249cd8 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x7f677bc1838f in xmlGetGlobalState (/usr/lib/libxml2.so.2+0xa838f)
The data mentioned in the above report is a libxml2 state structure
stored as thread-specific data. Such chunks of memory are automatically
released (by a destructor passed to pthread_key_create() by libxml2)
whenever a thread that allocated a given chunk exits. However, if
xmlCleanupThreads() is called by a given thread before it exits, the
destructor will not be invoked (due to xmlCleanupThreads() calling
pthread_key_delete()) and ASAN will report a memory leak. Thus,
xmlInitThreads() and xmlCleanupThreads() must not be called from worker
threads. Since xmlInitThreads() must be called on Windows in order for
libxml2 to work at all, move xmlInitThreads() and xmlCleanupThreads()
calls to the main named thread (which does not produce any XML output
itself) in order to prevent the memory leak from being reported by ASAN.
Michał Kępień [Mon, 2 Dec 2019 14:15:06 +0000 (15:15 +0100)]
Fix GeoIP2 memory leak upon reconfiguration
Loaded GeoIP2 databases are only released when named is shut down, but
not during server reconfiguration. This causes memory to be leaked
every time "rndc reconfig" or "rndc reload" is used, as long as any
GeoIP2 database is in use. Fix by releasing any loaded GeoIP2 databases
before reloading them. Do not call dns_geoip_shutdown() until server
shutdown as that function releases the memory context used for caching
GeoIP2 lookup results.
Ondřej Surý [Fri, 22 Nov 2019 10:39:57 +0000 (11:39 +0100)]
Request exclusive access when crashing via fatal()
When loading the configuration fails, there might be already other tasks
running and calling OpenSSL library functions. The OpenSSL on_exit
handler is called when exiting the main process and there's a timing
race between the on_exit function that destroys OpenSSL allocated
resources (threads, locks, ...) and other tasks accessing the very same
resources leading to a crash in the system threading library. Therefore,
the fatal() function needs to request exlusive access to the task
manager to finish the already running tasks and exit only when no other
tasks are running.