Tom Krizek [Thu, 12 Jan 2023 15:26:25 +0000 (16:26 +0100)]
Collect existing tests in pytest runner
Ensure pytest picks up our python test modules, since we're using
tests_*.py convention, which is different from the default.
Configure pytest logging to display the output from all tests (even the
ones that passed). This ensures we have sufficient amount of information
to debug test post-mortem just from the artifacts.
Tom Krizek [Thu, 12 Jan 2023 14:48:55 +0000 (15:48 +0100)]
Support both pytest and legacy system test runner
The legacy system test framework uses pytest to execute some tests.
Since it'd be quite difficult to convince pytest to decide whether to
include conftest.py (or which ones to include when launching from
subdir), it makes more sense to have a shared conftest.py which is used
by both the legacy test runner invocations of pytest and the new pytest
system test runner. It is ugly, but once we drop support for the legacy
runner, we'll get rid of it.
Properly scope the *port fixtures in order to ensure they'll work as
expected with the new pytest runner. Instead of using "session" (which
means the fixture is only evaluated once for the entire execution of
pytest), use "module" scope, which is evaluated separately for each
module. The legacy runner invoked pytest for each system test
separately, while the new pytest runner is invoked once for all system
tests -- therefore it requires the more fine-grained "module" scope to
for the fixtures to work properly.
Remove python shebang, as conftest.py isn't supposed to be an executable
script.
Tony Finch [Wed, 17 May 2023 09:57:16 +0000 (10:57 +0100)]
Acquire qpmulti->mutex during destruction
Thread sanitizer warns that parts of the qp-trie are accessed
both with and without the mutex; the unlocked accesses happen during
destruction, so they should be benign, but there's no harm locking
anyway to convince tsan it is clean.
Also, ensure .tsan-suppress and .tsan-suppress-extra are in sync.
Michał Kępień [Thu, 18 May 2023 13:12:23 +0000 (15:12 +0200)]
Remove <isc/cmocka.h>
The last use of the cmocka_add_test_byname() helper macro was removed in
commit 63fe9312ff8f54bc79e399bdbd5aaa15cd3e5459. Remove the
<isc/cmocka.h> header that defines it.
Michał Kępień [Thu, 18 May 2023 13:12:23 +0000 (15:12 +0200)]
Fix cmocka-related compiler warnings in ht_test
tests/isc/ht_test.c triggers the following compiler warnings when built
against development versions of cmocka:
In file included from ht_test.c:24:
ht_test.c: In function ‘test_ht_full’:
ht_test.c:68:45: warning: passing argument 2 of ‘_assert_ptr_equal’ makes pointer from integer without a cast [-Wint-conversion]
68 | assert_ptr_equal((void *)i, (uintptr_t)f);
/usr/include/cmocka.h:1513:56: note: in definition of macro ‘assert_ptr_equal’
1513 | #define assert_ptr_equal(a, b) _assert_ptr_equal((a), (b), __FILE__, __LINE__)
| ^
/usr/include/cmocka.h:2907:36: note: expected ‘const void *’ but argument is of type ‘long unsigned int’
2907 | const void *b,
| ~~~~~~~~~~~~^
ht_test.c:163:45: warning: passing argument 2 of ‘_assert_ptr_equal’ makes pointer from integer without a cast [-Wint-conversion]
163 | assert_ptr_equal((void *)i, (uintptr_t)f);
/usr/include/cmocka.h:1513:56: note: in definition of macro ‘assert_ptr_equal’
1513 | #define assert_ptr_equal(a, b) _assert_ptr_equal((a), (b), __FILE__, __LINE__)
| ^
/usr/include/cmocka.h:2907:36: note: expected ‘const void *’ but argument is of type ‘long unsigned int’
2907 | const void *b,
| ~~~~~~~~~~~~^
These are caused by a change to the definitions of pointer assert
functions in cmocka's development branch [1]. Fix by casting the
affected variables to (void *) instead of (uintptr_t).
Michał Kępień [Thu, 18 May 2023 13:12:23 +0000 (15:12 +0200)]
Include <inttypes.h> whenever including <cmocka.h>
Development versions of cmocka require the intmax_t and uintmax_t types
to be defined by the time the test code includes the <cmocka.h> header.
These types are defined in the <stdint.h> header, which is included by
the <inttypes.h> header, which in turn is already explicitly included by
some of the programs in the tests/ directory. Ensure all programs in
that directory that include the <cmocka.h> header also include the
<inttypes.h> header to future-proof the code while keeping the change
set minimal and the resulting code consistent. Also prevent explicitly
including the <stdint.h> header in those programs as it is included by
the <inttypes.h> header.
Mark Andrews [Tue, 16 May 2023 00:15:00 +0000 (10:15 +1000)]
Let RSASHA1 signing keys be ignored in FIPS mode
When the FIPS provider is available, RSASHA1 signing keys for zone
"example.com." are ignored if the zone is attempted to be signed with
the dnssec-signzone "-F" (FIPS mode) option:
Evan Hunt [Mon, 15 May 2023 20:22:24 +0000 (13:22 -0700)]
Match UQ and UR stats to domain name
The upforwd test for forwarding updates to a dead primary can continue
running a little bit past its end, causing update replies to be
recorded during a subsequent test case. Correct this by only looking
for update requests and replies for the specific domain name being
tested at any given time.
Tony Finch [Mon, 15 May 2023 19:04:14 +0000 (20:04 +0100)]
Fix the `upforwd` system test
After the RCU changes were merged, the `upforwd` test started
consistenly failing when run under thread sanitizer. After some
investigation, it turned out that retry attempts were continuing after
the "update forwarding to dead primary" test. This caused mismatches
in the DNSTAP message counts for the subsequent tests, because they
were also counting retries.
Fix this problem by `wait`ing for the `nsupdate` processes to exit.
While investigating the bug, I replaced several fixed 15 second delays
with `wait_for_log`, so the test runs faster.
Tony Finch [Mon, 15 May 2023 10:42:33 +0000 (11:42 +0100)]
Fixes for liburcu-qsbr
Move registration and deregistration of the main thread from
`isc_loopmgr_run()` into `isc__initialize()` / `isc__shutdown()`:
liburcu-qsbr fails an assertion if we try to use it from an
unregistered thread, and we need to be able to use it when the
event loops are not running.
Use `rcu_assign_pointer()` and `rcu_dereference()` in qp-trie
transactions so that they properly mark threads as online. The
RCU-protected pointer is no longer declared atomic because
liburcu does not (yet) use standard C atomics.
Fix the definition of `isc_qsbr_rcu_dereference()` to return
the referenced value, and to call the right function inside
liburcu.
Change the thread sanitizer suppressions to match any variant of
`rcu_*_barrier()`
Tony Finch [Mon, 15 May 2023 14:35:57 +0000 (15:35 +0100)]
Check the return value from uv_async_send()
An omission pointed out by the following report from Coverity:
/lib/isc/loop.c: 483 in isc_loopmgr_pause()
>>> CID 455002: Error handling issues (CHECKED_RETURN)
>>> Calling "uv_async_send" without checking return value (as is done elsewhere 5 out of 6 times).
483 uv_async_send(&loop->pause_trigger);
Evan Hunt [Sun, 14 May 2023 06:31:45 +0000 (23:31 -0700)]
allow streamdns read to resume after timeout
when reading on a streamdns socket failed due to timeout, but
the dispatch was still waiting for other responses, it would
resume reading by calling isc_nm_read() again. this caused
an assertion because the socket was already reading.
we now check that either the socket is reading, or that it was
already reading on the same handle.
Tony Finch [Fri, 5 May 2023 07:12:35 +0000 (08:12 +0100)]
Use per-CPU RCU helper threads
Create and free per-CPU helper threads from the main thread and tell
thread sanitizer to suppress leaking threads. (We are not leaking
threads ourselves and we can safely ignore the Userspace-RCU thread
leaks.)
Tony Finch [Thu, 4 May 2023 14:26:13 +0000 (15:26 +0100)]
Help thread sanitizer to cope with liburcu
All the places the qp-trie code was using `call_rcu()` needed
`__tsan_release()` and `__tsan_acquire()` annotations, so
add a couple of wrappers to encapsulate this pattern.
With these wrappers, the tests run almost clean under thread
sanitizer. The remaining problems are due to `rcu_barrier()`
which can be suppressed using `.tsan-suppress`. It does not
suppress the whole of `liburcu`, because we would like thread
sanitizer to detect problems in `call_rcu()` callbacks, which
are called from `liburcu`.
The CI jobs have been updated to use `.tsan-suppress` by
default, except for a special-case job that needs the
additional suppressions in `.tsan-suppress-extra`.
We might be able to get rid of some of this after liburcu gains
support for thread sanitizer.
Note: the `rcu_barrier()` suppression is not entirely effective:
tsan sometimes reports races that originate inside `rcu_barrier()`
but tsan has discarded the stack so it does not have the
information required to suppress the report. These "races" can
be made much easier to reproduce by adding `atexit_sleep_ms=1000`
to `TSAN_OPTIONS`. The problem with tsan's short memory can be
addressed by increasing `history_size`: when it is large enough
(6 or 7) the `rcu_barrier()` stack usually survives long enough
for suppression to work.
Tony Finch [Wed, 3 May 2023 14:29:43 +0000 (15:29 +0100)]
Avoid using the zone timer after its loop has gone
Shutdown and cleanup of zones is more asynchronous with the qp-trie
zone table. As a result it's possible that some activity is delayed
until after a zone has been released from its zonemanager.
Previously, the dns_zone code was not very strict in the way it
refers to the loop it is running on: The loop pointer was stashed when
dns_zonemgr_managezone() was called and never cleared. Now, zones
properly attach to and detach from their loops.
The zone timer depends on its loop. The shutdown crashes occurred
when asynchronous calls tried to modify the zone timer after
dns_zonemgr_releasezone() has been called and the loop was
invalidated. In these cases the attempt to set the timer is now
ignored, with a debug log message.
Tony Finch [Wed, 8 Mar 2023 14:28:06 +0000 (14:28 +0000)]
Refactor the core qp-trie code to use liburcu
A `dns_qmpulti_t` no longer needs to know about its loopmgr. We no
longer keep a linked list of `dns_qpmulti_t` that have reclamation
work, and we no longer mark chunks with the phase in which they are to
be reclaimed. Instead, empty chunks are listed in an array in a
`qp_rcu_t`, which is passed to call_rcu().
Tony Finch [Fri, 28 Apr 2023 13:10:03 +0000 (14:10 +0100)]
Wait for RCU to finish before destroying a memory context
Memory reclamation by `call_rcu()` is asynchronous, so during shutdown
it can lose a race with the destruction of its memory context. When we
defer memory reclamation, we need to attach to the memory context to
indicate that it is still in use, but that is not enough to delay its
destruction. So, call `rcu_barrier()` in `isc_mem_destroy()` to wait
for pending RCU work to finish before proceeding to destroy the memory
context.
Tony Finch [Thu, 9 Mar 2023 10:43:53 +0000 (10:43 +0000)]
A macro for the size of a struct with a flexible array member
It can be fairly long-winded to allocate space for a struct with a
flexible array member: in general we need the size of the struct, the
size of the member, and the number of elements. Wrap them all up in a
STRUCT_FLEX_SIZE() macro, and use the new macro for the flexible
arrays in isc_ht and dns_qp.
Evan Hunt [Wed, 3 May 2023 21:28:37 +0000 (14:28 -0700)]
add 'rndc -t' option to set timeout
Allow an arbitrary TCP timeout value to be specified when running
rndc, so that commands that take a long time to execute (for example,
reloading a very large configuration) can be given time to do so.
Aram Sargsyan [Thu, 11 May 2023 12:08:13 +0000 (12:08 +0000)]
Check whether zone->db is a valid pointer before attaching
The zone_resigninc() function does not check the validity of
'zone->db', which can crash named if the zone was unloaded earlier,
for example with "rndc delete".
Check that 'zone->db' is not 'NULL' before attaching to it, like
it is done in zone_sign() and zone_nsec3chain() functions, which
can similarly be called by zone maintenance.
Ondřej Surý [Tue, 9 May 2023 11:38:37 +0000 (13:38 +0200)]
Add Userspace-RCU to global CFLAGS and LIBS
The Userspace-RCU headers are now needed for more parts of the libisc
and libdns, thus we need to add it globally to prevent compilation
failures on systems with non-standard Userspace-RCU installation path.
Ondřej Surý [Mon, 8 May 2023 21:31:54 +0000 (23:31 +0200)]
Change the isc_quota API to use cds_wfcqueue internally
The isc_quota API was using locked list of isc_job_t objects to keep the
waiting TCP accepts. Change the isc_quota implementation to use
cds_wfcqueue internally - the enqueue is wait-free and only dequeue
needs to be locked.
Ondřej Surý [Tue, 9 May 2023 08:24:37 +0000 (10:24 +0200)]
Adjust the udp_shutdown_connect to delay the check
The teardown jobs are not executed immediately, so we need to delay the
check for ISC_R_SHUTTINGDOWN even more (as the UDP connect is
synchronous, it makes it harder to test it).
Ondřej Surý [Mon, 8 May 2023 21:31:54 +0000 (23:31 +0200)]
Change the isc_async API to use cds_wfcqueue internally
The isc_async API was using lock-free stack (where enqueue operation was
not wait-free). Change the isc_async to use cds_wfcqueue internally -
enqueue and splice (move the queue members from one list to another) is
nonblocking and wait-free.
Ondřej Surý [Wed, 3 May 2023 17:38:05 +0000 (19:38 +0200)]
Replace glue_cache hashtable with direct link in rdatasetheader
Instead of having a global hashtable with a global rwlock for the GLUE
cache, move the glue_list directly into rdatasetheader and use
Userspace-RCU to update the pointer when the glue_list is empty.
Additionally, the cached glue_lists needs to be stored in the RBTDB
version for early cleaning, otherwise the circular dependencies between
nodes and glue_lists will prevent nodes to be ever cleaned up.
Michal Nowak [Thu, 27 Apr 2023 08:30:54 +0000 (10:30 +0200)]
Disable ASAN in nsupdate for fatal cases
Clang 16 LeakSanitizer reports a memory leak when dns_request_create()
returned a TLS error in the nsupdate system test. While technically a
memory leak on error handling, it's not a problem because the program is
immediately terminated; nsupdate is not expected to run for a prolonged
time.
Tony Finch [Wed, 10 May 2023 16:15:21 +0000 (17:15 +0100)]
Avoid lossage from <stdnoreturn.h>
A few of the source files in `tests/ns` included `<isc/util.h>`
before `<cmocka.h>`. This could cause compile failures because the
`CMOCKA_NORETURN` macro is defined as `__attribute__((noreturn))`
and `<stdnoreturn.h>` defines `noreturn` as `_Noreturn` which does
not work as a gcc-style attribute.
Mark Andrews [Mon, 8 May 2023 07:39:51 +0000 (17:39 +1000)]
Handle FORMERR on unknown EDNS option that are echoed
If the resolver received a FORMERR response to a request with
an DNS COOKIE option present that echoes the option back, resend
the request without an DNS COOKIE option present.
The check_if_done() function can pass control back out to
dighost_shutdown() (which is part of dig.c, host.c, or nslookup.c),
and calling that twice can cause unexpected problems, if it is not
designed to be idempotent.
Since cancel_lookup() calls check_if_done() implicitly, don't call
check_if_done() again when 'next' is NULL.
Tom Krizek [Thu, 6 Apr 2023 12:05:30 +0000 (14:05 +0200)]
Ensure named always terminates in the shutdown test
Previously, if an exception would happen inside the `with` block, the
error handler would wait indefinitely for the process to end. That would
never happen, since the termination signal was never sent to named and
the test would get stuck.
Using the try-finally block ensures that the named process is always
killed and any exception or errors will be handled gracefully.
Tom Krizek [Thu, 6 Apr 2023 12:01:43 +0000 (14:01 +0200)]
Refactor shutdown test into more helper functions
Improve code readability by splitting the test into more functions. Some
could be re-used later on for more general-purpose subprocess handling
or named checks.