Otto Moerbeek [Tue, 20 Dec 2022 11:30:54 +0000 (12:30 +0100)]
Use correct logic for isEntryUsable()
Existing code was correct but used the wrong name for the method:
isEntryUsable() actually tested for isUnusable, but the caller
compensated for it. Reverse logic to make it more clear.
Otto Moerbeek [Mon, 19 Dec 2022 08:08:14 +0000 (09:08 +0100)]
Some systems have a low-resolution nanosleep(2), calling it will
sleep for at least a few ms. Compensate for that by running fewer
loops with longer sleeps.
Also use dns_random and make sure it is initlized properly for all tests.
Otto Moerbeek [Wed, 14 Dec 2022 13:06:36 +0000 (14:06 +0100)]
Generate EDE in more cases, specifically on unreachable auths or synthesized results.
As there is no specific EDE for synthesised, use noError with a text.
We have to be careful here: a single client query can lead to
multiple beginResolve calls. Some of these are done after the main
result has been looked up, for example to validate the result. These
subsequent calls can generate EDE's but we do not want to copy the
EDE to the main result in those cases. A typical example would be
an absent DS for an Insecure domain. Nothing wrong with these but
we do not want the potential absent DS EDE (which could be synthesize)
to be returned with the main query,
To solve this, mimic the processing of validation state and add
an extra argument to a few methods.
I am not terribly happy with the extra argument. Maybe we should
move to an object holding the parameters and result status of the
nested or subsequent calls. This would also avoid some of the saveX,
setX, beginResolve, restore X sequences.
Remi Gacogne [Wed, 14 Dec 2022 15:19:49 +0000 (16:19 +0100)]
dnsdist: Stop the responders more quickly during the tests
We use `SO_REUSEPORT` in these tests so if the old responder is
still around when the next test starts, it is quite likely that
it might get one of the new queries. This is usually fine because
responders with a different behaviour listen on different ports,
but if a query is queued to an old responder socket right during
the time that responder is checking whether it should stop and
the actual exit, the query will be lost.
Otto Moerbeek [Tue, 13 Dec 2022 11:25:12 +0000 (12:25 +0100)]
rec: make response stats a tcounter object
This allows for the packet cache hit path to record response stats without performance impact.
The qtype and rcode counters are capped, as i ran into trouble with
the thread stack sizes on macOS and OpenBSD. See the source comment
for explanation.
Remi Gacogne [Mon, 12 Dec 2022 14:42:57 +0000 (15:42 +0100)]
dnsdist: Disable the send wrappers in our CI
The way the send wrappers are implemented, reading the data _after_
it has been sent, cause them to report a data race that does not
exist with existing implementations:
- we call `send()` from thread 1 to send a query to a backend, never
touching the data or associated metadata again from that thread
- we get a response from the backend in a different thread, thread 2,
which will then access the metadata and sometimes (truncated UDP
answers following a DoH query) even modify the data itself
- ASAN and TSAN complain because the wrapper might still be reading
the data after the UDP datagram has been sent, which is effectively
a race, but it does not really make any sense for an actual
implementation of `send()` to do that.
We work around that by disabling the `send()` wrappers in our CI,
for the dnsdist regression tests only, via `intercept_send=0`.
Otto Moerbeek [Tue, 6 Dec 2022 10:18:56 +0000 (11:18 +0100)]
Close a race: some test immediately query metrics when a query result is received.
to avoid ordering issues, update metrics snap before answers are sent out.
Otto Moerbeek [Tue, 8 Nov 2022 10:10:54 +0000 (11:10 +0100)]
Introducing TCounters
This is a mostly lockless (and not using atomics) way to keep track of
counters and other metrics.
Atomic value are more expensive than you would think (especially if
your platform has no native atomic support for your data type), and
using locking all the time for often updated counters is very
expensive as well.
The idea for `TCounters` is based on
https://github.com/ahupowerdns/mcounter
But addresses the issues raised in
https://github.com/ahupowerdns/mcounter/issues/3
Templates are used, the application has to provide a specific class to
hold the values and enums to index these values. The application
specific class also has to provide a `merge()` method to merge two
instances of the application specific data. For counters that is
simple: just add them. Averages (or histogrfam) requires a bit more
work. This is demonstrated in `rec-tcounters.{cc,hh}`
At the end of a body of work the application's threads should call the
`updateSnap()` function. If a certain amount of time has passed since
the last time, a thread local snapshot of the thread local data will
be created in a thread-safe way.
The class that collects the aggregated values reads (also in a thread
safe way) from the snapshot values in each thread.
Updates of individual counters are done on thread-local data,
potentially many times per second. The snaps contain a consistent set
of the values and are taken by default once per 100ms, so reletively
seldom.
By using the snap mnechanism the aggragate values computed are based
on internally consistent counter values (as long as related counters
are updated from the same thread). A (small) drawback is that the
values computed might be a bit out of date.
The snapshot approach was suggested by @wojas.
This PR de demonstrates `TCounters` for a few Recursor metrics: simple
counters and double typed average values. For the latter weights are
kept, so that the average of averages can be computed in a proper way.