Otto Moerbeek [Wed, 25 Aug 2021 09:32:02 +0000 (09:32 +0000)]
Improve the rec bulk test script
- Exit if rec did not start up
- Status requesting commands (rec_control and kill -USR1) failures are non-fatal
except for the last 'ping' command.
- Increase timeout of rec_control command (to help investigating issues on buildbot)
The script is run with -e, so failure will lead to exit without killing
the running recursor atm.
Remi Gacogne [Tue, 24 Aug 2021 09:23:54 +0000 (11:23 +0200)]
dnsdist: Cache based on the DNS flags of the query after applying the rules
The tentative fix in dbadb4d272a3317407e6bc934f55c2d41a87c0ac actually
introduced an issue, because the backend might not perfectly echo the
RD and CD flags as they were in the query.
We can't use the "original" (before applying rules) flags either, so
we need to store the flags as they were sent to the backend to be
able to correctly store them in the cache.
Otto [Thu, 19 Aug 2021 07:08:53 +0000 (09:08 +0200)]
Prometheus help texts and general cleanup. Example output:
pdns_recursor_policy_hits 10
pdns_recursor_policy_hits{type="filter"} 3
pdns_recursor_policy_hits{type="rpz",policyname="rpz.local"} 5
pdns_recursor_policy_hits{type="rpz",policyname="rpzFile"} 2
Otto [Tue, 20 Jul 2021 12:07:20 +0000 (14:07 +0200)]
Keep a count of per rpz (or filter) hits, by default only exported via
Prometheus. After #10554 is merged the Promethus help info should be added
to this branch.
The general idea has been borrowed from Rust's locks: instead of
defining two objects, the one to be protected, T, and the lock, we
define a single LockGuarded<T> object which contains the object.
That provides two big advantages:
- it is immediately clear which data is protected by the lock
- that data simply can't be accessed without holding the lock.
Remi Gacogne [Mon, 16 Aug 2021 10:51:15 +0000 (12:51 +0200)]
dnsdist: Fix the wrong RD and CD flags being cached, causing misses
We used to restore the RD and CD flags from the initial query before
inserting the response into the cache. That would cause an issue
if the flags had been altered, for example via SetNoRecurseAction,
as the cache lookup is done _after_ the actions have been applied
and thus after the flags altered.
If the initial query had the RD bit set, and thus was cleared by the
rule, the response would have been inserted with the RD bit restored,
and no lookup would then succeed because it would be done with the
bit cleared.
This commit fixes the insertion to use the RD and CD bits as set in
the response before restoring them, and restores the RD and CD bits
after a cache hit as well, to ensure that:
- cache lookups are done after the rules are applied
- cache insertions are done before the flags are restored
Remi Gacogne [Mon, 16 Aug 2021 08:01:04 +0000 (10:01 +0200)]
Fix a warning about catching a polymorphic exception type by value
```
decafsigners.cc: In member function ‘virtual bool DecafED25519DNSCryptoKeyEngine::verify(const string&, const string&) const’:
decafsigners.cc:140:11: warning: catching polymorphic type ‘class decaf::CryptoException’ by value [-Wcatch-value=]
140 | } catch(CryptoException) {
| ^~~~~~~~~~~~~~~
decafsigners.cc: In member function ‘virtual bool DecafED448DNSCryptoKeyEngine::verify(const string&, const string&) const’:
decafsigners.cc:276:11: warning: catching polymorphic type ‘class decaf::CryptoException’ by value [-Wcatch-value=]
276 | } catch(CryptoException) {
| ^~~~~~~~~~~~~~~
```
Otto [Wed, 11 Aug 2021 11:14:37 +0000 (13:14 +0200)]
If we get an NS from the cache, it still could be one forwarding applies to.
Take that into acount when determining dont-query status. Should fix #10638.
Remi Gacogne [Thu, 5 Aug 2021 06:50:55 +0000 (08:50 +0200)]
Consistently return the number of ready events, not descriptor
We might have two events for the same descriptor, readable AND
writable. It was already counted as two separate events by the
kqueue multiplexer but not by the other ones.
Remi Gacogne [Wed, 4 Aug 2021 12:35:53 +0000 (14:35 +0200)]
Handle waiting for a descriptor to become readable OR writable
This commit refactors our multiplexers to be able to wait for a
descriptor to become readable OR writable at the same time.
I kept the two separate maps for an easier handling of the separate
TTD and to limit the amount of changes, but we might want to merge
them into a single map in the future.
The accounting is moved into the parent class instead of being dealt
with by the multiplexers themselves.
I noticed that the poll multiplexer allocates and fills a vector of
pollfd for every call to run(), which seems wasteful, but I did not
want to touch that in this commit.
I did not compile or test the kqueue, ports and /dev/poll multiplexers
yet, so don't merge this without testing them first.