Vladimír Čunát [Fri, 12 Feb 2021 09:06:25 +0000 (10:06 +0100)]
daemon/udp_queue: drop the error logging
We should do this for all transports and probably just in verbose mode.
We were printing lots of these on Turris OS (for one user at least):
https://forum.turris.cz/t/5-1-8-kresd-throwing-many-errors-in-var-log-messages/14775
EACCESS in particular apparently may happen (on Linux) when the network
is "unavailable", EPERM because of firewall/netfilter:
https://stackoverflow.com/a/23869102
Vladimír Čunát [Wed, 10 Feb 2021 11:56:14 +0000 (12:56 +0100)]
modules/{http,watchdog}: fix stability problems
As first noted in commit d1a229ae9, in some cases we do call chains that
are not supported for JIT in LuaJIT.
I'm not 100% sure all of these are needed to comply, but the functions
here are really small and probably not to be that heavily used,
so I don't think it will be costly to interpret them
(and avoiding crashes is more important).
In my tests this fixed occasional crashes when using http://*/trace/*
Vladimír Čunát [Thu, 28 Jan 2021 10:37:05 +0000 (11:37 +0100)]
policy.ANSWER: minor fixes, mainly around NODATA answers
- return SOA in NODATA answers and allow customizing it
- only call ensure_answer() if really generating an answer
(otherwise we might e.g. deplete XDP buffers, in extreme cases)
Vladimír Čunát [Mon, 1 Feb 2021 09:09:16 +0000 (10:09 +0100)]
when FORMERR comes, differentiate based on OPT
In particular, non-support of EDNS is implied iff FORMERR without OPT
comes. If OPT is there, one possibility is that there was something
wrong in the OPT that *we* sent, but it seems much more likely that
this particular server is just bad and we want to try another one.
https://tools.ietf.org/html/rfc6891#section-7
In particular, we would be in trouble if we dropped OPT in a zone
that is covered by DNSSEC.
Vladimír Čunát [Mon, 1 Feb 2021 08:57:46 +0000 (09:57 +0100)]
lib/selection: rename to *_FORMERR for consistency
It's now consistent with KNOT_RCODE_FORMERR and the official name
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6
Vladimír Čunát [Tue, 26 Jan 2021 11:25:09 +0000 (12:25 +0100)]
lib/selection: refactor kr_selection_error_str()
This way leaves less room for mistakes, etc. It's just the idea from:
https://gitlab.nic.cz/knot/knot-resolver/-/commit/dd0c99bdb6332ba3628833a8543a5f9f33141ddd#note_191580
Štěpán Balážik [Wed, 20 Jan 2021 18:33:14 +0000 (19:33 +0100)]
selection_iter: relax NSNXAttack mitigation
Previously the mitigation would stop some longer benign resolutions.
We can safely zero the subquery counter when choose a concrete transport
for the query (i.e. NS name with known IP address).
Štěpán Balážik [Wed, 20 Jan 2021 15:19:18 +0000 (16:19 +0100)]
selection: force resolution of new NS name after lame delegation
Lame delegations are weird, they breed more lame delegations on broken
zones since trying another server from the same set usualy doesn't help.
We force resolution of another NS name in hope of getting somewhere.
Štěpán Balážik [Tue, 19 Jan 2021 12:39:04 +0000 (13:39 +0100)]
iterate.c: don't copy NO_MINIMIZE when following a CNAME
Instead copy it from the request's options.
Reasoning: Minimization might have been turned off as a workaround for
broken authoritative servers which doesn't support it. There is no
reason to drop minimization when switching zones when following a CNAME.
Štěpán Balážik [Thu, 14 Jan 2021 14:39:31 +0000 (15:39 +0100)]
iterate: rework error handling from iterate.c
Previously there where resolve_badmsg and resolve_error functions used
to apply workarounds. This is now moved to selection.c and iterate.c
just provides feedback using the server selection API. Errors are now
handled centrally in selection.c:error.
Vladimír Čunát [Mon, 18 Jan 2021 08:16:52 +0000 (09:16 +0100)]
ci lint:scan-build: work around changes in meson
In 1f7678ea24 meson was updated and that broke our scan-build.
Now we work around that. Quick analysis of why:
https://github.com/mesonbuild/meson/pull/5918#issuecomment-762064902
Vladimír Čunát [Tue, 5 Jan 2021 15:59:48 +0000 (16:59 +0100)]
tests/dnstap: fix, refactor and integrate into meson and CI
They need one go package which I can't find even in Debian,
so it probably can't work without network access.
The new dnstap in extra_tests runs if dnstap is built and go is found.
It also tries to keep the source tree clean.
Now both query and reply messages are tested.
In CI (after caching go deps in image) this tests only takes slightly
more time than the longest config.* tests, so that seems OK.
Even so, it's not added into the valgrind variant, as compilation
of the test still isn't split away from the run itself.
Vladimír Čunát [Mon, 4 Jan 2021 09:50:18 +0000 (10:50 +0100)]
treewide: avoid memset where it's trivial
More idiomatic code seems better:
- for variable initialization we have = { 0 }
- (mm_)calloc for heap allocations
sizeof: use variable instead of type (where suitable; not sure why)
Vladimír Čunát [Tue, 22 Dec 2020 11:44:39 +0000 (12:44 +0100)]
copy mempattern files from Knot DNS as they are
It seems just easier than having the copies in the current way.
I don't think the `static inline` were helping us anyway,
except for avoiding KR_EXPORT in some cases.
Still, differences when copying:
- we use plain memset() in the implementation
(no motivation here to use the complex memzero() approach)
- we expose mm_malloc(), as we've been referring to it
- we KR_EXPORT some of the functions (for lua and modules)
Vladimír Čunát [Mon, 4 Jan 2021 15:28:52 +0000 (16:28 +0100)]
dnstap: represent DoT and DoH
(instead of marking them as TCP)
This includes latest dnstap.proto, except for keeping our local changes
of the licensing comment.
https://github.com/dnstap/dnstap.pb/blob/master/dnstap.proto
Pavel Dolezal [Tue, 15 Dec 2020 12:27:50 +0000 (13:27 +0100)]
dnstap: multiple changes
- log queries and responses as separate dnstap messages
- use "query" instead of "request" to mirror dnstap specification
- don't export "query_zone" field in "CLIENT_*" messages
Jakub Ružička [Fri, 8 Jan 2021 16:23:17 +0000 (17:23 +0100)]
distro: introduce upstream cznic release prefix
using cznic.1 release string for upstream packages ensures they are
prefered over downstream ones and that their versions don't collide
which was causing issues for users with both downstream and upstream
packaging repos enabled.
Following statements are true according to `dpkg --compare-versions`:
1.2.3-1 < 1.2.3-cznic.1
1.2.3-999 < 1.2.3-cznic.1
So upstream packages should always take precedence over downstream
packages of the same version.
Tomas Krizek [Tue, 5 Jan 2021 12:08:35 +0000 (13:08 +0100)]
daemon/http: fix memleak
The http_data structure is allocated in http_write_pkt() and the last
callback that uses it is on_pkt_write(), so it should be responsible for
freeing the memory.
This used to leak a small amount of memory on every DoH response.
Štěpán Balážik [Wed, 6 Jan 2021 12:06:25 +0000 (13:06 +0100)]
daemon/worker.c: fix warning from compilation without asserts
I kept the changes (especially the one in qr_task_on_send) as local as
possible while hopefully preserving the invariants other functions in
worker rely upon.
Vladimír Čunát [Tue, 22 Dec 2020 10:29:39 +0000 (11:29 +0100)]
lib/selection: minor refactorings and comments
Small things I've noticed while reading it all.
- line breaks: I believe <90 is OK, as usually the attempts to reduce
lengths impair readability
- avoid unnecessary casts; usually the type was visible
on the same line anyway
- avoid `|` on booleans
- one block gets de-indented (often badly shown in diffs)
- no need for UNRECOVERABLE_ERRORS in a header (and a weird one, too)
- recoverability from failed assertions (in case they're turned off)
Vladimír Čunát [Tue, 29 Dec 2020 14:51:50 +0000 (15:51 +0100)]
lib/selection: tweak computation of RTT estimates
- fix switched \alpha and \beta from the RFC (no big deal, I think)
- use the same order as in the RFC (perhaps that caused the switch?)
- avoid floating-point arithmetics (it's simple with these formulas)
- simplify the the backoff formula (MINs instead of branches)
Vladimír Čunát [Tue, 29 Dec 2020 08:28:16 +0000 (09:28 +0100)]
lib/selection: be more careful around rtt_state.dead_since
It's all because the timestamp that we're using isn't (guaranteed to be)
meaningful across reboots or different machines, whereas our cache even
persists by default.
Vladimír Čunát [Mon, 28 Dec 2020 09:09:18 +0000 (10:09 +0100)]
lib/selection: tweak how cache is used
- standardize cache key choice and ensure impossibility of collisions
- comment on interaction with GC; it would be better to give RTT
priority over most of other records
- be more robust wrt. value in cache