Marek Vavruša [Tue, 28 Aug 2018 17:32:45 +0000 (10:32 -0700)]
allow the delegation change if NS has both zones and child is in bailiwick
Previous change required the new delegation to contain QNAME, but
that's not always the case with chasing CNAME chains. This relaxes
the requirement to QNAME from the response packet, so it always
follows the next chased name.
Marek Vavruša [Thu, 23 Aug 2018 00:37:01 +0000 (17:37 -0700)]
lib/defines: added configurable SRTT limits and lowered probing rate
This adds configurable SRTT limits for various network environments.
The probing rate is reduced from 10% to 1%, so that badly connected
nameservers are not selected as often. This is apparent on zone cuts
with a lot of nameservers, e.g. .com. With the probing rate set to
1% the average response time for 300 queries is almost 50% better:
Marek Vavruša [Thu, 23 Aug 2018 00:34:57 +0000 (17:34 -0700)]
lib/cache: add LMDB_NO_DROP to always unlink file without mdb_drop
The mdb_drop() can get slow when the cache size is several GB,
so that the init daemon times out when waiting for the daemon to start,
and keeps restarting it in a loop.
Marek Vavruša [Tue, 21 Aug 2018 06:59:24 +0000 (23:59 -0700)]
cache/api: prevent deadlock on kr_cache_remove with multiple processes
Only the `kr_cache_remove_subtree` called `kr_cache_sync` to commit
the write transaction after cache removal operation. This wasn't
done in the `kr_cache_remove` so the write transaction could be
long-lived.
With two or more processes, if one help the write transaction open,
no other process could open it. If the process holding the transaction
would call IPC to other processes and wait, it would never release
it and the other processes could never acquire it, and deadlock
would occur.
Marek Vavruša [Fri, 17 Aug 2018 07:43:36 +0000 (00:43 -0700)]
daemon/worker: this fixes connect bug, and error handling from TLS writes
Previously, when the chosen protocol for the next message was SOCK_STREAM,
the TLS upgrade checked next selected address instead of retry address.
The error handling loop for uncorking TLS data was wrong, as the
underlying push function is asynchronous and there's no relationship
between completed DNS packet writes and number of TLS message writes.
In case of the asynchronous function, the buffered data must be valid
until the write is complete, currently this is not guaranteed and
loading the resolver with pipelined requests results in memory errors:
```
$ getdns_query @127.0.0.1#853 -s -a -s -l L -B -F queries -q
...
==47111==ERROR: AddressSanitizer: heap-use-after-free on address 0x6290040a1253 at pc 0x00010da960d3 bp 0x7ffee2628b30 sp 0x7ffee26282e0
READ of size 499 at 0x6290040a1253 thread T0
#0 0x10da960d2 in wrap_write (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x1f0d2)
#1 0x10d855971 in uv__write (libuv.1.dylib:x86_64+0xf971)
#2 0x10d85422e in uv__stream_io (libuv.1.dylib:x86_64+0xe22e)
#3 0x10d85b35a in uv__io_poll (libuv.1.dylib:x86_64+0x1535a)
#4 0x10d84c644 in uv_run (libuv.1.dylib:x86_64+0x6644)
#5 0x10d602ddf in main main.c:422
#6 0x7fff6a28a014 in start (libdyld.dylib:x86_64+0x1014)
0x6290040a1253 is located 83 bytes inside of 16895-byte region [0x6290040a1200,0x6290040a53ff)
freed by thread T0 here:
#0 0x10dacdfdd in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x56fdd)
#1 0x10d913c2e in _mbuffer_head_remove_bytes (libgnutls.30.dylib:x86_64+0xbc2e)
#2 0x10d915080 in _gnutls_io_write_flush (libgnutls.30.dylib:x86_64+0xd080)
#3 0x10d90ca18 in _gnutls_send_tlen_int (libgnutls.30.dylib:x86_64+0x4a18)
#4 0x10d90edde in gnutls_record_send2 (libgnutls.30.dylib:x86_64+0x6dde)
#5 0x10d90f085 in gnutls_record_uncork (libgnutls.30.dylib:x86_64+0x7085)
#6 0x10d5f6569 in tls_push tls.c:238
#7 0x10d5e5b2a in qr_task_send worker.c:1002
#8 0x10d5e2ea6 in qr_task_finalize worker.c:1562
#9 0x10d5dab99 in qr_task_step worker.c
#10 0x10d5e12fe in worker_process_tcp worker.c:2410
```
The current implementation adds opportunistic uv_try_write which
either writes the requested data, or returns UV_EAGAIN or an error,
which then falls back to slower asynchronous write that copies the buffered data.
The function signature is changed from simple write to vectorized write.
This also enables TLS False Start to save 1RTT when possible.
Anbang Wen [Tue, 14 Aug 2018 23:10:10 +0000 (16:10 -0700)]
support multiple addresses in daf src/dst filter
This enables using syntax like "src { CIDR-a CIDR-b } deny" to specify
multiple addresses to filter. All the conditions are ORed together
like qname/ns.
Marek Vavruša [Wed, 15 Aug 2018 22:12:37 +0000 (15:12 -0700)]
lib/nsrep: flag nameservers not supporting TCP
The previous behavior was to flag nameserver that doesn't respond
over TCP as dead, but this doesn't work in case when nameserver
only supports UDP (e.g. duckdns.org).
The new behavior flags nameservers that don't support TCP separately
in reputation cache. The effect of that is that when UDP times out,
another nameserver is elected instead of retrying over TCP and blacklisting
the nameserver.
Marek Vavruša [Thu, 9 Aug 2018 18:54:45 +0000 (11:54 -0700)]
daemon/tls: TLS parameters are refcounted by client sessions
The TLS parameters are shared between client sessions, but they
can be removed from the server during runtime, so a care must be
taken so that the parameters are not freed while sessions use them.
This commit adds reference counting to TLS parameters, so that they
remain valid until the last session using them is closed.
Marek Vavruša [Fri, 3 Aug 2018 22:04:45 +0000 (15:04 -0700)]
daemon/worker: fixes for handling of some non-fatal TLS errors, metrics
The handshake now properly deals with GNUTLS_E_INTERRUPTED to retry,
and GNUTLS_E_WARNING_ALERT_RECEIVED and GNUTLS_E_GOT_APPLICATION_DATA
during session resumption.
Added a metric for monitoring TLS handshake errors.
Added `net.tls_handshake_timeout([milliseconds])` for configurable
TLS handshake timeout (default is 6000ms), and documentation for
`net.tcp_in_idle([milliseconds])`.
Marek Vavruša [Sun, 5 Aug 2018 02:38:18 +0000 (19:38 -0700)]
cache: cache RRSIGs in packet cache
This will enable caching of RRSIG queries in packet cache.
The RRSIGs are cached as insecure as they don't have a signature.
Bogus RRSIGs won't be cached as they have to first pass the validator.
Marek Vavruša [Wed, 1 Aug 2018 20:38:10 +0000 (13:38 -0700)]
daemon/worker: always invalidate upstream address list
This makes sure that the whole upstream address list is invalidated
on address selection, and retransmit doesn't send query to invalid
upstream in case all other choices are exhausted.
Marek Vavruša [Tue, 31 Jul 2018 22:12:25 +0000 (15:12 -0700)]
daemon: allow opportunistic DNS over TLS to origins
This commit allows opportunistic DNS over TLS to origins configured
as supporting DoT on port 853. It also adds interface for clearing
configured TLS clients to allow runtime reconfiguration.
The general mode of operation is as follows:
1. Produce a new outgoing query
2. Check if the selected upstream address has configured TLS support on port 853
2a. If it does: upgrade to DNS over TLS, it cannot be downgraded from this point
2b. If not: continue with preferred protocol
This allows further automatic discovery as in [1], but right now it has to be configured
manually.
Marek Vavruša [Tue, 31 Jul 2018 22:07:58 +0000 (15:07 -0700)]
daemon/worker: don't include connection setup for TCP and TLS in RTT
Currently the handshake time is included in the RTT, so TCP and TLS
retries/forwards makes upstreams look bad compared to UDP, and
discourage connection reuse as other "faster" origins end up
with lower score, so they would be preferred.
This commit excludes wait and handshake time, so only the actual
message exchange time is included in the RTT calculation.
Marek Vavruša [Mon, 18 Jun 2018 23:17:53 +0000 (16:17 -0700)]
validate: fix when NS is both parent and child and child is insecure
When NS is both parent and child, it would respond to the final query
without signature and resolver is supposed to ask for DS to prove the
transition to insecure. Previously, this was only checked for NS queries
(made during referral chasing), so it would work for intermediate
nameservers, but not for final.
Marek Vavruša [Mon, 11 Jun 2018 00:12:00 +0000 (17:12 -0700)]
lib/cache/api: generalize ECS code to support longest prefix match
The current semantics is to try to look up an exact match, or
look into global scope. This commit changes that to longest prefix
match, so that wider cache scopes can be used to match against
multiple prefixes. In order to do this, the key scope format needs to
be changed from:
```
u8[] address
u8 scope_len
```
To:
```
u8[4 or 16] address
u8 scope_len
```
The fixed address length is necessary to be able to use lexicographic
lesser-or-equal scan supported by the database. For example, if the
search key is 192.168.0/24 (`\192\168\0\0\24'), any wider prefix
must be lexicographically smaller. The `\192\168\16` wouldn't be,
but the `\192\168\0\0\16` is, hence the key format change.
The fixed key size is also necessary to separate IPv4 scopes from
IPv6 scopes and vice versa. This is checked by comparing the
length of the found lesser-or-equal key - if the length is the same,
and key (without the scope part) is equal, the scope must be the same address
family.
Marek Vavruša [Sat, 9 Jun 2018 04:06:01 +0000 (21:06 -0700)]
iterate: follow CNAMEs in stub mode
There are two forwarder modes in the resolver - full forwarder,
and a stub mode. The full forwarder expects upstream to fully
solve the request (so the upstream must be a recursive resolver).
The stub forwarder mode is primarily useful for directing traffic
to a trusted resolver or authoritative (e.g. forward queries for
an internal zone). The upstream may not know the full answer to
the query, and may answer only from its authority. In that case
the resolver should follow the partially solved CNAME instead
of serving the partial answer.
Marek Vavruša [Wed, 6 Jun 2018 05:23:43 +0000 (22:23 -0700)]
policy: set NS set, support insecure forward in stub
This allows policy filter to modify NS set in the checkout layer.
It also fixes a bug in which invalid peer address would be used
if the first UDP retransmit fails (`choice` would be set to memory
past the `task->addrlist` in `qr_task_step`).
Marek Vavruša [Thu, 31 May 2018 02:06:22 +0000 (19:06 -0700)]
modules/policy: fixed NYIs (vararg function call)
* fixed NYI with vararg calls in policy filter
* fixed NYI with nil returns (incompatible with type pointer returned otherwise)
* fixed tail call returns exceeding trace loop counts
Marek Vavruša [Thu, 24 May 2018 23:08:07 +0000 (16:08 -0700)]
nsrep: verbose probe message, cap timeout value, less aggressive retry
The timeouted NS retry probed is now logged when tracing
Long response RTT is capped to KR_NS_TIMEOUT to smooth out transient errors.
The retry timer minimum interval is increased from 250ms to 500ms, as NSs taking
typically longer than 1s would just waste time retrying.
Marek Vavruša [Sat, 12 May 2018 01:39:12 +0000 (18:39 -0700)]
don't rewrite cached SOA records from negative answers
Currently there's only exception to avoid rewriting secure NS records.
Most of the negative answers provide SOA record, so it's undesirable
to keep rewriting it for every negative answer.
Marek Vavruša [Tue, 1 May 2018 06:20:27 +0000 (23:20 -0700)]
daemon/worker: always try multiple upstreams even if sending fails
Before no other upstreams were tried if qr_task_send or kr_resolve_checkout
failed, which isn't correct, as it doesn't allow blocking of outbound requests.
Marek Vavruša [Fri, 27 Apr 2018 06:27:33 +0000 (23:27 -0700)]
modules/daf,renumber: fixed the modules and added tests
This fixes most of the rules in DAF that were broken in 2.0 and adds tests.
It also allows policy filter to evaluate policies in the checkout layer,
before the subrequest is sent to authoritative. This is used primarily for
negotiating features between resolver and authoritatives, or disabling transports.
The policy filter can now match on:
* NS suffix - to apply policies on any zone on given nameservers
* Query type
New actions:
* REFUSE - block query with an RCODE=REFUSED, fixes #337
The DAF can now toggle features between resolver and authoritatives.
Marek Vavruša [Fri, 27 Apr 2018 06:21:31 +0000 (23:21 -0700)]
daemon/worker: move checkout layer before connect, catch checkout errors
The checkout layer was moved to where upstream address is known, but
before outbound message is sent (or connected to upstream).
The reason is to allow checkout layer to block outbound queries
without wasting time waiting for connect.
Marek Vavruša [Fri, 20 Apr 2018 03:15:19 +0000 (20:15 -0700)]
lib/generic/pack: fix operations on empty pack
Several operations were not safe to call on empty pack and would
return invalid memory. If the pack would have reserved space, but
would be empty (length = 0), it's head would be NULL but tail would
be array address (pack->at + 0). This is mostly checked by caller,
but it wasn't in several places (object deletion).
Marek Vavruša [Thu, 12 Apr 2018 17:35:57 +0000 (10:35 -0700)]
iterate: fix minimisation downgrade when encountering authoritative referrals
This fixes turning off minimisation when there's an authoritative referral
answer on the resolution path. This happens when there's a nameserver,
which is authoritative for both parent and child side of the delegation,
so it answers from the child side with AA=1. Such answer will be mistakenly
processed as authoritative, and QNAME minimisation will be turned off
(assuming this is the final zone cut).
Marek Vavruša [Thu, 12 Apr 2018 08:35:50 +0000 (01:35 -0700)]
nsrep: never blacklist NSs because of SERVFAIL/REFUSED
The SERVFAIL is a soft-failure, and REFUSED isn't something the server
is really in control of. It is easy to trick the resolver into blacklisting
a NS by creating a bad delegation and pointing it at the victim NS.
This changes the scoring function to degrade server score on these rcodes,
but cap it to a really bad score. It should be treated as timed out only
if it really times out or is unreachable.
Marek Vavruša [Thu, 12 Apr 2018 08:32:34 +0000 (01:32 -0700)]
iterate: do not treat REFUSED as soft fail with retries
REFUSED means the NS isn't authoritative for given zone, so it
shouldn't be treated like SERVFAIL. This fixes when a server is not
authoritative for given zone (failed transfer, bad delegation), and the
resolver enters into a retry loop and eventually runs out of time,
instead of trying different servers.
Marek Vavruša [Tue, 10 Apr 2018 06:11:16 +0000 (23:11 -0700)]
implement basic infrastructure for scoped cache
This commit adds support for scoped cache, e.g. keys can be tagged
with a scope, so that the same key can exist in multiple scope and
returns the value based on the scope set.
This is practically requires for scoping by subnet in ECS, but
it doesn't implement ECS completely. This is just a framework
to make something like ECS possible in a module.
The scope search is currently non-exhaustive, it either returns
a value bound to given scope or look into global scope, nothing
in between.
Marek Vavruša [Fri, 6 Apr 2018 05:43:57 +0000 (22:43 -0700)]
check per-query flags instead of global options, getter for NS name
Checking query flags instead of global context option allows setting
overrides on individual queries. The effect is the same as query flags
start by copying request flags which start by copying context options.
Marek Vavruša [Fri, 6 Apr 2018 05:48:51 +0000 (22:48 -0700)]
add bindings for the checkout layer
This one was missing from the current bindings. The checkout layer
runs when the worker attempts to send a DNS query to given upstream
when the address is already determined. The layer can add EDNS options
or update outbound query, or block particular addresses / protocol.
Marek Vavruša [Tue, 3 Apr 2018 21:04:32 +0000 (14:04 -0700)]
lib/resolve: don't append EDNS to garbage packets
The current handler will try to construct the compression table
starting with query name in question. If there's no query name,
it's going to construct it with garbage bytes.
Marek Vavruša [Mon, 2 Apr 2018 23:42:42 +0000 (16:42 -0700)]
modules/http: added an error handler to HTTP streams
Instead of throwing an error in the HTTP handler, server should log it.
This covers errors like client disconnecting before reading the response
body etc.
Marek Vavruša [Sat, 24 Mar 2018 04:00:37 +0000 (21:00 -0700)]
resolve: always update QNAME after zone cut update
Previously the code didn't update query if the minimization was turned off,
but that broke resolution for deep zones (like in-addr.arpa) when part of
the chain fell out of cache, and nearest zone cut was longer than
current query name. The condition is not necessary, since kr_make_query
already checks for query name minimisation flag.
Marek Vavruša [Mon, 12 Mar 2018 04:04:19 +0000 (21:04 -0700)]
cache: restored kr_cache_insert_rr API
This commit abstracts out stash_rrset from stash_rrarray_entry,
and fixes incrementing metrics on actual record insertion.
It then resurfaces kr_cache_insert_rr that was deleted in 2.0
using the extracted function.