Marek Vavrusa [Mon, 23 May 2016 00:56:50 +0000 (17:56 -0700)]
daemon: support event.socket(fd, cb) for I/O events
this allows embedding other event loops or just
asynchronous events triggered by socket activity.
this is required for things like cooperative
HTTP server, monitoring endpoint or remote
configuration daemon/controller
Marek Vavrusa [Sun, 22 May 2016 03:58:11 +0000 (20:58 -0700)]
worker: fixed corruption when follower timeouts, early free
* when enqueued task terminated earlier than leader
task because of timeout, it wasn't dequeued from
the waitlist immediately, but it didn't have any
outstanding outbound queries. when leader task
terminated, it removed this task and updated its
outbound query, which didn't exist. this triggered
a 16B write in undefined location
* fixed timeout timer being scheduled for closing
without holding reference to parent task
Marek Vavrusa [Sun, 15 May 2016 21:14:53 +0000 (14:14 -0700)]
lib: cache api v2, removed dep on libknot db.h
this change introduces new API for cache backends,
that is a subset of knot_db_api_t from libknot
with several cache-specific operations
major changes are:
* merged 'cachectl' module into 'cache' as it is
99% default-on and it simplifies things
* not transaction oriented, transactions may be
reused and cached for higher performance
* scatter/gather API, this is important for
latency and performance of non-local backends
like Redis
* faster and reliable cache clearing
* cache-specific operations (prefix scan, ...) in
the API not hacked in
* simpler code for both backends and caller
Marek Vavrusa [Sun, 15 May 2016 21:08:45 +0000 (14:08 -0700)]
contrib/lmdb: imported LMDB 0.9.18, built-in
by default, build system attempts to use LMDB
from the system. however if it's not found or
the version is too old, it uses the built-in
snapshot in contrib
Marek Vavrusa [Wed, 11 May 2016 07:40:35 +0000 (00:40 -0700)]
daemon/worker: deduplicate inbound queries
many clients do frequent retransmits of the query
to avoid network losses and get better service,
but then fail to work properly when a resolver
answers SERVFAIL to some of them because of the
time limit and some of them NOERROR.
it's also a good idea to avoid wasting time
tracking pending tasks to solve the same thing.
Marek Vavrusa [Wed, 11 May 2016 00:45:12 +0000 (17:45 -0700)]
daemon: do not modify task for outgoing queries
if the upstream TCP query timeouted or the connection
was severed, it would dissociate the handle from
original query, so the query would be solved
but the requestor wouldn't see the answer unless
he requeried
Marek Vavrusa [Fri, 6 May 2016 06:40:28 +0000 (23:40 -0700)]
lib: cleanup servfail soft-fails
* simplified soft-fail per-ns limit to per-query
limit, each query gets 4 tries at resolving
* instead of locking at single servfailing NS,
penalise it and run reelection, this may or
may not try other servers but avoids pathologic
case when single NS is servfailing while others
are good but never probed
* added new nsrep update mode (addition)
Marek Vavrusa [Wed, 4 May 2016 00:17:53 +0000 (17:17 -0700)]
lib/validate: fixed memory bug
this code used memory pool of source packet instead
of the answer, this could result in invalidated
memory read if the memory occupied by source
packet was rewritten
Marek Vavrusa [Tue, 3 May 2016 06:56:20 +0000 (23:56 -0700)]
daemon: out-of-order processing for TCP
* daemon now processes messages over TCP stream
out-of-order and concurrently
* support for TCP_DEFER_ACCEPT
* support for TCP Fast-Open
* there are now deadlines for TCP for idle/slow
streams (to prevent slowloris; pruning)
* there is now per-request limit on timeouts
(each request is allowed 4 timeouts before bailing)
* faster request closing, unified retry/timeout timers
* rare race condition in timer closing fixed
Marek Vavrusa [Mon, 18 Apr 2016 03:34:31 +0000 (20:34 -0700)]
daemon: mode(strict|normal|permissive)
the daemon has now three modes of strictness
checking from strict to permissive.
it reflects the tradeoff between resolving the
query in as few steps as possible and security
for insecure zones
Marek Vavrusa [Mon, 18 Apr 2016 00:32:17 +0000 (17:32 -0700)]
engine: clear bad scorers from RTT every 5 minutes
an internal timer walks RTT timer periodically and
clears entries with bad results every 5 minutes.
this means that a timeouted entry penalty is
capped to that interval, making sure that the
bad reputation doesn't last forever
Marek Vavrusa [Mon, 18 Apr 2016 00:29:41 +0000 (17:29 -0700)]
engine: throttle outbound queries only when busy
resolver will always attempt to contact upstreams
known to be bad if it's not busy.
this fixes a problem on low-volume resolvers
where a short connection outage could make
resolvers deny resolving queries even after the
connection is restored
Marek Vavrusa [Fri, 15 Apr 2016 07:03:13 +0000 (00:03 -0700)]
lib/iterate: QUERY_PERMISSIVE mode
in permissive mode, resolver is free to use
(but not cache) non-mandatory glue records even
if they're not resolvable. this is great as a
workaround for broken child-side zones, but
not great for security of, well, insecure
delegations. it's off by default.
Marek Vavrusa [Tue, 8 Mar 2016 17:26:19 +0000 (17:26 +0000)]
daemon: track case when all upstreams fail
previously full timeout led to reset of the evaluated
address list and no upstream server was penalised
for not answering the query, this penalises all of
tried servers with TIMEOUT
Marek Vavrusa [Wed, 24 Feb 2016 06:40:17 +0000 (22:40 -0800)]
modules/graphite: support for Graphite/TCP
graphite module now supports sending over TCP,
if the connection is severed it will attempt to
reconnect periodically. the stats module is now
optional, if not loaded only core built-in stats
will be transmitted
Marek Vavrusa [Mon, 8 Feb 2016 01:36:48 +0000 (01:36 +0000)]
lib/validate: scrubbed extra rrs in NS were checked
the validator module should ignore any data that
will be scrubbed, that includes non-authoritative
data outside current bailiwick. previously,
validator attempted to ignore these records only
for answer section and had a special case for NS
records.
cache: non-authoritative NS records are always
unchecked and must be treated as insecure
affected: www.iana.org trying to provide
delegation information for CNAME target, which is
moot with CNAME target explicit-fetch policy unless
the the resolver already knows DNSKEY with which
is could verify the records
Marek Vavrusa [Fri, 22 Jan 2016 23:59:40 +0000 (15:59 -0800)]
daemon/lua: rrset printing, new flags
this is a temporary change until luajit-kdns is
merged-in with complete functionality,
this will break the API later and will require a
couple changes in several modules and trust anchors
Marek Vavrusa [Fri, 22 Jan 2016 07:48:58 +0000 (23:48 -0800)]
scripts: kresd-query.lua (new)
this is a boilerplate for a CLI utility to resolve
names and execute script on query response
in another words, "a jq for resolver answers"
this is a scaffolding for alternative tools like
'host' or a plug-in part for scripting around it.
it basically starts a kresd instance, but doesn't
bind to any interface or read configuration,
then a query + callback is sent to kresd standard
input, and it quits after the execution
Marek Vavrusa [Fri, 22 Jan 2016 07:42:17 +0000 (23:42 -0800)]
lib/resolve: new flag ALWAYS_CUT
when raised, a response zone cut will be recovered
even if the response came from cache. this is
normally not needed (and incurs additional cache
lookups), but it may be useful for
inspection