Grigorii Demidov [Mon, 22 Oct 2018 08:48:51 +0000 (10:48 +0200)]
daemon: TCP connection timeouting has changed; connection was closed after peer's inactivity before, now it is closed after incativity in both directions (peer->kresd, kresd->peer); prevents connection from closing before answer sent to client
Vladimír Čunát [Wed, 10 Oct 2018 15:00:07 +0000 (17:00 +0200)]
cache: avoid some crashes
It's actually possible to start queries without any cache open,
e.g. add `resolve('.', kres.type.DNSKEY)` into your configuration.
If that happens, avoid the cache module dereferencing a NULL pointer.
Vladimír Čunát [Wed, 19 Sep 2018 14:57:24 +0000 (16:57 +0200)]
trie_it nitpick: reduce the initial stack size
Using 2kB as start is a bit too much, and it was showing as 0.5% CPU
on malloc() called during trie_it_begin(). Let's start at 0.5 kB,
as it can grow anyway (only negligible in profile now).
Vladimír Čunát [Mon, 15 Oct 2018 13:08:38 +0000 (13:08 +0000)]
Merge !675: daemon: first part of refactoring
- mainly the daemon/session.* files are separated,
moving lots of logic from daemon/worker.*;
- lib/generic/queue.* are added;
- verbose logging gets different IDs;
- various minor changes around.
Vladimír Čunát [Thu, 4 Oct 2018 12:43:54 +0000 (14:43 +0200)]
daemon/session nitpick: avoid a warning
lint:clang-scan-build reported:
> warning: The code calls sizeof() on a pointer type.
> This can produce an unexpected result
but in our case it's intentional.
(Yes, using pointers as keys in trie isn't very pretty.)
Vladimír Čunát [Wed, 19 Sep 2018 18:50:29 +0000 (20:50 +0200)]
daemon nitpick cleanups
- Some (potentially) unused vars were left behind.
- The two on_* functions are identical except for the uv types passed,
and those are surely the same in the part we use, but it's not worth
to deduplicate when these functions are only two and so simple.
- lint:c was complaining about the uv_tcp_t malloc().
Vladimír Čunát [Wed, 19 Sep 2018 18:05:08 +0000 (20:05 +0200)]
daemon: drop RECVMMSG_BATCH
The support hasn't landed in libuv over all the years,
and we've been still reserving memory for it in advance.
Also comment on the singleton buffer usage.
Vladimír Čunát [Wed, 19 Sep 2018 17:39:26 +0000 (19:39 +0200)]
worker: safer code around the mempool freelist
I did NOT remove this one, as in a quick profile that would be
increase in roughly 0.5% time in malloc, so that's possibly justifiable.
(And this one is much less obstructing to splitting the worker code.)
Vladimír Čunát [Wed, 19 Sep 2018 16:39:17 +0000 (18:39 +0200)]
worker: remove freelists for iohandle and iorequest
A quick profiling showed no change in performance,
and in particular no change in time spent in malloc/free.
Some of the types in the union differed in size by a multiple.
If their performance won't be satisfying, replacements should be
considered first (e.g. jemalloc) before rolling our own stuff.
daemon: logic around struct session was relocated to separate module; input data buffering scheme was changed (libuv); attempt was made to simplify processing of the stream
Vladimír Čunát [Fri, 14 Sep 2018 08:21:43 +0000 (10:21 +0200)]
misc nitpicks
- \param family, esp. don't rely on AF_UNSPEC being zero
- kres_gnutls_vec_push(): don't uv_write() if ENOMEM
- tls_client_params_clear(): remove unused function
Marek Vavruša [Fri, 17 Aug 2018 07:43:36 +0000 (00:43 -0700)]
daemon/worker: fixes error handling from TLS writes
The error handling loop for uncorking TLS data was wrong, as the
underlying push function is asynchronous and there's no relationship
between completed DNS packet writes and number of TLS message writes.
In case of the asynchronous function, the buffered data must be valid
until the write is complete, currently this is not guaranteed and
loading the resolver with pipelined requests results in memory errors:
```
$ getdns_query @127.0.0.1#853 -s -a -s -l L -B -F queries -q
...
==47111==ERROR: AddressSanitizer: heap-use-after-free on address 0x6290040a1253 at pc 0x00010da960d3 bp 0x7ffee2628b30 sp 0x7ffee26282e0
READ of size 499 at 0x6290040a1253 thread T0
#0 0x10da960d2 in wrap_write (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x1f0d2)
#1 0x10d855971 in uv__write (libuv.1.dylib:x86_64+0xf971)
#2 0x10d85422e in uv__stream_io (libuv.1.dylib:x86_64+0xe22e)
#3 0x10d85b35a in uv__io_poll (libuv.1.dylib:x86_64+0x1535a)
#4 0x10d84c644 in uv_run (libuv.1.dylib:x86_64+0x6644)
#5 0x10d602ddf in main main.c:422
#6 0x7fff6a28a014 in start (libdyld.dylib:x86_64+0x1014)
0x6290040a1253 is located 83 bytes inside of 16895-byte region [0x6290040a1200,0x6290040a53ff)
freed by thread T0 here:
#0 0x10dacdfdd in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x56fdd)
#1 0x10d913c2e in _mbuffer_head_remove_bytes (libgnutls.30.dylib:x86_64+0xbc2e)
#2 0x10d915080 in _gnutls_io_write_flush (libgnutls.30.dylib:x86_64+0xd080)
#3 0x10d90ca18 in _gnutls_send_tlen_int (libgnutls.30.dylib:x86_64+0x4a18)
#4 0x10d90edde in gnutls_record_send2 (libgnutls.30.dylib:x86_64+0x6dde)
#5 0x10d90f085 in gnutls_record_uncork (libgnutls.30.dylib:x86_64+0x7085)
#6 0x10d5f6569 in tls_push tls.c:238
#7 0x10d5e5b2a in qr_task_send worker.c:1002
#8 0x10d5e2ea6 in qr_task_finalize worker.c:1562
#9 0x10d5dab99 in qr_task_step worker.c
#10 0x10d5e12fe in worker_process_tcp worker.c:2410
```
The current implementation adds opportunistic uv_try_write which
either writes the requested data, or returns UV_EAGAIN or an error,
which then falls back to slower asynchronous write that copies the buffered data.
The function signature is changed from simple write to vectorized write.
This also enables TLS False Start to save 1RTT when possible.
Vladimír Čunát [Wed, 12 Sep 2018 12:59:46 +0000 (14:59 +0200)]
cache: improve out-of-disk condition
When suspect SIGBUS happens, print helpful error and try to remove
the cache, so that the service might work again if auto-restarted.
Theoretically we could longjmp() out of the SIGBUS handler,
but that would be rather messy, so let the process die.
Petr Špaček [Thu, 23 Aug 2018 08:16:50 +0000 (10:16 +0200)]
ci: update Deckard in attempt to make CI more reliable
Changes related to monotonic fake time and detection logic for overload
should make CI a little bit more reliable. It should be even better once
we combine overload-detection with some kind of auto-retry.