Daniel Mack [Tue, 22 Dec 2015 10:37:09 +0000 (11:37 +0100)]
core: re-sync bus name list after deserializing during daemon-reload
When the daemon reloads, it doesn not actually give up its DBus connection,
as wrongly stated in an earlier commit. However, even though the bus
connection stays open, the daemon flushes out all its internal state.
Hence, if there is a NameOwnerChanged signal after the flush and before the
deserialization, it cannot be matched against any pending unit.
To fix this, rename bus_list_names() to manager_sync_bus_names() and call
it explicitly at the end of the daemon reload operation.
Daniel Mack [Fri, 18 Dec 2015 16:28:15 +0000 (17:28 +0100)]
core: fix bus name synchronization after daemon-reload
During daemon-reload, PID1 temporarly loses its DBus connection, so there's
a small window in which all signals sent by dbus-daemon are lost.
This is a problem, since we rely on the NameOwnerChanged signals in order to
consider a service with Type=dbus fully started or terminated, respectively.
In order to fix this, a rewrite of bus_list_names() is necessary. We used
to walk the current list of names on the bus, and blindly triggered the
bus_name_owner_change() callback on each service, providing the actual name
as current owner. This implementation has a number of problems:
* We cannot detect if the the name was moved from one owner to the other
while we were reloading
* We don't notify services which missed the name loss signal
* Providing the actual name as current owner is a hack, as the comment also
admits.
To fix this, this patch carries the following changes:
* Track the name of the current bus name owner, and (de-)serialize it
during reload. This way, we can detect changes.
* In bus_list_names(), walk the list of bus names we're interested in
first, and then see if the name is active on the bus. If it is,
check it it's still the same as it used to be, and synthesize
NameOwnerChanged signals for the name add and/or loss.
This should fully synchronize the current name list with the internal
state of all services.
Michael Scherer [Sun, 20 Dec 2015 12:23:33 +0000 (13:23 +0100)]
Add Seal option in the configuration file for journald-remote
While journal received remotely can be sealed, it can only be done
on the command line using --seal, so for consistency, we will
also permit to set it in the configuration file.
resolved: propagate DNSSEC validation status from auxiliary transactions
Let's make sure we propagate the DNSSEC validation status from an
auxiliary DNSSEC transaction back to the originating transaction, to
improve the error messages we generate.
We have many types of failure for a transaction, and
DNS_TRANSACTION_FAILURE was just one specific one of them, if the server
responded with a non-zero RCODE. Hence let's rename this, to indicate
which kind of failure this actually refers to.
This adds a new DnsAnswer item flag "DNS_ANSWER_SHARED_OWNER" which is
set for mDNS RRs that lack the cache-flush bit. The cache-flush bit is
removed from the DnsResourceRecord object in favour of this.
This also splits out the code that removes previous entries when adding
new positive ones into a new separate call dns_cache_remove_previous().
Let's use dns_cache_remove() rather than
dns_cache_item_remove_and_free() to destroy the cache, since the former
requires far fewer hash table lookups.
resolved: when receiving a TTL=0 RR, only flush that specific RR
When we receieve a TTL=0 RR, then let's only flush that specific RR and
not the whole RRset.
On mDNS with RRsets that a shared-owner this is how specific RRs are
removed from the set, hence support this. And on non-mDNS the whole
RRset will already be removed much earlier in dns_cache_put() hence
there's no reason remove it again.
resolved: move DNS class utilities to dns-type.c and add more helpers
Let's make DNS class helpers more like DNS type helpers, let's move them
from resolved-dns-rr.[ch] into dns-type.[ch].
This also adds two new calls dns_class_is_pseudo() and
dns_class_is_valid_rr() which operate similar to dns_type_is_pseudo()
and dns_type_is_valid_rr() but for classes instead of types.
This should hopefully make handling of DNS classes and DNS types more
alike.
resolved: add support NSEC3 proofs, as well as proofs for domains that are OK to be unsigned
This large patch adds a couple of mechanisms to ensure we get NSEC3 and
proof-of-unsigned support into place. Specifically:
- Each item in an DnsAnswer gets two bit flags now:
DNS_ANSWER_AUTHENTICATED and DNS_ANSWER_CACHEABLE. The former is
necessary since DNS responses might contain signed as well as unsigned
RRsets in one, and we need to remember which ones are signed and which
ones aren't. The latter is necessary, since not we need to keep track
which RRsets may be cached and which ones may not be, even while
manipulating DnsAnswer objects.
- The .n_answer_cachable of DnsTransaction is dropped now (it used to
store how many of the first DnsAnswer entries are cachable), and
replaced by the DNS_ANSWER_CACHABLE flag instead.
- NSEC3 proofs are implemented now (lacking support for the wildcard
part, to be added in a later commit).
- Support for the "AD" bit has been dropped. It's unsafe, and now that
we have end-to-end authentication we don't need it anymore.
- An auxiliary DnsTransaction of a DnsTransactions is now kept around as
least as long as the latter stays around. We no longer remove the
auxiliary DnsTransaction as soon as it completed. THis is necessary,
as we now are interested not only in the RRsets it acquired but also
in its authentication status.
resolved: make sure we don't get confused when notifying transactions while they are destroyed
A failing transaction might cause other transactions to fail too, and
thus the set of transactions to notify for a transaction might change
while we are notifying them. Protect against that.
resolved: cache stringified transaction key once per transaction
We end up needing the stringified transaction key in many log messages,
hence let's simplify the logic and cache it inside of the transaction:
generate it the first time we need it, and reuse it afterwards. Free it
when the transaction goes away.
This also updated a couple of log messages to make use of this.
This was the case that caused various problems that were fixed in
preceding patches, so it is good to add a test that uses it directly.
In "may_fail" test cases try again with a bigger buffer.
Instead of allocating various buffers on the stack, malloc them.
This is more reliable in case of big buffers, and allows tools like
valgrind and address sanitizer to find overflows more easily.
journal: add "xfail" test for partial lz4 decompression
Add a test that LZ4_decompress_safe_partial does (not) work as
expected, so that if it starts to work at some point, we'll catch
this and adjust our code.
journal: fix reporting of output size in compres_stream_lz4
The header is 7 bytes, and this size was not accounted for in
total_out. This means that we could create a file that was 7 bytes
longer than requested, and the debug output was also inconsistent.
journal: add dst_allocated_size parameter for compress_blob
compress_blob took src, src_size, dst and *dst_size, but dst_size
wasn't used as an input parameter with the size of dst, but only as an
output parameter. dst was implicitly assumed to be at least src_size-1.
This code wasn't *wrong*, because the only real caller in
journal-file.c got it right. But it was misleading, and the tests in
test-compress.c got it wrong, and worked only because the output
buffer happened to be the same size as input buffer. So add a seperate
dst_allocated_size parameter to make it explicit what the size of the
buffer is, and to allow test to proceed with different output buffer
sizes.
journal: in some cases we have to decompress the full lz4 field
lz4 has to decompress a whole "sequence" at a time. When the compressed
data is composed of a repeating pattern, the whole set of repeats has
do be docompressed, and the output buffer has to be big enough.
This is unfortunate, because potentially the slowdown is very big. We
are only interested in the field name, but we might have to decompress
the whole thing. But the full cost will be borne out only when the
full entry is a repeating pattern. In practice this shouldn't happen
(apart from tests and the like). Hopefully lz4 will be fixed to avoid
this problem, or it will grow a new function which we can use [1], so
this fix should be remporary.
journal: decompress_startswith can return an error
The return value was used directly in an if, so an error was treated
as success; we need to bail out instead. An error should not happen,
unless we have a compression/decompression mismatch, so output a debug
line.
journal: properly handle an unexpectedly missing field
parse_field() checks if the field has the expected format, and returns
0 if it doesn't. In that case, value and size are not
set. Nevertheless, we would try to continue, and hit an assert in
safe_atou64. This case shouldn't happen, unless sd_j_get_data is borked,
so cleanly assert that we got the expected field.
Also, oom is the only way that parse_field can fail, which we log
already. Instead of outputting a debug statement and carrying on,
treat oom as fatal.
Output the same message when a request to change the log level is
received over dbus and through a signal. From the user point of view
those two operations are very similar and it's easy to think that the
dbus operation didn't work when the expected message is not emitted.
Also "downgrade" the message level to info, since this is a normal
user initiated action.
manager: move status output change debug messages to set function
This way we can only print the debug message when the status actually
changes. We also means we don't print anything when running in --user
mode, where status output is always disabled.
This changes answer validation to be more accepting to unordered RRs in
responses. The agorithm we now implement goes something like this:
1. populate validated keys list for this transaction from DS RRs
2. as long as the following changes the unvalidated answer list:
2a. try to validate the first RRset we find in unvalidated answer
list
2b. if that worked: add to validated answer; if DNSKEY also add to
validated keys list; remove from unvalidated answer.
2c. continue at 2a, with the next RRset, or restart from the
beginning when we hit the end
3. as long as the following changes the unvalidated answer list:
3a. try to validate the first RRset again. This will necessarily
fail, but we learn the precise error
3b. If this was a "primary" response to the question, fail the
entire transaction. "Primary" in this context means that it is
directly a response to the query, or a CNAME/DNAME for it.
3c. Otherwise, remove the RRset from the unvalidated answer list.
Note that we the too loops in 2 + 3 are actually coded as a single one,
but the dnskeys_finalized bool indicates which loop we are currently
processing.
Note that loop 2 does not drop any invalidated RRsets yet, that's
something only loop 3 does. This is because loop 2 might still encounter
additional DNSKEYS which might validate more stuff, and if we'd already
have dropped those RRsets we couldn't validate those anymore. The first
loop is hence a "constructive" loop, the second loop a "destructive"
one: the first one validates whatever is possible, the second one then
deletes whatever still isn't.
This adds a new validation result DNSSEC_UNSUPPORTED_ALGORITHM which is
returned when we encounter an unsupported crypto algorithm when trying
to validate RRSIG/DNSKEY combinations. Previously we'd return ENOTSUPP
in this case, but it's better to consider this a non-error DNSSEC
validation result, since our reaction to this case needs to be more like
in cases such as expired or missing keys: we need to keep continue
validation looking for another RRSIG/DNSKEY combination that works
better for us.
This also reworks how dnssec_validate_rrsig_search() propagates errors
from dnssec_validate_rrsig(). Previously, errors such as unsupported
algorithms or expired signatures would not be propagated, but simply be
returned as "missing-key".
resolved: rework how and when the number of answer RRs to cache is determined
Instead of figuring out how many RRs to cache right before we do so,
determine this at the time we install the answer RRs, so that we can
still alter this as we manipulate the answer during validation.
The primary purpose of this is to pave the way so that we can drop
unsigned RRsets from the answer and invalidate the number of RRs to
cache at the same time.