resolved: when we receive an reply which is OPT-less or RRSIG-less, downgrade what we verified
If we receive a reply that lacks the OPT RR, then this is reason to downgrade what was verified before, as it's
apparently no longer true, and the previous OPT RR we saw was only superficially OK.
Similar, if we realize that RRSIGs are not augmented, then also downgrade the feature level that was verified, as
DNSSEC is after all not supported. This check is in particular necessary, as we might notice the fact that RRSIG is not
augmented only very late, when verifying the root domain.
Also, when verifying a successful response, actually take in consideration that it might have been reported already
that RRSIG or OPT are missing in the response.
resolved: downgrade server feature level more aggressively when we have reason to
This adds logic to downgrade the feature level more aggressively when we have reason to. Specifically:
- When we get a response packet that lacks an OPT RR for a query that had it. If so, downgrade immediately to UDP mode,
i.e. don't generate EDNS0 packets anymore.
- When we get a response which we are sure should be signed, but lacks RRSIG RRs, we downgrade to EDNS0 mode, i.e.
below DO mode, since DO is apparently not really supported.
This should increase compatibility with servers that generate non-sensical responses if they messages with OPT RRs and
suchlike, for example the situation described here:
resolved: ignore invalid OPT RRs in incoming packets
This validates OPT RRs more rigorously, before honouring them: if we any of the following condition holds, we'll ignore
them:
a) Multiple OPT RRs in the same message
b) OPT RR not owned by the root domain
c) OPT RR in the wrong section (Belkin routers do this)
d) OPT RR contain rfc6975 algorithm data (Belkin routers do this)
e) OPT version is not 0
f) OPT payload doesn't add up with the lengths
Note that d) may be an indication that the server just blindly copied OPT data from the response into the reply.
RFC6975 data is only supposed to be included in queries, and we do so. It's not supposed to be included in responses
(and the RFC is very clear on that). Hence if we get it back in a reply, then the server probably just copied the OPT
RR.
This new test case tries to resolve a couple of known domains, to verify the validation results. It talks to resolved
via the bus, thus comprehensively testing the whole shebang.
Of course, it requires network connectivity and a DNSSEC capable DNS server, hence this is a manual test.
- When checking if a domain is non-existing, also check that no wildcard for it exists
- Ensure we don't base "covering" tests on NSEC RRs from a parent zone
- Refuse to accept expanded wildcard NSEC RRs for absence proofs.
resolved: on negative NODATA replies, properly deal with empty non-terminals
empty non-terminals generally lack NSEC RRs, which means we can deduce their existance only from the fact that there
are other RRs that contain them in their suffix. Specifically, the NSEC proof for NODATA on ENTs works by sending the
NSEC whose next name is a suffix of the queried name to the client. Use this information properly.
resolved: when validating an RRset, store information about the synthesizing source and zone in each RR
Having this information available is useful when we need to check whether various RRs are suitable for proofs. This
information is stored in the RRs as number of labels to skip from the beginning of the owner name to reach the
synthesizing source/signer. Simple accessor calls are then added to retrieve the signer/source from the RR using this
information.
This also moves validation of a a number of RRSIG parameters into a new call dnssec_rrsig_prepare() that as side-effect
initializes the two numeric values.
core: fix memory leak on set-default, enable, disable etc
Fixes:
==1== by 0x23E44C: remove_marked_symlinks_fd (install.c:453)
==1== by 0x23E256: remove_marked_symlinks_fd (install.c:405)
==1== by 0x23E630: remove_marked_symlinks (install.c:494)
==1== by 0x2427A0: unit_file_disable (install.c:1876)
==1== by 0x18A633: method_disable_unit_files_generic (dbus-manager.c:1760)
==1== by 0x18A6CA: method_disable_unit_files (dbus-manager.c:1768)
==1== by 0x1D8146: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9D8: object_find_and_run (bus-objects.c:1257)
==1== by 0x1DB01A: bus_process_object (bus-objects.c:1373)
==1==
==1== 228 (48 direct, 180 indirect) bytes in 2 blocks are definitely lost in loss record 8 of 14
==1== at 0x4C2BBCF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x4C2DE2F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x23DA60: unit_file_changes_add (install.c:233)
==1== by 0x23DDB2: create_symlink (install.c:298)
==1== by 0x240C5C: install_info_symlink_wants (install.c:1328)
==1== by 0x240FC8: install_info_apply (install.c:1384)
==1== by 0x241211: install_context_apply (install.c:1439)
==1== by 0x242563: unit_file_enable (install.c:1830)
==1== by 0x18A06E: method_enable_unit_files_generic (dbus-manager.c:1650)
==1== by 0x18A141: method_enable_unit_files (dbus-manager.c:1660)
==1== by 0x1D8146: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9D8: object_find_and_run (bus-objects.c:1257)
==1==
==1== 467 (144 direct, 323 indirect) bytes in 3 blocks are definitely lost in loss record 9 of 14
==1== at 0x4C2DD9F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x23DA60: unit_file_changes_add (install.c:233)
==1== by 0x23DE97: create_symlink (install.c:320)
==1== by 0x242CFC: unit_file_set_default (install.c:1951)
==1== by 0x18A881: method_set_default_target (dbus-manager.c:1802)
==1== by 0x1D8146: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9D8: object_find_and_run (bus-objects.c:1257)
==1== by 0x1DB01A: bus_process_object (bus-objects.c:1373)
==1== by 0x259143: process_message (sd-bus.c:2567)
==1== by 0x259326: process_running (sd-bus.c:2609)
==1== by 0x259BDC: bus_process_internal (sd-bus.c:2798)
==1== by 0x259CAD: sd_bus_process (sd-bus.c:2817)
==1==
==1== LEAK SUMMARY:
==1== definitely lost: 216 bytes in 6 blocks
==1== indirectly lost: 560 bytes in 14 blocks
==1== possibly lost: 0 bytes in 0 blocks
==1== still reachable: 65,536 bytes in 5 blocks
==1== suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1==
Fixes:
==1== HEAP SUMMARY:
==1== in use at exit: 67,182 bytes in 91 blocks
==1== total heap usage: 70,485 allocs, 70,394 frees, 42,184,635 bytes
allocated
==1==
==1== 5,742 (696 direct, 5,046 indirect) bytes in 29 blocks are
definitely lost in loss record 4 of 7
==1== at 0x4C2DD9F: realloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x21ADDD: realloc_multiply (alloc-util.h:67)
==1== by 0x21BFB0: strv_push (strv.c:448)
==1== by 0x21C245: strv_consume (strv.c:520)
==1== by 0x21C33C: strv_extend (strv.c:559)
==1== by 0x278AD7: unit_write_drop_in (unit.c:3352)
==1== by 0x278EEB: unit_write_drop_in_private (unit.c:3403)
==1== by 0x190C21: bus_service_set_transient_property
(dbus-service.c:254)
==1== by 0x190DBC: bus_service_set_property (dbus-service.c:284)
==1== by 0x18F00E: bus_unit_set_properties (dbus-unit.c:1226)
==1== by 0x186F6A: transient_unit_from_message (dbus-manager.c:683)
==1== by 0x1872B7: method_start_transient_unit (dbus-manager.c:763)
==1==
==1== LEAK SUMMARY:
==1== definitely lost: 696 bytes in 29 blocks
==1== indirectly lost: 5,046 bytes in 58 blocks
==1== possibly lost: 0 bytes in 0 blocks
==1== still reachable: 61,440 bytes in 4 blocks
==1== suppressed: 0 bytes in 0 blocks
Franck Bui [Thu, 14 Jan 2016 08:25:18 +0000 (09:25 +0100)]
transaction: downgrade warnings about wanted unit which are not found
If a unit was pulled by a Wants= dependency but its unit file was not
present then we logged this as an error.
However Wants= might be used to configure a soft/optional dependency
on another unit, ie. start an optional service only if it's installed
otherwise simply skip it. In this case emitting an error doesn't look
appropriate.
But it's still an error if the optional dependency exists but its
activation fails for any reasons.
==1== HEAP SUMMARY:
==1== in use at exit: 61,728 bytes in 22 blocks
==1== total heap usage: 258,122 allocs, 258,100 frees, 78,219,628
bytes allocated
==1==
==1== 16 bytes in 1 blocks are definitely lost in loss record 1 of 6
==1== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
==1== by 0x1E350E: memdup (alloc-util.c:34)
==1== by 0x135AFB: memdup_multiply (alloc-util.h:74)
==1== by 0x140F97: manager_set_default_rlimits (manager.c:2929)
==1== by 0x1303DA: manager_set_defaults (main.c:737)
==1== by 0x133A02: main (main.c:1718)
==1==
==1== 272 bytes in 17 blocks are definitely lost in loss record 2 of 6
==1== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
==1== by 0x1E350E: memdup (alloc-util.c:34)
==1== by 0x135AFB: memdup_multiply (alloc-util.h:74)
==1== by 0x140F97: manager_set_default_rlimits (manager.c:2929)
==1== by 0x1303DA: manager_set_defaults (main.c:737)
==1== by 0x13480D: main (main.c:1828)
==1==
==1== LEAK SUMMARY:
==1== definitely lost: 288 bytes in 18 blocks
==1== indirectly lost: 0 bytes in 0 blocks
==1== possibly lost: 0 bytes in 0 blocks
==1== still reachable: 61,440 bytes in 4 blocks
==1== suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not
shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
WaLyong Cho [Tue, 29 Dec 2015 05:15:04 +0000 (14:15 +0900)]
bus-util: print "systemctl --user" on user service manager
When a unit was started with "systemctl --user" and it failed, error
messages is printed as "systemctl status". But it should be "systemctl
--user status".
In contrast to ascii_strcasecmp_nn() it takes two character buffers with their individual length. It will then compare
the buffers up the smaller size of the two buffers, and finally the length themselves.
resolved: properly handles RRs in domains beginning in an asterisk label
Properly handle RRs that begin with an asterisk label. These are the unexpanded forms of wildcard domains and appear in
NSEC RRs for example. We need to make sure we handle the signatures of these RRs properly, since they mostly are
considered normal RRs, except that the RRSIG labels counter is one off for them, as the asterisk label is always
excluded of the signature.
Vito Caputo [Mon, 7 Dec 2015 19:28:18 +0000 (11:28 -0800)]
sd-event: instrument sd_event_run() for profiling delays
Set SD_EVENT_PROFILE_DELAYS to activate accounting and periodic logging
of the distribution of delays between sd_event_run() calls.
Time spent in dispatching as well as time spent outside of
sd_event_run() is measured and accounted for. Every 5 seconds a
logarithmic histogram loop iteration delays since 5 seconds previous is
logged.
This is useful in identifying the frequency and magnitude of latencies
affecting the event loop, which should be kept to a minimum.
Ismo Puustinen [Thu, 31 Dec 2015 12:54:44 +0000 (14:54 +0200)]
capabilities: added support for ambient capabilities.
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.
You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.
An example system service file might look like this:
Ismo Puustinen [Thu, 7 Jan 2016 22:00:04 +0000 (00:00 +0200)]
capabilities: keep bounding set in non-inverted format.
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
Fix miscalculated buffer size and uses of size-unlimited sprintf()
function.
Not sure if this results in an exploitable buffer overflow, probably not
since the the int value is likely sanitized somewhere earlier and it's
being put through a bit mask shortly before being used.
resolved: refuse doing queries for known-obsolete RR types
Given how fragile DNS servers are with some DNS types, and given that we really should avoid confusing them with
known-weird lookups, refuse doing lookups for known-obsolete RR types.
resolved: rework how and when we detect whether our chosen DNS server knows DNSSEC
Move detection into a set of new functions, that check whether one specific server can do DNSSEC, whether a server and
a specific transaction can do DNSSEC, or whether a transaction and all its auxiliary transactions could do so.
Also, do these checks both before we acquire additional RRs for the validation (so that we can skip them if the server
doesn't do DNSSEC anyway), and after we acquired them all (to see if any of the lookups changed our opinion about the
servers).
THis also tightens the checks a bit: a server that lacks TCP support is considered incompatible with DNSSEC too.
This changes the DnsServer logic to count failed UDP and TCP failures separately. This is useful so that we don't end
up downgrading the feature level from one UDP level to a lower UDP level just because a TCP connection we did because
of a TC response failed.
This also adds accounting of truncated packets. If we detect incoming truncated packets, and count too many failed TCP
connections (which is the normal fall back if we get a trucnated UDP packet) we downgrade the feature level, given that
the responses at the current levels don't get through, and we somehow need to make sure they become smaller, which they
will do if we don't request DNSSEC or EDNS support.
This makes resolved work much better with crappy DNS servers that do not implement TCP and only limited UDP packet
sizes, but otherwise support DNSSEC RRs. They end up choking on the generally larger DNSSEC RRs and there's no way to
retrieve the full data.
resolved: don't attempt to send queries for DNSSEC RR types to servers not supporting them
If we already degraded the feature level below DO don't bother with sending requests for DS, DNSKEY, RRSIG, NSEC, NSEC3
or NSEC3PARAM RRs. After all, we cannot do DNSSEC validation then anyway, and we better not press a legacy server like
this with such modern concepts.
This also has the benefit that when we try to validate a response we received using DNSSEC, and we detect a limited
server support level while doing so, all further auxiliary DNSSEC queries will fail right-away.
resolved: when we get a packet failure from a server, don't downgrade UDP to TCP or vice versa
Under the assumption that packet failures (i.e. FORMERR, SERVFAIL, NOTIMP) are caused by packet contents, not used
transport, we shouldn't switch between UDP and TCP when we get them, but only downgrade the higher levels down to UDP.
resolved: properly handle UDP ICMP errors as lost packets
UDP ICMP errors are reported to us via recvmsg() when we read a reply. Handle this properly, and consider this a lost
packet, and retry the connection.
This also adds some additional logging for invalid incoming packets.
resolved: when we get a TCP connection failure, try again
Previously, when we couldn't connect to a DNS server via TCP we'd abort the whole transaction using a
"connection-failure" state. This change removes that, and counts failed connections as "lost packet" events, so that
we switch back to the UDP protocol again.
resolved: when DNS/TCP doesn't work, try DNS/UDP again
If we failed to contact a DNS server via TCP, bump of the feature level to UDP again. This way we'll switch back
between UDP and TCP if we fail to contact a host.
Generally, we prefer UDP over TCP, which is why UDP is a higher feature level. But some servers only support UDP but
not TCP hence when reaching the lowest feature level of TCP and want to downgrade from there, pick UDP again. We this
keep downgrading until we reach TCP and then we cycle through UDP and TCP.
Let's be a bit more precise with the editor configuration and specify a higher fill column of 119. This isn't as emacs'
default of 70, but also not particularly high on today's screens.
While we are at it, also set a couple of other emacs C coding style variables.
resolved: properly look for NSEC/NSEC3 RRs when getting a positive wildcard response
This implements RFC 5155, Section 8.8 and RFC 4035, Section 5.3.4:
When we receive a response with an RRset generated from a wildcard we
need to look for one NSEC/NSEC3 RR that proves that there's no explicit RR
around before we accept the wildcard RRset as response.
This patch does a couple of things: the validation calls will now
identify wildcard signatures for us, and let us know the RRSIG used (so
that the RRSIG's signer field let's us know what the wildcard was that
generate the entry). Moreover, when iterating trough the RRsets of a
response we now employ three phases instead of just two.
a) in the first phase we only look for DNSKEYs RRs
b) in the second phase we only look for NSEC RRs
c) in the third phase we look for all kinds of RRs
Phase a) is necessary, since DNSKEYs "unlock" more signatures for us,
hence we shouldn't assume a key is missing until all DNSKEY RRs have
been processed.
Phase b) is necessary since NSECs need to be validated before we can
validate wildcard RRs due to the logic explained above.
Phase c) validates everything else. This phase also handles RRsets that
cannot be fully validated and removes them or lets the transaction fail.
resolved: split up nsec3_hashed_domain() into two calls
There's now nsec3_hashed_domain_format() and nsec3_hashed_domain_make().
The former takes a hash value and formats it as domain, the latter takes
a domain name, hashes it and then invokes nsec3_hashed_domain_format().
This way we can reuse more code, as the formatting logic can be unified
between this call and another place.
resolved: when validating, first strip revoked trust anchor keys from validated keys list
When validating a transaction we initially collect DNSKEY, DS, SOA RRs
in the "validated_keys" list, that we need for the proofs. This includes
DNSKEY and DS data from our trust anchor database. Quite possibly we
learn that some of these DNSKEY/DS RRs have been revoked between the
time we request and collect those additional RRs and we begin the
validation step. In this case we need to make sure that the respective
DS/DNSKEY RRs are removed again from our list. This patch adds that, and
strips known revoked trust anchor RRs from the validated list before we
begin the actual validation proof, and each time we add more DNSKEY
material to it while we are doing the proof.
Instead of first iterating through all DNSKEYs in the DnsAnswer in
dns_transaction_check_revoked_trust_anchors(), and
then doing that a second time in dns_trust_anchor_check_revoked(), do so
only once in the former, and pass the dnskey we found directly to the
latter.