tree-wide: use assert_se() for signal operations with constants
Continuation of a3ebe5eb620e49f0d24082876cafc7579261e64f:
in other places we sometimes use assert_se(), and sometimes normal error
handling. sigfillset and sigaddset can only fail if mask is NULL (which cannot
happen if we are passing in a reference), or if the signal number is invalid
(which really shouldn't happen when we are using a constant like SIGCHLD. If
SIGCHLD is invalid, we have a bigger problem). So let's simplify things and
always use assert_se() in those cases.
In sigset_add_many() we could conceivably pass an invalid signal, so let's keep
normal error handling here. The caller can do assert_se() around the
sigprocmask_many() call if appropriate.
'>= 0' is used for consistency with the rest of the codebase.
The kernel explicitly supports resuming with a different kernel than the one
used before hibernation. If this is something that shouldn't be supported, the
place to change this is in the kernel. We shouldn't censor something that this
exclusively in the kernel's domain.
People might be using this to switch kernels without restaring programs, and
we'd break this functionality for them.
Also, even if resuming with a different kernel was a bad idea, we don't really
prevent that with this check, since most users have more than one kernel and
can freely pick a different one from the menu. So this only affected the corner
case where the kernel has been removed, but there is no reason to single it
out.
tree-wide: make new/new0/malloc_multiply/reallocarray safe for size 0
All underlying glibc calls are free to return NULL if the size argument
is 0. We most often call those functions with a fixed argument, or at least
something which obviously cannot be zero, but it's too easy to forget.
E.g. coverity complains about "rows = new0(JsonVariant*, n_rows-1);" in
format-table.c There is an assert that n_rows > 0, so we could hit this
corner case here. Let's simplify callers and make those functions "safe".
The compiler is mostly able to optimize this away:
$ size build{,-opt}/src/shared/libsystemd-shared-239.so
(before)
text data bss dec hex filename 2643329 580940 3112 3227381 313ef5 build/src/shared/libsystemd-shared-239.so (-O0 -g) 2170013 578588 3089 2751690 29fcca build-opt/src/shared/libsystemd-shared-239.so (-03 -flto -g)
(after)
text data bss dec hex filename 2644017 580940 3112 3228069 3141a5 build/src/shared/libsystemd-shared-239.so 2170765 578588 3057 2752410 29ff9a build-opt/src/shared/libsystemd-shared-239.so
Chris Down [Wed, 19 Dec 2018 03:33:53 +0000 (03:33 +0000)]
cgroup: Imply systemd.unified_cgroup_hierarchy=1 on cgroup_no_v1=all
cgroup_no_v1=all doesn't make a whole lot of sense with legacy hierarchy
(where we use v1 hierarchy for everything), or hybrid hierarchy (where
we still use v1 hierarchy for resource control).
Right now we have to tell people to add both cgroup_no_v1=all and
systemd.unified_cgroup_hierarchy=1 to get the desired behaviour,
however in reality it's hard to imagine any situation where someone
passes cgroup_no_v1=all but *doesn't* want to use the unified cgroup
hierarchy.
Make it so that cgroup_no_v1=all produces intuitive behaviour in systemd
by default, although it can still be disabled by passing
systemd.unified_cgroup_hierarchy=0 explicitly.
resolved: add an explicit way to configure whether a link is useful as default route
Previously, we'd use a link as "default" route depending on whether
there are route-only domains defined on it or not. (If there are, it
would not be used as default route, if there aren't it would.)
Let's make this explicit and add a link variable controlling this. The
variable is not changeable from the outside yet, but subsequent commits
are supposed to add that.
Note that making this configurable adds a certain amount of redundancy,
as there are now two ways to ensure a link does not receive "default"
lookup (i.e. DNS queries matching no configured route):
1. By ensuring that at least one other link configures a route on it
(for example by add "." to its search list)
2. By setting this new boolean to false.
But this is exactly what is intended with this patch: that there is an
explicit way to configure on the link itself whether it receives
'default' traffic, rather than require this to be configured on other
links.
The variable added is a tri-state: if true, the link is suitable for
recieving "default" traffic. If false, the link is not suitable for it.
If unset (i.e. negative) the original logic of "has this route-only
routes" is used, to ensure compatibility with the status quo ante.
resolved: rework dns_server_limited_domains(), replace by dns_scope_has_route_only_domains()
The function dns_server_limited_domains() was very strange as it
enumerate the domains associated with a DnsScope object to determine
whether any "route-only" domains, but did so as a function associated
with a DnsServer object.
Let's clear this up, and replace it by a function associated with a
DnsScope instead. This makes more sense philosphically and allows us to
reduce the loops through which we need to jump to determine whether a
scope is suitable for default routing a bit.
resolved: bind .local domains to mDNS with DNS_SCOPE_YES, similar LLMNR
Previously, we'd return DNS_SCOPE_MAYBE for all domain lookups matching
LLMNR or mDNS. Let's upgrade this to DNS_SCOPE_YES, to make the binding
stronger.
The effect of this is that even if "local" is defined as routing domain
on some iface, we'll still lookup domains in local via mDNS — if mDNS is
turned on. This should not be limiting, as people who don't want such
lookups should turn off mDNS altogether, as it is useless if nothing is
routed to it.
This also has the nice benefit that mDNS/LLMR continue to work if people
use "~." as routing domain on some interface.
Similar for LLMNR and single label names.
Similar also for the link local IPv4 and IPv6 reverse lookups.
I looked over the diff, and it seems it's only additions and fixes, no removals.
The diff for the source files is much bigger, but it seems that the sorting
code is working well.
Thomas Haller [Thu, 20 Dec 2018 10:56:02 +0000 (11:56 +0100)]
dhcp6: don't enforce DUID content for sd_dhcp6_client_set_duid()
There are various functions to set the DUID of a DHCPv6 client.
However, none of them allows to set arbitrary data. The closest is
sd_dhcp6_client_set_duid(), which would still do validation of the
DUID's content via dhcp_validate_duid_len().
Relax the validation and only log a debug message if the DUID
does not validate.
Note that dhcp_validate_duid_len() already is not very strict. For example
with DUID_TYPE_LLT it only ensures that the length is suitable to contain
hwtype and time. It does not further check that the length of hwaddr is non-zero
or suitable for hwtype. Also, non-well-known DUID types are accepted for
extensibility. Why reject certain DUIDs but allowing clearly wrong formats
otherwise?
The validation and failure should happen earlier, when accepting the
unsuitable DUID. At that point, there is more context of what is wrong,
and a better failure reason (or warning) can be reported to the user. Rejecting
the DUID when setting up the DHCPv6 client seems not optimal, in particular
because the DHCPv6 client does not care about actual content of the
DUID and treats it as opaque blob.
Also, NetworkManager (which uses this code) allows to configure the entire
binary DUID in binary. It intentionally does not validate the binary
content any further. Hence, it needs to be able to set _invalid_ DUIDs,
provided that some basic constraints are satisfied (like the maximum length).
sd_dhcp6_client_set_duid() has two callers: both set the DUID obtained
from link_get_duid(), which comes from configuration.
`man networkd.conf` says: "The configured DHCP DUID should conform to
the specification in RFC 3315, RFC 6355.". It does not not state that
it MUST conform.
Note that dhcp_validate_duid_len() has another caller: DHCPv4's
dhcp_client_set_iaid_duid_internal(). In this case, continue with
strict validation, as the callers are more controlled. Also, there is
already sd_dhcp_client_set_client_id() which can be used to bypass
this check and set arbitrary client identifiers.
Thomas Haller [Wed, 19 Dec 2018 09:05:37 +0000 (10:05 +0100)]
dhcp: don't enforce hardware address length for sd_dhcp_client_set_client_id()
sd_dhcp_client_set_client_id() is the only API for setting a raw client-id.
All other setters are more restricted and only allow to set a type 255 DUID.
Also, dhcp4_set_client_identifier() is the only caller, which already
does:
r = sd_dhcp_client_set_client_id(link->dhcp_client,
ARPHRD_ETHER,
(const uint8_t *) &link->mac,
sizeof(link->mac));
and hence ensures that the data length is indeed ETH_ALEN.
Drop additional input validation from sd_dhcp_client_set_client_id(). The client-id
is an opaque blob, and if a caller wishes to set type 1 (ethernet) or type 32
(infiniband) with unexpected address length, it should be allowed. The actual
client-id is not relevant to the DHCP client, and it's the responsibility of the
caller to generate a suitable client-id.
For example, in NetworkManager you can configure all the bytes of the
client-id, including such _invalid_ settings. I think it makes sense,
to allow the user to fully configure the identifier. Even if such configuration
would be rejected, it would be the responsibility of the higher layers (including
a sensible error message to the user) and not fail later during
sd_dhcp_client_set_client_id().
Still log a debug message if the length is unexpected.
Thomas Haller [Thu, 20 Dec 2018 12:05:13 +0000 (13:05 +0100)]
dhcp: fix sd_dhcp_client_set_client_id() for infiniband addresses
Infiniband addresses are 20 bytes (INFINIBAND_ALEN), but only the last
8 bytes are suitable for putting into the client-id.
This bug had no effect for networkd, because sd_dhcp_client_set_client_id()
has only one caller which always uses ARPHRD_ETHER type.
I was unable to find good references for why this is correct ([1]). Fedora/RHEL
has patches for ISC dhclient that also only use the last 8 bytes ([2], [3]).
RFC 4390 (Dynamic Host Configuration Protocol (DHCP) over InfiniBand) [4] does
not discuss the content of the client-id either.
tmpfiles: fix crash with NULL in arg_root and other fixes and tests
The function to replacement paths into the configuration file list was borked.
Apart from the crash with empty root prefix, it would incorrectly handle the
case where root *was* set, and the replacement file was supposed to override
an existing file.
prefix_root is used instead of path_join because prefix_root removes duplicate
slashes (when --root=dir/ is used).
Generators run in a context where waiting for udev is not an option,
simply because it's not running there yet. Hence, let's not wait for it
in this case.
This is generally OK to do as we are operating on the root disk only
here, which should have been probed already by the time we come this
far.
An alternative fix might be to remove the udev dependency from image
dissection again in the long run (and thus replace reliance on
/dev/block/x:y somehow with something else).
This patch causes various problems during boot, where a "mount storm" occurs
naturally. Current approach is flakey, and it seems very risky to push a
feature like this which impacts boot right before a release. So let's revert
for now, and consider a more robust solution after later.
NeilBrown [Sun, 16 Dec 2018 22:32:58 +0000 (09:32 +1100)]
mount: disable mount-storm protection while mount unit is starting.
The starting of mount units requires that changes to
/proc/self/mountinfo be processed before the SIGCHILD from the
completion of /sbin/mount is processed, as described by the comment
/* Note that due to the io event priority logic, we can be sure the new mountinfo is loaded
* before we process the SIGCHLD for the mount command. */
The recently-added mount-storm protection can defeat this as it
will sometimes deliberately delay processing of /proc/self/mountinfo.
So we need to disable mount-storm protection when a mount unit is starting.
We do this by keeping a counter of the number of pending
mounts, and disabling the protection when this is non-zero.
Thanks to @asavah for finding and reporting this problem.
Apparently this originated in PHP, so the json output could be directly
embedded in HTML script tags.
See https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped.
Since the output of our tools is not intended directly for web page generation,
let's not do this unescaping. If needed, the consumer can always do escaping as
appropriate for the target format.
fileio: replace read_nul_string() by read_line() with a special flag
read_line() is a lot more careful and optimized than read_nul_string()
but does mostly the same thing. let's replace the latter by the former,
just with a special flag that toggles between the slightly different EOL
rules if both.
gpt-auto: propagate gpt partition ro/rw flag into root mount
This ensures that the read/write state of the root mount matches the
read/write flag in the GPT partition table entry.
This is only used as fallback in case no ro/rw flag is specified on the
kernel cmdline, and there's no entry for the root partition in
/etc/fstab.
This is missing functionality of the GPT auto logic, as without this the
root partition was always mounted read-only — when booting with zero
configuration in /etc/fstab and /proc/cmdline —, as we defaulted to
read-only behaviour for all mounts. Moreover we honoured the r/o flag in
the partition table for all other partition types, except for the root
partition.
units: set NoNewPrivileges= for all long-running services
Previously, setting this option by default was problematic due to
SELinux (as this would also prohibit the transition from PID1's label to
the service's label). However, this restriction has since been lifted,
hence let's start making use of this universally in our services.
On SELinux system this change should be synchronized with a policy
update that ensures that NNP-ful transitions from init_t to service
labels is permitted.
Let's split the commit in two: the sorting and the changes.
Because there's a requirement to update selinux policy, this change is
incompatible, strictly speaking. I expect that distributions might want to
revert this particular change. Let's make it easy.
meson: rename two more variables from _c to _sources
_c is misleading because .h files should be included in those lists too
(this tells meson that the build outputs should be rebuilt if the header
files change).
test-hashmap: add test to compare hashmap_free performance
The point here is to compare speed of hashmap_destroy with free and a different
freeing function, to the implementation details of hashmap_clear can be
evaluated.
Results:
current code:
/* test_hashmap_free (slow, 1048576 entries) */
string_hash_ops test took 2.494499s
custom_free_hash_ops test took 2.640449s
string_hash_ops test took 2.287734s
custom_free_hash_ops test took 2.557632s
string_hash_ops test took 2.299791s
custom_free_hash_ops test took 2.586975s
string_hash_ops test took 2.314099s
custom_free_hash_ops test took 2.589327s
string_hash_ops test took 2.319137s
custom_free_hash_ops test took 2.584038s
code with a patch which restores the "fast path" using:
for (idx = skip_free_buckets(h, 0); idx != IDX_NIL; idx = skip_free_buckets(h, idx + 1))
in the case where both free_key and free_value are either free or NULL:
/* test_hashmap_free (slow, 1048576 entries) */
string_hash_ops test took 2.347013s
custom_free_hash_ops test took 2.585104s
string_hash_ops test took 2.311583s
custom_free_hash_ops test took 2.578388s
string_hash_ops test took 2.283658s
custom_free_hash_ops test took 2.621675s
string_hash_ops test took 2.334675s
custom_free_hash_ops test took 2.601568s
So the test is noisy, but there clearly is no significant difference with the
"fast path" restored. I'm surprised by this, but it shows that the current
"safe" implementation does not cause a performance loss.
When the code is compiled with optimization, those times are significantly
lower (e.g. 1.1s and 1.4s), but again, there is no difference with the "fast
path" restored.
The difference between string_hash_ops and custom_free_hash_ops is the
additional cost of global modification and the extra function call.
lldp: add test coverage for sd_lldp_get_neighbors() with multiple neighbors
In particular, check that the order of the results is consistent.
This test coverage will be useful in order to refactor the compare_func
used while sorting the results.
When introduced, this test also uncovered a memory leak in sd_lldp_stop(),
which was then fixed by a separate commit using a specialized function
as destructor of the LLDP Hashmap.
Tested:
$ ninja -C build/ test
$ valgrind --leak-check=full build/test-lldp
hashmap: rework hashmap_clear() to be more defensive
Let's first remove an item from the hashmap and only then destroy it.
This makes sure that destructor functions can mdoify the hashtables in
their own codee and we won't be confused by that.
resolved: only attempt non-answer SOA RRs if they are parents of our query
There's no value in authenticating SOA RRs that are neither answer to
our question nor parent of our question (the latter being relevant so
that we have a TTL from the SOA field for negative caching of the actual
query).
By being to eager here, and trying to authenticate too much we run the
risk of creating cyclic deps between our transactions which then causes
the over-all authentication to fail.