Yu Watanabe [Sat, 12 Jun 2021 20:12:03 +0000 (05:12 +0900)]
network: try to bring down before setting MAC address
Most real network devices refuse to set MAC address when its operstate
is not down. So, setting MAC address once failed, then let's bring down
the interface and retry to set.
Michal Koutný [Thu, 10 Jun 2021 13:58:43 +0000 (15:58 +0200)]
core: Avoid spurious realization of unit cgroups
Cgroups may be unnecessarily realized when they are not needed. This
happens, e.g. for mount units parsed from /proc/$PID/mountinfo, check
touch /run/ns_mount
unshare -n sh -c "mount --bind /proc/self/ns/net /run/ns_mount"
# no cgroup exists
file /sys/fs/cgroup/system.slice/run-ns_mount.mount
systemctl daemon-reload
# the vain cgroup exists
file /sys/fs/cgroup/system.slice/run-ns_mount.mount
. (Such cgroups can account to a large number with many similar mounts.)
The code already accounts for "lazy" realization (see various checks for
Unit.cgroup_realized) but the unit_deserialize() in the reload/reexec
path performs unconditional realization.
Invalidate (and queue) the units for realization only if we know that
they were already realized in the past. This is a safe thing to do even
in the case the reload brings some new cgroup setting (controllers, BPF)
because units that aren't realized will use the updated setting when the
time for their realization comes. (It's not even needed to add a code
comment because the current formulation suggests the changed behavior.)
I wanted to see what is_path_read_only_fs() and is_path_temporary_fs() return
in a chroot, and various tests would fail. For most of our codebase, we can
assume that /proc and such are mounted, and it doesn't make sense to make the
tests work in a chroot. But let's do it here. (In general, it would be useful
for most stuff in src/basic/, since it's linked into libraries which might be
invoked in incorrectly set up environments and should not fail too badly.)
core/serialize: drop bogus deserialization of ipcns sockets
a70581ffb5c13c91c76ff73ba6f5f3ff59c5a915 added ExecRuntime.ipcns_storage_socket[], and
serialization in exec_runtime_serialize(), and deserialization in exec_runtime_deserialize_one(),
but also deserialization in exec_runtime_deserialize_compat(). exec_runtime_deserialize_compat()
is for deserializating ExecRuntime when it was serialized as part of the unit before e8a565cb660a7a11f76180fe441ba8e4f9383771. There was never any code which would serialize
ExecRuntime.ipcns_storage_socket[] this way, so the deserialization attempts are pointless.
core/serialization: drop misleadingly-named unit_can_serialize()
All unit types can be serialized. This function was really checking whether the
unit type has custom serialization/deserialization code. But we don't need a
function for this.
Also, the check that both .serialize() and .deserialize_item() are defined is
better written as an assert. Not we have a function which would skip
serialization/deserializaton for the unit if we forgot to set either of the
fields.
fileio: bump limit for read_full_file() and friends to 64M
Apparently people use such large key files. Specifically, people used 4M
key files, and we lowered the limit from 4M to 4M-1 back in 248.
This raises the limit to 64M for read_full_file() to avoid these
specific issues and give some non-trivial room beyond the 4M files seen
IRL.
Note that that a 64M allocation in glibc is always immediately done via
mmap(), and is thus a lot slower than shorter allocations. This means
read_virtual_file() becomes ridiculously slow if we'd use the large
limit, since we use it all the time for reading /proc and /sys metadata,
and read_virtual_file() typically allocates the full size with malloc()
in advance. In fact it becomes so slow, that test-process-util kept
timing out on me all the time, once I blindly raised the limit.
This patch hence introduces two distinct limits for read_full_file() and
read_virtual_file(): the former is much larger than the latter and the
latter remains where it is. This is safe since the former uses an
exponentially growing realloc() loop while the latter uses the
aforementioend ahead-of-time full limit allocation.
sd-event: change ordering of pending/ratelimited events
Instead of ordering non-pending before pending we should order
"non-pending OR ratelimited" before "pending AND not-ratelimited".
This fixes a bug where ratelimited events were ordered at the end of the
priority queue and could be stuck there for an indeterminate amount of
time.
Since those workarounds have been added, work has been done to tighten
up log_*() return values. Seems we get no warning with
gcc-11.1.1-1.fc34.x86_64 and -O0/-O2.
shared/install: improve message about template mismatch
$ systemctl enable --root=/ serial-getty@.service
Failed to enable unit, unit getty.target is a non-template unit.
↓
Failed to enable serial-getty@.service, destination unit getty.target is a non-template unit.
shared/install: remove custom error handling in unit_file_preset_all()
This had some purpose back in the day, but right now I cannot see what
difference this makes. It's hard to keep the list of all possible errors up to
date. So let's remove this, hopefully nothing breaks.
Anita Zhang [Tue, 8 Jun 2021 07:04:35 +0000 (00:04 -0700)]
test: add extended test for triggering mount rate limit
It's hard to trigger the failure to exit the rate limit state in
isolation as it needs multiple event sources in order to show that it
gets stuck in the queue. Hence why this is an extended test.
core/serialization: call exec_runtime_deserialize_compat() independently of whether .serialize is defined
There is no reason to tie the two together: in principle we may have
in the future a unit type which does not define .serialize/.deserialize_item,
but we would still want to call the compat deserialization code for it.
journal: don't try to reuse already calculated hash between files with keyed hash feature
When suppressing duplicate fields between files we so far tried to reuse
the already known hash value of the data fields between files. This was
fine as long as we used the same hash function everywhere. However,
since addition of the keyed hash feature for journal files this doesn't
work anymore, since the hashes will be different for different files.
Yu Watanabe [Wed, 9 Jun 2021 04:30:16 +0000 (13:30 +0900)]
tmpfile: several minor coding style fixes
This makes the followings:
- reduces scope of variables,
- drop unnecessary 'else'
- use CLOSE_AND_REPLACE() macro
- use strnull() for possible NULL string
bpf-firewall: close gap when updating the firewall
If we have BPF_F_ALLOW_MULTI support we can install the new program
before we drop the old (because we can install two program at the same
time). Let's do that, and thus fully close the firewall
gap.
Yu Watanabe [Mon, 7 Jun 2021 07:26:10 +0000 (16:26 +0900)]
network: do not process requests which conditionalized with link flags while the flags are updating
E.g. nexthop requires IFF_UP flag, but the currently stored flag may be
outdated if we called link_down(). This makes such requests pending if
at least one of the flags are updating.
Yu Watanabe [Mon, 7 Jun 2021 06:54:48 +0000 (15:54 +0900)]
network: do not drop requests on carrier lost
On carrier lost, then all requests which require carrier will not be
processed. And they will be processed when the interface gained its
carrier again. So, it is not necessary to drop requests here.
Previously, IPv6LinkLocalAddressGenerationMode= is not set, then we
define the address generation mode based on the result of reading
stable_secret sysctl value. This makes the mode is determined by whether
a secret address is specified in the new setting.
Mostly logging related: let's downgrade logging in dlopen_bpf() for
example, and remove duplicate logging at various places. Add %m to log
messages and so on.
bpf-firewall: move destruction of IP firewall objects to bpf-firewall.c
These are so many runtime objects, let's add a bpf_firewall_close()
helper that destroys them all, and call that from unit_free(), simply as
an excercise of encapsulating more BPF code in bpf-firewall.c.
This also brings the destruction order and variable declaration order in
struct Unit into the same systematic order.
No change in behaviour just some minor refactoring.
In dns_server_unlink_marked() and dns_server_mark_all() we done recursively.
People might have dozens of servers defined, and it's better to avoid recursion
when a simple loop suffices.
dns_server_unlink_marked() would only unmark the first marked server.