Hardware addresses come in various shapes and sizes, these new functions
and accomapying data structures account for that instead of hard-coding
a hardware address to the 6 bytes of an ethernet MAC.
To make things simple and robust when debugging journald, we'll leave
the SO_TIMESTAMP invocations in the C code in place, even if they are
now typically redundant, given that the sockets are already passed into
the process with SO_TIMESTAMP turned on now.
Let's not have #ifdeffery both in the consumers and the providers of the
selinux glue code. Unless the code is particularly complex, let's do the
ifdeffery only in the provider of the selinux glue code, and let's keep
the consumers simple and just invoke it.
Anita Zhang [Fri, 23 Oct 2020 05:44:22 +0000 (22:44 -0700)]
core: clean up inactive/failed {service|scope}'s cgroups when the last process exits
If processes remain in the unit's cgroup after the final SIGKILL is
sent and the unit has exceeded stop timeout, don't release the unit's
cgroup information. Pid1 will have failed to `rmdir` the cgroup path due
to processes remaining in the cgroup and releasing would leave the cgroup
path on the file system with no tracking for pid1 to clean it up.
Instead, keep the information around until the last process exits and pid1
sends the cgroup empty notification. The service/scope can then prune
the cgroup if the unit is inactive/failed.
Dan Streetman [Fri, 23 Oct 2020 19:50:28 +0000 (15:50 -0400)]
test: ignore ENOMEDIUM error from sd_pid_get_cgroup()
Ubuntu builds on the Launchpad infrastructure run inside a chroot that does
not have the sysfs cgroup dirs mounted, so this call will return ENOMEDIUM
from cg_unified_cached() during the build-time testing, for example when
building the package in a Launchpad PPA.
The function `asynchronous_close()` confuses valgrind. Before this commit,
valgrind may report the following:
```
HEAP SUMMARY:
in use at exit: 384 bytes in 1 blocks
total heap usage: 4,787 allocs, 4,786 frees, 1,379,191 bytes allocated
384 bytes in 1 blocks are possibly lost in loss record 1 of 1
at 0x483CAE9: calloc (vg_replace_malloc.c:760)
by 0x401456A: _dl_allocate_tls (in /usr/lib64/ld-2.31.so)
by 0x4BD212E: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.31.so)
by 0x499B662: asynchronous_job (async.c:47)
by 0x499B7DC: asynchronous_close (async.c:102)
by 0x4CFA8B: client_initialize (sd-dhcp-client.c:696)
by 0x4CFC5E: client_stop (sd-dhcp-client.c:725)
by 0x4D4589: sd_dhcp_client_stop (sd-dhcp-client.c:2134)
by 0x493C2F: link_stop_clients (networkd-link.c:620)
by 0x4126DB: manager_free (networkd-manager.c:867)
by 0x40D193: manager_freep (networkd-manager.h:97)
by 0x40DAFC: run (networkd.c:20)
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 384 bytes in 1 blocks
still reachable: 0 bytes in 0 blocks
suppressed: 0 bytes in 0 blocks
For lists of detected and suppressed errors, rerun with: -s
ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
I think the idea was generally sound, but didn't take into account the
limitations of show-environment and how it is used. People expect to be able to
eval systemctl show-environment output in bash, and no escaping syntax is
defined for environment *names* (we only do escaping for *values*). We could
skip such problematic variables in 'systemctl show-environment', and only allow
them to be inherited directly. But this would be confusing and ugly.
The original motivation for this change was that various import operations
would fail. a4ccce22d9552dc74b6916cc5ec57f2a0b686b4f changed systemctl to filter
invalid variables in import-environment.
https://gitlab.gnome.org/GNOME/gnome-session/-/issues/71 does a similar change
in GNOME. So those problematic variables should not cause failures, but just
be silently ignored.
Finally, the environment block is becoming a dumping ground. In my gnome
session 'systemctl show-environment --user' includes stuff like PWD, FPATH
(from zsh), SHLVL=0 (no idea what that is). This is not directly related to
variable names (since all those are allowed under the stricter rules too), but
I think we should start pushing people away from running import-environment and
towards importing only select variables.
Dan Streetman [Wed, 17 Jun 2020 20:28:39 +0000 (16:28 -0400)]
network: move set-MAC and set-nomaster operations out of link_up()
These should not be bundled into the link_up() operation, as that is
not (currently) called during interface configuration if the interface
already is IFF_UP, which is unrelated to the need to change the mac
to a user-defined value, or set 'nomaster' on the interface.
Additionally, there is no need to re-set the mac or re-assert nomaster
every time the interface is brought up; those should be only part of
normal initial interface configuration.
Let's try to make collisions when multiple clients want to use the same
device less likely, by sleeping a random time on collision.
The loop device allocation protocol is inherently collision prone:
first, a program asks which is the next free loop device, then it tries
to acquire it, in a separate, unsynchronized setp. If many peers do this
all at the same time, they'll likely all collide when trying to
acquire the device, so that they need to ask for a free device again and
again.
Let's make this a little less prone to collisions, reducing the number
of failing attempts: whenever we notice a collision we'll now wait
short and randomized time, making it more likely another peer succeeds.
(This also adds a similar logic when retrying LOOP_SET_STATUS64, but
with a slightly altered calculation, since there we definitely want to
wait a bit, under all cases)
On systems that have a udev before a7fdc6cbd399acdb1a975a7f72b9be4504a38c7c uevents would sometimes be
eaten because of device node collisions that caused the ruleset to fail.
Let's add an (ugly) work-around for this, so that we can even work with
such an older udev.
loop-util: if a loopback device we want to use still has partitions, do something about it
On current kernels (5.8 for example) under some conditions I don't fully
grok it might happen that a detached loopback block device still has
partition block devices around. Accessing these partition block devices
results in EIO errors (that also fill up dmesg). These devices cannot be
claned up with LOOP_CLR_FD (since the main device already is officially
detached), nor with LOOP_CTL_DELETE (returns EBUSY as long as the
partitions still exist). This is a kernel bug. But it appears to apply
to all recent kernels. I cannot really pin down what triggers this,
suffice to say our heavy-duty test can trigger it.
Either way, let's do something about it: when we notice this state we'll
attach an empty file to it, which is guaranteed to have to part table.
This makes the partitions go away. After closing/reoping the device we
hence are good to go again. ugly workaround, but I think OK enough to
use.
The net result is: with this commit, we'll guarantee that by the time we
attach a file to the loopback device we have zero kernel partitions
associated with it. Thus if we then wait for the kernel partitions we
need to appear we should have entirely reliable behaviour even if
loopback devices by the name are heavily recycled and udev events reach
us very late.
Previously, we'd just wait for the first moment where the kernel exposes
the same numbre of partitions as libblkid tells us. After that point we
enumerate kernel partitions and look for matching libblkid partitions.
With this change we'll instead enumerate with libblkid only, and then
wait for each kernel partition to show up with the exact parameters we
expect them to show up. Once that happens we are happy.
Care is taken to use the udev device notification messages only as hint
to recheck what the kernel actually says. That's because we are
otherwise subject to a race: we might see udev events from an earlier
use of a loopback device. After all these devices are heavily recycled.
Under the assumption that we'll get udev events for *at least* all
partitions we care about (but possibly more) we can fix the race
entirely with one more fix coming in a later commit: if we make sure
that a loopback block device has zero kernel partitions when we take
possession of it, it doesn't matter anymore if we get spurious udev
events from a previous use. All we have to do is notice when the devices
we need all popped up.
loop-util: LOOP_CLR_FD is async, don't retry to reuse a device right after issuing it
When we fall back to classic LOOP_SET_FD logic in case LOOP_CONFIGURE
didn't work we issue LOOP_CLR_FD first. But that call turns out to be
potentially async in the kernel: if something else (let's say
udev/blkid) is accessing the device the ioctl just sets the autoclear
flag and exits. Hence quite often the LOOP_SET_FD will subsequently
fail. Let's avoid the trouble, and immediately exit with EBUSY if
LOOP_CONFIGURE fails, and but remember that LOOP_CONFIGURE is not
available so that on the next iteration we go directly for LOOP_SET_FD
instead.
Since
https://github.com/torvalds/linux/commit/5db470e229e22b7eda6e23b5566e532c96fb5bc3 (i.e. kernel 5.0)
changing the .lo_offset field via LOOP_SET_STATUS64 might result in
EAGAIN. Let's handle that.
test-env-util: Verify that \r is disallowed in env var values
This adds tests to make sure that basic/env-util considers environment
variables containing \r characters invalid, and that it removes such
variables during environment cleanup in strv_env_clean*().
test-env-util has not verified this behaviour before.
As \r characters can be used to hide information, disallowing them
helps with systemd's security barrier role, even when the \r
character comes as part of a DOS style (\r\n) line ending.
The idea is that we have strvs like list of server names or addresses, where
the majority of strings is rather short, but some are long and there can
potentially be many strings. So formattting them either all on one line or all
in separate lines leads to output that is either hard to read or uses way too
many rows. We want to wrap them, but relying on the pager to do the wrapping is
not nice. Normal text has a lot of redundancy, so when the pager wraps a line
in the middle of a word the read can understand what is going on without any
trouble. But for a high-density zero-redundancy text like an IP address it is
much nicer to wrap between words. This also makes c&p easier.
This adds a variant of TABLE_STRV which is wrapped on output (with line breaks
inserted between different strv entries).
The change table_print() is quite ugly. A second pass is added to re-calculate
column widths. Since column size is now "soft", i.e. it can adjust based on
available columns, we need to two passes:
- first we figure out how much space we want
- in the second pass we figure out what the actual wrapped columns
widths will be.
To avoid unnessary work, the second pass is only done when we actually have
wrappable fields.
man/systemd-resolved: reword the description of query a bit
The phrase "routing domains" is used to mean both route-only domains and search
domains. Route-only domains are always called like that, and not just "route domains".
Some paragraphs are reordered to describe synthetisized records first, then
LLMNR, then various ways quries are routed.
The original intention to enabled this option by default is that
it only affects kernel post-panic behavior, so should have no harm.
But this is not true if the user wants a reliable kdump.
crash_kexec_post_notifiers is known to cause problem with kdump,
and it's documented in kernel. It's not easy to fix the problem
because of how kdump works. Kdump expects the crashed kernel to
jump to an pre-loaded crash kernel, so doing any extra job before
the jump will increase the risk.
It depends on the user to choose between having a reliable kdump or
some other post-panic debug mechanic.
So it's better to keep this config untouched by default, or it may put
kdump at higher risk of failing silently. User should enable it by
uncommenting the config line manually if pstore is always needed.
Also add a inline comment inform user about the potential issue.
test-path: start infinite sleep instead of a short command
The test sometimes fails, e.g. in bionic-s390x ci. I think it might be because
the service binary exits before we get a chance to notice that it is running:
13:59:31 --- Listing only the last 100 lines from a long log. ---
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4639845)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4539608)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4439376)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4338946)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4238702)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4138424)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 4038116)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3937835)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3837553)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3737250)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3636934)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3536622)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3436318)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3336021)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3235730)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3135468)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 3035158)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2934855)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2834541)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2732511)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2632255)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2532014)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2431746)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2331438)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2231213)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2130952)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 2030663)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1930428)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1830172)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1729906)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1629652)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1529368)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1429110)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1328852)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1228593)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1128320)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 1028083)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 927824)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 827564)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 724935)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 624664)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 524411)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 424124)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 323853)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 223585)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 120356)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: 18053)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 line 293: path-unit.path: state = running; result = success (left: -82385)
13:59:31 line 293: path-mycustomunit.service: state = exited; result = success
13:59:31 Test timeout when testing path-unit.path
It seems test/test-path/path-service.service wasn't actually used for anything.
We check that the binary exists before writing the service file, but
let's also not consider the service started until the fork has happened.
This is still relatively new stuff, so we're can change the implementation
details like this.
test-path: do not fail the test if we fail to start a service because of cgroup setup
The test was failing because it couldn't start the service:
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-modified.service: Failed to create cgroup /system.slice/kojid.service/path-modified.service: Permission denied
path-modified.service: Failed to attach to cgroup /system.slice/kojid.service/path-modified.service: No such file or directory
path-modified.service: Failed at step CGROUP spawning /bin/true: No such file or directory
path-modified.service: Main process exited, code=exited, status=219/CGROUP
path-modified.service: Failed with result 'exit-code'.
Test timeout when testing path-modified.path
In fact any of the services that we try to start may fail, especially
considering that we're doing some rogue cgroup operations. See
https://github.com/systemd/systemd/pull/16603#issuecomment-679133641.
Just to make it easier to grok what happens when test-path fails.
Change printf→log_info so that output is interleaved and not split in two
independent parts in log files.
williamvds [Wed, 21 Oct 2020 16:19:05 +0000 (17:19 +0100)]
systemctl: show original contents when editing unit
A comment indicates the start of the new contents of the override file,
and another indicates that lines following it will be discarded once
editing is finished.
The contents of the unit file and drop-ins are listed out after this
last marker.
Adds WRITE_STRING_FILE_TRUNCATE to set O_TRUNC when opening a file.
Thanks to cgzones for providing the required SELinux function calls.
Co-authored-by: Christian Göttsche <cgzones@googlemail.com>
Jonathan Lebon [Tue, 20 Oct 2020 20:30:20 +0000 (16:30 -0400)]
units: add initrd-cryptsetup.target
For encrypted block devices that we need to unlock from the initramfs,
we currently rely on dracut shipping `cryptsetup.target`. This works,
but doesn't cover the case where the encrypted block device requires
networking (i.e. the `remote-cryptsetup.target` version). That target
however is traditionally dynamically enabled.
Instead, let's rework things here by adding a `initrd-cryptsetup.target`
specifically for initramfs encrypted block device setup. This plays the
role of both `cryptsetup.target` and `remote-cryptsetup.target` in the
initramfs.
Then, adapt `systemd-cryptsetup-generator` to hook all generated
services to this new unit when running from the initrd. This is
analogous to `systemd-fstab-generator` hooking all mounts to
`initrd-fs.target`, regardless of whether they're network-backed or not.
Chandradeep Dey [Sun, 18 Oct 2020 09:59:40 +0000 (15:29 +0530)]
homed: remove PAM_USER_UNKNOWN test in pam_sm_acct_mgmt
Why this change
---------------
Assumption - PAM's auth stack is properly configured.
Currently account pam_systemd_home.so returns PAM_SUCCESS for non
systemd-homed users, and a variety of return values (including
PAM_SUCCESS) for homed users.
account pam_unix returns PAM_AUTHINFO_UNAVAIL for systemd-homed
users, and a variety of return values (including PAM_AUTHINFO_UNAVAIL)
for normal users.
No possible combination in the pam stack can let us preserve the
various return values of the modules. For example, the configuration
mentioned in the manpage causes account pam_unix to never be reached
since pam_systemd_home just returns a success for ordinary users. Users
with expired passwords are allowed to log in because a check cannot be
made.
More configuration examples and why they don't work are mentioned
in #16906 and the downstream discussion linked there.
After this change
-----------------
account pam_unix will continue to return wrong value for homed users.
But we can skip the module conditionally using the return value from
account pam_systemd_home. We can already do this with the auth and
password modules.