Roland Hieber [Mon, 18 Sep 2023 09:52:06 +0000 (11:52 +0200)]
udev: generate system-unique storage symlinks using device path
When the same disk image is written to multiple storage units, for
example an external SD card and an internal eMMC, the symlinks in
/dev/disk/by-{label,uuid,partlabel,partuuid}/ are no longer unique, and
will point to the device that is probed last.
Adressing partitions via labels and UUIDs is nice to work with, and
depending on the use case, it might also be more robust than using the
symlinks in /dev/disk/by-path/ containing the partition number. Combine
the two approaches to create unique symlinks containing both the device
path as well as the respective UUIDs or labels, and throw in a symlink
using the devpath and the partition number for the sake of completeness.
For an exemplary GPT-partitioned disk at "platform-2198000.mmc" with a
partition containing an ext4 file system, this might create symlinks of
the following form:
Mike Yuan [Sat, 25 Nov 2023 14:10:16 +0000 (22:10 +0800)]
systemctl-whoami: use pidfd to refer to processes
While at it, rephrase the output a bit. Before this commit, if
the pid doesn't exist, we output something hard to interpret -
"Failed to get unit for ourselves".
mime: register confext/sysext images in shared-mime-info
This make them recognized by file managers and stuff. Maybe one day we
should properly register mime types in the "vnd." namespace with IANA,
but I am too lazy to deal with the bureaucracy for that, hence let's
stick with the x. namespace for now.
userdbctl: enable ssh-authorized-keys logic by default
sshd now supports config file drop-ins, hence let's install one to hook
up "userdb ssh-authorized-keys", so that things just work.
We put the drop-in relatively early, so that other drop-ins generally
will override this.
Ideally sshd would support such drop-ins in /usr/ rather than /etc/, but
let's take what we can get. It's not that sshd's upstream was
particularly open to weird ideas from Linux people.
pid1: add ProtectSystem= as system-wide configuration, and default it to true in the initrd
This adds a new ProtectSystem= setting that mirrors the option of the
same of services, but in a more restrictive way. If enabled will remount
/usr/ to read-only, very early at boot. Takes a special value "auto"
(which is the default) which is equivalent to true in the initrd, and
false otherwise.
Unlike the per-service option we don't support full/strict modes, but
the door is open to eventually support that too if it makes sense. It's
not entirely trivial though as we have very little mounted this early,
and hence the mechanism might not apply 1:1. Hence in this PR is a
conservative first step.
My primary goal with this is to lock down initrds a bit, since they
conceptually are mostly immutable, but they are unpacked into a mutable
tmpfs. let's tighten the screws a bit on that, and at least make /usr/
immutable.
This is particularly nice on USIs (i.e. Unified System Images, that pack
a whole OS into a UKI without transitioning out of it), such as
diskomator.
We have two mechanisms that remove old coredumps: systemd-coredump has
parameters based on disk use / remaining disk free, and systemd-tmpfiles does
cleanup based on time. The first mechanism should prevent us from using too much
disk space in case something is crashing continuously or there are very large
core files.
The limit of 3 days makes it likely that the core file will be gone by the time
the admin looks at the issue. E.g. if something crashes on Friday, the coredump
would likely be gone before people are back on Monday to look at it.
We'll typically send signals to all remaining processes in the following
cases:
1. pid1 (in initrd) when transitioning from initrd to sysroot: SIGTERM
2. pid1 (in sysroot) before transitioning back to initrd (exitrd): SIGTERM + SIGKILL
3. systemd-shutdown (in exitrd): SIGTERM + SIGKILL
'warn_rootfs' is set to true only when we're not in initrd and we're
sending SIGKILL, which means the second case. So, we want to emit the
warning when the root of the storage daemon IS the same as that of pid1,
rather than the other way around.
The condition is spuriously reversed in the offending commit.
loginctl: show a nicer error message when no session/seat is available
When calling loginctl {seat,session}-status without arguments, show a nicer
error message in case there's no suitable session/seat attached to the calling
tty.
Before:
~# loginctl seat-status
Could not get properties: Unknown object '/org/freedesktop/login1/seat/auto'.
~# systemd-run -q -t loginctl seat-status
Could not get properties: Unknown object '/org/freedesktop/login1/seat/auto'.
~# systemd-run -q -t loginctl session-status
Could not get properties: Unknown object '/org/freedesktop/login1/session/auto'.
After:
~# build/loginctl seat-status
Failed to get path for seat 'auto': Session '1' has no seat.
~# systemd-run -q -t build/loginctl seat-status
Failed to get path for seat 'auto': Caller does not belong to any known session and doesn't own any suitable session.
~# systemd-run -q -t build/loginctl session-status
Failed to get path for session 'auto': Caller does not belong to any known session and doesn't own any suitable session.
Daan De Meyer [Tue, 5 Dec 2023 13:56:15 +0000 (14:56 +0100)]
repart: Add Minimize=best to --make-ddi= partition definitions
Otherwise, repart won't calculate the minimal size of the partition
automatically and things will fail once the partitions exceed the
minimal partition size (10M).
ukify: raise error if genkey is called with no output arguments
The idea is that genkey is called with either
--secureboot-private-key= + --secureboot-certificate=, and then it
writes those, or with --pcr-private-key + optionally --pcr-public-key
and then it writes those, or both. But when called with no arguments
whatsover, it did nothing.
There is no implicit value for any of those parameters as input (unlike in
mkosi), so we also don't want to have implicit values when used as output.
But we shouldn't return success if no work was done, this is quite confusing.
Roland Singer [Wed, 6 Dec 2023 09:49:47 +0000 (10:49 +0100)]
ukify: fix handling of --secureboot-certificate-validity= (#30315)
Before:
$ python src/ukify/ukify.py genkey --secureboot-private-key=sb2.key --secureboot-certificate=sb2.cert --secureboot-certificate-validity=111
Traceback (most recent call last):
File "/home/zbyszek/src/systemd-work/src/ukify/ukify.py", line 1660, in <module>
main()
File "/home/zbyszek/src/systemd-work/src/ukify/ukify.py", line 1652, in main
generate_keys(opts)
File "/home/zbyszek/src/systemd-work/src/ukify/ukify.py", line 943, in generate_keys
key_pem, cert_pem = generate_key_cert_pair(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zbyszek/src/systemd-work/src/ukify/ukify.py", line 891, in generate_key_cert_pair
now + ONE_DAY * valid_days
~~~~~~~~^~~~~~~~~~~~
TypeError: can't multiply sequence by non-int of type 'datetime.timedelta'
Now:
$ python src/ukify/ukify.py genkey --secureboot-private-key=sb2.key --secureboot-certificate=sb2.cert --secureboot-certificate-validity=111
Writing SecureBoot private key to sb2.key
Writing SecureBoot certificate to sb2.cert
The man page doesn't even mention errno. It just says that ferror() should
be used to check for errors. Those writes are unlikely to fail, but if they
do, errno might even be 0. Also, we have fflush_and_check() which does
additional paranoia around errno, because we apparently do not trust that
errno will always be set correctly.
This is a fancy wrapper around "cat <<EOF", but:
- the user doesn't need to figure out the file name,
- parent directories are created automatically,
- daemon-reload is implied,
so it's a convenient way to create units or drop-ins.
Luca Boccassi [Tue, 5 Dec 2023 15:43:12 +0000 (15:43 +0000)]
switch-root: also check that mount IDs are the same, not just inodes
If /run/nextroot/ has been set up, use it, even if the inodes are
the same. It could be a verity device that is reused, but with
different sub-mounts or other differences. Or the same / tmpfs with
different /usr/ mounts. If it was explicitly set up we should use it.
Use the new helper to check that the mount IDs are also the same,
not just the inodes.
If the user edits result in an empty file (after stripping), we would silently
not do anything, aborting the edit. This is rather confusing, let's emit a
notice:
$ build/systemctl --user edit asdf.service --full --stdin </dev/null
/home/zbyszek/.config/systemd/user/asdf.service: after editing, new contents are empty, not writing file.
(This also works with an editor, instead of --stdin. The message is printed on
the console and is visible after the editor exits.)
While at it, fix the condition to skip writing file after stripping. We had
"old_contents", but we modified that string. In some code flows, we would
compare the stripped old contents (i.e. a string which by defintion doesn't
have a newline at the end) with a string to which we just appended a newline
(i.e. a string which by defintion has a newline at the end).
shared/edit-util: split out function to populate temp file
In preparation for future changes: I want to add a mode where interactive
editing is not done, and when this preparation is moved to a helper, it's much
easier to skip it.
e->line is initialized to 1 and overwritten even if sync fails. Theoretically
this is against our style, but the alternative is to propagate a temporary
value of line through the layers, which adds a lot of noise. If we fail, this
EditFile object will not be used for anything, to the changed value of .line
has no effect.
test: set correct group for systemd-journal-upload tests
We can't use the systemd-journal-upload user here, since it's created
dynamically by DynamicUser=yes. However, we can use the group specified
in SupplementaryGroups=, so do exactly that.
Revert "sysusers.d: create the user for systemd-journal-upload.service"
I have no idea what was my reasoning that led to this change, but it is
simply wrong: systemd-journal-upload.service uses
User=systemd-journal-upload together with DynamicUser=yes, so the user
doesn't have to (and shouldn't) exist before starting the service.
The original reason for deny-listing it was that it's flaky there. I'm
not sure if that's still the case, but the Ubuntu CI jobs for i*86 are
gone, so this file shouldn't be needed anymore anyway.
The regexp only worked if the sections were small enough for the size to
start with "0". I have an initrd that is 0x1078ec7e bytes, so the tests
would spuriously fail.
Note, currently, the function is always called with SCMP_ACT_ALLOW as
the default action, except for the test. So, this should not change
anything in the runtime code.
run: fix bad escaping and memory ownership confusion
arg_description was either set to arg_unit (i.e. a const char*), or to
char *description, the result of allocation in run(). But description
was decorated with _cleanup_, so it would be freed when going out of the
function. Nothing bad would happen, because the program would exit after
exiting from run(), but this is just all too messy.
Also, strv_join(" ") + shell_escape() is not a good way to escape command
lines. In particular, one the join has happened, we cannot distinguish
empty arguments, or arguments with whitespace, etc. We have a helper
function to do the escaping properly, so let's use that.
core: when applying syscall filters, use ENOSYS for unknown calls
glibc starting using fchmodat2 to implement fchmod with flags [1], but
current version of libseccomp does not support fchmodat2 [2]. This is
causing problems with programs sandboxed by systemd. libseccomp needs to know
a syscall to be able to set any kind of filter for it, so for syscalls unknown
by libseccomp we would always do the default action, i.e. either return the
errno set by SystemCallErrorNumber or send a fatal signal. For glibc to ignore
the unknown syscall and gracefully fall back to the older implementation,
we need to return ENOSYS. In particular, tar now fails with the default
SystemCallFilter="@system-service" sandbox [3].
This is of course a wider problem: any time the kernel gains new syscalls,
before libseccomp and systemd have caught up, we'd behave incorrectly. Let's
do the same as we already were doing in nspawn since 3573e032f26724949e86626eace058d006b8bf70, and do the "default action" only
for syscalls which are known by us and libseccomp, and return ENOSYS for
anything else. This means that users can start using a sandbox with the new
syscalls only after libseccomp and systemd have been updated, but before that
happens they behaviour that is backwards-compatible.
In seccomp_restrict_sxid() there's a chunk conditionalized with
'#if defined(__SNR_fchmodat2)'. We need to kep that because seccomp_restrict_sxid()
seccomp_restrict_suid_sgid() uses SCMP_ACT_ALLOW as the default action.