Dan Streetman [Wed, 7 Dec 2022 16:23:59 +0000 (11:23 -0500)]
tpm2: move policy building out of policy session creation
This retains the use of policy sessions instead of trial sessions
in most cases, based on the code comment that some TPMs do not
implement trial sessions correctly. However, it's likely that the
issue was not the TPMs, but our code's incorrect use of PolicyPCR
inside a trial session; we are not providing expected PCR values
with our call to PolicyPCR inside a trial session, but the spec
indicates that in a trial session, the TPM *may* return error if
the expected PCR value(s) are not provided. That may have been the
source of the original confusion about trial sessions.
More details:
https://github.com/systemd/systemd/pull/26357#pullrequestreview-1409983694
Also, future commits will replace the use of trial sessions with
policy calculations, which avoids the problem entirely.
Ronan Pigott [Fri, 5 May 2023 19:33:29 +0000 (12:33 -0700)]
zsh: remove usage of PREFIX in _systemctl
The usage of PREFIX in this completion is mostly counter to the intended
usage of compsys in zsh. It is generally expected that completion code
provide the available completions and tags in that word position so that
compsys, with user configuration, can filter them to the appropriate set.
One egregious error caused by the usage of PREFIX here is the caching of
SYS_ALL_UNITS, which stored only the unit names prematurely filtered by
the completion prefix, affecting all future completions. For example,
$ systemctl cat nonsense<TAB>
might find no matching units if nonsense* has no matches, but now
$ systemctl cat <TAB>
will fail in all future completions even though every unit file
is a valid match, because the cached set has been erroneously filtered
by the last prefix.
Nick Rosbrook [Tue, 2 May 2023 16:30:31 +0000 (12:30 -0400)]
basic/audit-util: make a test request before enabling use of audit
If a container manager does not follow the guidance in
https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities,
then the current check may not be sufficient to determine that audit
will function properly. In particular, when calling bind() on the audit
fd, we will get EPERM if running in a user-namespaced container.
Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to
test if it is working. If this fails with ECONNREFUSED, we know it is
because the kernel does not support the use of audit outside of the
initial user namespace.
Note that the approach of this patch was suggested here:
https://github.com/systemd/systemd/pull/19443#issuecomment-829566659
As in mkosi(1), let's describe the config file and commandline options
together. This is nice for us, because we don't need to duplicate descriptions
and we're less likely to forget to update one place or the other. This is also
nice for users, because they can easily figure out what can be configured
where.
The options are now ordered by config file section.
test_ukify: rework how --flakes argument is appended
The usual approach is to put 'addopts = --flakes' in setup.cfg. Unfortunately
this fails badly when pytest-flakes is not installed:
ERROR: usage: test_ukify.py [options] [file_or_dir] [file_or_dir] [...]
test_ukify.py: error: unrecognized arguments: --flakes
pytest-flakes is not packaged everywhere, and this test is not very important,
so let's just do it only if pytest-flakes is available. We now detect if
pytest-flakes is available and only add '--flakes' conditionally. This
unfortunately means that when invoked via 'pytest' or directly as
'src/ukify/test/test_ukify.py', '--flakes' will not be appended automatically.
But I don't see a nice way to achieve previous automatic behaviour.
(I first considered making 'setup.cfg' templated. But then it is created
in the build directory, but we would need it in the source directory for
pytest to load it automatically. So to load the file, we'd need to give an
argument to pytest anyway, so we don't gain anything with this more complex
approach.)
Note to self: PEP 585 introduced using collection types as types,
and is available since 3.9. PEP 604 allows writing unions with "|",
but is only available since 3.10, so not yet here because we maintain
compat with 3.9.
test-kernel-install: test 60-ukify.install and 90-uki-copy.install
We install a kernel with layout=uki and uki_generator=ukify, and test
that a UKI gets installed in the expected place. The two plugins cooperate,
so it's easiest to test them together.
60-ukify: kernel-install plugin that calls ukify to create a UKI
60-ukify.install calls ukify with a config file, so singing and policies and
splash will be done through the ukify config file, without 60-ukify.install
knowing anything directly.
In meson.py, the variable for loaderentry.install.in is used just once, let's
drop it. (I guess this approach was copied from kernel_install_in, which is
used in another file.)
The general idea is based on cvlc12's #27119, but now in Python instead of
bash.
ukify: rework option parsing to support a config file
In some ways this is similar to mkosi: we have a argparse.ArgumentParser()
with a bunch of options, and a configparser.ConfigParser() with an
overlapping set of options. Many options are settable in both places, but
not all. In mkosi, we define this in three places (a dataclass, and a
function for argparse, and a function for configparser). Here, we have one
huge list of ConfigItem instances. Each instance specifies the full metadata
for both parsers. Argparse generates a --help string for all the options,
and we also append a config file sample to --help based on the ConfigItem
data:
With --summary, existence of input paths is not checked. I think we'll
want to show them, instead of throwing an error, but in red, similarly to
'bootctl list'.
This also fixes tests which were failing with e.g.
E FileNotFoundError: [Errno 2] No such file or directory: '/ARG1'
=========================== short test summary info ============================
FAILED ../src/ukify/test/test_ukify.py::test_parse_args_minimal - FileNotFoun...
FAILED ../src/ukify/test/test_ukify.py::test_parse_args_many - FileNotFoundEr...
FAILED ../src/ukify/test/test_ukify.py::test_parse_sections - FileNotFoundErr...
=================== 3 failed, 10 passed, 3 skipped in 1.51s ====================
This is closely related to the previous commit: if the credentials dir
is empty and nothing mounted on it, let's remove it again.
This will in particular happen if we decided to not actually install the
mount we prepared for the credentials because it is empty. In that case
the mount point inode is already there, and with this we'll remove it.
Primary effect, users will see ENOENT rather than EACCESS when trying to
access it, which should be preferable, given we already handle that
nicely in our credential consumption code.
This should also be useful on systems where we lack any privs to create
mounts, and thus operate on a regular dir anyway.
Let's avoid creating another mount in the system if it's empty anyway.
This is mostl a cosmetic thing in one (pretty common) special case: if
creds settings are used in a unit but no creds actually available to be
passed.
(While we are at it this also does one more minor optimization: it
adjusts the MS_RDONLY/MS_NOSUID/… flags of the source mount we are about
to MS_MOVE into the right place only if we actually really move it, and
if we instead unmount it again we won't bother with the flags either)
There's no need to fchdir() out of the rootfs and back into it around
the umount2(), hence don't.
This brings the logic closer to what the pivot_root() man page suggests.
While we are at it, always operate based on fds, once we opened the
original dir, and pass the path string along only for generating
messages (i.e. as "decoration").
Add tests for both code paths: the pivot_root() one and the MS_MOUNT.
notify: don't send EXIT_STATUS= notify message from systemd-notify
In 623a00020f116d8e9c70608a9e4f7cc978342441 code was added that our
various programs send a notification message with their exit status on
exit. This is great, but it becomes utterly confusing in systemd-notify,
whose primary purpose is to send such messages after all, and sending an
implicit one in addition to the primary one is particularly confusing,
when debugging things.
Let's hence just drop the implicit message. systemd-notify's exit status
is after all indicative primarily because sd_notify() failed, and hence
it's pretty pointless to then send that fact as another sd_notify()
message.
(Primary reason for this patch is simply that it confused the hell out
of me, when debugging sd_notify() issues)
base-filesystem: add new helper base_filesystem_create_fd() that operates on an fd, instead of a path
This also changes the open flags from
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC|O_NOFOLLOW to
O_DIRECTORY|O_CLOEXEC. O_RDONLY is redundant, since O_RDONLY is zero
anyway, and O_DIRECTORY pins the acces mode enough: it doesn't allow
read()/write() anyway when specified. O_NONBLOCK is also pointless given
that O_DIRECTORY is specified, it has no meaning on directories. (It is
useful if we don't know much about the inode we are opening, and could
be a device node or fifo, but the O_DIRECTORY excludes that case.)
O_NOFOLLOW is dropped since there's really no point in blocking out the
initial entrypoint being a symlink. Once we pinned the the root of the
tree it might make sense to restrict symlink use below it, but for the
entrypoint itself it doesn't matter.
switch-root: don't require /mnt/ when switching root into host OS
So far, we invoked pivot_root() specifying /mnt/ as second argument,
which then unmounted right-after. We'd create /mnt/ if needed. This
sucks, because it means /mnt/ must strictly be pre-created on immutable
images.
Remove this limitation, by using pivot_root() with "." as source and
target, which will result in two stacked mounts afterwards: the new one
underneath, the old one ontop. We can then simply unmount the top one,
and have what we want without needing any extra /mnt/ dir.
Since we don't need /mnt/ anymore we can get rid of the extra
unmount_old_root parameter and simply specify it as NULL if we don't
want the old mount to stick around.
We can always build the standalone version whenever we build the normal version
(the dependencies are the same). In most builds standalone binaries would be
disabled. But it is occasionally useful to have them for testing, so move the
conditional to install:, so the binaries can be build by giving the explicit
target name.
The default of 'build_by_default' for executable() is sadly true (since meson
0.38.0), so need to specify build_by_default: too.
Also add systemd-shutdown.standalone to public_programs for additional testing.
meson: avoid building executables that won't be installed
When executable() or custom_target() has install: that is conditional as is
false (i.e. not install:true), it won't be built by default. (build_by_default:
defaults to install:). But if that program is added to public_programs, it will
be build by default because it is pulled in by the test, effectively defeating
the disablement.
While at it, make 'ukify' follow the same pattern as 'kernel-install'.
They will be used later together.
We generally nowadays use UPPERCASE for parameters in variuos help text.
Let's be consistent here too, and also drop duplicated 'usage:':
$ ukify -h
usage: ukify [options…] LINUX INITRD…
ukify -h | --help
Jan Janssen [Tue, 2 May 2023 17:41:58 +0000 (19:41 +0200)]
boot: Use correct memory type for allocations
We were using the wrong memory type when allocating pool memory. This
does not seem to cause a problem on x86, but the kernel will fail to
boot at least on ARM in QEMU.
This is caused by mixing different allocation types which ended up
breaking the kernel or EDK2 during boot services exit. Commit 2f3c3b0bee5534f2338439f04b0aa517479f8b76 appears to fix this boot
failure because it was replacing the gnu-efi xpool_print with xasprintf
thereby unifying the allocation type.
But this same issue can also happen without this fix somehow when the
random-seed logic is in use.
msizanoen1 [Tue, 2 May 2023 09:59:07 +0000 (16:59 +0700)]
core: check for SERVICE_RELOAD_NOTIFY in manager_dbus_is_running
This ensures that systemd won't erronously disconnect from the system
bus in case a bus recheck is triggered immediately after the bus service
emits `RELOADING=1`.
This fixes an issue where systemd-logind sometimes randomly stops
receiving `UnitRemoved` after a system update.
This also handles SERVICE_RELOAD_SIGNAL just in case somebody ever
creates a D-Bus broker implementation that uses `Type=notify-reload`.
generators: skip private tmpfs if /tmp does not exist
When spawning generators within a sandbox we want a private /tmp, but it
might not exist, and on some systems we might be unable to create it
because users want a BTRFS subvolume instead.
base-filesystem: create /proc, /sys, /dev mount points as 0555
These inodes are going to be overmounted anyway, hence let's create them
with access mode 555, so that they are as close to being immutable as
regular UNIX access modes allow them to be. In other words: this takes
the "w" mode away for root. This of course usually has little effect --
unless CAP_DAC_OVERRIDE is dropped. But at the very least it makes the
point clear that inodes should be considered immutable.
(I intended to make this 0000 originally, but that doesn't work, as many
tools – including our own – have fallback paths that when they see
ENOENT in /proc/ they can handle this gracefully. But changing the mode
to 000 would turn this to EACCES - something they usually have no
fallback path for)
This slightly extends the symbol file test and checks which symbols are
listed in one list but missing in the other. This is tremendously useful
to quickly determine which symbols wheren't exposed properly but should
have been.
(This is is implemented in pure C, no systemd helpers, to ensure we see
libsystemd.so API as any other tool would.)
mkosi: Switch to use mkosi presets with prebuilt initrds
Instead of building the initrds for the mkosi images with dracut,
let's switch to using mkosi presets to build the initrd with mkosi
as well.
This commit splits up our single image build into three separate
mkosi presets:
1. The "base" preset. This image contains systemd and all its runtime
dependencies. The sole purpose of this image is to serve as a base image
for the initrd and the final image. It's also responsible for building
systemd from source with the build script. The results are installed into
the base image. Note that we install the systemd and udev packages into this
image as well to prevent package managers from overriding the systemd we built
from source with the distro packaged systemd if it's pulled in as a dependency
by another package from the initrd or final profiles.
2. The "initrd" preset. This image provides the initrd. It's trivial and does
nothing more than packaging the base image up as a zstd compressed initramfs and
adds /init and /etc/initrd-release symlinks to the image.
3. The "final" preset. This image builds on top of the base image and adds
a kernel and extra packages that are useful for testing and debugging.
We also split out the optional kernel build into a separate set of config files
that are only included if a kernel to build is actually provided.
Note that this commit doesn't really change anything about how mkosi is used.
The commands remain the same, except that mkosi will now build all the presets
in order. "mkosi summary" will show the summary of all the presets. "mkosi qemu,
boot, shell" will always boot the final preset. With "-f", all presets will be
built and the final one is booted. "-i" makes a cache of each preset.
The only thing to keep in mind is that specifying config via the mkosi CLI will
apply to each of the presets. e.g. any extra packages added with "-p" will be
installed in both the initrd and the final image. To apply local configuration
to a single preset, create a file 00-local.conf in
mkosi.presets/<profile>/mkosi.conf.d and put all the preset specific configuration
in there.
Last month I monkey-patched journald to produce a small (64K) but valid
journal and used that as an input to four AFL fuzzers. After a month it
generated quite a nice corpora (4738 test cases) and after filtering
and minimizing it I was left with 619 unique journals with various
levels of corruption that probe the journal code.
It seems to detect past issues like systemd#26567, etc.