dissect-image: tighten checks on root + /usr/ combinations
Our code logic doesn't support images with two verity partitions at the
moment, hence refuse this early (with ENOTUNIQ)
Also, go even further and refuse any combinations of verity enabled root
with verity-less /usr, simplify because that is unsafe and defeats the
point of verity. (i.e. we want to give the guarantee that for
auto-discovered verity magic we guarantee that the data afterwards
available in /usr is safe).
We already check whether we discovered a /usr verity partition without a
/usr partition when initially mangling the partitions, a bunch of lines
further up, no need to repeat this here.
dissect-image: mangle discovered /usr/ partition data, even if we found a root partition
Previously, we'd clean up discovered /usr/ partition data only if we did
not find a root partition. Given that we allow combinations of root and
/usr partitions clean things up in both cases however.
dissect-image: refuse external verity data in partitioned mode
Our code doesn't support setting up verity with an external verity data
file unless we operate in non-partitioned mode. Let's refuse this
clearly and early if attempted anyway.
dissect-image: also derive read-only mode from fstype in non-partitioned mode
For the GPT partitioned logic we also consult the fstype to determine whether
a partition is read-only (i.e. squashfs is already read-only). For the
non-partitioned mode we didn't do that so far. Fix that.
Let's also pick more precise names for these helpers that are used for
the tabular output: one checks whether a partition is candidate for
verity at all, and the other checks if it is ready to be used for it.
Let's make this clearer in the name.
Let's make the booleans indicating verity state a bit more descriptive.
Let's rename:
can_verity → has_verity: because that's really what this about
whether verity data is included in the image. Whether we actually
can use it is a different story.
verity → verity_ready: this one should tell us if we have everything
need to actually set it up, hence explicitly say "ready to use" in
the name.
The Bootloader Specification says "devicetree refers to the binary
device tree to use when executing the kernel..", but systemd-boot
didn't actually do anything when encountering this stanza until now.
Add support for loading, applying fixups if relevant, and installing the
new device tree before executing the kernel.
in order to uppercase the first letter of the user's real name. This is
a glib bug, because there is a different codepath that gets the pwd from
vanilla getpwnam instead of getpwnam_r as shown here. When the pwd
struct is returned by getpwnam, its fields point to static data owned by
glibc/NSS, and so it must not be modified by the caller. After much
debugging, Jamie Bainbridge has fixed this in https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2244
by making a copy of the data before modifying it, and that resolves all
problems for glib. Yay!
However, glib is crashing even when getpwnam_r is used instead of
getpwnam! According to getpwnam_r(3), the strings in the pwd struct are
supposed to be pointers into the buffer passed by the caller, so glib
should be able to safely edit it directly in this case, so long as it
doesn't try to increase the size of any of the strings.
Problem is various functions throughout nss-systemd.c return synthesized
records declared at the top of the file. These records are returned
directly and so contain pointers to static strings owned by
libsystemd-nss. systemd must instead copy all the strings into the
provided buffer.
This crash is reproducible if nss-systemd is listed first on the passwd
line in /etc/nsswitch.conf, and the application looks up one of the
synthesized user accounts "root" or "nobody", and finally the
application attempts to edit one of the strings in the returned struct.
All our synthesized records for the other struct types have the same
problem, so this commit fixes them all at once.
nss-systemd: pack pw_passwd result into supplied buffer
getpwnam_r() guarantees that the strings in the struct passwd that it
returns are pointers into the buffer allocated by the application and
passed to getpwnam_r(). This means applications may choose to modify the
strings in place, as long as the length of the strings is not increased.
So it's wrong for us to return a static string here, we really do have
to copy it into the application-provided buffer like we do for all the
other strings.
This is only a theoretical problem since it would be very weird for an
application to modify the pw_passwd field, but I spotted this when
investigating a similar crash caused by glib editing a different field.
See also:
Michal Sekletar [Wed, 8 Sep 2021 13:42:11 +0000 (15:42 +0200)]
sd-event: take ref on event loop object before dispatching event sources
Idea is that all public APIs should take reference on objects that get
exposed to user-provided callbacks. We take the reference as a
protection from callbacks dropping it. We used to do this also here in
sd_event_loop(). However, in cleanup portion of f814c871e6 this was
accidentally dropped.
The `dracut_install` is a misnomer, since the systemd integration test
suite is based on the original dracut's test suite, and not all the
references to dracut has been edited out. Let's fix that.
FIDO2 device access is serialised by libfido2 using flock().
Therefore, make sure to close a FIDO2 device once we are done
with it, or we risk opening it again at a later point and
deadlocking. Fixes #20664.
Maanya Goenka [Thu, 26 Aug 2021 07:17:32 +0000 (00:17 -0700)]
systemd-analyze: add new option to generate JSON output of security analysis table
The new option --json= works with the 'security' verb and takes in one of three format flags.
These are off which is the default, pretty and short which use JSON format flags for output.
When set to true, it generates a JSON formatted output of the security analysis table. The
format is a JSON array with objects containing the following fields: set which indicates if
the id has been set or not, name which is what is used to refer to the id, json_field
which is the equivalent JSON formatted id name only used for JSON outputs, description which
is an outline of the id state, and exposure which is an unsigned integer in the range 0.0..10.0,
where a higher value corresponds to a higher security threat. The JSON version of the table is
printed on the standard output file.
Example Run:
The unit file testfile.service was created to test the --json= option
/usr/lib/systemd/system/plymouth-start.service:15: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's
process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'.
Support for KillMode=none is deprecated and will eventually be removed.
/usr/lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating
/var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
/usr/lib/systemd/system/gdm.service:30: Standard output type syslog is obsolete, automatically updating to journal. Please update your
unit file, and consider removing the setting altogether.
/home/maanya-goenka/systemd/foo.service:2: Unknown key name 'foo' in section 'Unit', ignoring.
NAME DESCRIPTION EXPOSURE
✓ PrivateNetwork= Service has no access to the host's network
✗ User=/DynamicUser= Service runs as root user 0.4
✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service may change UID/GID identities/capabilities 0.3
✗ CapabilityBoundingSet=~CAP_NET_ADMIN Service has administrator privileges 0.3
→ Overall exposure level for testfile.service: 8.3 EXPOSED 🙁
/usr/lib/systemd/system/plymouth-start.service:15: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's
process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'.
Support for KillMode=none is deprecated and will eventually be removed.
/usr/lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating
/var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
/usr/lib/systemd/system/gdm.service:30: Standard output type syslog is obsolete, automatically updating to journal. Please update your
unit file, and consider removing the setting altogether.
/home/maanya-goenka/systemd/foo.service:2: Unknown key name 'foo' in section 'Unit', ignoring.
/usr/lib/systemd/system/plymouth-start.service:15: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's
process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'.
Support for KillMode=none is deprecated and will eventually be removed.
/usr/lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating
/var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
/usr/lib/systemd/system/gdm.service:30: Standard output type syslog is obsolete, automatically updating to journal. Please update your
unit file, and consider removing the setting altogether.
/home/maanya-goenka/systemd/foo.service:2: Unknown key name 'foo' in section 'Unit', ignoring.
[{"set":true,"name":"PrivateNetwork=", "json_field":"PrivateNetwork", "description":"Service has no access to the host's network","exposure":null}, ...]
systemd-analyze: use config value in RestrictNamespaces id (#20645)
For most fields, the text shown by `.id` is the value that should be set
in the unit file; however, for RestrictNamespaces, it is not. Changing
this to show the actual text makes it more clear to a user what the
actual change that needs to be made to the unit file is.
Luca Boccassi [Wed, 4 Aug 2021 14:00:06 +0000 (15:00 +0100)]
portabled: validate SYSEXT_LEVEL when attaching
When attaching a portable service with extensions, immediately validate
that the os-release and extension-release metadata values match, rather
than letting it fail when the units are started
format-table: allow to explicitly override JSON field names
In some cases it's useful to explicitly generate the JSON field names to
generate for table columns, instead of auto-mangling them from table
header names that are intended for human consumption.
This adds the infra and a test for it.
It's intended to be used by #20544, for the first column, which in text
mode should have an empty header field, but have an explicit name in
json output mode.
With this change, "mkosi build" will automatically build systemd for the
current distro without any further configuration. If people want to do a
cross-distro build by default, they can still create mkosi.default, but I
assume that this is relatively rare.
If people have symlinked mkosi.default to one of the files in .mkosi/, they'll
need to adjust the symlink.
(Building without configuration would always fail, since systemd has many many
required dependencies. I think it's nicer to do the most commonly expected
thing by default, i.e. rebuild for the current distro.)
Mkosi is nowadays packaged for most distros, so recommend installing of distro
packages as the primary installation mechanism.
gitignore: only ignore *local*.conf" under mkosi.default.d/
The pattern was added in 6242cda99d9194efec20997697d703c0c005dbd4, with the
idea that users will have local configuration files for mkosi and git should
not bother them about those. But let's make this narrower, and only match
files with "local". This way we reduce the risk that some unrelated file
will be ignored by accident.
.gitignore in the parent directory is used, because mkosi apparently tries
to load all files under mkosi.default.d/, without looking at the extension.
This is probably something to fix in mkosi too.
mkosi: drop the code to determine nobody user name
The comments were outdated: at least "nfsnobody" is not used in Fedora since a
few years. So I hope we don't need this anymore. The meson build scripts do
autodetection on their own.
Bastien Nocera [Tue, 24 Aug 2021 11:54:02 +0000 (13:54 +0200)]
hwdb: Allow end-users root-less access to USB analyzers
Procotol analyzers are external devices used to capture traffic over a
wire so that it could be analysed. End-users at the console should be
able to access those devices without requiring root access.
This change obsoletes the need to install Total Phase's "Linux drivers",
which are really just udev rules and hotplug usermap files to do that:
https://www.totalphase.com/products/usb-drivers-linux/
Vito Caputo [Tue, 31 Aug 2021 01:20:53 +0000 (18:20 -0700)]
sd-journal: use FILE streams to buffer write_uint64()
journal_file_verify() uses a set of tmpfs files to create lists
of object positions by type.
The existing code used a bare write() call for every object
position written, incurring a syscall per listed object.
This commit encapsulates the bare file descriptors in FILE *'s
and replaces the bare write with fwrite, buffering the writes so
there's less syscalls.
Cached `journalctl --verify` tests showed a ~8% faster runtime
with this change on a release build, verifying 1.3GiB of
production journals across 16 files.
udev-node: drop redundant trial of devlink creation
Previously, the devlink was created based on the priority saved in udev
database. So, we needed to reevaluate devlinks after database is saved.
But now the priority is stored in the symlink under /run/udev/links, and
the loop of devlink creation is controlled with the timestamp of the
directory. So, the double evaluation is not necessary anymore.
Yu Watanabe [Tue, 31 Aug 2021 17:20:33 +0000 (02:20 +0900)]
udev-node: always atomically create symlink to device node
By the previous commit, it is not necessary to distinguish if the devlink
already exists. Also, I cannot find any significant advantages of the
previous complecated logic, that is, first try to create directly, and then
fallback to atomically creation. Moreover, such logic increases the chance
of conflicts between multiple udev workers.
This makes devlinks always created atomically. Hopefully, this reduces the
conflicts between the workers.
udev-node: assume no new claim to a symlink if /run/udev/links is not updated
During creating a symlink to a device node, if another device node which
requests the same symlink is added/removed, `stat_inode_unmodified()`
should always detects that. We do not need to continue the loop
unconditionally.
Yu Watanabe [Tue, 31 Aug 2021 19:16:21 +0000 (04:16 +0900)]
udev-node: save information about device node and priority in symlink
Previously, we only store device IDs in /run/udev/links, and when
creating/removing device node symlink, we create sd_device object
corresponds to the IDs and read device node and priority from the
object. That requires parsing uevent and udev database files.
This makes link_find_prioritized() get the most prioritzed device node
without parsing the files.
When pulling in the SHA256 implementation from glibc, only some of the
coding style was adjusted to ours, other was not. Let's make things a
bit more consistent.
The GUIDs we usually deal with should be considered constant. Hence make
them so. Unfortunately the prototypes for various functions doesn't mark
them as const (but still decorates them with "IN", clarifying they are
input-only), hence we need to cast things at various places. We already
cast in similar fashion in many other cases, hence unify things here in
one style.
Making the EFI_GUID constant (and in particular so when specified in C99
compound literal style) allows compilers to merge multiple instances of
them.
repart: Support volatile-root for finding the root partition
The automatic logic can't always find the original root partition (ex:
if the rootfs is copied to a ext4 fs backed by zram in the initramfs),
so we want to support "/run/systemd/volatile-root" which is a symlink to
the original root partition.
Bastien Nocera [Mon, 30 Aug 2021 12:08:06 +0000 (14:08 +0200)]
udev: Import hwdb matches for USB devices
Import hwdb matches for USB devices (not interfaces) which don't usually
have a modalias so that it's possible to, for example, make them
available for unprivileged users.