Daan De Meyer [Thu, 17 Nov 2022 13:12:48 +0000 (14:12 +0100)]
tmpfile-util: Introduce fopen_temporary_child()
Instead of having fopen_temporary() create the file either next
to an existing file or in tmp/, let's split this up clearly into
two different functions, one for creating temporary files next to
existing files, and one for creating a temporary file in a directory.
Marcus Schäfer [Wed, 16 Nov 2022 15:25:08 +0000 (16:25 +0100)]
Handle MACHINE_ID=uninitialized
systemd supports /etc/machine-id to be set to: uninitialized
In this case the expectation is that systemd creates a new
machine ID and replaces the value 'uninitialized' with the
effective machine id. In the scope of kernel-install we
should also enforce the creation of a new machine id in this
condition
ERROR:esys:src/tss2-esys/esys_iutil.c:394:iesys_handle_to_tpm_handle() Error: Esys invalid ESAPI handle (40000001).
WARNING:esys:src/tss2-esys/esys_iutil.c:415:iesys_is_platform_handle() Convert handle from TPM2_RH to ESYS_TR, got: 0x40000001
ERROR:esys:src/tss2-esys/esys_iutil.c:394:iesys_handle_to_tpm_handle() Error: Esys invalid ESAPI handle (40000001).
WARNING:esys:src/tss2-esys/esys_iutil.c:415:iesys_is_platform_handle() Convert handle from TPM2_RH to ESYS_TR, got: 0x4000000
New TPM2 token enrolled as key slot 1.
The problem seems to be that Esys_LoadExternal() function from tpm2-tss
expects a 'ESYS_TR_RH*' constant specifying the requested hierarchy and not
a 'TPM2_RH_*' one (see Esys_LoadExternal() -> Esys_LoadExternal_Async() ->
iesys_handle_to_tpm_handle() call chain).
It all works because Esys_LoadExternal_Async() falls back to using the
supplied values when iesys_handle_to_tpm_handle() fails:
r = iesys_handle_to_tpm_handle(hierarchy, &tpm_hierarchy);
if (r != TSS2_RC_SUCCESS) {
...
tpm_hierarchy = hierarchy;
}
Note, TPM2_RH_OWNER was used on purpose to support older tpm2-tss versions
(pre https://github.com/tpm2-software/tpm2-tss/pull/1531), use meson magic
to preserve compatibility.
Daan De Meyer [Sun, 5 Jun 2022 12:25:22 +0000 (14:25 +0200)]
crash-handler: Make sure we propagate the original siginfo
If we call raise(), we lose the information from the original signal.
If we use rt_sigqueueinfo(), the original siginfo gets reused which
is helpful when debugging crashes.
Daan De Meyer [Wed, 16 Nov 2022 10:17:52 +0000 (11:17 +0100)]
mkfs-util: Add support for rootless xfs population
We use mkfs.xfs's protofile (-p) support to achieve this. The
protofile is a description of the files that should be copied into
the filesystem. The format is described in the manpage of mkfs.xfs.
systemd-boot expects being loaded from ESP and is quite unhappy in case
the loaded image device path is something else. When running on qemu
this can easily happen though. Case one is direct kernel boot, i.e.
loading via 'qemu -kernel systemd-bootx64.efi'. Case two is sd-boot
being added to the ovmf firmware image and being loaded from there.
This patch detects both cases and goes inspect all file systems known to
the firmware, trying to find the ESP. When present the
VMMBootOrderNNNN variables are used to inspect the file systems in the
given order.
Marcus Schäfer [Tue, 15 Nov 2022 23:17:19 +0000 (00:17 +0100)]
Fix reading /etc/machine-id in kernel-install (#25388)
* Fix reading /etc/machine-id in kernel-install
The kernel-install script has code to read the contents of
/etc/machine-id into the MACHINE_ID variable. Depending
on the variable content kernel-install either logs the
value or creates a new machine id via 'systemd-id128 new'.
In that logic there is one issue. If the file /etc/machine-id
exists but is empty, the script tries to call read on an
empty file which return with an exit code != 0. As the
script code also uses 'set -e', kernel-install will exit at
this point which is unexpected.
The condition of an empty /etc/machine-id file exists for
example when building OS images, which should initialize the
system id on first boot but not staticly inside of the image.
afaik an empty /etc/machine-id is also a common approach
to make systemd indicate that it should create a new system
id. Because of this, the commit makes sure the reading of
/etc/machine-id does not fail in any case such that the
handling of the MACHINE_ID variable takes place.
tpm2: add some extra validation of device string before using it
Let's add some extra validation before constructing and using the .so
name to load. This isn't really security sensitive, given that we
used secure_getenv() to get the device string (and it thus should have
been come from a trusted source) but let's better be safe than sorry.
Daan De Meyer [Thu, 10 Nov 2022 14:40:00 +0000 (15:40 +0100)]
repart: Run most repart integration tests without root privileges
To make sure rootless mode keeps working, let's run all repart
integration tests that we can without root privileges. The only ones
we need to keep running with root privileges are the tests that operate
on a block/loop device and those that use --image=.
Daan De Meyer [Sun, 9 Oct 2022 22:14:17 +0000 (00:14 +0200)]
repart: Don't use loop devices when we're not operating on a block device
When repart is not operating on a block device, if we avoid using
any loop devices at all, it becomes possible to run repart without
needing root privileges.
Note that this also depends on the filesystems in use to support
population without needing root privileges (specifically, squashfs,
ext4 or btrfs).
Daan De Meyer [Mon, 10 Oct 2022 21:34:04 +0000 (23:34 +0200)]
repart: Ensure files end up owned by root in generated filesystems
By forking off a user namespace before running mkfs and ID mapping
the user running repart to root in the user namespace, we can make
sure that files in the generated filesystems are all owned by root
instead of the user running repart.
To make this work we have to make sure that all the files in the
root directory that's passed to the mkfs binary are owned by the
user running repart, so we have to drop the shortcut for only a
single root directory in partition_populate_directory().
Daan De Meyer [Sun, 9 Oct 2022 18:46:59 +0000 (20:46 +0200)]
repart: Do offline encryption instead of online
Offline encryption can be done without mounting the luks device. For
now we still use loop devices to split out the partition we want to
write to but in a later commit we'll replace this with a regular file.
For offline encryption, we need to keep 2x the luks header size space
free at the end of the partition, so this means our encrypted partitions
will be 16M larger than before.
Daan De Meyer [Tue, 11 Oct 2022 08:50:58 +0000 (10:50 +0200)]
mkfs-util: Add support to populate vfat without mounting using mcopy
mkfs.vfat doesn't support specifying a root directory to bootstrap
the filesystem from (see https://github.com/dosfstools/dosfstools/issues/183).
Instead, we can use the mcopy tool from the mtools package to copy
files into the vfat filesystem after creating it without needing to
mount the vfat filesystem.
Daan De Meyer [Fri, 14 Oct 2022 10:06:55 +0000 (12:06 +0200)]
repart: Add --include/--exclude-partitions
Let's allow filtering the partitions to operate on by partition
type UUID. This is necessary when building bootable images with a
verity protected root/usr partition as we can only build the UKI
image when we have the verity roothash which means we cannot populate
the EFI partition yet when we run repart initially to determine the
verity roothash.
Daan De Meyer [Fri, 14 Oct 2022 10:40:28 +0000 (12:40 +0200)]
repart: Use first unused partition number for new partitions
If we skip some partition types in a first run of systemd-repart,
we don't want their partition numbers to be different than usual,
so let's change the allocation of partition numbers to account for
that.
Daan De Meyer [Thu, 13 Oct 2022 19:26:16 +0000 (21:26 +0200)]
gpt: Expose GptPartitionType and get rid of SECONDARY/OTHER
Instead of exposing just the partition type UUID, let's expose the
GptPartitionType struct, which has a lot more information available
in a much more accessible way.
Also, let's get rid of SECONDARY/OTHER in PartitionDesignator. These
were only there to support preferred architectures in dissect-image.c,
but we can easily handle that by comparing architectures when we decide
whether to override a partition. This is done in a new function
compare_arch().
Jeremy Linton [Tue, 8 Nov 2022 05:31:30 +0000 (23:31 -0600)]
acpi-fpdt: Use kernel fpdt parsing
On some kernels/distros (RHEL/aarch64) /dev/mem is
turned off. This means that the ACPI FPDT data is
missing from systemd-analyze output when /dev/mem
fails to provide the boot times.
Instead recent kernels can export that data from
/sys/firmware/acpi/fpdt/boot/ entries. Use that
information if available first.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
random-seed: refresh EFI boot seed when writing a new seed
Since this runs at shutdown to write a new seed, we should also keep the
bootloader's seed maximally fresh by doing the same. So we follow the
same pattern - hash some new random bytes with the old seed to make a
new seed. We let this fail without warning, because it's just an
opportunistic thing. If the user happens to have set up the random seed
with bootctl, and the RNG is initialized, then things should be fine. If
not, we create a new seed if systemd-boot is in use. And if not, then we
just don't do anything.
boot: implement kernel EFI RNG seed protocol with proper hashing
Rather than passing seeds up to userspace via EFI variables, pass seeds
directly to the kernel's EFI stub loader, via LINUX_EFI_RANDOM_SEED_TABLE_GUID.
EFI variables can potentially leak and suffer from forward secrecy
issues, and processing these with userspace means that they are
initialized much too late in boot to be useful. In contrast,
LINUX_EFI_RANDOM_SEED_TABLE_GUID uses EFI configuration tables, and so
is hidden from userspace entirely, and is parsed extremely early on by
the kernel, so that every single call to get_random_bytes() by the
kernel is seeded.
In order to do this properly, we use a bit more robust hashing scheme,
and make sure that each input is properly memzeroed out after use. The
scheme is:
The various inputs are:
- LINUX_EFI_RANDOM_SEED_TABLE_GUID from prior bootloaders
- 256 bits of seed from EFI's RNG
- The (immutable) system token, from its EFI variable
- The prior on-disk seed
- The UEFI monotonic counter
- A timestamp
This also adjusts the secure boot semantics, so that the operation is
only aborted if it's not possible to get random bytes from EFI's RNG or
a prior boot stage. With the proper hashing scheme, this should make
boot seeds safe even on secure boot.
There is currently a bug in Linux's EFI stub in which if the EFI stub
manages to generate random bytes on its own using EFI's RNG, it will
ignore what the bootloader passes. That's annoying, but it means that
either way, via systemd-boot or via EFI stub's mechanism, the RNG *does*
get initialized in a good safe way. And this bug is now fixed in the
efi.git tree, and will hopefully be backported to older kernels.
As the kernel recommends, the resultant seeds are 256 bits and are
allocated using pool memory of type EfiACPIReclaimMemory, so that it
gets freed at the right moment in boot.
Vitaly Kuznetsov [Fri, 11 Nov 2022 16:15:55 +0000 (17:15 +0100)]
measure: fix section names in 'objcopy' example in systemd-measure man
A copy paste error has crippled in the objcopy example in 'systemd-measure'
manual, "--change-section-vma" should reference the section being added,
not ".splash". When used as-is, the resulting UKI is unbootable.
These are allowed to fail, for example on a read-only filesystem. But they still
log at error level, which is annoying and gets flagged. Tune those specific errors
down to info.
There are likely more that could be tuned down, but the important thing is to cover
the tmpfiles.d that we ship right now.
Before:
$ echo -e "d- /root :0700 root :root - \nd- /root/.ssh :0700 root :root -" | SYSTEMD_LOG_LEVEL=err build/systemd-tmpfiles --root=/tmp/img --create -
Failed to create directory or subvolume "/tmp/img/root": Read-only file system
Failed to open path '/tmp/img/root': No such file or directory
$
Mike Yuan [Fri, 11 Nov 2022 18:52:38 +0000 (02:52 +0800)]
module-util: use the blacklist from module_blacklist= in cmdline
When a module is blacklisted using module_blacklist=
we shouldn't fail with 'Operation not permitted'.
Instead we check for it and skip it if this is the case.
bootctl,bootspec: make use of CHASE_PROHIBIT_SYMLINKS whenever we access the ESP/XBOOTLDR
Let's make use of the new flag whenever we access the ESP or XBOOTLDR.
The resources we make use of in these partitions can't possibly use
symlinks (because UEFI knows no symlink concept), and they are untrusted
territory, hence under no circumstances we should be tricked into
following symlinks that shouldn't be there in the first place.
Of course, you might argue thta ESP/XBOOTLDR are VFAT and thus don#t
know symlinks. But the thing is, they don#t have to be. Firmware can
support other file systems too, and people can use efifs to gain access
to arbitrary Linux file systems from EFI. Hence, let's better be safe
than sorry.
chase-symlinks: add new flag for prohibiting any following of symlinks
This is useful when operating in the ESP, which is untrusted territory,
and where under no circumstances we should be tricked by symlinks into
doing anything we don't want to.
nulstr-util: fix corner cases of strv_make_nulstr()
Let's change the return semantics of strv_make_nulstr() so that we can
properly distuingish the case where we have a no entries in the nulstr
from the case where we have a single empty string in a nulstr.
Previously we couldn't distuingish those, we'd in both cases return a
size of zero, and a buffer with two NUL bytes.
With this change, we'll still return a buffer with two NULL bytes, but
for the case where no entries are defined we'll return a size of zero,
and where we have two a size of one.
This is a good idea, as it makes sure we can properly handle all corner
cases.
Nowadays the function is used by one place only: ask-password-api.c. The
corner case never mattered there, since it was used to serialize
passwords, and it was known that there was exactly one password, not
less. But let's clean this up. This means the subtraction of the final
NUL byte now happens in ask-password-api.c instead.
nulstr-util: use memdup_suffix0() where appropriate
if the nulstr is not nul-terminated, we shouldn't use strndup() but
memdup_suffix0(), to not trip up static analyzers which imply we are
duping a string here.
This rework the logic for handling the "header" cells a bit. Instead of
special casing the first row in regards to uppercasing/coloring let's
just intrduce a proper cell type TABLE_HEADER which is in most ways
identical to TABLE_STRING except that it defaults to uppercase output
and underlined coloring.
This is mostly refactoring, but I think it makes a ton of sense as it
makes the first row less special and you could in fact insert
TABLE_HEADER (and in fact TABLE_FIELD) cells wherever you like and
something sensible would happen (i.e. a string cell is displayed with
a specific formatting).
Yu Watanabe [Fri, 11 Nov 2022 04:54:03 +0000 (13:54 +0900)]
ac-power: check battery existence and status
If a battery is not present or its status is not discharging, then
the battery should not be used as a power source.
Let's count batteries currently discharging.