shared: split out ESP/XBOOTLDR search stuff from bootspec.c
The code is quite different from the rest of bootspec.c, with different
deps and stuff. There's even a /***/ line to separate the two parts.
Given how large the file already is, let#s just split it into two.
shared/install: do not print aliases longer than UNIT_NAME_MAX
065364920281e1cf59cab989e17aff21790505c4 did the conversion to install_path_printf().
But IIUC, here we are just looking at a unit file name, not the full
path.
shared/install: consistently use 'lp' as the name for the LookupPaths instance
Most of the codebase does this. Here we were using 'p' or 'paths'
instead. Those names are very generic and not good for a "global-like"
object like the LookupPaths instance. And we also have 'path' variable,
and it's confusing to have 'path' and 'paths' in the same function that
are unrelated.
Also pass down LookupPaths* lower in the call stack, in preparation for
future changes.
homed: permit inodes owned by UID_MAPPED_ROOT to be created in $HOME
If people use nspawn in their $HOME we should allow this inodes owned by
this special UID to be created temporarily, so that UID mapped nspawn
containers just work.
nspawn: make sure host root can write to the uidmapped mounts we prepare for the container payload
When using user namespaces in conjunction with uidmapped mounts, nspawn
so far set up two uidmappings:
1. One that is used for the uidmapped mount and that maps the UID range
0…65535 on the backing fs to some high UID range X…X+65535 on the
uidmapped fs. (Let's call this mapping the "mount mapping")
2. One that is used for the userns namespace the container payload
processes run in, that maps X…X+65535 back to 0…65535. (Let's call
this one the "process mapping").
These mappings hence are pretty much identical, one just moves things up
and one back down. (Reminder: we do all this so that the processes can
run under high UIDs while running off file systems that require no
recursive chown()ing, i.e. we want processes with high UID range but
files with low UID range.)
This creates one problem, i.e. issue #20989: if nspawn (which runs as
host root, i.e. host UID 0) wants to add inodes to the uidmapped mount
it can't do that, since host UID 0 is not defined in the mount mapping
(only the X…X+65536 range is, after all, and X > 0), and processes whose
UID is not mapped in a uidmapped fs cannot create inodes in it since
those would be owned by an unmapped UID, which then triggers
the famous EOVERFLOW error.
Let's fix this, by explicitly including an entry for the host UID 0 in
the mount mapping. Specifically, we'll extend the mount mapping to map
UID 2147483646 (which is INT32_MAX-1, see code for an explanation why I
picked this one) of the backing fs to UID 0 on the uidmapped fs. This
way nspawn can creates inode on the uidmapped as it likes (which will
then actually be owned by UID 2147483646 on the backing fs), and as it
always did. Note that we do *not* create a similar entry in the process
mapping. Thus any files created by nspawn that way (and not chown()ed to
something better) will appear as unmapped (i.e. as overflowuid/"nobody")
in the container payload. And that's good. Of course, the latter is
mostly theoretic, as nspawn should generally chown() the inodes it
creates to UID ranges that actually make sense for the container (and we
generally already do this correctly), but it#s good to know that we are
safe here, given we might accidentally forget to chown() some inodes we
create.
Net effect: the two mappings will not be identical anymore. The mount
mapping has one entry more, and the only reason it exists is so that
nspawn can access the uidmapped fs reasonably independently from any
process mapping.
Yu Watanabe [Wed, 16 Mar 2022 11:46:49 +0000 (20:46 +0900)]
udev: run the main process, workers, and spawned commands in /udev subcgroup
And enable cgroup delegation for udevd.
Then, processes invoked through ExecReload= are assigned .control
subcgroup, and they are not killed by cg_kill().
varlink_error(...) expects a json object as the third parameter. Passing a string variant causes
parameter sanitization to fail, and it returns -EINVAL. Pass object variant instead.
Grigori Goronzy [Sat, 26 Feb 2022 09:41:16 +0000 (10:41 +0100)]
tpm2: enable parameter encryption
Use a salted, unbound HMAC session with the primary key used as tpmKey,
which mean that the random salt will be encrypted with the primary
key while in transit. Decrypt/encrypt flags are set on the new session
with AES in CFB mode. There is no fallback to XOR mode.
This provides confidentiality and replay protection, both when sealing
and unsealing. There is no protection against man in the middle
attacks since we have no way to authenticate the TPM at the moment.
The exception is unsealing with PIN, as an attacker will be unable
to generate the proper HMAC digest.
Conceptually the feature is great and should exist, but in its current
form should be worked to be generic (i.e. not specific to
Windows/Bitlocker, but appliable to any boot entry), not be global (but
be a per-entry thing), not require a BootXXXX entry to exist, and not
check for the BitLocker signature (as TPMs are not just used for
BitLocker).
Since we want to get 251 released, mark it in the documentation, in NEWS
and in code as experimental and make clear it will be reworked in a
future release. Also, make it opt-in to make it less likely people come
to rely on it without reading up on it, and understanding that it will
likely change sooner or later.
sd-boot: measure kernel cmdline into PCR 12 rather than 8
Apparently Grub is measuring all kinds of garbage into PCR 8. Since people
apparently chainload sd-boot from grub, let's thus stay away from PCR 8,
and use PCR 12 instead for the kernel command line.
boot: drop const from EFI_PHYSICAL_ADDRESS parameter
It's not a pointer after all, but a numeric value. As such the const
applies to the value and not the target, but we genreally don#t do that
for value parameters. Hence drop the const.
cgroup: also indicate cgroup delegation state in user-accessible xattr
So far we set the "trusted.delegate" xattr on cgroups where delegation
is on. This duplicates this behaviour with the "user.delegate" xattr.
This has two benefits:
1. unprivileged clients can *read* the xattr. "systemd-cgls" can thus
show delegated cgroups as such properly, even when invoked without
privs
2. unprivileged systemd instances can set the xattr, i.e. when systemd
--user delegates a cgroup to further payloads.
This weakens security a tiny bit, given that code that got a cgroup
delegated can manipulate the xattr, but I think that's OK, given they
have a higher trust level regarding cgroups anyway, if they got a
subtree delegated, and access controls on the cgroup itself are still
enforced. Moreover PID 1 as the cgroup manager only sets these xattrs,
never reads them — the xattr is primarily a way to tell payloads about
the delegation, and it's strictly this one way.
Grigori Goronzy [Thu, 24 Feb 2022 00:28:29 +0000 (01:28 +0100)]
cryptenroll: add tests for TPM2 unlocking
Add tests for enrolling and unlocking. Various cases are tested:
- Default PCR 7 policy w/o PIN, good and bad cases (wrong PCR)
- PCR 7 + PIN policy, good and bad cases (wrong PCR, wrong PIN)
- Non-default PCR 0+7 policy w/o PIN, good and bad cases (wrong PCR 0)
Grigori Goronzy [Fri, 18 Feb 2022 20:13:41 +0000 (21:13 +0100)]
cryptsetup: add manual TPM2 PIN configuration
Handle the case where TPM2 metadata is not available and explicitly
provided in crypttab. This adds a new "tpm2-pin" option to crypttab
options for this purpose.
Grigori Goronzy [Wed, 16 Feb 2022 21:13:42 +0000 (22:13 +0100)]
tpm2: support policies with PIN
Modify TPM2 authentication policy to optionally include an authValue, i.e.
a password/PIN. We use the "PIN" terminology since it's used by other
systems such as Windows, even though the PIN is not necessarily numeric.
The pin is hashed via SHA256 to allow for arbitrary length PINs.
v2: fix tpm2_seal in sd-repart
v3: applied review feedback
Yu Watanabe [Mon, 14 Mar 2022 13:02:37 +0000 (22:02 +0900)]
test: wait for loopback device being actually created
It seems there exists a short time period that we cannot see the
loopback device after `losetup` is finished:
```
testsuite-58.sh[367]: ++ losetup -b 1024 -P --show -f /tmp/testsuite-58-sector-1024.img
kernel: loop1: detected capacity change from 0 to 204800
testsuite-58.sh[285]: + LOOP=/dev/loop1
testsuite-58.sh[285]: + systemd-repart --pretty=yes --definitions=/tmp/testsuite-58-sector/ --seed=750b6cd5c4ae4012a15e7be3c29e6a47 --empty=require --dry-run=no /dev/loop1
testsuite-58.sh[368]: Device '/dev/loop1' has no dm-crypt/dm-verity device, no need to look for underlying block device.
testsuite-58.sh[368]: Failed to determine canonical path for '/dev/loop1': No such file or directory
testsuite-58.sh[368]: Failed to open file or determine backing device of /dev/loop1: No such file or directory
```
Yu Watanabe [Sun, 13 Mar 2022 12:38:10 +0000 (21:38 +0900)]
test: use /var/tmp for storing disk images
The Ubuntu CI on ppc64el seems to have a issue on tmpfs, and files
may not be fsynced. See c10caebb98803b812ebc4dd6cdeaab2ca17826d7.
For safety, let's use /var/tmp to store disk images.
Vivien Didelot [Mon, 14 Mar 2022 20:34:57 +0000 (16:34 -0400)]
units: fix factory-reset.target description
The current description for the factory reset target does not add any
value and doesn't respect the definition of the related property as
described in systemd.unit(5).
Starting the target currently results in the following log:
[ 11.139174] systemd[1]: Reached target Target that triggers factory reset. Does nothing by default..
[ OK ] Reached target Target that…set. Does nothing by default..
Simply update the target description to "Factory Reset".
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
/dev/urandom is seeded with RDRAND. Calling genuine_random_bytes(...,
..., 0) will use /dev/urandom as a last resort. Hence, we gain nothing
here by having our own RDRAND wrapper, because /dev/urandom already is
based on RDRAND output, even before /dev/urandom has fully initialized.
Furthermore, RDRAND is not actually fast! And on each successive
generation of new x86 CPUs, from both AMD and Intel, it just gets
slower.
This commit simplifies things by just using /dev/urandom in cases where
we before might use RDRAND, since /dev/urandom will always have RDRAND
mixed in as part of it.
And above where I say "/dev/urandom", what I actually mean is
GRND_INSECURE, which is the same thing but won't generate warnings in
dmesg.