homed: permit inodes owned by UID_MAPPED_ROOT to be created in $HOME
If people use nspawn in their $HOME we should allow this inodes owned by
this special UID to be created temporarily, so that UID mapped nspawn
containers just work.
nspawn: make sure host root can write to the uidmapped mounts we prepare for the container payload
When using user namespaces in conjunction with uidmapped mounts, nspawn
so far set up two uidmappings:
1. One that is used for the uidmapped mount and that maps the UID range
0…65535 on the backing fs to some high UID range X…X+65535 on the
uidmapped fs. (Let's call this mapping the "mount mapping")
2. One that is used for the userns namespace the container payload
processes run in, that maps X…X+65535 back to 0…65535. (Let's call
this one the "process mapping").
These mappings hence are pretty much identical, one just moves things up
and one back down. (Reminder: we do all this so that the processes can
run under high UIDs while running off file systems that require no
recursive chown()ing, i.e. we want processes with high UID range but
files with low UID range.)
This creates one problem, i.e. issue #20989: if nspawn (which runs as
host root, i.e. host UID 0) wants to add inodes to the uidmapped mount
it can't do that, since host UID 0 is not defined in the mount mapping
(only the X…X+65536 range is, after all, and X > 0), and processes whose
UID is not mapped in a uidmapped fs cannot create inodes in it since
those would be owned by an unmapped UID, which then triggers
the famous EOVERFLOW error.
Let's fix this, by explicitly including an entry for the host UID 0 in
the mount mapping. Specifically, we'll extend the mount mapping to map
UID 2147483646 (which is INT32_MAX-1, see code for an explanation why I
picked this one) of the backing fs to UID 0 on the uidmapped fs. This
way nspawn can creates inode on the uidmapped as it likes (which will
then actually be owned by UID 2147483646 on the backing fs), and as it
always did. Note that we do *not* create a similar entry in the process
mapping. Thus any files created by nspawn that way (and not chown()ed to
something better) will appear as unmapped (i.e. as overflowuid/"nobody")
in the container payload. And that's good. Of course, the latter is
mostly theoretic, as nspawn should generally chown() the inodes it
creates to UID ranges that actually make sense for the container (and we
generally already do this correctly), but it#s good to know that we are
safe here, given we might accidentally forget to chown() some inodes we
create.
Net effect: the two mappings will not be identical anymore. The mount
mapping has one entry more, and the only reason it exists is so that
nspawn can access the uidmapped fs reasonably independently from any
process mapping.
Yu Watanabe [Wed, 16 Mar 2022 11:46:49 +0000 (20:46 +0900)]
udev: run the main process, workers, and spawned commands in /udev subcgroup
And enable cgroup delegation for udevd.
Then, processes invoked through ExecReload= are assigned .control
subcgroup, and they are not killed by cg_kill().
varlink_error(...) expects a json object as the third parameter. Passing a string variant causes
parameter sanitization to fail, and it returns -EINVAL. Pass object variant instead.
Grigori Goronzy [Sat, 26 Feb 2022 09:41:16 +0000 (10:41 +0100)]
tpm2: enable parameter encryption
Use a salted, unbound HMAC session with the primary key used as tpmKey,
which mean that the random salt will be encrypted with the primary
key while in transit. Decrypt/encrypt flags are set on the new session
with AES in CFB mode. There is no fallback to XOR mode.
This provides confidentiality and replay protection, both when sealing
and unsealing. There is no protection against man in the middle
attacks since we have no way to authenticate the TPM at the moment.
The exception is unsealing with PIN, as an attacker will be unable
to generate the proper HMAC digest.
Conceptually the feature is great and should exist, but in its current
form should be worked to be generic (i.e. not specific to
Windows/Bitlocker, but appliable to any boot entry), not be global (but
be a per-entry thing), not require a BootXXXX entry to exist, and not
check for the BitLocker signature (as TPMs are not just used for
BitLocker).
Since we want to get 251 released, mark it in the documentation, in NEWS
and in code as experimental and make clear it will be reworked in a
future release. Also, make it opt-in to make it less likely people come
to rely on it without reading up on it, and understanding that it will
likely change sooner or later.
sd-boot: measure kernel cmdline into PCR 12 rather than 8
Apparently Grub is measuring all kinds of garbage into PCR 8. Since people
apparently chainload sd-boot from grub, let's thus stay away from PCR 8,
and use PCR 12 instead for the kernel command line.
boot: drop const from EFI_PHYSICAL_ADDRESS parameter
It's not a pointer after all, but a numeric value. As such the const
applies to the value and not the target, but we genreally don#t do that
for value parameters. Hence drop the const.
cgroup: also indicate cgroup delegation state in user-accessible xattr
So far we set the "trusted.delegate" xattr on cgroups where delegation
is on. This duplicates this behaviour with the "user.delegate" xattr.
This has two benefits:
1. unprivileged clients can *read* the xattr. "systemd-cgls" can thus
show delegated cgroups as such properly, even when invoked without
privs
2. unprivileged systemd instances can set the xattr, i.e. when systemd
--user delegates a cgroup to further payloads.
This weakens security a tiny bit, given that code that got a cgroup
delegated can manipulate the xattr, but I think that's OK, given they
have a higher trust level regarding cgroups anyway, if they got a
subtree delegated, and access controls on the cgroup itself are still
enforced. Moreover PID 1 as the cgroup manager only sets these xattrs,
never reads them — the xattr is primarily a way to tell payloads about
the delegation, and it's strictly this one way.
Grigori Goronzy [Thu, 24 Feb 2022 00:28:29 +0000 (01:28 +0100)]
cryptenroll: add tests for TPM2 unlocking
Add tests for enrolling and unlocking. Various cases are tested:
- Default PCR 7 policy w/o PIN, good and bad cases (wrong PCR)
- PCR 7 + PIN policy, good and bad cases (wrong PCR, wrong PIN)
- Non-default PCR 0+7 policy w/o PIN, good and bad cases (wrong PCR 0)
Grigori Goronzy [Fri, 18 Feb 2022 20:13:41 +0000 (21:13 +0100)]
cryptsetup: add manual TPM2 PIN configuration
Handle the case where TPM2 metadata is not available and explicitly
provided in crypttab. This adds a new "tpm2-pin" option to crypttab
options for this purpose.
Grigori Goronzy [Wed, 16 Feb 2022 21:13:42 +0000 (22:13 +0100)]
tpm2: support policies with PIN
Modify TPM2 authentication policy to optionally include an authValue, i.e.
a password/PIN. We use the "PIN" terminology since it's used by other
systems such as Windows, even though the PIN is not necessarily numeric.
The pin is hashed via SHA256 to allow for arbitrary length PINs.
v2: fix tpm2_seal in sd-repart
v3: applied review feedback
Yu Watanabe [Mon, 14 Mar 2022 13:02:37 +0000 (22:02 +0900)]
test: wait for loopback device being actually created
It seems there exists a short time period that we cannot see the
loopback device after `losetup` is finished:
```
testsuite-58.sh[367]: ++ losetup -b 1024 -P --show -f /tmp/testsuite-58-sector-1024.img
kernel: loop1: detected capacity change from 0 to 204800
testsuite-58.sh[285]: + LOOP=/dev/loop1
testsuite-58.sh[285]: + systemd-repart --pretty=yes --definitions=/tmp/testsuite-58-sector/ --seed=750b6cd5c4ae4012a15e7be3c29e6a47 --empty=require --dry-run=no /dev/loop1
testsuite-58.sh[368]: Device '/dev/loop1' has no dm-crypt/dm-verity device, no need to look for underlying block device.
testsuite-58.sh[368]: Failed to determine canonical path for '/dev/loop1': No such file or directory
testsuite-58.sh[368]: Failed to open file or determine backing device of /dev/loop1: No such file or directory
```
Yu Watanabe [Sun, 13 Mar 2022 12:38:10 +0000 (21:38 +0900)]
test: use /var/tmp for storing disk images
The Ubuntu CI on ppc64el seems to have a issue on tmpfs, and files
may not be fsynced. See c10caebb98803b812ebc4dd6cdeaab2ca17826d7.
For safety, let's use /var/tmp to store disk images.
Vivien Didelot [Mon, 14 Mar 2022 20:34:57 +0000 (16:34 -0400)]
units: fix factory-reset.target description
The current description for the factory reset target does not add any
value and doesn't respect the definition of the related property as
described in systemd.unit(5).
Starting the target currently results in the following log:
[ 11.139174] systemd[1]: Reached target Target that triggers factory reset. Does nothing by default..
[ OK ] Reached target Target that…set. Does nothing by default..
Simply update the target description to "Factory Reset".
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
/dev/urandom is seeded with RDRAND. Calling genuine_random_bytes(...,
..., 0) will use /dev/urandom as a last resort. Hence, we gain nothing
here by having our own RDRAND wrapper, because /dev/urandom already is
based on RDRAND output, even before /dev/urandom has fully initialized.
Furthermore, RDRAND is not actually fast! And on each successive
generation of new x86 CPUs, from both AMD and Intel, it just gets
slower.
This commit simplifies things by just using /dev/urandom in cases where
we before might use RDRAND, since /dev/urandom will always have RDRAND
mixed in as part of it.
And above where I say "/dev/urandom", what I actually mean is
GRND_INSECURE, which is the same thing but won't generate warnings in
dmesg.
macro: handle DECIMAL_STR_MAX() special cases more accurately
So far DECIMAL_STR_MAX() overestimated the types in two ways: it would
also adds space for a "-" for unsigned types.
And it would always return the same size for 64bit values regardless of
signedness, even though the longest maximum numbers for signed and
unsigned differ in length by one digit. i.e. 2^64-1 (i.e. UINT64_MAX) is
one decimal digit longer than -2^63 (INT64_MIN) - for the other integer
widths the number of digits in the "longest" decimal value is always the
same, regardless of signedness. by example: strlen("65535") ==
strlen("32768") (i.e. the relevant 16 bit limits) holds — and similar
for 8bit and 32bit integer width limits — but
strlen("18446744073709551615") > strlen("9223372036854775808") (i.e. the
relevant 64 bit limits).
Franck Bui [Mon, 14 Mar 2022 17:03:02 +0000 (18:03 +0100)]
journal: preserve acls when rotating user journals with NOCOW attribute set
When restoring the COW flag for journals on BTRFS, the full journal contents
are copied into new files. But during these operations, the acls of the
previous files were lost and users were not able to access to their old
journal contents anymore.
Frantisek Sumsal [Sun, 13 Mar 2022 13:45:03 +0000 (14:45 +0100)]
macro: account for negative values in DECIMAL_STR_WIDTH()
With negative numbers we wouldn't account for the minus sign, thus
returning a string with one character too short, triggering buffer
overflows in certain situations.