homework: add new helper call that can shift home dir UID/GID ranges
This new helper is not used yet, but it's useful for apply UID/GID
shifts so that the underlying home dir can use an arbitrary UID (for
example "nobody") and we'll still make it appear as owned by the target
UID.
This operates roughly like this:
1. The relevant underlying UID is mapped to the target UID
2. Everything in the homed UID range except for the target UID is left
unmapped (and thus will appear as "nobody")
3. Everything in the 16bit UID range outside of the homed UID
range/target UID/nobody user is mapped to itself
4. Everything else is left unmapped (in particular everything outside of
the 16 bit range).
Why do it like this?
The 2nd rule done to ensure that any files from homed's managed UID
range that do not match the user's own UID will be shown as "unmapped"
basically. Of course, IRL this should never happen, except if people
managed to manipulate the underlying fs directly.
The 3rd rule is to allow that if devs untar an OS image it more or
less just works as before: 16bit UIDs outside of the homed range will
be mapped onto themselves: you can untar things and tar it back up and
things will just work.
homework: rework directory backend to set up mounts in /run/systemd/user-home-mount before moving them to /home
This does what we already do for the LUKS backend: instead of mounting
the source directory directly to the final home dir, we instead bind
mount it to /run/systemd/user-home-mount (where /run/ is unshared and
specific to our own mount namespace), then adjust its mount flags and
then bind mount it in a single atomic operation into the final
destination, fully set up.
This doesn't improve much on its own, but it makes things a tiny bit
more correct: this way MS_NODEV/MS_NOEXEC/MS_NOSUID will already be
applied when the bind mount appears in the host mount namespace, instead
of being adjusted after the fact.
Doing things this way also makes things work more like the LUKS backend,
reducing surprises. Most importantly it's preparation for doing
uidmapping for directory homes, added in a later commit.
homework: when activating a directory, include info about it in resulting record
For the other backends we synthesize a "binding" section in the json
record of the user that stores meta info how a user record is "bound" to
the local host. It declares storage info and such. Let's do the same for
the directory/subvolume backends.
bootctl: refuse parsing unknown special '@' entry ids
Let's make sure '@' is never written as entry ID into any EFI variable,
as we want the ability to add new ids like this later on, with them
resulting in a clear error on older implementations.
Daan De Meyer [Wed, 27 Oct 2021 09:54:53 +0000 (10:54 +0100)]
docs: Remove mkosi symlink instruction from HACKING
mkosi automatically builds for the host distro which seems a much
better default to encourage since dnf won't be installed on any host
system that's not Fedora anyway.
With Oracle's physical servers, both `/sys/class/dmi/id/sys_vendor` and
`/sys/class/dmi/id/board_vendor` contain `Oracle Corporation`, so those
servers are detected as `oracle` (VirtualBox).
VirtualBox has the following values in the latest versions:
Presumably the existing check for `innotek GmbH` is meant to detect
older versions of VirtualBox, while changing the second checked value
from `Oracle Corporation` to `VirtualBox` will reliably detect later and future
versions.
json: do something remotely reasonable when we see NaN/infinity
JSON doesn't have NaN/infinity/-infinity concepts in the spec.
Implementations vary what they do with it. JSON5 + Python simply
generate special words "NAN" and "Inifinity" from it. Others generate
"null" for it.
At this point we never actually want to output this, so let's be
conservative and generate RFC compliant JSON, i.e. convert to null.
One day should JSON5 actually become a thing we can revisit this, but in
that case we should implement things via a flag, and only optinally
process nan/infinity/-infinity.
This patch is extremely simple: whenever accepting a
nan/infinity/-infinity from outside it converts it to NULL. I.e. we
convert on input, not output.
The setting is completely meaningless, as WithoutRA= and UseDelegatedPrefix=
in [DHCPv6] section, and DHCPv6Client= in [IPv6AcceptRA] section control
the behavior.
Yu Watanabe [Wed, 13 Oct 2021 07:26:09 +0000 (16:26 +0900)]
network: dhcp6: introduce UseDelegatedPrefix= setting and enable by default
Previously, the prefix delegation is enabled when at least one
downstream interfaces request it. But, when the DHCPv6 client on the
upstream interface is configured, some downstream interfaces may not
exist yet, nor have .network file assigned.
Also, if a system has thousands of interfaces, then the previous logic
introduce O(n^2) search.
This makes the prefix delegation is always enabled, except when it is
explicitly disabled. Hopefully, that should not break anything, as the
DHCPv6 server should ignore the prefix delegation request if the server
do not have any prefix to delegate.
Bastien Nocera [Tue, 26 Oct 2021 09:57:30 +0000 (11:57 +0200)]
hwdb: Allow console users access to media* nodes
Newer webcams and video devices have controls only available through
/dev/media* nodes. Make sure they're accessible in the same way as
/dev/video* nodes.
Yu Watanabe [Mon, 25 Oct 2021 17:29:09 +0000 (02:29 +0900)]
network: delay dropping addresses or so on reloading .network files
When a .network file is updated but its change is not so big, it is not
necessary to first drop all configs and then reassign later again.
This slightly optimize such situation. First foreignize all configs, and
then drop later when it is not requested by the updated .network file.
Apparently memory sanitizer doesn't grok getdents64() properly. Let's
address that by explicitly marken memory initialized by getdents64() as
unpoisoned.
stat-util: optimize dir_is_empty_at() a bit, by using getdents64()
That way we have a single syscall only for it, instead of the multiple
readdir() and friends do. And we can operate entirely on the stack, no
malloc() implicit.
Frantisek Sumsal [Mon, 25 Oct 2021 09:02:22 +0000 (11:02 +0200)]
test: tweak TriggerLimitIntervalSec= when built with coverage
Collecting coverage causes a significant slowdown in general, but since
this test requires certain timing, we need to tweak the defaults to make
it reliably pass.
varlink: don't try to talk to oomd from unit tests
Talking to external daemons we ourselves maintain is a job for the
integration tests, not the unit tests. This communication is likely to
fail hence don#t even bother.
mount-util: move opening of /proc/self/mountinfo into bind_remount_one_with_mountinfo()
Let's move things around a bit, and open /proc/self/mountinfo if needed
inside of bind_remount_one_with_mountinfo(). That way bind_remount_one()
can become a superthin inline wrapper around
bind_remount_one_with_mountinfo(). Main benefit is that we don't even
have to open /p/s/mi in case mount_setattr() actually worked for us.
We should drop caches if we are configured to do so in all cases where
we are done with home dir operations: except if that operation is
activation, because in that case we are not destroying anything, but
leaving it on.
Hence, turn off the flag that reminds us that we should drop caches
before exiting, once activation completed fully,
nspawn: bump RLIMIT_NOFILE for nspawn payload similar to how host PID 1 does it for its payload
We try to pass containers roughly the same rlimits as the host gets from
the kernel. However, this means we'd set the RLIMIT_NOFILE to 4K. Which
is quite limiting though, and is something we actually departed from in
PID1: since 52d620757817bc0fa7de3ddbe43024544ced7ea0 we raise the limit
substantially for all userspace.
Given that nspawn is quite often invoked without proper PID1, let's raise the
limits for container payloads the same way as we do from the real PID1
to its service payloads.
Jan Janssen [Wed, 20 Oct 2021 10:15:03 +0000 (12:15 +0200)]
sd-boot: Add keys to reboot into firmware interface
This is useful if the auto-firmware setting has been disabled. The
keys used here are based on what the majority of firmware employ in
the wild.
This also ensures there's a chance for the user to discover this in
case they were too slow during POST or simply used the wrong ones.