Topi Miettinen [Sat, 30 Oct 2021 16:58:41 +0000 (19:58 +0300)]
execute: respect selinux_context_ignore
When `SELinuxContext=` parameter is prefixed with `-`, the documentation states
that any errors determining or changing context should be ignored, but this
doesn't actually happen and the service may fail with `229/SELINUX_CONTEXT`.
Fix by adding checks to `context->selinux_context_ignore`.
Daan De Meyer [Sat, 30 Oct 2021 10:15:22 +0000 (11:15 +0100)]
mkosi: Install less in the mkosi Fedora image
We're actually falling back to `more` in the mkosi image which doesn't
behave quite the same as less which is somewhat annoying. Let's make
sure `less` is installed so systemd can use it as the pager.
The checks for finding a new available address in the pool were broken in two
ways: not using UINT32_TO_PTR() on hashmap lookups resulted in false negatives,
and the check wasn't skipping the server address if that was part of the pool.
Move the check for available addresses to a small helper function and fix both
issues, and also add a check to the REQUEST code for the server address.
sd-dhcp-server: clear out expired leases when processing requests
The DHCP server configuration supports setting a maximum lease time, but old
leases are never actually cleared out if the client doesn't send a RELEASE.
This causes the pool to run out of addresses on networks where clients just
disappear, which is a fairly common occurrence on wireless networks.
Fix this by cleaning up expired leases before processing client requests, so
addresses can be reused for new clients.
homework,repart: turn on cryptsetup logging before we have a context
Otherwise we'll miss the log message from allocation of the context. We
already made this change in most of our tools that interface with
libcryptsetup, but we forgot two.
nspawn: make sure to chown() implicit source dirs for --bind= to container root UID
This makes sure that a switch like --bind=:/foo does the right thing if
user namespacing is one: the backing dir should be owned by the
container's root UID not the host's. Thus, whenever the source path is
left empty and we automatically generate a source dir as temporary
directory, ensure it's owned by the right UID.
systemctl: make sure "systemctl -M status" shows cgroup tree of container not host
This shows the cgroup tree of the root slice of the container now, by
querying the cgroup pid tree via the bus instead of going directly to
the cgroupfs.
A fallback is kept for really old systemd versions where querying the
PID tree was not available.
systemctl: only fall back to local cgroup display if we talk to local systemd
Otherwise we likely show rubbish because even in local containers we
nowadays have cgroup namespacing, hence we likely can't access the
cgroup tree from the host at the same place as inside the container.
Tony Asleson [Wed, 27 Oct 2021 17:00:59 +0000 (12:00 -0500)]
integritysetup: Check args to prevent assert
The utility function parse_integrity_options is used to both validate
integritytab options or validate and return values. In the case where
we are validating only and we have specific value options we will
assert.
core: Try to prevent infinite recursive template instantiation
To prevent situations like in #17602 from happening, let's drop
direct recursive template dependencies. These will almost certainly
lead to infinite recursion so let's drop them immediately to avoid
instantiating potentially thousands of irrelevant units.
Example of a template that would lead to infinite recursion which
is caught by this check:
core: add [State|Runtime|Cache|Logs]Directory symlink as second parameter
When combined with a tmpfs on /run or /var/lib, allows to create
arbitrary and ephemeral symlinks for StateDirectory or RuntimeDirectory.
This is especially useful when sharing these directories between
different services, to make the same state/runtime directory 'backend'
appear as different names to each service, so that they can be added/removed
to a sharing agreement transparently, without code changes.
foo and bar use respectively /var/lib/foo and /var/lib/bar. Then
the orchestration layer decides to stop this sharing, the drop-in
can be removed. The services won't need any update and will keep
working and being able to store state, transparently.
To keep backward compatibility, new DBUS messages are added.
This teachs the LUKS backend UID mapping, similar to the existing
logic for the "directory", "subvolume" and "fscrypt" backends: the files
will be owned by "nobody" on the fs itself, but will be mapped to
logging in user via uidmapped mounts.
This way LUKS home dirs become truly portable: no local UID info will
leak onto the images anymore, and the need to recursively chown them on
activation goes away. This means activation is always as performant as
it should be.
homework: get rid of manual error path in home_create_luks()
Now that all objects we need to destroy are managed by the HomeSetup
object we can drop our manual destruction path and just use the normal
clean-up logic implemented for HomeSetup anyway. More unification, yay!
homework: move destruction of temporary image file into HomeSetup
Let's simplify things further a bit and move the destruction of the
temporary image file we operate on when creating a LUKS home into
HomeSetup, like all our other resources.
homework: get rid of manual clean up path in home_setup_luks()
Now that we stored all our different objects inside the HomeSetup
structure, we can get rid of our manual clean-up path, since
home_setup_done() will clean up everything stored therein anyway, in the
right order.
This is the main reason we moved everything into HomeSetup in the
previous commits: so that we can share clean-up paths for these objects
with everything else.
homework: move all DM detachment/freeing into HomeSetup
We actually already detach/free the LUKS DM devices for most operations
via HomeSetup, let's move the creation logic to also do this, in order
to unify behaviour between operations.
homework: teach home_lock() + home_unlock() + home_deactivate() to use HomeSetup, too
This is just some minor refactoring, to make these two operations work
like the rest.
home_lock_luks() will now use the root_fd field of HomeSetup already,
but for home_unlock_luks() + home_deactivate() this change has no effect for now. (But a
later commit will change this.)
core: make DynamicUser=1 and StateDirectory= work with TemporaryFileSystem=/var/lib
The /var/lib/private/foo -> /var/lib/foo symlink for StateDirectory and
DynamicUser is set up on the host filesystem, before the mount namespacing
is brought up. If an empty /var/lib is used, to ensure the service does not
see other services data, the symlink is then not available despite
/var/lib/private being set up as expected.
Make a list of symlinks that need to be set up, and create them after all
the namespaced filesystems have been created, but before any eventual
read-only switch is flipped.
core: normalize 'r' variable handling in unit_attach_pids_to_cgroup() a bit
The 'r' variable is our "go-to" variable for error return codes, all
across our codebase. In unit_attach_pids_to_cgroup() it was so far used
in a strange way for most of the function: instead of directly storing
the error codes of functions we call we'd store it in a local variable
'q' instead, and propagate it to 'r' only in some cases finally we'd
return the ultimate result of 'r'.
Let's normalize this a bit: let's always store error return values in
'r', and then use 'ret' as the variable to sometimes propagate errors
to, and then return that.
This also allows us to get rid of one local variable.
No actual codeflow changes, just some renaming of variables that allows
us to remove one.
homework: allow specifying explicit additional mount options when using CIFS backend
This is useful since certain shares can only be mounted with additional
mount flags. For example the SMB share in modern AVM Fritz!Boxes
requires "noserverino" to be set to work from Linux.
Unfortunately mount.cifs doesn't really let us know much about the
reason for the failure. Hence, assume it's caused by a bad password, and
retry on any failure with additional passwords that we might have.
A loop to do this was always in place, but none of the possible
codepaths actually allowed to iterate more than once. Fix that.
homework: allow specifying a dir component in CIFS services
Allow specifying CIFS services in the format //host/service/subdir/… to
allow multiple homedirs on the same share, and not in the main dir of
the share.
All other backends allow placing the data store at arbitrary places,
let's allow this too for the CIFS backend. This is particularly useful
for testing.
homework: make home_move_mount() a bit more generic by renaming first parameter
No actual code change, let's just rename the first parameter, to make it
more generically useful in case the first argument is an arbitrary path,
not necessarily a username/realm.
homework: make sure fscrypt backend takes a HomeSetup object for all calls
Similar to the same chage we did for the directory backend. Let's always
path the setup context object, i.e. HomeSetup, and store whatever we set
up in there.
test-fd-util: extend close_all_fds() test to trigger all fallback codepaths
This extends the close_all_fds() logic to overmount /proc with an empty
tmpfs, and/or to block close_range() via seccomp, so that we run the
test case for the function with the fallback paths.
This should make sure that we don't regress in limited environments or
older kernels.
fd-util: special case invocation of close_all_fds() with single exception fd
Add special case optimization for a single exception fd. It's a
pretty common case in our codebase, and the optimization is simple
and means we don't need to copy/sort the exception array, so do it.
homework: add new helper call that can shift home dir UID/GID ranges
This new helper is not used yet, but it's useful for apply UID/GID
shifts so that the underlying home dir can use an arbitrary UID (for
example "nobody") and we'll still make it appear as owned by the target
UID.
This operates roughly like this:
1. The relevant underlying UID is mapped to the target UID
2. Everything in the homed UID range except for the target UID is left
unmapped (and thus will appear as "nobody")
3. Everything in the 16bit UID range outside of the homed UID
range/target UID/nobody user is mapped to itself
4. Everything else is left unmapped (in particular everything outside of
the 16 bit range).
Why do it like this?
The 2nd rule done to ensure that any files from homed's managed UID
range that do not match the user's own UID will be shown as "unmapped"
basically. Of course, IRL this should never happen, except if people
managed to manipulate the underlying fs directly.
The 3rd rule is to allow that if devs untar an OS image it more or
less just works as before: 16bit UIDs outside of the homed range will
be mapped onto themselves: you can untar things and tar it back up and
things will just work.
homework: rework directory backend to set up mounts in /run/systemd/user-home-mount before moving them to /home
This does what we already do for the LUKS backend: instead of mounting
the source directory directly to the final home dir, we instead bind
mount it to /run/systemd/user-home-mount (where /run/ is unshared and
specific to our own mount namespace), then adjust its mount flags and
then bind mount it in a single atomic operation into the final
destination, fully set up.
This doesn't improve much on its own, but it makes things a tiny bit
more correct: this way MS_NODEV/MS_NOEXEC/MS_NOSUID will already be
applied when the bind mount appears in the host mount namespace, instead
of being adjusted after the fact.
Doing things this way also makes things work more like the LUKS backend,
reducing surprises. Most importantly it's preparation for doing
uidmapping for directory homes, added in a later commit.
homework: when activating a directory, include info about it in resulting record
For the other backends we synthesize a "binding" section in the json
record of the user that stores meta info how a user record is "bound" to
the local host. It declares storage info and such. Let's do the same for
the directory/subvolume backends.