update-done: create /etc and /var if they didn't exist
Previously, we would fail. But this doesn't seem useful: we may want to
mark the update as done even if /etc/ or /var/ no updates were necessary
and there was no need to create /etc/ or /var/ yet.
The idea is to use this when building an image to mark the image as not
needing updates after the reboot. In general it is impossible to say if
any of the early boot update services can be safely skipped, except when
the creator of the image knows all the contents there and has made sure
that all the updates have been processed. (This is in fact what happens
in a typical package-based installation: the packages have scriptlets which
implement the changes during or after the installation process.)
With this patch, the image build process can do 'systemd-update-done --root=…'
at the appropriate point to avoid triggering of ldconfig.service,
systemd-hwdb-update.service, etc.
I didn't write --image=, because it doesn't seem immediately useful. The
approach with --root is most useful when we're building the image "offline",
which means that we have a directory we're working on.
The man page was right, but the comment in the generated file was wrong. The
timestamp is *not* the timestamp when the update is being done. While at it,
say to what directory the message applies. This makes it easier for a casual
reader to figure out what is happening.
Also rename the function to better reflect what it does.
Inspired by https://github.com/systemd/systemd/issues/36045.
vmspawn: allow TPM state to be persistent + rework runtime dir logic
When using vmspawn on particleos image we really want that the TPM state
is retained between invocation, since the encryption key is locked to
the TPM after all. Hence let's support that.
This adds --tpm-state= which can be used to configure a path to store
the TPM state in. It can also be used to force tpm state to be transient
or to let vmpsawn pick the path automatically.
While we are at it, let's also revamp the runtime dir handling in
vmspawn: let's no longer place the sockets the auxiliary services listen
on within their own runtime directories. Instead, just drop the runtime
directories for them entirely (since neither virtiofsd, nor swtpm
actually use them). Also, let systemd clean up the sockets
automatically.
Currently this is picked up from the main branch of the fork which is
suboptimal. The packit folks implemented this new option for us which
should fix the problem.
Daan De Meyer [Wed, 19 Mar 2025 13:08:49 +0000 (14:08 +0100)]
fmf: Use mkosi -f together with ToolsTreePackageDirectories=
There's no need to build various systemd tools from source again to
build the mkosi image when we can just install the packages that were
already built from source into the tools tree so let's do that to avoid
unnecessary compiling.
Yu Watanabe [Wed, 19 Mar 2025 21:28:18 +0000 (06:28 +0900)]
core: Make DelegateNamespaces= work for user managers with CAP_SYS_ADMIN (#36771)
Currently DelegateNamespaces= only works for services spawned by the
system manager. User managers will always unshare the user namespace
first even if they're running with CAP_SYS_ADMIN.
Let's add support for DelegateNamespaces= for user managers if they're
running with CAP_SYS_ADMIN. By default, we'll still delegate all
namespaces
for user managers, but this can now be overridden by explicitly passing
DelegateNamespaces=.
If a user manager is running without CAP_SYS_ADMIN, the user manager is
still always unshared first just like before.
tpm2-util: return better errors if we try to unlock a tpm key on the wrong tpm
Let's improve error handling in case one tries to unlock a TPM2 locked
volume on a different machine via TPM than it was originally enrolled
on. Let's recognize this case and print a clearer error message.
sd-event: make pidfd copy in event_add_child_pidref()
So far we'd directly use the pidfd passed into event_add_child_pidref(),
hoping it would not be closed by the caller before we are done. This was
violated by vmspawn however.
Let's make this safe, and simply duplicate the fd, and make us
independent of the caller.
Daan De Meyer [Wed, 19 Mar 2025 11:36:20 +0000 (12:36 +0100)]
fmf: Drop support for dist-git-source: true
In preparation for moving the fmf stuff to the fedora spec repo instead
of maintaining it upstream, let's drop support for dist-git-source: true
which won't be needed anymore when we move the fmf stuff to the Fedora
spec repository.
Daan De Meyer [Wed, 19 Mar 2025 09:54:51 +0000 (10:54 +0100)]
packit: Enable use_target_repo_for_fmf_url option
Currently this is picked up from the main branch of the fork which is
suboptimal. The packit folks implemented this new option for us which
should fix the problem.
Daan De Meyer [Wed, 19 Mar 2025 09:30:52 +0000 (10:30 +0100)]
userdb: Add userdb.user.* and userdb.group.* credentials (#36740)
Let's allow providing extra userdb users and groups via credentials.
Similarly to systemd-udev-load-credentials.service, we ship
systemd-userdb-load-credentials.service which transform the JSON
user/group records provided via the corresponding credentials to static
userdb dropins in /run/userdb.
Daan De Meyer [Mon, 17 Mar 2025 10:35:23 +0000 (11:35 +0100)]
core: Make DelegateNamespaces= work for user managers with CAP_SYS_ADMIN
Currently DelegateNamespaces= only works for services spawned by the
system manager. User managers will always unshare the user namespace
first even if they're running with CAP_SYS_ADMIN.
Let's add support for DelegateNamespaces= for user managers if they're
running with CAP_SYS_ADMIN. By default, we'll still delegate all namespaces
for user managers, but this can now be overridden by explicitly passing
DelegateNamespaces=.
If a user manager is running without CAP_SYS_ADMIN, the user manager is
still always unshared first just like before.
Daan De Meyer [Mon, 17 Mar 2025 11:26:46 +0000 (12:26 +0100)]
capability-util: Ignore unknown capabilities instead of aborting
capability_ambient_set_apply() can be called with capability sets
containing unknown capabilities. Let's not crash when this is the
case but instead ignore the unknown capabilities.
This fixes a crash when running the following command:
Make sure the test has its own /proc and skip it in containers as
MountAPIVFS=yes in a container always results in a read-only /proc/sys
which means the test can't write to /proc/sys/kernel/ns_last_pid.
Daan De Meyer [Mon, 17 Mar 2025 15:20:00 +0000 (16:20 +0100)]
TEST-07-PID1.delegate-namespaces: Make sure fully visible procfs is available
To be able to mount /proc inside an unprivileged user namespace, we have
to make sure a fully visible procfs is available on the host, so let's make
sure that's the case.
Daan De Meyer [Mon, 17 Mar 2025 15:17:25 +0000 (16:17 +0100)]
core: Also check if we can mount /proc if pid namespace is delegated
If the pid namespace is delegated, it doesn't matter if we have CAP_SYS_ADMIN,
we'll still fail to mount /proc if part of it is masked on the host so also
check if we can mount /proc if the pid namespace is delegated.
Daan De Meyer [Thu, 13 Mar 2025 14:22:34 +0000 (15:22 +0100)]
userdb: Add userdb.user.* and userdb.group.* credentials
Let's allow providing extra userdb users and groups via credentials.
Similarly to systemd-udev-load-credentials.service, we ship
systemd-userdb-load-credentials.service which transform the JSON
user/group records provided via the corresponding credentials to static
userdb dropins in /etc/userdb.
Daan De Meyer [Tue, 18 Mar 2025 19:35:59 +0000 (20:35 +0100)]
mkosi: Bump to Fedora 42
Beta was just released, let's switch to Fedora 42 which coincidentally
also has a crucial fix for its nsswitch.conf to make the next commits
actually work.
Add a new condition wich checks against systemd version.
Change condition_test_kernel_version() into a generic condition_test_version()
so most of the code can be reused.
Yu Watanabe [Mon, 17 Mar 2025 01:36:33 +0000 (10:36 +0900)]
getty-generator: unify add_serial_getty() and add_container_getty()
This also makes the generator not trigger an assertion added by 1cd3c49d09bf78a2a2e4cf25cb3d388e1f08a709. If getty.ttys.container
credential contains a line prefixed with '/dev/', then the assertion
assert(!path_startswith(tty, "/dev/"));
was triggered. This drops the offending assertion, and such lines
are handled gracefully now.
Also, an empty string, "/dev/", and "/dev/pts/" (that is, a directory
without tty name) are gracefully skipped now.
Let's return the size in a return parameter instead of the return value.
And if NULL is specified this tells us the caller doesn't care about the
size and expects a NUL terminated string. In that case look for an
embedded NUL byte, and refuse in that case.
This should lock things down a bit, as we'll systematically refuse
embedded NUL strings now when we expect strings.
Sonia Zorba [Tue, 18 Mar 2025 00:25:51 +0000 (02:25 +0200)]
hwdb: fix backspace not working on HP Pavilion laptop (#36777)
PR #34685 moved the handling of keys 66/65 from specific models to
generic HP laptops.
Key 66 has been linked to the `pickup_phone` function; however, this
action key is not available on all HP laptop models, particularly older
versions. On my HP Pavilion laptop, key 66 is mapped to the `backspace`
function, which caused the backspace key to stop working after the
change.
The following PR fixes the issue on my **HP Pavilion Laptop 15-eg0xxx**.
I have placed the modifications under the Pavilion section, but I cannot
guarantee that this solution will apply to all Pavilion models.
Additionally, I have included a line that checks for "HP" instead of
solely searching for "Hewlett-Packard," as my model is simply labeled as
HP.
Currently, the unit is only reffed in transient_unit_set_properties()
via AddRef(), which however would be dropped if a reconnection
is attempted. Make sure to explicitly re-add reference in that case.
Yu Watanabe [Mon, 17 Mar 2025 22:34:03 +0000 (07:34 +0900)]
nsresourced,vmspawn: allow unpriv "tap" based networking in vmspawn (#36688)
This extends nsresourced to also allow delegation of a network tap
device (in addition to veth) to unpriv clients, with a strictly enforced
naming scheme.
also tightens security on a couple of things:
* enforces polkit on all nsresourced ops too (though by default still
everything is allowed)
* put a limit on delegated network devices
* forcibly clean up delegated network devices when the userns goes away
tree-wide: refuse user/group records lacking UID or GID
userdb allows user/group records without UID/GID (it only really
requires a name), in order to permit "unfixated" records. But that means
we cannot just rely on the field to be valid. And we mostly got that
right, but not everywhere. Fix that.
nsresourced: check polkit before executing our operations
Let's tighten rules on namespace operations: let's always ask PK for
permission before doing anything.
Note that if polkit is absent we'll still allow things, and the default
PK policy will also still allow things, but there's now a clear way how
people can not allow things if they want, by modifying the PK policy.