This makes sure we now use Varlink per default as transport for
allocating sessions.
This reduces the time it takes to do one run0 cycle by roughly ~10% on my
completely synthetic test setup (assuming the target user's service
manager is already started)
The D-Bus codepaths are kept in place for two reasons:
* To make upgrades easy
* If the user actually sets resource properties on the PAM session we
fall back to the D-Bus codepaths, as we currently have no way to
encode the scope properties in JSON, this is only supported for D-Bus
serialization.
The latter should be revisited once it is possible to allocate a scope
unit from PID1 via varlink.
Daan De Meyer [Wed, 15 Jan 2025 10:32:34 +0000 (11:32 +0100)]
test: Drop sandbox() from integration test wrapper (#36009)
With the latest changes, this is not required anymore as mkosi sandbox
will set up the proper $PATH to make sure the executables from the build
directory are used.
sysusers: emit audit events for user and group creation (#35957)
Background: Fedora/RHEL are switching to sysusers.d metadata for
creation of users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must
emit audit events so that it provides a drop-in replacement.
systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP
when adding users and groups. The operation "names" are copied from
shadow-utils, so the format of the events that is generated on success
should be identical. On failure, things are more complicated. We write
the whole file at once, once, so we first generate "success" messages
for each entry, then we try to write the files, and if things fail, we
generate failure messages to all entries that we failed to write.
sysusers: emit audit events for user and group creation
Background: Fedora/RHEL are switching to sysusers.d metadata for creation of
users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must emit
audit events so that it provides a drop-in replacement.
systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP when
adding users and groups. The operation "names" are copied from shadow-utils in
Fedora (which has a patch to change them from the upstream version), so the
format of the events that is generated on success should be identical.
The helper code is shared between sysusers and utmp-wtmp. I changed the
audit_fd variable to be unconditional. This way we can avoid ugly iffdefery
every time the variable would be used. The cost is that 4 bytes of unused
storage might be present. This is negligible, and the compiler might even be
able to optimize that away if it inlines things.
Daan De Meyer [Wed, 15 Jan 2025 09:21:33 +0000 (10:21 +0100)]
test: Drop sandbox() from integration test wrapper
With the latest changes, this is not required anymore as mkosi sandbox
will set up the proper $PATH to make sure the executables from the build
directory are used.
Jeremy Linton [Fri, 10 Jan 2025 03:24:07 +0000 (21:24 -0600)]
confidential-virt: add detection for aarch64 CCA
The arm confidential compute architecture (CCA) provides a platform design for
confidential VMs running in a new realm context.
This can be detected by the existence of a platform device exported for the
arm-cca-guest driver, which provides attestation services via the realm
services interface (RSI) to the Realm Management Monitor (RMM).
Like the other methods systemd uses to detect Confidential VM's, checking
the sysfs entry suggests that this is a confidential VM and should only be
used for informative purposes, or to trigger further attestation.
Like the s390 detection logic, the sysfs path being checked is not labeled
as ABI, and may change in the future. It was chosen because its
directly tied to the kernel's detection of the realm service interface rather
to the Trusted Security Module (TSM) which is what is being triggered by the
device entry. The TSM module has a provider string of 'arm-cca-guest' which
could also be used, but that (IMHO) doesn't currently provide any additional
benefit except that it can fail of the module isn't loaded.
More information can be found here:
https://developer.arm.com/documentation/den0125/0300
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Yu Watanabe [Fri, 10 Jan 2025 21:07:55 +0000 (06:07 +0900)]
udev-rules: do not change maximum log level when running in test mode
When udev rules are being tested, log level specified by SYSTEMD_LOG_LEVEL=
environment variable should be honored, and should not be overridden by
the rules.
Daan De Meyer [Tue, 14 Jan 2025 15:05:33 +0000 (16:05 +0100)]
man: Clarify systemd-notify and sd_notify() PID documentation
Let's clarify more explicitly that privileged calls to
systemd-notify --pid= and sd_pid_notify() effectively override any
configured NotifyAccess=main|exec for a service.
Daan De Meyer [Tue, 14 Jan 2025 12:53:26 +0000 (13:53 +0100)]
mkosi: Install libxslt on CentOS/Fedora instead of xsltproc
Same package, but xsltproc is a very recently introduced Provides
for libxslt, and isn't available on CentOS Stream 9, so let's install
the package directly instead.
Daan De Meyer [Mon, 13 Jan 2025 15:18:33 +0000 (16:18 +0100)]
mkosi: update fedora commit reference
* fd36e4c562 Rebuilt for the bin-sbin merge (2nd attempt)
* cddeca136f Rebuilt for the bin-sbin merge (2nd attempt)
* 20cc578e59 Enable signing systemd-boot on OBS builds
* b1bd57ecce Revert use of PrivateTmp=disconnected
* 30f50b1870 Drop patch numbers
* 1814bfe794 remove STI test
* 3a9c32b8a9 Version 257.2
* 4df2711a9f Add bcond for OBS-specific quirks
* e570cd53df spec: drop trailing whitespace
* c7379c9460 Replace 'udevadm hwdb' with systemd-hwdb
* 3386f5d704 Rename source .abignore file
* fd860fd12d Drop a build dependency on a linter package: pytest-flakes
* 133ae30e33 Drop patches based on %upstream macro instead of patch number
* e157552c6c Always build in release mode
* fc47a92e4a Re-enable upstream behaviour of systemd-tmpfiles --purge
* 62abb21906 Version 257.1
* 35e6814ef4 Add patch for test-time-util
* bd8339bf00 sysusers: support new ! line flag for creating fully locked accounts
* c2f5f4a68a Version 257
* 31aaef8e17 Enable slow tests during build
Daan De Meyer [Mon, 13 Jan 2025 15:11:07 +0000 (16:11 +0100)]
mkosi: update arch commit reference
* 8160e63e52 Limit logic required for building locally
* 3a62443e41 OBS build: add support for xz and zst compression formats
* 9667464ad7 Get rid of _tag variable
* 73dc492b5e upgpkg: 257.2-2: rebuild with changes for service restart
* 6b7355b5bb do not restart any templated units
* 332718f955 exclude vmspawn units from restart as well
* 5a749a6716 exclude systemd-nspawn@* services from restart
* 8a10796f8b upgpkg: 257.2-1: new upstream release
* 16294a0b44 Add support for building from git in OBS
* 38b664eed4 upgpkg: 257.1-1: new upstream release
* e26158dda9 upgpkg: 257-1: new upstream release
* c984b75c3f restart services after upgrade...
* 27fae2c192 upgpkg: 256.9-1: new upstream release
* 1afdd08a60 upgpkg: 256.8-2: apply: shutdown: close DM block device before issuing DM_DEV_REMOVE ioctl
In containers securityfs is typically not mounted. Our lsm-bpf code
so far detected this situation and claimed the kernel was lacking
lsm-bpf support. Which isn't quite true though, it might very well
support it. This made boots of systemd in systemd-nspawn a bit ugly,
because of the misleading log message at boot.
Let's improve things, and make clearer what is going on.
While at it, turn the retval check for sd_bus_track_count_name()
into assertion, given we're working with already established tracks
(service_name_is_valid() should never yield false in this case).
Luca Boccassi [Fri, 10 Jan 2025 21:02:55 +0000 (21:02 +0000)]
stub: drop PE sections parsing cap
This was added originally as it was thought that Windows applied
the same cap. Nowadays the specs do not mention it, and it is
believed Windows no longer applies it either, so drop it in order
to allow an arbitrary number of DTBs to be included
Daan De Meyer [Fri, 10 Jan 2025 14:26:54 +0000 (15:26 +0100)]
fmf: Use different heuristic on beefy systems
If we save journals in /tmp, we can run a larger number of tests in
parallel so let's make use of the larger number of CPUs if the tests
run on a beefy machine.
Daan De Meyer [Fri, 10 Jan 2025 13:51:24 +0000 (14:51 +0100)]
test: Move StateDirectory= directive into dropin
The integration-test-setup calls require StateDirectory= but some
tests override the test unit used which then won't have StateDirectory=
so let's move StateDirectory= into the dropin as well to avoid this
issue.
Daan De Meyer [Fri, 10 Jan 2025 13:27:33 +0000 (14:27 +0100)]
test: Add option to save in progress test journals to /tmp
The journal isn't the best at being fast, especially when writing
to disk and not to memory, which can cause integration tests to
grind to a halt on beefy systems due to all the systemd-journal-remote
instances not being able to write journal entries to disk fast enough.
Let's introduce an option to allow writing in progress test journals
to use /tmp which can be used on beefy systems with lots of memory to
speed things up.
varlink: send linux errno name along with errno number in generic system error replies (#35912)
Let's make things a bit less Linux specific, and more debuggable, by
including not just the error number but also the error name in the
generic io.systemd.System errors we generate when all we have is an
"errno".
process-util: make pidref_safe_fork_full() work with FORK_WAIT
(This is useful for the test case added in the next commit, where it's
kinda nice being able to use pidref_safe_fork_full() and acquiring a
pidref of the child in the child in one go. There's no other value in
this than a bit of synctactic sugar for that test. But otoh thre's no
good reason to prohibit FORK_WAIT use like this, hence either way, this
commit should be a good thing.)
varlink: tweak what we include in "system error" messages
We so far only included the numeric Linux errno. That's pretty Linux
specific however. Hence, let's improve things and include an origin
string, that clearly marks Linux as origin. Also, include the string
name of the error.
Take these two fields into account when translating back, too. So that
we prefer going by symbolic name rather than by numeric id.
sd-json: make it safe to call sd_json_dispatch_full() with a NULL table
This is useful for generating good errors when dispatching varlink
methods that take no parameters, as we'll still generate precise errors
in that case, taking a NULL table as equivalent as one with no
entries.
logind: rework session creation logic, to be more reusable for varlink codepaths
This separates the preparatory checks that generate D-Bus errors from
the code that actually allocates the session. This make the logic easier
to follow and prepares ground so that we can reuse the 2nd part later
when exposing session creation via Varlink.
userdb: define new 64K "foreign UID" range (#35932)
This is establish the basic concepts for #35685, in the hope to get this
merged first.
This defines a special, fixed 64K UID range that is supposed to be used
by directory container images on disk, that is mapped to a dynamic UID
range at runtime (via idmapped mounts).
This enables a world where each container can run with a dynamic UID
range, but this in no way leaks onto the disk, thus making supposedly
dynamic, transient UID range assignments persistent.
This is infrastructure later used for the primary part of #35685: unpriv
container execution with directory images inside user's home dirs, that
are assigned to this special "foreign UID range".
This PR only defines the ranges, synthesizes NSS records for them via
userdb, and then exposes them in a new "systemd-dissect --shift" command
that can re-chown a container directory tree into this range (and in
fact any range).
This comes with docs. But no tests. There are tests in #35685 that cover
all this, but they are more comprehensive and also test nspawn's hook-up
with this, hence are excluded from this PR.
Daan De Meyer [Thu, 9 Jan 2025 14:45:41 +0000 (15:45 +0100)]
fmf: Use one fewer than number of available CPUs again
This effectively reverts b8582198ca1e6fe390f7169e623a9130b68a6b36
as I can not get the testing farm bare metal machines working
downstream and even if I managed to, without also using the testing
farm bare metal machines upstream (for which there is no capacity),
the setup would very quickly bitrot anyway so we'll just run the
container based tests for now.