While at it, turn the retval check for sd_bus_track_count_name()
into assertion, given we're working with already established tracks
(service_name_is_valid() should never yield false in this case).
Luca Boccassi [Fri, 10 Jan 2025 21:02:55 +0000 (21:02 +0000)]
stub: drop PE sections parsing cap
This was added originally as it was thought that Windows applied
the same cap. Nowadays the specs do not mention it, and it is
believed Windows no longer applies it either, so drop it in order
to allow an arbitrary number of DTBs to be included
Daan De Meyer [Fri, 10 Jan 2025 14:26:54 +0000 (15:26 +0100)]
fmf: Use different heuristic on beefy systems
If we save journals in /tmp, we can run a larger number of tests in
parallel so let's make use of the larger number of CPUs if the tests
run on a beefy machine.
Daan De Meyer [Fri, 10 Jan 2025 13:51:24 +0000 (14:51 +0100)]
test: Move StateDirectory= directive into dropin
The integration-test-setup calls require StateDirectory= but some
tests override the test unit used which then won't have StateDirectory=
so let's move StateDirectory= into the dropin as well to avoid this
issue.
Daan De Meyer [Fri, 10 Jan 2025 13:27:33 +0000 (14:27 +0100)]
test: Add option to save in progress test journals to /tmp
The journal isn't the best at being fast, especially when writing
to disk and not to memory, which can cause integration tests to
grind to a halt on beefy systems due to all the systemd-journal-remote
instances not being able to write journal entries to disk fast enough.
Let's introduce an option to allow writing in progress test journals
to use /tmp which can be used on beefy systems with lots of memory to
speed things up.
varlink: send linux errno name along with errno number in generic system error replies (#35912)
Let's make things a bit less Linux specific, and more debuggable, by
including not just the error number but also the error name in the
generic io.systemd.System errors we generate when all we have is an
"errno".
process-util: make pidref_safe_fork_full() work with FORK_WAIT
(This is useful for the test case added in the next commit, where it's
kinda nice being able to use pidref_safe_fork_full() and acquiring a
pidref of the child in the child in one go. There's no other value in
this than a bit of synctactic sugar for that test. But otoh thre's no
good reason to prohibit FORK_WAIT use like this, hence either way, this
commit should be a good thing.)
varlink: tweak what we include in "system error" messages
We so far only included the numeric Linux errno. That's pretty Linux
specific however. Hence, let's improve things and include an origin
string, that clearly marks Linux as origin. Also, include the string
name of the error.
Take these two fields into account when translating back, too. So that
we prefer going by symbolic name rather than by numeric id.
sd-json: make it safe to call sd_json_dispatch_full() with a NULL table
This is useful for generating good errors when dispatching varlink
methods that take no parameters, as we'll still generate precise errors
in that case, taking a NULL table as equivalent as one with no
entries.
logind: rework session creation logic, to be more reusable for varlink codepaths
This separates the preparatory checks that generate D-Bus errors from
the code that actually allocates the session. This make the logic easier
to follow and prepares ground so that we can reuse the 2nd part later
when exposing session creation via Varlink.
userdb: define new 64K "foreign UID" range (#35932)
This is establish the basic concepts for #35685, in the hope to get this
merged first.
This defines a special, fixed 64K UID range that is supposed to be used
by directory container images on disk, that is mapped to a dynamic UID
range at runtime (via idmapped mounts).
This enables a world where each container can run with a dynamic UID
range, but this in no way leaks onto the disk, thus making supposedly
dynamic, transient UID range assignments persistent.
This is infrastructure later used for the primary part of #35685: unpriv
container execution with directory images inside user's home dirs, that
are assigned to this special "foreign UID range".
This PR only defines the ranges, synthesizes NSS records for them via
userdb, and then exposes them in a new "systemd-dissect --shift" command
that can re-chown a container directory tree into this range (and in
fact any range).
This comes with docs. But no tests. There are tests in #35685 that cover
all this, but they are more comprehensive and also test nspawn's hook-up
with this, hence are excluded from this PR.
Daan De Meyer [Thu, 9 Jan 2025 14:45:41 +0000 (15:45 +0100)]
fmf: Use one fewer than number of available CPUs again
This effectively reverts b8582198ca1e6fe390f7169e623a9130b68a6b36
as I can not get the testing farm bare metal machines working
downstream and even if I managed to, without also using the testing
farm bare metal machines upstream (for which there is no capacity),
the setup would very quickly bitrot anyway so we'll just run the
container based tests for now.
Stash the subscriber list when we disconenct from the bus (#35406)
If we unexpectly disconnect from the bus, systemd would end up dropping
the list of subscribers, which breaks the ability of clients like logind
to monitor the state of units.
Stash the list of subscribers into the deserialized state in the event
of a disconnect so that when we recover we can renew the broken
subscriptions.
pam: add session class "none" to disable logind sessions (#35171)
pam_systemd is used to create logind sessions and to apply extended
attributes from json user records. Not every application that creates a
pam session expects a login scope, but may be interested in the extended
attributes of json user records. Session class "none" implements this
service by disabling logind for this session altogether.
Daan De Meyer [Thu, 9 Jan 2025 10:59:58 +0000 (11:59 +0100)]
TEST-06-SELINUX: Add knob to allow checking for AVCs (#35921)
When running the integration tests downstream, it's useful to be able to
test that a new systemd version doesn't introduce any AVC denials, so
let's add a knob to make that possible.
Daan De Meyer [Wed, 8 Jan 2025 12:31:11 +0000 (13:31 +0100)]
TEST-06-SELINUX: Add knob to allow checking for AVCs
When running the integration tests downstream, it's useful to be
able to test that a new systemd version doesn't introduce any AVC
denials, so let's add a knob to make that possible.
Daan De Meyer [Thu, 9 Jan 2025 10:28:15 +0000 (11:28 +0100)]
test: Only plug in integration-test-setup.sh in interactive mode
If we're not running interactively, there's no point in the features
from integration-test-setup.sh which are intended for interactive
development and debugging so lets skip adding it in that case.
Yu Watanabe [Tue, 7 Jan 2025 18:23:29 +0000 (03:23 +0900)]
sd-device: make sd_device_new_from_path() accept relative path to device node
Even though udevadm accepts relative syspath, previously, udevadm
could not use relative path to device node:
===
$ cd /dev
$ udevadm info sda
Bad argument "sda", expected an absolute path in /dev/ or /sys/ or a unit name: Invalid argument
$ udevadm info /usr/../dev/sda
Unknown device "/usr/../dev/sda": No such device
===
With this change, both the above cases work fine.
Note, still sd_device_new_from_devname() requires absolute path starts
with /dev/, for safety.
Daan De Meyer [Wed, 8 Jan 2025 21:20:42 +0000 (22:20 +0100)]
fmf: Use different heuristic for number of process with many CPUs
Downstream we sometimes end up with machines with lots of CPUs which
leads to running out of memory when trying to run the tests in VMs.
So let's switch to a different heuristic when we have lots of CPUs to
avoid running out of memory.
Ronan Pigott [Thu, 28 Nov 2024 19:53:32 +0000 (12:53 -0700)]
dbus: stash the subscriber list when we disconenct from the bus
If we unexpectly disconnect from the bus, systemd would end up dropping
the list of subscribers, which breaks the ability of clients like logind
to monitor the state of units.
Stash the list of subscribers into the deserialized state in the event
of a disconnect so that when we recover we can renew the broken
subscriptions.
hwids: add a new efi firmware type of device entry (#35747)
This change adds a new firmware type device entry for the .hwids
section.
It also adds compile time validations and appropriate unit tests for
them.
chid_match() and related helpers have been updated accordingly.
Duplicate of https://github.com/systemd/systemd/pull/35281
Last review feedback's from this above PR has been incorporated and
merged.
Remove no longer needed login-options override. Fixes agetty autologin.
The need for -o was introduced in db6aeda to set the -p flag for login.
Setting -o overrides agettys built-in handling of arguments, so "-- \\u" was needed to mimic it.
This broke the autologin-feature, since the -f (noauth) flag is not passed to login [1].
But with 3d2157e, the -p flag is dropped, but the full change wasn't reverted,
leaving autologin still broken - But for no reason since agetty does the right thing.
This makes the UID range configurable via build time options, but of
course it really shouldn't be changed. The default range I picked is
outside even of IPAs current (ridiculously large) allocation ranges,
hence hopefully minimizes conflicts.
nsresource: optionally mangle userns names passed to nsresourced (#35900)
We enforce quite strict rules on naming userns we assign uid ranges to
for users. So strict that they are hard to get right for clients. hence,
let's optionally mangle provided strings so that they work for us.
This should make it much easier to work with the API, as something
reasonable happens regarldess what kind of garbage a client sets as
name.
mangling the name is opt-in for clients, so that there's tight control
for the client on the name, but also "fire and forget".
pid1: allow removal of foreign-owned subcgroups of cgroups owned by some user (#35922)
This improves operation in unprivileged userns environments, where
unpriv user code might invoke a container with a delegated userns UID
range, and thus ends up with a subcgroup owned by another UID. With this
patch any user is always allowed to remove their own cgroups even if it
has subcgroups owned by other users.
This removes a DoS of sorts, and enforces the rule that users strictly
own everything below cgroups they own.
test: add testcase that verifies we can safely delete subcgroups owned by other users if we own the parent
This is a test for the previous commits: we create an unpriv, delegated cgroup in
--user mode, then create a subcgroup that is owned by some other user
(to mimic the case where an unpriv user got a userns with delegated UIDs
assigned), and then try to stop the unit. traditionally this would fail,
because our unpriv systemd --user instance can't remove the subcrroup
owned by someone else. With the earlier patches this is addressed.
pid1: add D-Bus API for removing delegated subcgroups
When running unprivileged containers, we run into a scenario where an
unpriv owned cgroup has a subcgroup delegated to another user (i.e. the
container's own UIDs). When the owner of that cgroup dies without
cleaning it up then the unpriv service manager might encounter a cgroup
it cannot delete anymore.
Let's address that: let's expose a method call on the service manager
(primarly in PID1) that can be used to delete a subcgroup of a unit one
owns. This would then allow the unpriv service manager to ask the priv
service manager to get rid of such a cgroup.
This commit only adds the method call, the next commit then adds the
code that makes use of this.
pid1: allow moving processes in a userns owned by the user, too
Let's liberalize process migration a bit. Previously, PID 1 would only
allow you to move processes into your own cgroups, if those processes
are owned by you too. This is now slightly relaxed: it's now also OK if
the processes are in a userns owned by you.
This makes process migration more useful in context of unpriv userns.
nl6720 [Wed, 8 Jan 2025 14:19:33 +0000 (16:19 +0200)]
dissect-image: mount the ESP with fmask=0177 (#35871)
Avoid showing the files on the ESP (i.e. a FAT formatted volume) as
executable by removing the execute permission from them.
IMO this makes the colored output of `ls` more sensible since the file
system will be mounted with `noexec` anyway.
Add a `fstype_can_fmask_dmask` function that checks if a file system
type can use the `fmask` and `dmask` mount options.
This replaces `fstype_can_umask` since it was only used in
`partition_pick_mount_options` which only cares about the file system
support for fmask & dmask now.
It somewhat reduces the coverage of the feature since there are more file
systems that support umask as opposed to those supporting dmask & dmask,
but it should not be much of an issue since fmask & dmask are supported
by vfat, exfat and ntfs3.
nsresourced: add ability to mangle specified name if necessary
Let's optionally mangle any passed name on the server side so that it is
useful for identifying a userns, if it isn't suitable for that
right-away. This mostly means truncating it if too long.
It's just too nasty to leave this to the client side, since they'd have
to understand the precise rules for naming userns then.
While we are at it, add full Varlink IDL comments.