anonymix007 [Tue, 22 Oct 2024 11:38:00 +0000 (14:38 +0300)]
uki: add new .dtbauto PE section type
.dtbauto section contains DT blobs, just like .dtb, the difference is
that multiple .dtbauto sections are allowed to be in a UKI and only one
is selected automatically
Temporarily drop an assert_cc() check in systemd-measure to make it compilable before the next commit
resolved: log error messages for openssl/gnutls context creation
In https://bugzilla.redhat.com/show_bug.cgi?id=2322937 we're getting
an error message:
Okt 29 22:21:03 fedora systemd-resolved[29311]: Could not create manager: Cannot allocate memory
I expect that this actually comes from dnstls_manager_init(), the
openssl version. But without real logs it's hard to know for sure.
Use EIO instead of ENOMEM, because the problem is unlikely to be actually
related to memory.
Daan De Meyer [Mon, 4 Nov 2024 11:21:21 +0000 (12:21 +0100)]
tmpfiles: Implement L? to only create symlinks if source exists
This allows a single tmpfiles snippet with lines to symlink directories
from /usr/share/factory to be shared across many different configurations
while making sure symlinks only get created if the source actually exists.
Yu Watanabe [Fri, 1 Nov 2024 21:31:25 +0000 (06:31 +0900)]
network: check if interface is initialized after enumeration completed
We enumerate interfaces at first, then enumerate other configurations
like addresses and so on. If we are running on a container, previously
we started to configure the enumerated interfaces before enumerating other
configurations.
Let's configure interfaces after all configurations are enumerated.
sd-daemon: add fd array size safety check to sd_notify_with_fds()
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
We fucked that up in the original sd_listen() calls, and then we fixed
that on the newer flavours. But pour internal common implementation
should of course use the full range size_t, as it should be.
This then allows us to drop a redundant range check.
This cleans up the handling of the "unset_environment" parameter to
sd_listen() and related calls: the man pages claim we operate on it on
error too. Hence, actually do so in strictly all error paths. Previously
we'd miss out on some, because wrapper functions mishandled them.
This establishes a common pattern: a function to unset the relevant env
vars, that is called from a goto section at the botom on both success
and failure.
Martin Wilck [Wed, 30 Oct 2024 15:57:39 +0000 (16:57 +0100)]
udev-builtin-path_id: SAS wide ports must have num_phys > 1
Some kernel SAS drivers (e.g. smartpqi) expose ports with num_phys = 0. udev
shouldn't treat these ports as wide ports. SAS wide ports always have
num_phys > 1. See comments for sas_port_add_phy() in the kernel sources.
Sample data from a smartpqi system to illustrate the issue below.
Here the phy device is attached to port 0:0, which has no end devices attached
and the SAS end device (where sda is attached) is associated with SAS
port 0:1, which has no associated phy device. Thus num_phys for port-0:1 is 0.
This is arguably wrong, but it's how smartpqi has always set up its devices in
sysfs.
Daan De Meyer [Sun, 3 Nov 2024 11:54:20 +0000 (12:54 +0100)]
openssl-util: Query engine/provider pin via ask-password (#34948)
In mkosi, we want to support signing via a hardware token. We already
support this in systemd-repart and systemd-measure. However, if the
hardware token is protected by a pin, the pin is asked as many as 20
times when building an image as the pin is not cached and thus requested
again for every operation.
Let's introduce a custom openssl ui when we use engines and providers
and plug systemd-ask-password into the process. With
systemd-ask-password, the pin can be cached in the kernel keyring,
allowing us to reuse it without querying the user again every time to
enter the pin.
We use the private key URI as the keyring identifier so that the cached
pin can be shared across multiple tools.
Daan De Meyer [Wed, 30 Oct 2024 14:47:58 +0000 (15:47 +0100)]
openssl-util: Query engine/provider pin via ask-password
In mkosi, we want to support signing via a hardware token. We already
support this in systemd-repart and systemd-measure. However, if the
hardware token is protected by a pin, the pin is asked as many as 20
times when building an image as the pin is not cached and thus requested
again for every operation.
Let's introduce a custom openssl ui when we use engines and providers
and plug systemd-ask-password into the process. With systemd-ask-password,
the pin can be cached in the kernel keyring, allowing us to reuse it without
querying the user again every time to enter the pin.
We use the private key URI as the keyring identifier so that the cached pin
can be shared across multiple tools.
Note that if the private key is pin protected, openssl will prompt both when
loading the private key using the pkcs11 engine and when actually signing the
roothash. To make sure our custom UI is used when signing the roothash, we have
to also configure it with ENGINE_ctrl() which takes a non-owning pointer to
the UI_METHOD object and its userdata object which we have to keep alive so we
introduce a new AskPasswordUserInterface struct which we use to keep both objects
alive together with the EVP_PKEY object.
Because the AskPasswordRequest struct stores non-owning pointers to its fields,
we change repart to store the private key URI as a global variable again instead
of the EVP_PKEY object so that we can use the private key argument as the keyring
field of the AskPasswordRequest instance without running into lifetime issues.
Yu Watanabe [Sat, 2 Nov 2024 20:07:55 +0000 (05:07 +0900)]
network: free DHCP client and friends in link_free()
No functional change, at least now. Preparation for later commits.
But we are planning to extend KeepConfiguration= and also keep
addresses and so on assigned by other dynamic configuration protocol
like DHCPv6 or NDisc.
However, when link_free_engines() is called here, acquired addresses so
on by NDisc will be removed, even if link_stop_engines() handles
restarting networkd or KeepConfiguration= gracefully.
So, let's not free engines here, but free them later in link_free().
It is not necessary to be called here anyway.
Daan De Meyer [Thu, 31 Oct 2024 12:54:33 +0000 (13:54 +0100)]
efivars: Remove STRINGIFY() helper macros
The names of these conflict with macros from efi.h that we'll move
to efi-fundamental.h in a later commit. Let's avoid the conflict by
getting rid of these helpers. Arguably this also improves readability
by clearly indicating we're passing arbitrary strings and not constants
to the macros when we invoke them.
Currently ask_password_auto() will always try to store the password into
the user keyring. Let's make this configurable so that we can configure
ask_password_auto() into the session keyring. This is required when working
with user namespaces, as the user keyring is namespaced by user namespaces
which makes it impossible to share cached keys across user namespaces by using
the user namespace while this is possible with the session keyring.
Daan De Meyer [Sat, 2 Nov 2024 21:13:31 +0000 (22:13 +0100)]
mkosi: Add extra tools tree packages required to run integration tests
With https://github.com/systemd/mkosi/pull/3164, we'll be able to run
arbitrary commands in the mkosi sandbox, which has /usr from the tools
tree if one is configured. Let's add the required packages to be able to
run meson to setup the integration tests. This allows running the integration
tests without having to install meson or other build dependencies on the
host system.
Luca Boccassi [Sat, 2 Nov 2024 12:04:49 +0000 (12:04 +0000)]
Add support for id-mapped mounts to Exec directories (#34078)
Currently, bind-mounted directories within a user/mount namespace get
the uid/gid stored on their files. If the host creates a file in the
source directory, it will still show as root in the namespace.
Id-mapping is a filesystem feature that allows a mount namespace to show
a different uid than what is actually stored on a file. Add support for
id-mappings to exec directories, so that the files within the mount
namespace are owned by the unprivileged uid/gid.
In the host namespace, creating a file "test":
```
root@abeltran-test:/var/lib/andresstatedir# ls -lah
total 8.0K
drwxr-xr-x 2 root root 4.0K Aug 21 23:48 .
drwx------ 3 root root 4.0K Aug 21 23:47 ..
-rw-r--r-- 1 root root 0 Aug 21 23:48 test
```
Within the unit namespace:
```
root@abeltran-test:/var/lib/sampleservice# ls -lah
total 4.0K
drwxr-xr-x 2 63750 63750 4.0K Aug 21 23:48 .
drwxr-xr-x 3 root root 60 Aug 21 23:47 ..
-rw-r--r-- 1 63750 63750 0 Aug 21 23:48 test
```
```
root@abeltran-test:/# mount | grep and
/dev/sda1 on /var/lib/private/andresstatedir type ext4 (rw,nosuid,noexec,relatime,idmapped,discard,errors=remount-ro,commit=30)
```
Luca Boccassi [Sat, 2 Nov 2024 11:27:28 +0000 (11:27 +0000)]
logind: respect SD_LOGIND_ROOT_CHECK_INHIBITORS with weak blockers (#34969)
The check for the old flag was not restored when the weak blocker was
added, add it back. Also skip polkit check for root for the weak
blocker, to keep compatibility with the previous behaviour.
Luca Boccassi [Thu, 31 Oct 2024 16:02:38 +0000 (16:02 +0000)]
logind: respect SD_LOGIND_ROOT_CHECK_INHIBITORS with weak blockers
The check for the old flag was not restored when the weak
blocker was added, add it back. Also skip polkit check for
root for the weak blocker, to keep compatibility with the
previous behaviour.
Daan De Meyer [Fri, 1 Nov 2024 12:05:46 +0000 (13:05 +0100)]
mkosi: Set BuildSourcesEphemeral=no in mkosi.clangd
We're just running a language server so no need to put a writable
overlay on top of the build sources to prevent modifications. This
hopefully helps the language server track modifications to the source
files better.
Luca Boccassi [Fri, 1 Nov 2024 12:25:35 +0000 (12:25 +0000)]
coredump: lock down EnterNamespace= mount even more (#34975)
Let's disable symlink following if we attach a container's mount tree to
our own mount namespace. We afte rall mount the tree to a different
location in the mount tree than where it was inside the container, hence
symlinks (if they exist) will all point to the wrong places (even if
relative, some might point to other places). And since symlink attacks
are a thing, and we let libdw operate on the tree, let's lock this down
as much as we can and simply disable symlink traversal entirely.
This makes use of the new TIOCGPTPEER pty ioctl() for directly opening a
PTY peer, without going via path names. This is nice because it closes a
race around allocating and opening the peer. And also has the nice
benefit that if we acquired an fd originating from some other
namespace/container, we can directly derive the peer fd from it, without
having to reenter the namespace again.
Luca Boccassi [Mon, 28 Oct 2024 19:58:58 +0000 (19:58 +0000)]
core: add read-only flag for exec directories
When an exec directory is shared between services, this allows one of the
service to be the producer of files, and the other the consumer, without
letting the consumer modify the shared files.
This will be especially useful in conjunction with id-mapped exec directories
so that fully sandboxed services can share directories in one direction, safely.
Adrian Vovk [Fri, 2 Feb 2024 03:53:09 +0000 (22:53 -0500)]
homed: Allow user to change parts of their record
This allows an unprivileged user that is active at the console to change
the fields that are in the selfModifiable allowlists (introduced in a
previous commit) without authenticating as a system administrator.
Administrators can disable this behavior per-user by setting the
relevant selfModifiable allowlists, or system-wide by changing the
policy of the org.freedesktop.home1.update-home-by-owner Polkit action.
coredump: lock down EnterNamespace= mount even more
Let's disable symlink following if we attach a container's mount tree to
our own mount namespace. We afte rall mount the tree to a different
location in the mount tree than where it was inside the container, hence
symlinks (if they exist) will all point to the wrong places (even if
relative, some might point to other places). And since symlink attacks
are a thing, and we let libdw operate on the tree, let's lock this down
as much as we can and simply disable symlink traversal entirely.
coredump: rework protocol between coredump pattern handler and processing service (#34970)
In
https://github.com/systemd/systemd/commit/68511cebe58977ea68ae4f57c6462e979efd1cff
the ability to pass the
coredump's mount namespace fd from the coredump patter handler was added
to systemd-coredump. For this the protocol was augmented, in attempt to
provide both forward and backward compatibility.
The protocol as of v256: one or more datagrams with journal log fields
about the coredump are sent via an SOCK_SEQPACKET connection. It is
finished with a zero length datagram which carries the coredump fd (this
last datagram is called "sentinel" sometimes).
The protocol after
https://github.com/systemd/systemd/commit/68511cebe58977ea68ae4f57c6462e979efd1cff
is extended
so that after the sentinal a 2nd sentinel is sent, with a pair of fds:
the coredump fd *again* and a mount fd (acquired via open_tree()) of the
container's mount tree. It's a bit ugly to send the coredump fd a 2nd
time, but what's more important the implementation didn't work: since on
SOCK_SEQPACKET a zero sized datagram cannot be distinguished from EOF
(which is a Linux API design mistake), an early EOF would be
misunderstood as a zero size datagram lacking any fd, which resulted in
protocol termination.
Moreover, I think if we touch the protocol we should make the move to
pidfs at the same time.
All of the above is what this protocol rework addresses.
1. A pidfd is now sent as well
2. The protocol is now payload, followed by the coredump fd datagram (as
before). But now followed by a second empty datagram with a pidfd,
and a third empty datagram with the mount tree fd. Of this the latter
two or last are optional. Thus, it's now a stream of payload
datagrams with one, two or three fd-laden datagrams as sentinel. If
we read the 2nd or 3rd sentinel without an attached fd we assume this
is actually an EOF (whether it actually is one or not doesn't matter
here). This should provide nice up and down compatibility.
3. The mount_tree_fd is moved into the Context object. The pidfd is
placed there too, as a PidRef. Thus the data we pass around is now
the coredump fd plus the context, which is simpler and makes a lot
more semantical sense I think.
4. The "first" boolean is replaced by an explicit state engine enum