Follow-up for dbef4dd4f23517abfc73b35f0bdf004d2f8f4805. Everything that that
commit says is true, but — at least for me — it wasn't obvious why the code is
correct and we can do fixed-size allocations like new(struct inotify_data, 1).
In sd_event_source.child, we have 5 bools. If we make them each take one byte,
the structure size increases. So let's do that for the three of them, and leave
the other two (less frequently used) squished into the last byte. This allows
more efficient code to be generated, without changing the size of the struct:
sd-event: drop some bitfield specifiers from struct sd_event_source
This does not change the size of the structure, because the size is determined
by .child, which has a 128-byte siginfo_t field. But by dropping the specifiers
we let the compiler generate code that operates on full bytes instead of having
to play with bitmasks, see second diff below.
Also move the bools in .memory_pressure into a gap to save a few bytes on
initialization.
Yu Watanabe [Fri, 6 Jun 2025 15:09:37 +0000 (00:09 +0900)]
test: extend timeout and enable generating debugging logs
Not sure why the test failed, but maybe the test environment is too
slow? Even this does not fix the failure, by enabling debugging logs,
this hopefully provides more useful information for debugging.
Yu Watanabe [Fri, 6 Jun 2025 16:55:21 +0000 (01:55 +0900)]
run: ignore bus connection error in acquiring invocation ID (#37763)
This introduce bus_error_is_connection(), and use it where applicable.
Then, this makes connection errors in acquiring invocation ID by
systemd-run handled gracefully, like we already do other places.
Yu Watanabe [Fri, 6 Jun 2025 12:14:20 +0000 (21:14 +0900)]
sd-device: replace '!' with '/' before calling sd_device_new_from_subsystem_sysname()
Device ID uses device directory name as is, hence may contain '!', but
sd_device_new_from_subsystem_sysname() expects that the input is sysname.
So, we need to replace '!' with '/'.
repart: try harder to find verity-sig partitions for CopyBlocks=auto
verity-sig partitions are not kernel concepts, hence dm-verity won't
link them for us from the slaves/ subdir in sysfs. Hence let's instead
look up the partition via udev's database.
Hence: when we search for the data+verity+verity-sig partitions then
search for the first two as usual, but search for the latter by looking
up the udev props on the first two, and then following the paths
provided therein.
udev: add udev properties that point to verity/verity sig metadata partitions from data partitions
This extends the dissect_image builtin to actually add device node
references to the device nodes where the associated data is placed, if
we can find it.
This is kept very generic, and independent from the roothash properties
and suchlike, since it makes sense to make it possible to set these
properties also independently of the dissect-image builtin.
The device path is a /dev/disk/by-diskseq/ symlink, so that we have
stable reference that are not subject to dev_t reuses.
And rework partition_designator_is_verity_sig() to be based on
partition_verity_sig_to_data(), so that we don't have to maintain two
lists of verity sig partition types.
Yu Watanabe [Fri, 6 Jun 2025 10:12:48 +0000 (19:12 +0900)]
sd-lldp-rx: add VLAN ID parsing (#37725)
While the `port_vlan_id` field was already present in the
`sd_lldp_neighbor`, it wasn't currently parsed from the LLDP packet.
Added support for that as well as a small parsing test.
journal: replace a bunch of assert() with friendlier checks
We should not rely that data stored in the journal files remains
entirely untouched at all times. Because we unallocate files, data might
go away any time. Hence, never assert() on any expectations on what the
file contains. Instead, handle it more gracefully as a corruption issue,
and return EBADMSG.
This is just paranoia: let's determine the compression to use once,
instead of twice, after all te data is in journal files which might be
corrupted any time, and it would be weird if we came to different
results here each time.
journal: use EBADMSG for invalid data in file mmap
We must assume that any data in the mmap can change anytime because the
file is deallocated or similar. Let's strictly use EBADMSG for reporting
invalid file contents though (as opposed to using EINVAL if our own code
passes a wrong parameter somwhere).
Daan De Meyer [Thu, 5 Jun 2025 10:14:45 +0000 (12:14 +0200)]
meson: Don't fail install script if file doesn't exist
Depending on which optional features are enabled, the NSS module
might not have been built, which means the custom install script
will fail to remove the file. Let's pass -f so it succeeds regardless
of whether the file exists or not.
We allow omission of the part before and the part after the @. But so
far we didn't allow omitting both. There's no real reason for
disallowing that, hence be systematic and allow it.
journalctl: politely refuse if non-root usernames are specified for --machine=
We currently cannot support that (supporting that would probably require
some active component in the machine, or alternatively idmapped mounts
or so), hence politely refuse it.
run: chop off username from --machine= argument before calling OpenMachinePTY()
Let's be compatible with sd-bus' logic to talk to machine, and support
the usual user@host syntax. We only want the host part, hence chop if
off before passing it to OpenMachinePTY().
machined: open up OpenMachinePTY() for unpriv clients
The method call already does a PK check, it was just forgotten to
allowlist this in the dbus policy. And in the dbus vtable for
OpenMachinePTY() call. (It was allowlisted in the per-machine
vtable…)
core: Make sure we handle DelegateSubgroup= in combo with cgroupns
Currently, if we use a cgroup namespace together with DelegateSubgroup=,
the subgroup becomes the root of the cgroup namespace because we move the
service process to the subgroup before we unshare the cgroup namespace, and
the current cgroup becomes the root of the cgroup namespace when we unshare
the cgroup namespace.
Let's fix the problem by not moving the service process to the subgroup until
we've unshared the cgroup namespace. Note that this doesn't break the primary use
case of CLONE_INTO_CGROUP since we still use it to immediately clone into the service
main cgroup, just not anymore into the subgroup, but this shouldn't matter in practice.
Additionally, we need special handling for control processes, as those *do*
need to get spawned into the subcgroup immediately if delegation is configured to
avoid violating the cgroupsv2 "no inner processes" rule.
Effectively, this leaves us with the following logic:
- In exec_spawn(), spawn into subgroup if we're spawning a control process
that needs to be spawned into a subgroup immediately. Otherwise, spawn into
main service cgroup.
- In exec_invoke(), move into subgroup early if we don't need a cgroup namespace.
Otherwise, move into subgroup after we've unshared the cgroup namespace.
Daan De Meyer [Thu, 5 Jun 2025 09:04:06 +0000 (11:04 +0200)]
meson: Remove unnecessary deps from libsystemd-static build
blkid, libmount and openssl are not used in src/basic or src/libsystemd,
and so shouldn't be required as deps of libsystemd static, so let's drop
them.
Mike Yuan [Sun, 1 Jun 2025 07:12:13 +0000 (09:12 +0200)]
sd-daemon: add sd_pidfd_get_inode_id()
We nowadays expose pidfdid at various places, e.g. envvars
and dbus properties. Also the sd_notify() MAINPID= message
has been complemented with MAINPIDFDID=. But acquiring
pidfdid is actually non-trivial especially considering
the 32-bit case, hence let's introduce a public helper
in sd-daemon specifically for that purpose.
Mike Yuan [Sun, 1 Jun 2025 06:55:50 +0000 (08:55 +0200)]
pidfd-util: open an internal pidfd if none is passed in pidfd_check_pidfs()
I'd like to introduce a libsystemd helper for acquiring pidfd
inode id, which however means the fd passed to pidfd_check_pidfs()
can no longer be trusted. Let's add back the logic of allocating
a genuine pidfd allocated internally, which was remove in 5dc9d5b4eacbe32f58ad6ca18d70931ab89ea409.
Currently, almost all cgroup attr getters check cgroup_path for whether
cgroup is around. This is actually great, because we never want to expose
a non-existent cgroup path via IPC and such. However, it is spuriously
initialized at places where it shouldn't be, e.g. in unit_warn_leftover_processes().
This matters especially to units that *may* carry processes to run, but
not *always*, notably socket units. unit_warn_leftover_processes() is supposed
to be informative only and not try to set cgroup tracking to realized in
a half-assed way.
Hence, let's kill cgroup_realized field, and make sure cgroup_path is set
only if cgroup has been created. Be extra careful with deserialization
though, since the previous versions don't follow this rule and we need
to patch cgroup_path manually based on cgroup_realized we got from deserialization.
Calls to unit_watch_cgroup*() are dropped in cgroup_runtime_deserialize_one(),
because unit_deserialize_state() will invalidate cgroup realized state and
reapply later.
Mike Yuan [Wed, 4 Jun 2025 19:59:23 +0000 (21:59 +0200)]
core/unit-serialize: drop deserialization compat for state_change_timestamp
This was from v228, i.e. before cgroup v2 got introduced.
Nowadays cgroup v1 is outright rejected during initialization,
i.e. upgrading isn't possible whatsoever. Remove the compat glue there.
journal-file: let's make journal_file_copy_entry() robust against concurrent writing of the source
As usual, we need to protect ourselves against concurrent modification
of journal files. We a pretty good at that these days when reading
journal files. But journal_file_copy_entry() so far wasn't too good with
that. journal_file_append_data() so far returned EINVAL when you pass
invalid data to it. Since we pass the source data as-is in there, it's
going to fail if the journal source file is slightly invalid due to a
concurrent update.
Hence, we need to validate data gracefully here that we think comes from
a safe place, because actually it doesn't, it's directly copied from an
unsafe journal file.
Hence, let's introduce a clear error code here, and look for it in
journal_file_copy_entry(), and handle it gracefully.
Pretty sure this fixes #33372, but it's a race, so I don't know for
sure. If this remains reproducible we need to look at this again.
Let's rename the return parameters as "ret_xyz" systematically in
sd-login.
Also, let's make the return parameters systematically optional, like we
typically do these days. So far some where optional, other's weren't.
Let's clean this up.