Markus Boehme [Wed, 27 Aug 2025 20:49:29 +0000 (22:49 +0200)]
pkgconf: expose variables for system-alloc-{uid,gid}-min
Expose variables for system-alloc-uid-min and system-alloc-gid-min
similar to the UID/GID ranges already exposed for the respective
maximums, and other UID/GID ranges.
Christian Hesse [Fri, 19 Sep 2025 15:04:53 +0000 (17:04 +0200)]
man/systemd-notify: add a note on return value
The options `--booted` is compared with the command `systemctl
is-system-running`, but the return values have differnt meanings and it
is not a drop-in.
Let's do a "soft" reset of the TTY when a ptyfwd session ends. This is a
good idea, in order to reset changes to the scrolling window that code
inside the session might have made. A "soft" reset will undo this.
While we are at it, make sure to output the ansi sequences for this
*after* terminating any half-written line, as that is still somewhat
contents of the session, even if it's augmented.
UID entry in the machine state file is introduced in v258,
hence when a host is upgraded to v258, the field does not exist in the
file, thus the variable 'uid' is NULL.
For the case that sys/stat.h is not included indirectly by other headers.
Fixes the following error:
```
../src/run/run.c: In function 'fchown_to_capsule':
../src/run/run.c:2128:21: error: storage size of 'st' isn't known
2128 | struct stat st;
| ^~
```
We update BOOTX64.EFI explicitly once (because we know that it's the
main entry point of UEFI) and then a second time when we update
everything in $ESP/EFI/*.EFI. That's redundant and pretty ugly/confusing
in the log output. Hence exclude the file we already updated explicitly
from the 2nd run.
bootctl: downgrade messages about foreign EFI files
Given that we iterate through $ESP/EFI/BOOT/*.EFI these days this is a
pretty common case, hence it's not really noteworthy, hence downgrade
these log messages from LOG_NOTICE to LOG_INFO.
bootctl: switch a few getenv() calls to secure_getenv()
Following the rule that we should always prefer the secure flavour over
the regular one unless there's a clear reason for the regular one, let's
switch this over. Better safe than sorry.
This fixes two things: first of all it ensures we take the override
status output field properly into account, instead of going directly to
the regular one.
Moreover, it ensures that we bypass auto for both notice + emergency,
since both have the same "impact", and, don't limit this for notice
only.
So far, when outputing information about copy progress we'd suppress the
digit after the dot if it is zero. That makes the progress bar a bit
"jumpy", because sometimes there are two more character cells used than
other times. Let's just always output one digit after the dot here
hence, to avoid this.
core: Use oom_group_kill attribute if OOMPolicy=kill
For managed oom kills, we check the user.oomd_ooms property which
reports how many times systemd-oomd recursively killed the entire
cgroup. For kernel OOM kills, we check the oom_kill property from
memory.events which reports how many processes were killed by the
kernel OOM killer in the corresponding cgroup and its child cgroups.
For units with Delegate=yes, this is problematic, becase OOM kills
in child cgroups that were handled by the delegated unit will still
be treated as unit OOM kills by systemd.
Specifically, if systemd is managing the delegated cgroup and
memory.oom.group=1 is set on both the service cgroup and the child
cgroup, if the child cgroup is OOM killed and this is handled by systemd
running inside the delegated units, when the unit exits later, it will
still be treated as oom-killed because oom_kill in memory.events will
contain the OOM kills that happened in the child cgroup.
To allow addressing this, the oom_group_kill property was added to the
memory.events and memory.events.local files which allows reading how many
times the entire cgroup was oom killed by the kernel if memory.oom.group=1.
If we read this from memory.events.local, we know how many times the unit's
entire cgroup (plus child cgroups) got oom killed by the kernel. This matches
what we report for systemd-oomd managed oom kills and avoids reporting the
unit as oom-killed if a child cgroup was oom killed by the kernel due to
having memory.oom.group=1 set on it.
Since this is only available from kernel 5.12 onwards, we fall back to
reading the oom_kill field from memory.events if the oom_group_kill property
is not available.
So, arch-chroot currently uses a rather cursed setup:
it sets up a PID namespace, but mounts /proc/ from the outside
into the chroot tree, and then call chroot(2), essentially
making it somewhere between chroot(8) and a full-blown
container. Hence, the PID dirs in /proc/ reveal the outer world.
The offending commit switched chroot detection to compare
/proc/1/root and /proc/OUR_PID/root, exhibiting the faulty behavior
where the mentioned environment now gets deemed to be non-chroot.
Now, this is very much an issue in arch-chroot. However,
if /proc/ is to be properly associated with the pidns,
then we'd treat it as a container and no longer a chroot.
Also, the previous logic feels more readable and more
honestly reported errors in proc_mounted(). Hence I opted
for reverting the change here. Still note that the culprit
(once again :/) lies in the arch-chroot's pidns impl, not
systemd.
firewall-util: remove iptables/libiptc backend support (#38976)
This removes iptables/libiptc backend support in firewall-util, as
already announced by 5c68c51045c27d77b7afc211df7304a958d8cf24.
Then, this drops meaningless `FirewallContext` wrapper.
core: if we cannot decode a TPM credential skip over it for ImportCredential=
let's skip over credentials we cannot decode when they are found with
ImportCredential=. When installing an OS on some disk and using that
disk on a different machine than assumed we'll otherwise end up with a
broken boot, because the credentials cannot be decoded when starting
systemd-firstboot. Let's handle this somewhat gracefully.
This leaves handling for LoadCredential=/SetCredential= as it is (i.e.
failure to decrypt results in service failure), because it is a lot more
explicit and focussed as opposed to ImportCredentials= which looks
everywhere, uses globs and so on and is hence very vague and unfocussed.
creds-util: tweak error code generation in decrypt_credential_and_warn() a bit, and add a comment listing it
Let's make some specific condition more recognizable via error codes of
their own, and in particular remove confusion between EREMOTE as
returned by tpm2_unseal() and by us.
Nick Rosbrook [Thu, 18 Sep 2025 13:16:02 +0000 (09:16 -0400)]
basic: validate timezones in get_timezones()
Depending on the packaging of tzdata, /usr/share/zoneinfo/tzdata.zi may
reference zones or links that are not actually present on the system.
E.g. on Debian and Ubuntu, there is a tzdata-legacy package that
contains "legacy" zones and links, but they are still referenced in
/usr/share/zoneinfo/tzdata.zi shipped by the main tzdata package.
Right now, get_timezoes() does not validate timezones when building the
list, which makes the following possible:
Since mountfsd was added in 702a52f4b5d49cce11e2adbc740deb3b644e2de0 the
caps bounding set line was commented. That's an accident. Fix that. (We
need to add a bunch of caps to the list).
units: explicitly reset TTY before running stuff on console
This adds TTYReset=yes to all units which run directly on the TTY. We
already had this in place for the gettys, but this adds it for the rest
that basically has StandardInput=tty + StandardOutput=tty set.
Originally, for these tools it wasn't necessary to reset the TTY,
because we after all already reset /dev/console very very early on once,
during PID1's early initialization, and hence there's no real reason to
do it again for these early boot services. But that's actually not
right, because since #36666 the TTY we reset from PID 1 is typically
/dev/console but the TTY those services are invoked on is typically the
resolved version of that, i.e. wherever that points. Now you might
think: if one is just an alias to the other, why does it matter to reset
this again? Well, because it's only a half-assed alias, and as it turns
out WIOCSWINSZ is not propagated from one to the other, i.e the terminal
dimesions we initialize for /dev/console don't propagate to whatever
that points to.
One option to address that would be to immediately propagate this down
ourselves (or to fix the kernel for it), but it felt safer to simply do
the reset again before the use, after all these one one-off services,
and there's no point in optimizing much here. Moreover, its probably
safer to give the guarantee that when the firstboot stuff (which after
all queries for pws to set) runs it definitely certainly guaranteed has
a properly reset terminal.
It's not well-formed to begin with. And util-linux's mount(8)
is pretty much ubiquitously employed, hence it will be rejected
elsewhere too. Just stop pretending it is valid just because
glibc parser is sloppy.