Let's never bother with old TPM 1.x structures, they are not mentioned
in the TCG for TPM2 spec at all. However, the spec does say we should
check the Size field of the relevant structs, before accessing them,
hence do that.
On older versions, if the flag is anything other than AT_SYMLINK_NOFOLLOW,
it returns EINVAL, so we can detect it and call the kernel syscall directly
ourselves.
Using the glibc wrappers when possible is prefereable so that programs
like fakeroot can intercept its calls and redirect them.
Since there have been disagreements on certain aspects of the technical
direction, let's clear things up, and introduce a governance document,
taking inspiration from:
This also makes shebang always use env command, and drops unnecessary
'bash -c' or 'sh -c' when a signle command is invoked in the shell,
like sleep or echo.
In the commit c960ca2be1cfd183675df581f049a0c022c1c802, the logic of
updating ACL on device node was moved from logind to udevd, but at that
time, mistakenly removed the logic for static nodes.
Igor Opaniuk [Thu, 18 Sep 2025 15:49:32 +0000 (17:49 +0200)]
boot: add support for overriding key enrollement timeout
Currently, a 15-second timeout is hardcoded for the key enrollment
process while waiting for user confirmation. Make this value configurable
to allow the option of disabling user input, such as during automatic key
provisioning at the factory.
Signed-off-by: Igor Opaniuk <igor.opaniuk@foundries.io>
Block devices with removable media (e.g. SD card readers) indicate a
missing medium with a zero size. Optionally ignore such block devices
that carry no medium currently.
In systemd <= 257, each set_audit tristate value had special meaning,
- true: enable the kernel audit subsystem,
- false: disable the kernel audit subsystem,
- negative: keep the current kernel audit subsystem state.
And the default is true, rather than negative. So, users sometimes
explicitly pass an empty string to Audit= setting to keep the state.
But since f48cf2a96dfdc23fe30ba0f870125fe55cab64c7 (v258), the negative
value is mistakenly used as 'really unspecified' even if an empty string
is explicitly specified.
This makes negative values handled as unspecified as usual, and assign a new
positive value AUDIT_KEEP for when an empty string is explicitly specified.
Also, make the Audit= setting accept "keep" setting, and suggest to use "keep"
rather than an empty string.
Markus Boehme [Wed, 27 Aug 2025 20:49:29 +0000 (22:49 +0200)]
pkgconf: expose variables for system-alloc-{uid,gid}-min
Expose variables for system-alloc-uid-min and system-alloc-gid-min
similar to the UID/GID ranges already exposed for the respective
maximums, and other UID/GID ranges.
Christian Hesse [Fri, 19 Sep 2025 15:04:53 +0000 (17:04 +0200)]
man/systemd-notify: add a note on return value
The options `--booted` is compared with the command `systemctl
is-system-running`, but the return values have differnt meanings and it
is not a drop-in.
Let's do a "soft" reset of the TTY when a ptyfwd session ends. This is a
good idea, in order to reset changes to the scrolling window that code
inside the session might have made. A "soft" reset will undo this.
While we are at it, make sure to output the ansi sequences for this
*after* terminating any half-written line, as that is still somewhat
contents of the session, even if it's augmented.
UID entry in the machine state file is introduced in v258,
hence when a host is upgraded to v258, the field does not exist in the
file, thus the variable 'uid' is NULL.
For the case that sys/stat.h is not included indirectly by other headers.
Fixes the following error:
```
../src/run/run.c: In function 'fchown_to_capsule':
../src/run/run.c:2128:21: error: storage size of 'st' isn't known
2128 | struct stat st;
| ^~
```
We update BOOTX64.EFI explicitly once (because we know that it's the
main entry point of UEFI) and then a second time when we update
everything in $ESP/EFI/*.EFI. That's redundant and pretty ugly/confusing
in the log output. Hence exclude the file we already updated explicitly
from the 2nd run.
bootctl: downgrade messages about foreign EFI files
Given that we iterate through $ESP/EFI/BOOT/*.EFI these days this is a
pretty common case, hence it's not really noteworthy, hence downgrade
these log messages from LOG_NOTICE to LOG_INFO.
bootctl: switch a few getenv() calls to secure_getenv()
Following the rule that we should always prefer the secure flavour over
the regular one unless there's a clear reason for the regular one, let's
switch this over. Better safe than sorry.
This fixes two things: first of all it ensures we take the override
status output field properly into account, instead of going directly to
the regular one.
Moreover, it ensures that we bypass auto for both notice + emergency,
since both have the same "impact", and, don't limit this for notice
only.
So far, when outputing information about copy progress we'd suppress the
digit after the dot if it is zero. That makes the progress bar a bit
"jumpy", because sometimes there are two more character cells used than
other times. Let's just always output one digit after the dot here
hence, to avoid this.
core: Use oom_group_kill attribute if OOMPolicy=kill
For managed oom kills, we check the user.oomd_ooms property which
reports how many times systemd-oomd recursively killed the entire
cgroup. For kernel OOM kills, we check the oom_kill property from
memory.events which reports how many processes were killed by the
kernel OOM killer in the corresponding cgroup and its child cgroups.
For units with Delegate=yes, this is problematic, becase OOM kills
in child cgroups that were handled by the delegated unit will still
be treated as unit OOM kills by systemd.
Specifically, if systemd is managing the delegated cgroup and
memory.oom.group=1 is set on both the service cgroup and the child
cgroup, if the child cgroup is OOM killed and this is handled by systemd
running inside the delegated units, when the unit exits later, it will
still be treated as oom-killed because oom_kill in memory.events will
contain the OOM kills that happened in the child cgroup.
To allow addressing this, the oom_group_kill property was added to the
memory.events and memory.events.local files which allows reading how many
times the entire cgroup was oom killed by the kernel if memory.oom.group=1.
If we read this from memory.events.local, we know how many times the unit's
entire cgroup (plus child cgroups) got oom killed by the kernel. This matches
what we report for systemd-oomd managed oom kills and avoids reporting the
unit as oom-killed if a child cgroup was oom killed by the kernel due to
having memory.oom.group=1 set on it.
Since this is only available from kernel 5.12 onwards, we fall back to
reading the oom_kill field from memory.events if the oom_group_kill property
is not available.
So, arch-chroot currently uses a rather cursed setup:
it sets up a PID namespace, but mounts /proc/ from the outside
into the chroot tree, and then call chroot(2), essentially
making it somewhere between chroot(8) and a full-blown
container. Hence, the PID dirs in /proc/ reveal the outer world.
The offending commit switched chroot detection to compare
/proc/1/root and /proc/OUR_PID/root, exhibiting the faulty behavior
where the mentioned environment now gets deemed to be non-chroot.
Now, this is very much an issue in arch-chroot. However,
if /proc/ is to be properly associated with the pidns,
then we'd treat it as a container and no longer a chroot.
Also, the previous logic feels more readable and more
honestly reported errors in proc_mounted(). Hence I opted
for reverting the change here. Still note that the culprit
(once again :/) lies in the arch-chroot's pidns impl, not
systemd.
firewall-util: remove iptables/libiptc backend support (#38976)
This removes iptables/libiptc backend support in firewall-util, as
already announced by 5c68c51045c27d77b7afc211df7304a958d8cf24.
Then, this drops meaningless `FirewallContext` wrapper.