Ryan Wilson [Tue, 15 Oct 2024 03:49:54 +0000 (20:49 -0700)]
cgroup: Add ManagedOOMMemoryPressureDurationSec= override setting for units
This will allow units (scopes/slices/services) to override the default
systemd-oomd setting DefaultMemoryPressureDurationSec=.
The semantics of ManagedOOMMemoryPressureDurationSec= are:
- If >= 1 second, overrides DefaultMemoryPressureDurationSec= from oomd.conf
- If is empty, uses DefaultMemoryPressureDurationSec= from oomd.conf
- Ignored if ManagedOOMMemoryPressure= is not "kill"
- Disallowed if < 1 second
Note the corresponding dbus property is DefaultMemoryPressureDurationUSec
which is in microseconds. This is consistent with other time-based
dbus properties.
Ryan Wilson [Wed, 16 Oct 2024 17:40:30 +0000 (10:40 -0700)]
oomd: Refactor DefaultMemoryPressureDurationSec= to use conf parser
Parsing DefaultMemoryPressureDurationSec= is currently split between
conf parser, main() and manager_start() methods. This commit centralizes
parsing and bounds checking logic within a single custom conf parser
function.
Luca Boccassi [Wed, 16 Oct 2024 10:42:06 +0000 (11:42 +0100)]
Fix maybe-uninitialized warnings with gcc 14.2
../src/resolve/resolved-bus.c: In function ‘call_link_method’:
../src/resolve/resolved-bus.c:1769:16: warning: ‘l’ may be used uninitialized [-Wmaybe-uninitialized]
1769 | return handler(message, l, error);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
../src/resolve/resolved-bus.c:1755:15: note: ‘l’ was declared here
1755 | Link *l;
| ^
../src/resolve/resolved-bus.c: In function ‘bus_method_get_link’:
../src/resolve/resolved-bus.c:1828:13: warning: ‘l’ may be used uninitialized [-Wmaybe-uninitialized]
1828 | p = link_bus_path(l);
| ^~~~~~~~~~~~~~~~
../src/resolve/resolved-bus.c:1816:15: note: ‘l’ was declared here
1816 | Link *l;
| ^
xujing [Wed, 16 Oct 2024 07:19:09 +0000 (15:19 +0800)]
pid1: add env var to override default mount rate limit interval
Similar to 24a4542c. 24a4542c can only be set 1 in 1s at most,
sometimes we may need to set to something else(such as 1 in 2s).
So it's best to let the user decide.
Yu Watanabe [Wed, 16 Oct 2024 10:34:18 +0000 (19:34 +0900)]
journalctl: do not directly use optarg, but copy optarg before use
Otherwise, if the process forks child processes, then the arguments
cannot be used from them.
To avoid potential issues like the one fixed by 6d3012bab4ce4c1ed260598d05b4e9f2ea471658.
Nested mounts should be carried over from host to overlayfs to overlayfs
(and back to host if unmerged). Otherwise you run into hard to debug
issues where merging extensions means you can't unmount those nested mounts
anymore as they are hidden by the overlayfs mount.
To fix this, before unmerging any previous extensions, let's move the nested
mounts from the hierarchy to the workspace, then set up the new hierachy, and
finally, just before moving the hierarchy into place, move the nested mounts
back into place.
Because there might be multiple nested mounts that consists of one or more
mounts stacked on top of each other, we make sure to move all stacked mounts
properly to the overlayfs. The kernel doesn't really provide a nice way to do
this, so we create a stack, pop off each mount onto the stack and then pop from
the stack again to the destination to re-establish the stacked mounts in the same
order in the destination.
core: move debug logging from _can_live_mount() functions to caller
Let's debug log the returned dbus error where we want the logging, but
don't log it, where we don't.
This removes the noisy logging from the property handler for the
CanLiveMount property, but keeps it in place for the MountImage() method
call where we want it.
Yu Watanabe [Wed, 16 Oct 2024 05:52:49 +0000 (14:52 +0900)]
TEST-55-OOMD: check global config earlier
'Default Memory Pressure Duration' field in oomctl, which can be configured
with DefaultMemoryPressureDurationSec= in oomd.conf, is a global config.
Let's check it earlier.
This also drops unnecessary cleanup at the beginning.
Yu Watanabe [Fri, 11 Oct 2024 07:09:11 +0000 (16:09 +0900)]
TEST-55-OOMD: set ManagedOOMMemoryPressure= and friends in a drop-in config
Fedora and friends has a drop-in config for the settings in
/usr/lib/systemd/user/slice.d/ . Hence, settings in the main .slice may be
overridden. Let's set below in a drop-in with higher decimal prefix.
Also, rename override.conf -> 99-managed-oom-preference.conf for the same reason.
Michael Ferrari [Tue, 15 Oct 2024 16:42:20 +0000 (18:42 +0200)]
gpt-auto: remove directory check for ESP mount
Ensure that we always attempt to mount the `ESP` partition to `/boot`
when there is no `XBOOTLDR` partition.
Fixes an issue when booting without a `XBOOTLDR` partition and an empty
root partition, since it would mount the `ESP` partition to `/efi/`
unconditionally causing boot entries to not be under `/boot/` as
recommended by the Boot Loader Specification.
This PidRef just track some data, but cannot be used for any active
operation.
Background: for https://github.com/systemd/systemd/pull/34703 it makes
sense to track explicitly if some PidRef is not a local one, so that we
never attempt to for example "kill a remote process" and thus
acccidentally hit the wrong process (i.e. a local one by the same PID).
let's rename the "_ts" flavour of these calls "_full" instead, exposing
the full functionality. And then keep two more minimal versions around:
one "_at" (which has the ts parameter suppressed, but keeps the dir_fd
one). And one without suffix (which supresses both).
Do the same for the label versions of these calls.
userdb: return ESRCH if userdb service refuses a user/group name as invalid
if a userdb service refuse a user/group name as invalid, let's turn this
into ESRCH client-side following that there definitely is no user/group
record for a completely invalid user/group name.
Since the root directory was being suppressed to NULL, the subsequent
check would erroneously think that no working directory was specified.
This caused the default working directory to be applied instead.
Yu Watanabe [Tue, 15 Oct 2024 05:03:02 +0000 (14:03 +0900)]
TEST-13-NSPAWN: several cleanups
- suppress unnecessary error messages, especially in loop and at_exit(),
- ensure the container service is stopped before restarting,
- do not send KILL signal, as garbages will remain, and disturb the next
invocation,
- drop unnecessary workaround of trying machine twice.
Mike Yuan [Mon, 14 Oct 2024 16:31:14 +0000 (18:31 +0200)]
hibernate-resume-generator: don't initiate resume if soft-rebooted
This is just paranoia, to ensure that we don't accidentally
initiate resume if the initrd is entered through soft-reboot
rather than the initial one for booting up.
Mike Yuan [Sat, 28 Sep 2024 13:54:42 +0000 (15:54 +0200)]
core/manager: pass soft-reboot count to generators
soft-reboot allows switching into a different root/installation,
i.e. potentially invalidate settings from kernel cmdline and such.
Let's hence inform generators about soft-reboots.
Yu Watanabe [Sat, 12 Oct 2024 07:43:15 +0000 (16:43 +0900)]
network: wait for IPv6 MTU being synced to link MTU
The kernel resets the IPv6 MTU of an interface when its link MTU is changed.
But it seems the operation is asynchronous, and even when we detect that
the link MTU is changed, the IPv6 MTU may not be reset yet.
====
[ 2257.067613] systemd-networkd[447122]: veth99: MTU is changed: 1500 →1600 (min: 68, max: 65535)
[ 2257.067641] systemd-networkd[447122]: Setting '/proc/sys/net/ipv6/conf/veth99/mtu' to '1410'
[ 2257.067711] systemd-networkd[447122]: No change in value '1410', suppressing write
====
As you can see, even if the link MTU is changed to 1600, the IPv6 MTU is
unchanged (in this case, still 1410).
With the offending commit, on remove event, database file for a device is once
removed in event_execute_rules_on_remove(), but later re-created here.
This fixes the issue, and makes the database file not re-created on remove event.
Yu Watanabe [Sat, 12 Oct 2024 20:15:18 +0000 (05:15 +0900)]
TEST-13-NSPAWN: add test for 'machinectl terminate'
This also fixes the test for io.systemd.Machine.Terminate.
When systemd-nspawn@.service receives stop signal, then systemd-nspawn
sends SIGRTMIN+3 to the container, which was previously ignored by the
custom init script used by the container.
Let's introduce another trap for the signal, and correctly handle it.
sd-json: drop sd_json_dispatch_pid() again, as we prefer json_dispatch_pidref() now
The calls are now unused, and we generally prefer if people send a PID
triplet rather than a single PID, hence stop supporting a high-level
dispacher for pid_t.