From: Daan De Meyer Date: Thu, 18 Sep 2025 07:59:10 +0000 (+0200) Subject: core: Use oom_group_kill attribute if OOMPolicy=kill X-Git-Tag: v259-rc1~492^2~2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=e03e5056dbffffafc86e46985658e1c9075d3c74;p=thirdparty%2Fsystemd.git core: Use oom_group_kill attribute if OOMPolicy=kill For managed oom kills, we check the user.oomd_ooms property which reports how many times systemd-oomd recursively killed the entire cgroup. For kernel OOM kills, we check the oom_kill property from memory.events which reports how many processes were killed by the kernel OOM killer in the corresponding cgroup and its child cgroups. For units with Delegate=yes, this is problematic, becase OOM kills in child cgroups that were handled by the delegated unit will still be treated as unit OOM kills by systemd. Specifically, if systemd is managing the delegated cgroup and memory.oom.group=1 is set on both the service cgroup and the child cgroup, if the child cgroup is OOM killed and this is handled by systemd running inside the delegated units, when the unit exits later, it will still be treated as oom-killed because oom_kill in memory.events will contain the OOM kills that happened in the child cgroup. To allow addressing this, the oom_group_kill property was added to the memory.events and memory.events.local files which allows reading how many times the entire cgroup was oom killed by the kernel if memory.oom.group=1. If we read this from memory.events.local, we know how many times the unit's entire cgroup (plus child cgroups) got oom killed by the kernel. This matches what we report for systemd-oomd managed oom kills and avoids reporting the unit as oom-killed if a child cgroup was oom killed by the kernel due to having memory.oom.group=1 set on it. Since this is only available from kernel 5.12 onwards, we fall back to reading the oom_kill field from memory.events if the oom_group_kill property is not available. --- diff --git a/src/core/cgroup.c b/src/core/cgroup.c index 48149e15bd5..1d5fa792802 100644 --- a/src/core/cgroup.c +++ b/src/core/cgroup.c @@ -3036,20 +3036,43 @@ int unit_check_oom(Unit *u) { if (!crt || !crt->cgroup_path) return 0; - r = cg_get_keyed_attribute( + CGroupContext *ctx = unit_get_cgroup_context(u); + if (!ctx) + return 0; + + /* If memory.oom.group=1, then look up the oom_group_kill field, which reports how many times the + * kernel killed every process recursively in this cgroup and its descendants, similar to + * systemd-oomd. Because the memory.events.local file was only introduced in kernel 5.12, we fall + * back to reading oom_kill if we can't find the file or field. */ + + if (ctx->memory_oom_group) { + r = cg_get_keyed_attribute( + "memory", + crt->cgroup_path, + "memory.events.local", + STRV_MAKE("oom_group_kill"), + &oom_kill); + if (r < 0 && !IN_SET(r, -ENOENT, -ENXIO)) + return log_unit_debug_errno(u, r, "Failed to read oom_group_kill field of memory.events.local cgroup attribute, ignoring: %m"); + } + + if (isempty(oom_kill)) { + r = cg_get_keyed_attribute( "memory", crt->cgroup_path, "memory.events", STRV_MAKE("oom_kill"), &oom_kill); - if (IN_SET(r, -ENOENT, -ENXIO)) /* Handle gracefully if cgroup or oom_kill attribute don't exist */ + if (r < 0 && !IN_SET(r, -ENOENT, -ENXIO)) + return log_unit_debug_errno(u, r, "Failed to read oom_kill field of memory.events cgroup attribute: %m"); + } + + if (!oom_kill) c = 0; - else if (r < 0) - return log_unit_debug_errno(u, r, "Failed to read oom_kill field of memory.events cgroup attribute: %m"); else { r = safe_atou64(oom_kill, &c); if (r < 0) - return log_unit_debug_errno(u, r, "Failed to parse oom_kill field: %m"); + return log_unit_debug_errno(u, r, "Failed to parse memory.events cgroup oom field: %m"); } increased = c > crt->oom_kill_last; @@ -3061,7 +3084,7 @@ int unit_check_oom(Unit *u) { log_unit_struct(u, LOG_NOTICE, LOG_MESSAGE_ID(SD_MESSAGE_UNIT_OUT_OF_MEMORY_STR), LOG_UNIT_INVOCATION_ID(u), - LOG_UNIT_MESSAGE(u, "A process of this unit has been killed by the OOM killer.")); + LOG_UNIT_MESSAGE(u, "The kernel OOM killer killed some processes in this unit.")); unit_notify_cgroup_oom(u, /* managed_oom= */ false);