From 4fb8cc53b3438eaafe27e22f168d3ae333ca037c Mon Sep 17 00:00:00 2001 From: Chris Down Date: Tue, 17 Feb 2026 14:30:16 +0800 Subject: [PATCH] oomd: Fix silent failure to find bad cgroups when another cgroup dies Consider a workload slice with several sibling cgroups. Imagine that one of those cgroups is removed between the moment oomd enumerates the directory and the moment it reads memory.oom.group. This is actually relatively plausible under the high memory pressure conditions where oomd is most needed. In this case, the failed read prompts us to `return 0`, which exits the entire enumeration loop in recursively_get_cgroup_context(). As a result, all remaining sibling cgroups are silently dropped from the candidate list for that monitoring cycle. The effect is that oomd can fail to identify and kill the actual offending cgroup, allowing memory pressure to persist until a subsequent cycle where the race doesn't occur. Fix this by instead proceeding to evaluate further sibling cgroups. --- src/oom/oomd-manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oom/oomd-manager.c b/src/oom/oomd-manager.c index 41763d606f2..c43e1bc5d10 100644 --- a/src/oom/oomd-manager.c +++ b/src/oom/oomd-manager.c @@ -241,7 +241,7 @@ static int recursively_get_cgroup_context(Hashmap *new_h, const char *path) { return r; if (r < 0) { log_debug_errno(r, "Failed to read memory.oom.group from %s, ignoring: %m", cg_path); - return 0; + continue; } if (r > 0) r = oomd_insert_cgroup_context(NULL, new_h, cg_path); -- 2.47.3