From 4fb8cc53b3438eaafe27e22f168d3ae333ca037c Mon Sep 17 00:00:00 2001
From: Chris Down <chris@chrisdown.name>
Date: Tue, 17 Feb 2026 14:30:16 +0800
Subject: [PATCH] oomd: Fix silent failure to find bad cgroups when another
 cgroup dies

Consider a workload slice with several sibling cgroups. Imagine that one
of those cgroups is removed between the moment oomd enumerates the
directory and the moment it reads memory.oom.group. This is actually
relatively plausible under the high memory pressure conditions where
oomd is most needed.

In this case, the failed read prompts us to `return 0`, which exits the
entire enumeration loop in recursively_get_cgroup_context(). As a
result, all remaining sibling cgroups are silently dropped from the
candidate list for that monitoring cycle.

The effect is that oomd can fail to identify and kill the actual
offending cgroup, allowing memory pressure to persist until a subsequent
cycle where the race doesn't occur.

Fix this by instead proceeding to evaluate further sibling cgroups.
---
 src/oom/oomd-manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/oom/oomd-manager.c b/src/oom/oomd-manager.c
index 41763d606f2..c43e1bc5d10 100644
--- a/src/oom/oomd-manager.c
+++ b/src/oom/oomd-manager.c
@@ -241,7 +241,7 @@ static int recursively_get_cgroup_context(Hashmap *new_h, const char *path) {
                         return r;
                 if (r < 0) {
                         log_debug_errno(r, "Failed to read memory.oom.group from %s, ignoring: %m", cg_path);
-                        return 0;
+                        continue;
                 }
                 if (r > 0)
                         r = oomd_insert_cgroup_context(NULL, new_h, cg_path);
-- 
2.47.3