From 31620807e19e889c278ac847da9fffc7f2bd2d96 Mon Sep 17 00:00:00 2001 From: Chris Down Date: Tue, 17 Feb 2026 14:58:44 +0800 Subject: [PATCH] oomd: Fix unnecessary delays during OOM kills with pending kills present Let's say a user has two services with ManagedOOMMemoryPressure=kill, perhaps a web server under system.slice and a batch job under user.slice. Both exceed their pressure limits. On the previous timer tick, oomd has already queued the web server's candidate for killing, but the prekill hook has not yet responded, so the kill is still pending. In the code, monitor_memory_pressure_contexts_handler() iterates over all pressure targets that have exceeded their limits. When it reaches the web server target and calls oomd_cgroup_kill_mark(), which returns 0 because that cgroup is already queued. The code treats this the same as a successful new kill: it resets the 15 second delay timer and returns from the function, exiting the loop. This loop is handled by SET_FOREACH and the iteration order is hash-dependent. As such, if the web server target happens coincidentally to be visited first, oomd never evaluates the batch job target at all. The effect is twofold: 1. oomd stalls for 15 seconds despite not having initiated any new kill. That can unnecessarily delay further action to stem increases in memory pressure. The delay exists to let stale pressure counters settle after a kill, but no kill has happened here. 2. It non-deterministically skips pressure targets that may have unqueued candidates, dangerously allowing memory pressure to persist for longer than it should. Fix this by skipping cgroups that are already queued so the loop proceeds to try other pressure targets. We should only delay when a new kill mark is actually created. --- src/oom/oomd-manager.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/oom/oomd-manager.c b/src/oom/oomd-manager.c index c43e1bc5d10..b2142dd43cf 100644 --- a/src/oom/oomd-manager.c +++ b/src/oom/oomd-manager.c @@ -539,20 +539,20 @@ static int monitor_memory_pressure_contexts_handler(sd_event_source *s, uint64_t return log_oom(); if (r < 0) log_error_errno(r, "Failed to select any cgroups under %s based on pressure, ignoring: %m", t->path); + else if (r == 0) + /* Already queued for kill by an earlier iteration, try next target without + * resetting the delay timer. */ + continue; else { - /* Don't act on all the high pressure cgroups at once; return as soon as we kill one. - * If r == 0 then the cgroup is already queued for kill by an earlier iteration. - * In either case, go through the event loop again and select a new candidate if - * pressure is still high. */ + /* Don't act on all the high pressure cgroups at once; return as soon as we kill one. */ m->mem_pressure_post_action_delay_start = usec_now; - if (selected && r > 0) { + if (selected) log_notice("Marked %s for killing due to memory pressure for %s being %lu.%02lu%% > %lu.%02lu%%" " for > %s with reclaim activity", selected->path, t->path, LOADAVG_INT_SIDE(t->memory_pressure.avg10), LOADAVG_DECIMAL_SIDE(t->memory_pressure.avg10), LOADAVG_INT_SIDE(t->mem_pressure_limit), LOADAVG_DECIMAL_SIDE(t->mem_pressure_limit), FORMAT_TIMESPAN(t->mem_pressure_duration_usec, USEC_PER_SEC)); - } return 0; } } -- 2.47.3