From: Anita Zhang Date: Wed, 19 Jan 2022 18:40:46 +0000 (-0800) Subject: oomd: fix race with path unavailability when killing cgroups X-Git-Tag: v251-rc1~488^2~1 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=2ee209466bb51f39ae9df7fec4d5594ce8cfa3f0;p=thirdparty%2Fsystemd.git oomd: fix race with path unavailability when killing cgroups There can be a situation where systemd-oomd would kill all of the processes in a cgroup, pid1 would clean up that cgroup, and systemd-oomd would get ENODEV trying to iterate the cgroup a final time to ensure it was empty. systemd-oomd sees this as an error and immediately picks a new candidate even though pressure may have recovered. To counter this, check and handle path unavailability errnos specially. Fixes: #22030 --- diff --git a/src/oom/oomd-util.c b/src/oom/oomd-util.c index 64ea8cf7e43..b54bf483d60 100644 --- a/src/oom/oomd-util.c +++ b/src/oom/oomd-util.c @@ -196,7 +196,14 @@ int oomd_cgroup_kill(const char *path, bool recurse, bool dry_run) { r = cg_kill_recursive(SYSTEMD_CGROUP_CONTROLLER, path, SIGKILL, CGROUP_IGNORE_SELF, pids_killed, log_kill, NULL); else r = cg_kill(SYSTEMD_CGROUP_CONTROLLER, path, SIGKILL, CGROUP_IGNORE_SELF, pids_killed, log_kill, NULL); - if (r < 0) + + /* The cgroup could have been cleaned up after we have sent SIGKILL to all of the processes, but before + * we could do one last iteration of cgroup.procs to check. Or the service unit could have exited and + * was removed between picking candidates and coming into this function. In either case, let's log + * about it let the caller decide what to do once they know how many PIDs were killed. */ + if (IN_SET(r, -ENOENT, -ENODEV)) + log_debug_errno(r, "Error when sending SIGKILL to processes in cgroup path %s, ignoring: %m", path); + else if (r < 0) return r; r = increment_oomd_xattr(path, "user.oomd_kill", set_size(pids_killed));