From: Greg Kroah-Hartman Date: Mon, 26 May 2025 10:47:02 +0000 (+0200) Subject: 5.4-stable patches X-Git-Tag: v6.12.31~38 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=4c814c1d783cee4aedaa5e34ee7ae5c5fc6f951f;p=thirdparty%2Fkernel%2Fstable-queue.git 5.4-stable patches added patches: memcg-always-call-cond_resched-after-fn.patch mm-page_alloc.c-avoid-infinite-retries-caused-by-cpuset-race.patch --- diff --git a/queue-5.4/memcg-always-call-cond_resched-after-fn.patch b/queue-5.4/memcg-always-call-cond_resched-after-fn.patch new file mode 100644 index 0000000000..40f8090f0c --- /dev/null +++ b/queue-5.4/memcg-always-call-cond_resched-after-fn.patch @@ -0,0 +1,80 @@ +From 06717a7b6c86514dbd6ab322e8083ffaa4db5712 Mon Sep 17 00:00:00 2001 +From: Breno Leitao +Date: Fri, 23 May 2025 10:21:06 -0700 +Subject: memcg: always call cond_resched() after fn() + +From: Breno Leitao + +commit 06717a7b6c86514dbd6ab322e8083ffaa4db5712 upstream. + +I am seeing soft lockup on certain machine types when a cgroup OOMs. This +is happening because killing the process in certain machine might be very +slow, which causes the soft lockup and RCU stalls. This happens usually +when the cgroup has MANY processes and memory.oom.group is set. + +Example I am seeing in real production: + + [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... + .... + [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... + [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] + .... + +Quick look at why this is so slow, it seems to be related to serial flush +for certain machine types. For all the crashes I saw, the target CPU was +at console_flush_all(). + +In the case above, there are thousands of processes in the cgroup, and it +is soft locking up before it reaches the 1024 limit in the code (which +would call the cond_resched()). So, cond_resched() in 1024 blocks is not +sufficient. + +Remove the counter-based conditional rescheduling logic and call +cond_resched() unconditionally after each task iteration, after fn() is +called. This avoids the lockup independently of how slow fn() is. + +Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org +Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") +Signed-off-by: Breno Leitao +Suggested-by: Rik van Riel +Acked-by: Shakeel Butt +Cc: Michael van der Westhuizen +Cc: Usama Arif +Cc: Pavel Begunkov +Cc: Chen Ridong +Cc: Greg Kroah-Hartman +Cc: Johannes Weiner +Cc: Michal Hocko +Cc: Michal Hocko +Cc: Muchun Song +Cc: Roman Gushchin +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/memcontrol.c | 6 ++---- + 1 file changed, 2 insertions(+), 4 deletions(-) + +--- a/mm/memcontrol.c ++++ b/mm/memcontrol.c +@@ -1221,7 +1221,6 @@ int mem_cgroup_scan_tasks(struct mem_cgr + { + struct mem_cgroup *iter; + int ret = 0; +- int i = 0; + + BUG_ON(memcg == root_mem_cgroup); + +@@ -1231,10 +1230,9 @@ int mem_cgroup_scan_tasks(struct mem_cgr + + css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); + while (!ret && (task = css_task_iter_next(&it))) { +- /* Avoid potential softlockup warning */ +- if ((++i & 1023) == 0) +- cond_resched(); + ret = fn(task, arg); ++ /* Avoid potential softlockup warning */ ++ cond_resched(); + } + css_task_iter_end(&it); + if (ret) { diff --git a/queue-5.4/mm-page_alloc.c-avoid-infinite-retries-caused-by-cpuset-race.patch b/queue-5.4/mm-page_alloc.c-avoid-infinite-retries-caused-by-cpuset-race.patch new file mode 100644 index 0000000000..3e55dc4f35 --- /dev/null +++ b/queue-5.4/mm-page_alloc.c-avoid-infinite-retries-caused-by-cpuset-race.patch @@ -0,0 +1,79 @@ +From e05741fb10c38d70bbd7ec12b23c197b6355d519 Mon Sep 17 00:00:00 2001 +From: Tianyang Zhang +Date: Wed, 16 Apr 2025 16:24:05 +0800 +Subject: mm/page_alloc.c: avoid infinite retries caused by cpuset race + +From: Tianyang Zhang + +commit e05741fb10c38d70bbd7ec12b23c197b6355d519 upstream. + +__alloc_pages_slowpath has no change detection for ac->nodemask in the +part of retry path, while cpuset can modify it in parallel. For some +processes that set mempolicy as MPOL_BIND, this results ac->nodemask +changes, and then the should_reclaim_retry will judge based on the latest +nodemask and jump to retry, while the get_page_from_freelist only +traverses the zonelist from ac->preferred_zoneref, which selected by a +expired nodemask and may cause infinite retries in some cases + +cpu 64: +__alloc_pages_slowpath { + /* ..... */ +retry: + /* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */ + if (alloc_flags & ALLOC_KSWAPD) + wake_all_kswapds(order, gfp_mask, ac); + /* cpu 1: + cpuset_write_resmask + update_nodemask + update_nodemasks_hier + update_tasks_nodemask + mpol_rebind_task + mpol_rebind_policy + mpol_rebind_nodemask + // mempolicy->nodes has been modified, + // which ac->nodemask point to + + */ + /* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */ + if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, + did_some_progress > 0, &no_progress_loops)) + goto retry; +} + +Simultaneously starting multiple cpuset01 from LTP can quickly reproduce +this issue on a multi node server when the maximum memory pressure is +reached and the swap is enabled + +Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn +Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") +Signed-off-by: Tianyang Zhang +Reviewed-by: Suren Baghdasaryan +Reviewed-by: Vlastimil Babka +Cc: Michal Hocko +Cc: Brendan Jackman +Cc: Johannes Weiner +Cc: Zi Yan +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/page_alloc.c | 8 ++++++++ + 1 file changed, 8 insertions(+) + +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4573,6 +4573,14 @@ restart: + } + + retry: ++ /* ++ * Deal with possible cpuset update races or zonelist updates to avoid ++ * infinite retries. ++ */ ++ if (check_retry_cpuset(cpuset_mems_cookie, ac) || ++ check_retry_zonelist(zonelist_iter_cookie)) ++ goto restart; ++ + /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ + if (alloc_flags & ALLOC_KSWAPD) + wake_all_kswapds(order, gfp_mask, ac); diff --git a/queue-5.4/series b/queue-5.4/series index 353246a157..f612fd0372 100644 --- a/queue-5.4/series +++ b/queue-5.4/series @@ -178,3 +178,5 @@ can-bcm-add-missing-rcu-read-protection-for-procfs-content.patch alsa-pcm-fix-race-of-buffer-access-at-pcm-oss-layer.patch llc-fix-data-loss-when-reading-from-a-socket-in-llc_ui_recvmsg.patch drm-edid-fixed-the-bug-that-hdr-metadata-was-not-reset.patch +memcg-always-call-cond_resched-after-fn.patch +mm-page_alloc.c-avoid-infinite-retries-caused-by-cpuset-race.patch