]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob
11038ed7e38f43ec31b0c284adada98cefd2d891
[thirdparty/kernel/stable-queue.git] /
1 From 6aaced5abd32e2a57cd94fd64f824514d0361da8 Mon Sep 17 00:00:00 2001
2 From: Seiji Nishikawa <snishika@redhat.com>
3 Date: Sun, 1 Dec 2024 01:12:34 +0900
4 Subject: mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
5
6 From: Seiji Nishikawa <snishika@redhat.com>
7
8 commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 upstream.
9
10 The task sometimes continues looping in throttle_direct_reclaim() because
11 allow_direct_reclaim(pgdat) keeps returning false.
12
13 #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
14 #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
15 #2 [ffff80002cb6f990] schedule at ffff800008abc50c
16 #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
17 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
18 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
19 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
20 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
21 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
22 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4
23
24 At this point, the pgdat contains the following two zones:
25
26 NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32"
27 SIZE: 20480 MIN/LOW/HIGH: 11/28/45
28 VM_STAT:
29 NR_FREE_PAGES: 359
30 NR_ZONE_INACTIVE_ANON: 18813
31 NR_ZONE_ACTIVE_ANON: 0
32 NR_ZONE_INACTIVE_FILE: 50
33 NR_ZONE_ACTIVE_FILE: 0
34 NR_ZONE_UNEVICTABLE: 0
35 NR_ZONE_WRITE_PENDING: 0
36 NR_MLOCK: 0
37 NR_BOUNCE: 0
38 NR_ZSPAGES: 0
39 NR_FREE_CMA_PAGES: 0
40
41 NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal"
42 SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264
43 VM_STAT:
44 NR_FREE_PAGES: 146
45 NR_ZONE_INACTIVE_ANON: 94668
46 NR_ZONE_ACTIVE_ANON: 3
47 NR_ZONE_INACTIVE_FILE: 735
48 NR_ZONE_ACTIVE_FILE: 78
49 NR_ZONE_UNEVICTABLE: 0
50 NR_ZONE_WRITE_PENDING: 0
51 NR_MLOCK: 0
52 NR_BOUNCE: 0
53 NR_ZSPAGES: 0
54 NR_FREE_CMA_PAGES: 0
55
56 In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
57 inactive/active file-backed pages calculated in zone_reclaimable_pages()
58 based on the result of zone_page_state_snapshot() is zero.
59
60 Additionally, since this system lacks swap, the calculation of inactive/
61 active anonymous pages is skipped.
62
63 crash> p nr_swap_pages
64 nr_swap_pages = $1937 = {
65 counter = 0
66 }
67
68 As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
69 the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
70 free pages significantly exceeding the high watermark.
71
72 The problem is that the pgdat->kswapd_failures hasn't been incremented.
73
74 crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
75 $1935 = 0x0
76
77 This is because the node deemed balanced. The node balancing logic in
78 balance_pgdat() evaluates all zones collectively. If one or more zones
79 (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
80 entire node is deemed balanced. This causes balance_pgdat() to exit early
81 before incrementing the kswapd_failures, as it considers the overall
82 memory state acceptable, even though some zones (like ZONE_NORMAL) remain
83 under significant pressure.
84
85
86 The patch ensures that zone_reclaimable_pages() includes free pages
87 (NR_FREE_PAGES) in its calculation when no other reclaimable pages are
88 available (e.g., file-backed or anonymous pages). This change prevents
89 zones like ZONE_DMA32, which have sufficient free pages, from being
90 mistakenly deemed unreclaimable. By doing so, the patch ensures proper
91 node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
92 and prevents infinite loops in throttle_direct_reclaim() caused by
93 allow_direct_reclaim(pgdat) repeatedly returning false.
94
95
96 The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
97 by a node being incorrectly deemed balanced despite pressure in certain
98 zones, such as ZONE_NORMAL. This issue arises from
99 zone_reclaimable_pages() returning 0 for zones without reclaimable file-
100 backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
101 free pages to be skipped.
102
103 The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
104 during reclaim, masking pressure in other zones. Consequently,
105 pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
106 mechanisms in allow_direct_reclaim() from being triggered, leading to an
107 infinite loop in throttle_direct_reclaim().
108
109 This patch modifies zone_reclaimable_pages() to account for free pages
110 (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones
111 with sufficient free pages are not skipped, enabling proper balancing and
112 reclaim behavior.
113
114 [akpm@linux-foundation.org: coding-style cleanups]
115 Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com
116 Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com
117 Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations")
118 Signed-off-by: Seiji Nishikawa <snishika@redhat.com>
119 Cc: Mel Gorman <mgorman@techsingularity.net>
120 Cc: <stable@vger.kernel.org>
121 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
122 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
123 ---
124 mm/vmscan.c | 9 ++++++++-
125 1 file changed, 8 insertions(+), 1 deletion(-)
126
127 --- a/mm/vmscan.c
128 +++ b/mm/vmscan.c
129 @@ -374,7 +374,14 @@ unsigned long zone_reclaimable_pages(str
130 if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
131 nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
132 zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
133 -
134 + /*
135 + * If there are no reclaimable file-backed or anonymous pages,
136 + * ensure zones with sufficient free pages are not skipped.
137 + * This prevents zones like DMA32 from being ignored in reclaim
138 + * scenarios where they can still help alleviate memory pressure.
139 + */
140 + if (nr == 0)
141 + nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
142 return nr;
143 }
144