]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob
0990714020e0fc29e44e78cff44d94e806f7f9b6
[thirdparty/kernel/stable-queue.git] /
1 From eae116d1f0449ade3269ca47a67432622f5c6438 Mon Sep 17 00:00:00 2001
2 From: Gabriel Krisman Bertazi <krisman@suse.de>
3 Date: Tue, 25 Feb 2025 22:22:58 -0500
4 Subject: Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
5
6 From: Gabriel Krisman Bertazi <krisman@suse.de>
7
8 commit eae116d1f0449ade3269ca47a67432622f5c6438 upstream.
9
10 Commit 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's
11 ->lowmem_reserve[] for empty zone") removes the protection of lower zones
12 from allocations targeting memory-less high zones. This had an unintended
13 impact on the pattern of reclaims because it makes the high-zone-targeted
14 allocation more likely to succeed in lower zones, which adds pressure to
15 said zones. I.e, the following corresponding checks in
16 zone_watermark_ok/zone_watermark_fast are less likely to trigger:
17
18 if (free_pages <= min + z->lowmem_reserve[highest_zoneidx])
19 return false;
20
21 As a result, we are observing an increase in reclaim and kswapd scans, due
22 to the increased pressure. This was initially observed as increased
23 latency in filesystem operations when benchmarking with fio on a machine
24 with some memory-less zones, but it has since been associated with
25 increased contention in locks related to memory reclaim. By reverting
26 this patch, the original performance was recovered on that machine.
27
28 The original commit was introduced as a clarification of the
29 /proc/zoneinfo output, so it doesn't seem there are usecases depending on
30 it, making the revert a simple solution.
31
32 For reference, I collected vmstat with and without this patch on a freshly
33 booted system running intensive randread io from an nvme for 5 minutes. I
34 got:
35
36 rpm-6.12.0-slfo.1.2 -> pgscan_kswapd 5629543865
37 Patched -> pgscan_kswapd 33580844
38
39 33M scans is similar to what we had in kernels predating this patch.
40 These numbers is fairly representative of the workload on this machine, as
41 measured in several runs. So we are talking about a 2-order of magnitude
42 increase.
43
44 Link: https://lkml.kernel.org/r/20250226032258.234099-1-krisman@suse.de
45 Fixes: 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone")
46 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
47 Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
48 Acked-by: Michal Hocko <mhocko@suse.com>
49 Acked-by: Mel Gorman <mgorman@suse.de>
50 Cc: Baoquan He <bhe@redhat.com>
51 Cc: <stable@vger.kernel.org>
52 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
53 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
54 ---
55 mm/page_alloc.c | 3 +--
56 1 file changed, 1 insertion(+), 2 deletions(-)
57
58 --- a/mm/page_alloc.c
59 +++ b/mm/page_alloc.c
60 @@ -5991,11 +5991,10 @@ static void setup_per_zone_lowmem_reserv
61
62 for (j = i + 1; j < MAX_NR_ZONES; j++) {
63 struct zone *upper_zone = &pgdat->node_zones[j];
64 - bool empty = !zone_managed_pages(upper_zone);
65
66 managed_pages += zone_managed_pages(upper_zone);
67
68 - if (clear || empty)
69 + if (clear)
70 zone->lowmem_reserve[j] = 0;
71 else
72 zone->lowmem_reserve[j] = managed_pages / ratio;