]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob
c406d0fbda7bbad1863cfa02998a5333b42914c9
[thirdparty/kernel/stable-queue.git] /
1 From 90afa5de6f3fa89a733861e843377302479fcf7e Mon Sep 17 00:00:00 2001
2 From: Mel Gorman <mel@csn.ul.ie>
3 Date: Tue, 16 Jun 2009 15:33:20 -0700
4 Subject: vmscan: properly account for the number of page cache pages zone_reclaim() can reclaim
5
6 From: Mel Gorman <mel@csn.ul.ie>
7
8 commit 90afa5de6f3fa89a733861e843377302479fcf7e upstream.
9
10 A bug was brought to my attention against a distro kernel but it affects
11 mainline and I believe problems like this have been reported in various
12 guises on the mailing lists although I don't have specific examples at the
13 moment.
14
15 The reported problem was that malloc() stalled for a long time (minutes in
16 some cases) if a large tmpfs mount was occupying a large percentage of
17 memory overall. The pages did not get cleaned or reclaimed by
18 zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists
19 are uselessly scanned frequencly making the CPU spin at near 100%.
20
21 This patchset intends to address that bug and bring the behaviour of
22 zone_reclaim() more in line with expectations which were noticed during
23 investigation. It is based on top of mmotm and takes advantage of
24 Kosaki's work with respect to zone_reclaim().
25
26 Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the
27 scan should go ahead. The broken heuristic is what was causing the
28 malloc() stall as it uselessly scanned the LRU constantly. Currently,
29 zone_reclaim is assuming zone_reclaim_mode is 1 and historically it
30 could not deal with tmpfs pages at all. This fixes up the heuristic so
31 that an unnecessary scan is more likely to be correctly avoided.
32
33 Patch 2 notes that zone_reclaim() returning a failure automatically means
34 the zone is marked full. This is not always true. It could have
35 failed because the GFP mask or zone_reclaim_mode were unsuitable.
36
37 Patch 3 introduces a counter zreclaim_failed that will increment each
38 time the zone_reclaim scan-avoidance heuristics fail. If that
39 counter is rapidly increasing, then zone_reclaim_mode should be
40 set to 0 as a temporarily resolution and a bug reported because
41 the scan-avoidance heuristic is still broken.
42
43 This patch:
44
45 On NUMA machines, the administrator can configure zone_reclaim_mode that
46 is a more targetted form of direct reclaim. On machines with large NUMA
47 distances for example, a zone_reclaim_mode defaults to 1 meaning that
48 clean unmapped pages will be reclaimed if the zone watermarks are not
49 being met.
50
51 There is a heuristic that determines if the scan is worthwhile but the
52 problem is that the heuristic is not being properly applied and is
53 basically assuming zone_reclaim_mode is 1 if it is enabled. The lack of
54 proper detection can manfiest as high CPU usage as the LRU list is scanned
55 uselessly.
56
57 Historically, once enabled it was depending on NR_FILE_PAGES which may
58 include swapcache pages that the reclaim_mode cannot deal with. Patch
59 vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by
60 Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included
61 pages that were not file-backed such as swapcache and made a calculation
62 based on the inactive, active and mapped files. This is far superior when
63 zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a
64 reasonable starting figure.
65
66 This patch alters how zone_reclaim() works out how many pages it might be
67 able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set in
68 the reclaim_mode it will either consider NR_FILE_PAGES as potential
69 candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount
70 swapcache and other non-file-backed pages. If RECLAIM_WRITE is not set,
71 then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is
72 not set, then NR_FILE_MAPPED are not.
73
74 [kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages]
75 [fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate]
76 Signed-off-by: Mel Gorman <mel@csn.ul.ie>
77 Reviewed-by: Rik van Riel <riel@redhat.com>
78 Acked-by: Christoph Lameter <cl@linux-foundation.org>
79 Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
80 Cc: Wu Fengguang <fengguang.wu@intel.com>
81 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
82 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
83 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
84
85 ---
86 Documentation/sysctl/vm.txt | 12 ++++++----
87 mm/vmscan.c | 52 ++++++++++++++++++++++++++++++++++++++------
88 2 files changed, 53 insertions(+), 11 deletions(-)
89
90 --- a/Documentation/sysctl/vm.txt
91 +++ b/Documentation/sysctl/vm.txt
92 @@ -314,10 +314,14 @@ min_unmapped_ratio:
93
94 This is available only on NUMA kernels.
95
96 -A percentage of the total pages in each zone. Zone reclaim will only
97 -occur if more than this percentage of pages are file backed and unmapped.
98 -This is to insure that a minimal amount of local pages is still available for
99 -file I/O even if the node is overallocated.
100 +This is a percentage of the total pages in each zone. Zone reclaim will
101 +only occur if more than this percentage of pages are in a state that
102 +zone_reclaim_mode allows to be reclaimed.
103 +
104 +If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
105 +against all file-backed unmapped pages including swapcache pages and tmpfs
106 +files. Otherwise, only unmapped pages backed by normal files but not tmpfs
107 +files and similar are considered.
108
109 The default is 1 percent.
110
111 --- a/mm/vmscan.c
112 +++ b/mm/vmscan.c
113 @@ -2290,6 +2290,48 @@ int sysctl_min_unmapped_ratio = 1;
114 */
115 int sysctl_min_slab_ratio = 5;
116
117 +static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
118 +{
119 + unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
120 + unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
121 + zone_page_state(zone, NR_ACTIVE_FILE);
122 +
123 + /*
124 + * It's possible for there to be more file mapped pages than
125 + * accounted for by the pages on the file LRU lists because
126 + * tmpfs pages accounted for as ANON can also be FILE_MAPPED
127 + */
128 + return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
129 +}
130 +
131 +/* Work out how many page cache pages we can reclaim in this reclaim_mode */
132 +static long zone_pagecache_reclaimable(struct zone *zone)
133 +{
134 + long nr_pagecache_reclaimable;
135 + long delta = 0;
136 +
137 + /*
138 + * If RECLAIM_SWAP is set, then all file pages are considered
139 + * potentially reclaimable. Otherwise, we have to worry about
140 + * pages like swapcache and zone_unmapped_file_pages() provides
141 + * a better estimate
142 + */
143 + if (zone_reclaim_mode & RECLAIM_SWAP)
144 + nr_pagecache_reclaimable = zone_page_state(zone, NR_FILE_PAGES);
145 + else
146 + nr_pagecache_reclaimable = zone_unmapped_file_pages(zone);
147 +
148 + /* If we can't clean pages, remove dirty pages from consideration */
149 + if (!(zone_reclaim_mode & RECLAIM_WRITE))
150 + delta += zone_page_state(zone, NR_FILE_DIRTY);
151 +
152 + /* Watch for any possible underflows due to delta */
153 + if (unlikely(delta > nr_pagecache_reclaimable))
154 + delta = nr_pagecache_reclaimable;
155 +
156 + return nr_pagecache_reclaimable - delta;
157 +}
158 +
159 /*
160 * Try to free up some pages from this zone through reclaim.
161 */
162 @@ -2324,9 +2366,7 @@ static int __zone_reclaim(struct zone *z
163 reclaim_state.reclaimed_slab = 0;
164 p->reclaim_state = &reclaim_state;
165
166 - if (zone_page_state(zone, NR_FILE_PAGES) -
167 - zone_page_state(zone, NR_FILE_MAPPED) >
168 - zone->min_unmapped_pages) {
169 + if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
170 /*
171 * Free memory by calling shrink zone with increasing
172 * priorities until we have enough memory freed.
173 @@ -2384,10 +2424,8 @@ int zone_reclaim(struct zone *zone, gfp_
174 * if less than a specified percentage of the zone is used by
175 * unmapped file backed pages.
176 */
177 - if (zone_page_state(zone, NR_FILE_PAGES) -
178 - zone_page_state(zone, NR_FILE_MAPPED) <= zone->min_unmapped_pages
179 - && zone_page_state(zone, NR_SLAB_RECLAIMABLE)
180 - <= zone->min_slab_pages)
181 + if (zone_pagecache_reclaimable(zone) <= zone->min_unmapped_pages &&
182 + zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
183 return 0;
184
185 if (zone_is_all_unreclaimable(zone))