From: Greg Kroah-Hartman Date: Tue, 24 Jul 2012 22:52:05 +0000 (-0700) Subject: 3.0-stable patches X-Git-Tag: v3.4.7~12 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=502375cfb129a17d02ef3973d65807d388e3cd43;p=thirdparty%2Fkernel%2Fstable-queue.git 3.0-stable patches added patches: mm-reduce-the-amount-of-work-done-when-updating-min_free_kbytes.patch mm-vmscan-fix-force-scanning-small-targets-without-swap.patch vmscan-add-shrink_slab-tracepoints.patch vmscan-clear-zone_congested-for-zone-with-good-watermark.patch --- diff --git a/queue-3.0/mm-reduce-the-amount-of-work-done-when-updating-min_free_kbytes.patch b/queue-3.0/mm-reduce-the-amount-of-work-done-when-updating-min_free_kbytes.patch new file mode 100644 index 00000000000..41cdd135682 --- /dev/null +++ b/queue-3.0/mm-reduce-the-amount-of-work-done-when-updating-min_free_kbytes.patch @@ -0,0 +1,86 @@ +From 938929f14cb595f43cd1a4e63e22d36cab1e4a1f Mon Sep 17 00:00:00 2001 +From: Mel Gorman +Date: Tue, 10 Jan 2012 15:07:14 -0800 +Subject: mm: reduce the amount of work done when updating min_free_kbytes + +From: Mel Gorman + +commit 938929f14cb595f43cd1a4e63e22d36cab1e4a1f upstream. + +Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=726210 . + Large machines with 1TB or more of RAM take a long time to boot + without this patch and may spew out soft lockup warnings. + +When min_free_kbytes is updated, some pageblocks are marked +MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early +in boot but on large machines with 1TB of memory, this has been reported +to delay boot times, probably due to the NUMA distances involved. + +The bulk of the work is due to calling calling pageblock_is_reserved() an +unnecessary amount of times and accessing far more struct page metadata +than is necessary. This patch significantly reduces the amount of work +done by setup_zone_migrate_reserve() improving boot times on 1TB machines. + +[akpm@linux-foundation.org: coding-style fixes] +Signed-off-by: Mel Gorman +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Mel Gorman +Signed-off-by: Greg Kroah-Hartman + +--- + mm/page_alloc.c | 40 ++++++++++++++++++++++++---------------- + 1 file changed, 24 insertions(+), 16 deletions(-) + +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -3418,25 +3418,33 @@ static void setup_zone_migrate_reserve(s + if (page_to_nid(page) != zone_to_nid(zone)) + continue; + +- /* Blocks with reserved pages will never free, skip them. */ +- block_end_pfn = min(pfn + pageblock_nr_pages, end_pfn); +- if (pageblock_is_reserved(pfn, block_end_pfn)) +- continue; +- + block_migratetype = get_pageblock_migratetype(page); + +- /* If this block is reserved, account for it */ +- if (reserve > 0 && block_migratetype == MIGRATE_RESERVE) { +- reserve--; +- continue; +- } ++ /* Only test what is necessary when the reserves are not met */ ++ if (reserve > 0) { ++ /* ++ * Blocks with reserved pages will never free, skip ++ * them. ++ */ ++ block_end_pfn = min(pfn + pageblock_nr_pages, end_pfn); ++ if (pageblock_is_reserved(pfn, block_end_pfn)) ++ continue; + +- /* Suitable for reserving if this block is movable */ +- if (reserve > 0 && block_migratetype == MIGRATE_MOVABLE) { +- set_pageblock_migratetype(page, MIGRATE_RESERVE); +- move_freepages_block(zone, page, MIGRATE_RESERVE); +- reserve--; +- continue; ++ /* If this block is reserved, account for it */ ++ if (block_migratetype == MIGRATE_RESERVE) { ++ reserve--; ++ continue; ++ } ++ ++ /* Suitable for reserving if this block is movable */ ++ if (block_migratetype == MIGRATE_MOVABLE) { ++ set_pageblock_migratetype(page, ++ MIGRATE_RESERVE); ++ move_freepages_block(zone, page, ++ MIGRATE_RESERVE); ++ reserve--; ++ continue; ++ } + } + + /* diff --git a/queue-3.0/mm-vmscan-fix-force-scanning-small-targets-without-swap.patch b/queue-3.0/mm-vmscan-fix-force-scanning-small-targets-without-swap.patch new file mode 100644 index 00000000000..81fa228fbc1 --- /dev/null +++ b/queue-3.0/mm-vmscan-fix-force-scanning-small-targets-without-swap.patch @@ -0,0 +1,86 @@ +From a4d3e9e76337059406fcf3ead288c0df22a790e9 Mon Sep 17 00:00:00 2001 +From: Johannes Weiner +Date: Wed, 14 Sep 2011 16:21:52 -0700 +Subject: mm: vmscan: fix force-scanning small targets without swap + +From: Johannes Weiner + +commit a4d3e9e76337059406fcf3ead288c0df22a790e9 upstream. + +Stable note: Not tracked in Bugzilla. This patch augments an earlier commit + that avoids scanning priority being artificially raised. The older + fix was particularly important for small memcgs to avoid calling + wait_iff_congested() unnecessarily. + +Without swap, anonymous pages are not scanned. As such, they should not +count when considering force-scanning a small target if there is no swap. + +Otherwise, targets are not force-scanned even when their effective scan +number is zero and the other conditions--kswapd/memcg--apply. + +This fixes 246e87a93934 ("memcg: fix get_scan_count() for small +targets"). + +[akpm@linux-foundation.org: fix comment] +Signed-off-by: Johannes Weiner +Acked-by: KAMEZAWA Hiroyuki +Reviewed-by: Michal Hocko +Cc: Ying Han +Cc: Balbir Singh +Cc: KOSAKI Motohiro +Cc: Daisuke Nishimura +Acked-by: Mel Gorman +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Mel Gorman +Signed-off-by: Greg Kroah-Hartman + +--- + mm/vmscan.c | 27 ++++++++++++--------------- + 1 file changed, 12 insertions(+), 15 deletions(-) + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -1747,23 +1747,15 @@ static void get_scan_count(struct zone * + u64 fraction[2], denominator; + enum lru_list l; + int noswap = 0; +- int force_scan = 0; ++ bool force_scan = false; + unsigned long nr_force_scan[2]; + +- +- anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + +- zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON); +- file = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) + +- zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE); +- +- if (((anon + file) >> priority) < SWAP_CLUSTER_MAX) { +- /* kswapd does zone balancing and need to scan this zone */ +- if (scanning_global_lru(sc) && current_is_kswapd()) +- force_scan = 1; +- /* memcg may have small limit and need to avoid priority drop */ +- if (!scanning_global_lru(sc)) +- force_scan = 1; +- } ++ /* kswapd does zone balancing and needs to scan this zone */ ++ if (scanning_global_lru(sc) && current_is_kswapd()) ++ force_scan = true; ++ /* memcg may have small limit and need to avoid priority drop */ ++ if (!scanning_global_lru(sc)) ++ force_scan = true; + + /* If we have no swap space, do not bother scanning anon pages. */ + if (!sc->may_swap || (nr_swap_pages <= 0)) { +@@ -1776,6 +1768,11 @@ static void get_scan_count(struct zone * + goto out; + } + ++ anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + ++ zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON); ++ file = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) + ++ zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE); ++ + if (scanning_global_lru(sc)) { + free = zone_page_state(zone, NR_FREE_PAGES); + /* If we have very few page cache pages, diff --git a/queue-3.0/series b/queue-3.0/series index 42d24d646bb..cf7cced32c7 100644 --- a/queue-3.0/series +++ b/queue-3.0/series @@ -6,3 +6,7 @@ ubifs-fix-a-bug-in-empty-space-fix-up.patch dm-raid1-fix-crash-with-mirror-recovery-and-discard.patch mm-vmstat.c-cache-align-vm_stat.patch mm-memory-hotplug-check-if-pages-are-correctly-reserved-on-a-per-section-basis.patch +mm-reduce-the-amount-of-work-done-when-updating-min_free_kbytes.patch +mm-vmscan-fix-force-scanning-small-targets-without-swap.patch +vmscan-clear-zone_congested-for-zone-with-good-watermark.patch +vmscan-add-shrink_slab-tracepoints.patch diff --git a/queue-3.0/vmscan-add-shrink_slab-tracepoints.patch b/queue-3.0/vmscan-add-shrink_slab-tracepoints.patch new file mode 100644 index 00000000000..a114952f59c --- /dev/null +++ b/queue-3.0/vmscan-add-shrink_slab-tracepoints.patch @@ -0,0 +1,147 @@ +From 095760730c1047c69159ce88021a7fa3833502c8 Mon Sep 17 00:00:00 2001 +From: Dave Chinner +Date: Fri, 8 Jul 2011 14:14:34 +1000 +Subject: vmscan: add shrink_slab tracepoints + +From: Dave Chinner + +commit 095760730c1047c69159ce88021a7fa3833502c8 upstream. + +Stable note: This patch makes later patches easier to apply but otherwise + has little to justify it. It is a diagnostic patch that was part + of a series addressing excessive slab shrinking after GFP_NOFS + failures. There is detailed information on the series' motivation + at https://lkml.org/lkml/2011/6/2/42 . + +It is impossible to understand what the shrinkers are actually doing +without instrumenting the code, so add a some tracepoints to allow +insight to be gained. + +Signed-off-by: Dave Chinner +Signed-off-by: Al Viro +Signed-off-by: Mel Gorman + +--- + include/trace/events/vmscan.h | 77 ++++++++++++++++++++++++++++++++++++++++++ + mm/vmscan.c | 8 +++- + 2 files changed, 84 insertions(+), 1 deletion(-) + +--- a/include/trace/events/vmscan.h ++++ b/include/trace/events/vmscan.h +@@ -179,6 +179,83 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_en + TP_ARGS(nr_reclaimed) + ); + ++TRACE_EVENT(mm_shrink_slab_start, ++ TP_PROTO(struct shrinker *shr, struct shrink_control *sc, ++ long nr_objects_to_shrink, unsigned long pgs_scanned, ++ unsigned long lru_pgs, unsigned long cache_items, ++ unsigned long long delta, unsigned long total_scan), ++ ++ TP_ARGS(shr, sc, nr_objects_to_shrink, pgs_scanned, lru_pgs, ++ cache_items, delta, total_scan), ++ ++ TP_STRUCT__entry( ++ __field(struct shrinker *, shr) ++ __field(void *, shrink) ++ __field(long, nr_objects_to_shrink) ++ __field(gfp_t, gfp_flags) ++ __field(unsigned long, pgs_scanned) ++ __field(unsigned long, lru_pgs) ++ __field(unsigned long, cache_items) ++ __field(unsigned long long, delta) ++ __field(unsigned long, total_scan) ++ ), ++ ++ TP_fast_assign( ++ __entry->shr = shr; ++ __entry->shrink = shr->shrink; ++ __entry->nr_objects_to_shrink = nr_objects_to_shrink; ++ __entry->gfp_flags = sc->gfp_mask; ++ __entry->pgs_scanned = pgs_scanned; ++ __entry->lru_pgs = lru_pgs; ++ __entry->cache_items = cache_items; ++ __entry->delta = delta; ++ __entry->total_scan = total_scan; ++ ), ++ ++ TP_printk("%pF %p: objects to shrink %ld gfp_flags %s pgs_scanned %ld lru_pgs %ld cache items %ld delta %lld total_scan %ld", ++ __entry->shrink, ++ __entry->shr, ++ __entry->nr_objects_to_shrink, ++ show_gfp_flags(__entry->gfp_flags), ++ __entry->pgs_scanned, ++ __entry->lru_pgs, ++ __entry->cache_items, ++ __entry->delta, ++ __entry->total_scan) ++); ++ ++TRACE_EVENT(mm_shrink_slab_end, ++ TP_PROTO(struct shrinker *shr, int shrinker_retval, ++ long unused_scan_cnt, long new_scan_cnt), ++ ++ TP_ARGS(shr, shrinker_retval, unused_scan_cnt, new_scan_cnt), ++ ++ TP_STRUCT__entry( ++ __field(struct shrinker *, shr) ++ __field(void *, shrink) ++ __field(long, unused_scan) ++ __field(long, new_scan) ++ __field(int, retval) ++ __field(long, total_scan) ++ ), ++ ++ TP_fast_assign( ++ __entry->shr = shr; ++ __entry->shrink = shr->shrink; ++ __entry->unused_scan = unused_scan_cnt; ++ __entry->new_scan = new_scan_cnt; ++ __entry->retval = shrinker_retval; ++ __entry->total_scan = new_scan_cnt - unused_scan_cnt; ++ ), ++ ++ TP_printk("%pF %p: unused scan count %ld new scan count %ld total_scan %ld last shrinker return val %d", ++ __entry->shrink, ++ __entry->shr, ++ __entry->unused_scan, ++ __entry->new_scan, ++ __entry->total_scan, ++ __entry->retval) ++); + + DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template, + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -250,6 +250,7 @@ unsigned long shrink_slab(struct shrink_ + unsigned long long delta; + unsigned long total_scan; + unsigned long max_pass; ++ int shrink_ret = 0; + + max_pass = do_shrinker_shrink(shrinker, shrink, 0); + delta = (4 * nr_pages_scanned) / shrinker->seeks; +@@ -274,9 +275,12 @@ unsigned long shrink_slab(struct shrink_ + total_scan = shrinker->nr; + shrinker->nr = 0; + ++ trace_mm_shrink_slab_start(shrinker, shrink, total_scan, ++ nr_pages_scanned, lru_pages, ++ max_pass, delta, total_scan); ++ + while (total_scan >= SHRINK_BATCH) { + long this_scan = SHRINK_BATCH; +- int shrink_ret; + int nr_before; + + nr_before = do_shrinker_shrink(shrinker, shrink, 0); +@@ -293,6 +297,8 @@ unsigned long shrink_slab(struct shrink_ + } + + shrinker->nr += total_scan; ++ trace_mm_shrink_slab_end(shrinker, shrink_ret, total_scan, ++ shrinker->nr); + } + up_read(&shrinker_rwsem); + out: diff --git a/queue-3.0/vmscan-clear-zone_congested-for-zone-with-good-watermark.patch b/queue-3.0/vmscan-clear-zone_congested-for-zone-with-good-watermark.patch new file mode 100644 index 00000000000..5d493165df0 --- /dev/null +++ b/queue-3.0/vmscan-clear-zone_congested-for-zone-with-good-watermark.patch @@ -0,0 +1,72 @@ +From 439423f6894aa0dec22187526827456f5004baed Mon Sep 17 00:00:00 2001 +From: Shaohua Li +Date: Thu, 25 Aug 2011 15:59:12 -0700 +Subject: vmscan: clear ZONE_CONGESTED for zone with good watermark + +From: Shaohua Li + +commit 439423f6894aa0dec22187526827456f5004baed upstream. + +Stable note: Not tracked in Bugzilla. kswapd is responsible for clearing + ZONE_CONGESTED after it balances a zone and this patch fixes a bug + where that was failing to happen. Without this patch, processes + can stall in wait_iff_congested unnecessarily. For users, this can + look like an interactivity stall but some workloads would see it + as sudden drop in throughput. + +ZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any +task. It's possible ZONE_CONGESTED isn't cleared in some cases: + + 1. the zone is already balanced just entering balance_pgdat() for + order-0 because concurrent tasks free memory. In this case, later + check will skip the zone as it's balanced so the flag isn't cleared. + + 2. high order balance fallbacks to order-0. quote from Mel: At the + end of balance_pgdat(), kswapd uses the following logic; + + If reclaiming at high order { + for each zone { + if all_unreclaimable + skip + if watermark is not met + order = 0 + loop again + + /* watermark is met */ + clear congested + } + } + + i.e. it clears ZONE_CONGESTED if it the zone is balanced. if not, + it restarts balancing at order-0. However, if the higher zones are + balanced for order-0, kswapd will miss clearing ZONE_CONGESTED as + that only happens after a zone is shrunk. This can mean that + wait_iff_congested() stalls unnecessarily. + +This patch makes kswapd clear ZONE_CONGESTED during its initial +highmem->dma scan for zones that are already balanced. + +Signed-off-by: Shaohua Li +Acked-by: Mel Gorman +Reviewed-by: Minchan Kim +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Mel Gorman +Signed-off-by: Greg Kroah-Hartman + +--- + mm/vmscan.c | 3 +++ + 1 file changed, 3 insertions(+) + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -2456,6 +2456,9 @@ loop_again: + high_wmark_pages(zone), 0, 0)) { + end_zone = i; + break; ++ } else { ++ /* If balanced, clear the congested flag */ ++ zone_clear_flag(zone, ZONE_CONGESTED); + } + } + if (i < 0)