From: Greg Kroah-Hartman Date: Wed, 11 Mar 2015 13:49:34 +0000 (+0100) Subject: 3.19-stable patches X-Git-Tag: v3.10.72~46 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=13a0632aaf4404f19cd506b210e7b2845245a970;p=thirdparty%2Fkernel%2Fstable-queue.git 3.19-stable patches added patches: mm-compaction-fix-wrong-order-check-in-compact_finished.patch mm-fix-negative-nr_isolated-counts.patch mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch mm-hugetlb-remove-unnecessary-lower-bound-on-sysctl-handlers.patch mm-hwpoison-drop-lru_add_drain_all-in-__soft_offline_page.patch mm-memory.c-actually-remap-enough-memory.patch mm-mmap.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch mm-nommu-fix-memory-leak.patch mm-nommu.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch mm-page_alloc-revert-inadvertent-__gfp_fs-retry-behavior-change.patch mm-when-stealing-freepages-also-take-pages-created-by-splitting-buddy-page.patch --- diff --git a/queue-3.19/mm-compaction-fix-wrong-order-check-in-compact_finished.patch b/queue-3.19/mm-compaction-fix-wrong-order-check-in-compact_finished.patch new file mode 100644 index 00000000000..367b795a8dc --- /dev/null +++ b/queue-3.19/mm-compaction-fix-wrong-order-check-in-compact_finished.patch @@ -0,0 +1,60 @@ +From 372549c2a3778fd3df445819811c944ad54609ca Mon Sep 17 00:00:00 2001 +From: Joonsoo Kim +Date: Thu, 12 Feb 2015 14:59:50 -0800 +Subject: mm/compaction: fix wrong order check in compact_finished() + +From: Joonsoo Kim + +commit 372549c2a3778fd3df445819811c944ad54609ca upstream. + +What we want to check here is whether there is highorder freepage in buddy +list of other migratetype in order to steal it without fragmentation. +But, current code just checks cc->order which means allocation request +order. So, this is wrong. + +Without this fix, non-movable synchronous compaction below pageblock order +would not stopped until compaction is complete, because migratetype of +most pageblocks are movable and high order freepage made by compaction is +usually on movable type buddy list. + +There is some report related to this bug. See below link. + + http://www.spinics.net/lists/linux-mm/msg81666.html + +Although the issued system still has load spike comes from compaction, +this makes that system completely stable and responsive according to his +report. + +stress-highalloc test in mmtests with non movable order 7 allocation +doesn't show any notable difference in allocation success rate, but, it +shows more compaction success rate. + +Compaction success rate (Compaction success * 100 / Compaction stalls, %) +18.47 : 28.94 + +Fixes: 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available") +Signed-off-by: Joonsoo Kim +Acked-by: Vlastimil Babka +Reviewed-by: Zhang Yanfei +Cc: Mel Gorman +Cc: David Rientjes +Cc: Rik van Riel +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/compaction.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/compaction.c ++++ b/mm/compaction.c +@@ -1088,7 +1088,7 @@ static int compact_finished(struct zone + return COMPACT_PARTIAL; + + /* Job done if allocation would set block type */ +- if (cc->order >= pageblock_order && area->nr_free) ++ if (order >= pageblock_order && area->nr_free) + return COMPACT_PARTIAL; + } + diff --git a/queue-3.19/mm-fix-negative-nr_isolated-counts.patch b/queue-3.19/mm-fix-negative-nr_isolated-counts.patch new file mode 100644 index 00000000000..ca70497dfa5 --- /dev/null +++ b/queue-3.19/mm-fix-negative-nr_isolated-counts.patch @@ -0,0 +1,52 @@ +From ff59909a077b3c51c168cb658601c6b63136a347 Mon Sep 17 00:00:00 2001 +From: Hugh Dickins +Date: Thu, 12 Feb 2015 15:00:28 -0800 +Subject: mm: fix negative nr_isolated counts + +From: Hugh Dickins + +commit ff59909a077b3c51c168cb658601c6b63136a347 upstream. + +The vmstat interfaces are good at hiding negative counts (at least when +CONFIG_SMP); but if you peer behind the curtain, you find that +nr_isolated_anon and nr_isolated_file soon go negative, and grow ever +more negative: so they can absorb larger and larger numbers of isolated +pages, yet still appear to be zero. + +I'm happy to avoid a congestion_wait() when too_many_isolated() myself; +but I guess it's there for a good reason, in which case we ought to get +too_many_isolated() working again. + +The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case: +putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot +to call acct_isolated() to increment them. + +It is possible that the bug whcih this patch fixes could cause OOM kills +when the system still has a lot of reclaimable page cache. + +Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") +Signed-off-by: Hugh Dickins +Acked-by: Vlastimil Babka +Acked-by: Joonsoo Kim +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/compaction.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +--- a/mm/compaction.c ++++ b/mm/compaction.c +@@ -1015,8 +1015,10 @@ static isolate_migrate_t isolate_migrate + low_pfn = isolate_migratepages_block(cc, low_pfn, end_pfn, + isolate_mode); + +- if (!low_pfn || cc->contended) ++ if (!low_pfn || cc->contended) { ++ acct_isolated(zone, cc); + return ISOLATE_ABORT; ++ } + + /* + * Either we isolated something and proceed with migration. Or diff --git a/queue-3.19/mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch b/queue-3.19/mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch new file mode 100644 index 00000000000..472f35d4e83 --- /dev/null +++ b/queue-3.19/mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch @@ -0,0 +1,51 @@ +From 9fbc1f635fd0bd28cb32550211bf095753ac637a Mon Sep 17 00:00:00 2001 +From: Naoya Horiguchi +Date: Wed, 11 Feb 2015 15:25:32 -0800 +Subject: mm/hugetlb: add migration entry check in __unmap_hugepage_range + +From: Naoya Horiguchi + +commit 9fbc1f635fd0bd28cb32550211bf095753ac637a upstream. + +If __unmap_hugepage_range() tries to unmap the address range over which +hugepage migration is on the way, we get the wrong page because pte_page() +doesn't work for migration entries. This patch simply clears the pte for +migration entries as we do for hwpoison entries. + +Fixes: 290408d4a2 ("hugetlb: hugepage migration core") +Signed-off-by: Naoya Horiguchi +Cc: Hugh Dickins +Cc: James Hogan +Cc: David Rientjes +Cc: Mel Gorman +Cc: Johannes Weiner +Cc: Michal Hocko +Cc: Rik van Riel +Cc: Andrea Arcangeli +Cc: Luiz Capitulino +Cc: Nishanth Aravamudan +Cc: Lee Schermerhorn +Cc: Steve Capper +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/hugetlb.c | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -2657,9 +2657,10 @@ again: + goto unlock; + + /* +- * HWPoisoned hugepage is already unmapped and dropped reference ++ * Migrating hugepage or HWPoisoned hugepage is already ++ * unmapped and its refcount is dropped, so just clear pte here. + */ +- if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { ++ if (unlikely(!pte_present(pte))) { + huge_pte_clear(mm, address, ptep); + goto unlock; + } diff --git a/queue-3.19/mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch b/queue-3.19/mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch new file mode 100644 index 00000000000..6229a20b4cc --- /dev/null +++ b/queue-3.19/mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch @@ -0,0 +1,70 @@ +From a8bda28d87c38c6aa93de28ba5d30cc18e865a11 Mon Sep 17 00:00:00 2001 +From: Naoya Horiguchi +Date: Wed, 11 Feb 2015 15:25:28 -0800 +Subject: mm/hugetlb: add migration/hwpoisoned entry check in hugetlb_change_protection + +From: Naoya Horiguchi + +commit a8bda28d87c38c6aa93de28ba5d30cc18e865a11 upstream. + +There is a race condition between hugepage migration and +change_protection(), where hugetlb_change_protection() doesn't care about +migration entries and wrongly overwrites them. That causes unexpected +results like kernel crash. HWPoison entries also can cause the same +problem. + +This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this +function to do proper actions. + +Fixes: 290408d4a2 ("hugetlb: hugepage migration core") +Signed-off-by: Naoya Horiguchi +Cc: Hugh Dickins +Cc: James Hogan +Cc: David Rientjes +Cc: Mel Gorman +Cc: Johannes Weiner +Cc: Michal Hocko +Cc: Rik van Riel +Cc: Andrea Arcangeli +Cc: Luiz Capitulino +Cc: Nishanth Aravamudan +Cc: Lee Schermerhorn +Cc: Steve Capper +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/hugetlb.c | 21 ++++++++++++++++++++- + 1 file changed, 20 insertions(+), 1 deletion(-) + +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -3384,7 +3384,26 @@ unsigned long hugetlb_change_protection( + spin_unlock(ptl); + continue; + } +- if (!huge_pte_none(huge_ptep_get(ptep))) { ++ pte = huge_ptep_get(ptep); ++ if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { ++ spin_unlock(ptl); ++ continue; ++ } ++ if (unlikely(is_hugetlb_entry_migration(pte))) { ++ swp_entry_t entry = pte_to_swp_entry(pte); ++ ++ if (is_write_migration_entry(entry)) { ++ pte_t newpte; ++ ++ make_migration_entry_read(&entry); ++ newpte = swp_entry_to_pte(entry); ++ set_huge_pte_at(mm, address, ptep, newpte); ++ pages++; ++ } ++ spin_unlock(ptl); ++ continue; ++ } ++ if (!huge_pte_none(pte)) { + pte = huge_ptep_get_and_clear(mm, address, ptep); + pte = pte_mkhuge(huge_pte_modify(pte, newprot)); + pte = arch_make_huge_pte(pte, vma, NULL, 0); diff --git a/queue-3.19/mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch b/queue-3.19/mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch new file mode 100644 index 00000000000..9d1c30499d9 --- /dev/null +++ b/queue-3.19/mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch @@ -0,0 +1,146 @@ +From 0f792cf949a0be506c2aa8bfac0605746b146dda Mon Sep 17 00:00:00 2001 +From: Naoya Horiguchi +Date: Wed, 11 Feb 2015 15:25:25 -0800 +Subject: mm/hugetlb: fix getting refcount 0 page in hugetlb_fault() + +From: Naoya Horiguchi + +commit 0f792cf949a0be506c2aa8bfac0605746b146dda upstream. + +When running the test which causes the race as shown in the previous patch, +we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault(). + +This race happens when pte turns into migration entry just after the first +check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false. +To fix this, we need to check pte_present() again after huge_ptep_get(). + +This patch also reorders taking ptl and doing pte_page(), because +pte_page() should be done in ptl. Due to this reordering, we need use +trylock_page() in page != pagecache_page case to respect locking order. + +Fixes: 66aebce747ea ("hugetlb: fix race condition in hugetlb_fault()") +Signed-off-by: Naoya Horiguchi +Cc: Hugh Dickins +Cc: James Hogan +Cc: David Rientjes +Cc: Mel Gorman +Cc: Johannes Weiner +Cc: Michal Hocko +Cc: Rik van Riel +Cc: Andrea Arcangeli +Cc: Luiz Capitulino +Cc: Nishanth Aravamudan +Cc: Lee Schermerhorn +Cc: Steve Capper +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/hugetlb.c | 52 ++++++++++++++++++++++++++++++++++++---------------- + 1 file changed, 36 insertions(+), 16 deletions(-) + +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -3134,6 +3134,7 @@ int hugetlb_fault(struct mm_struct *mm, + struct page *pagecache_page = NULL; + struct hstate *h = hstate_vma(vma); + struct address_space *mapping; ++ int need_wait_lock = 0; + + address &= huge_page_mask(h); + +@@ -3172,6 +3173,16 @@ int hugetlb_fault(struct mm_struct *mm, + ret = 0; + + /* ++ * entry could be a migration/hwpoison entry at this point, so this ++ * check prevents the kernel from going below assuming that we have ++ * a active hugepage in pagecache. This goto expects the 2nd page fault, ++ * and is_hugetlb_entry_(migration|hwpoisoned) check will properly ++ * handle it. ++ */ ++ if (!pte_present(entry)) ++ goto out_mutex; ++ ++ /* + * If we are going to COW the mapping later, we examine the pending + * reservations for this page now. This will ensure that any + * allocations necessary to record that reservation occur outside the +@@ -3190,30 +3201,31 @@ int hugetlb_fault(struct mm_struct *mm, + vma, address); + } + ++ ptl = huge_pte_lock(h, mm, ptep); ++ ++ /* Check for a racing update before calling hugetlb_cow */ ++ if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) ++ goto out_ptl; ++ + /* + * hugetlb_cow() requires page locks of pte_page(entry) and + * pagecache_page, so here we need take the former one + * when page != pagecache_page or !pagecache_page. +- * Note that locking order is always pagecache_page -> page, +- * so no worry about deadlock. + */ + page = pte_page(entry); +- get_page(page); + if (page != pagecache_page) +- lock_page(page); +- +- ptl = huge_pte_lockptr(h, mm, ptep); +- spin_lock(ptl); +- /* Check for a racing update before calling hugetlb_cow */ +- if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) +- goto out_ptl; ++ if (!trylock_page(page)) { ++ need_wait_lock = 1; ++ goto out_ptl; ++ } + ++ get_page(page); + + if (flags & FAULT_FLAG_WRITE) { + if (!huge_pte_write(entry)) { + ret = hugetlb_cow(mm, vma, address, ptep, entry, + pagecache_page, ptl); +- goto out_ptl; ++ goto out_put_page; + } + entry = huge_pte_mkdirty(entry); + } +@@ -3221,7 +3233,10 @@ int hugetlb_fault(struct mm_struct *mm, + if (huge_ptep_set_access_flags(vma, address, ptep, entry, + flags & FAULT_FLAG_WRITE)) + update_mmu_cache(vma, address, ptep); +- ++out_put_page: ++ if (page != pagecache_page) ++ unlock_page(page); ++ put_page(page); + out_ptl: + spin_unlock(ptl); + +@@ -3229,12 +3244,17 @@ out_ptl: + unlock_page(pagecache_page); + put_page(pagecache_page); + } +- if (page != pagecache_page) +- unlock_page(page); +- put_page(page); +- + out_mutex: + mutex_unlock(&htlb_fault_mutex_table[hash]); ++ /* ++ * Generally it's safe to hold refcount during waiting page lock. But ++ * here we just wait to defer the next page fault to avoid busy loop and ++ * the page is not used after unlocked before returning from the current ++ * page fault. So we are safe from accessing freed page, even if we wait ++ * here without taking refcount. ++ */ ++ if (need_wait_lock) ++ wait_on_page_locked(page); + return ret; + } + diff --git a/queue-3.19/mm-hugetlb-remove-unnecessary-lower-bound-on-sysctl-handlers.patch b/queue-3.19/mm-hugetlb-remove-unnecessary-lower-bound-on-sysctl-handlers.patch new file mode 100644 index 00000000000..5e2f2cc0afd --- /dev/null +++ b/queue-3.19/mm-hugetlb-remove-unnecessary-lower-bound-on-sysctl-handlers.patch @@ -0,0 +1,54 @@ +From 3cd7645de624939c38f5124b4ac15f8b35a1a8b7 Mon Sep 17 00:00:00 2001 +From: Andrey Ryabinin +Date: Tue, 10 Feb 2015 14:11:33 -0800 +Subject: mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? + +From: Andrey Ryabinin + +commit 3cd7645de624939c38f5124b4ac15f8b35a1a8b7 upstream. + +Commit ed4d4902ebdd ("mm, hugetlb: remove hugetlb_zero and +hugetlb_infinity") replaced 'unsigned long hugetlb_zero' with 'int zero' +leading to out-of-bounds access in proc_doulongvec_minmax(). Use +'.extra1 = NULL' instead of '.extra1 = &zero'. Passing NULL is +equivalent to passing minimal value, which is 0 for unsigned types. + +Fixes: ed4d4902ebdd ("mm, hugetlb: remove hugetlb_zero and hugetlb_infinity") +Signed-off-by: Andrey Ryabinin +Reported-by: Dmitry Vyukov +Suggested-by: Manfred Spraul +Acked-by: David Rientjes +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + kernel/sysctl.c | 3 --- + 1 file changed, 3 deletions(-) + +--- a/kernel/sysctl.c ++++ b/kernel/sysctl.c +@@ -1248,7 +1248,6 @@ static struct ctl_table vm_table[] = { + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = hugetlb_sysctl_handler, +- .extra1 = &zero, + }, + #ifdef CONFIG_NUMA + { +@@ -1257,7 +1256,6 @@ static struct ctl_table vm_table[] = { + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = &hugetlb_mempolicy_sysctl_handler, +- .extra1 = &zero, + }, + #endif + { +@@ -1280,7 +1278,6 @@ static struct ctl_table vm_table[] = { + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = hugetlb_overcommit_handler, +- .extra1 = &zero, + }, + #endif + { diff --git a/queue-3.19/mm-hwpoison-drop-lru_add_drain_all-in-__soft_offline_page.patch b/queue-3.19/mm-hwpoison-drop-lru_add_drain_all-in-__soft_offline_page.patch new file mode 100644 index 00000000000..4c4899e1648 --- /dev/null +++ b/queue-3.19/mm-hwpoison-drop-lru_add_drain_all-in-__soft_offline_page.patch @@ -0,0 +1,47 @@ +From 9ab3b598d2dfbdb0153ffa7e4b1456bbff59a25d Mon Sep 17 00:00:00 2001 +From: Naoya Horiguchi +Date: Thu, 12 Feb 2015 15:00:25 -0800 +Subject: mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page() + +From: Naoya Horiguchi + +commit 9ab3b598d2dfbdb0153ffa7e4b1456bbff59a25d upstream. + +A race condition starts to be visible in recent mmotm, where a PG_hwpoison +flag is set on a migration source page *before* it's back in buddy page +poo= l. + +This is problematic because no page flag is supposed to be set when +freeing (see __free_one_page().) So the user-visible effect of this race +is that it could trigger the BUG_ON() when soft-offlining is called. + +The root cause is that we call lru_add_drain_all() to make sure that the +page is in buddy, but that doesn't work because this function just +schedule= s a work item and doesn't wait its completion. +drain_all_pages() does drainin= g directly, so simply dropping +lru_add_drain_all() solves this problem. + +Fixes: f15bdfa802bf ("mm/memory-failure.c: fix memory leak in successful soft offlining") +Signed-off-by: Naoya Horiguchi +Cc: Andi Kleen +Cc: Tony Luck +Cc: Chen Gong +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/memory-failure.c | 2 -- + 1 file changed, 2 deletions(-) + +--- a/mm/memory-failure.c ++++ b/mm/memory-failure.c +@@ -1654,8 +1654,6 @@ static int __soft_offline_page(struct pa + * setting PG_hwpoison. + */ + if (!is_free_buddy_page(page)) +- lru_add_drain_all(); +- if (!is_free_buddy_page(page)) + drain_all_pages(page_zone(page)); + SetPageHWPoison(page); + if (!is_free_buddy_page(page)) diff --git a/queue-3.19/mm-memory.c-actually-remap-enough-memory.patch b/queue-3.19/mm-memory.c-actually-remap-enough-memory.patch new file mode 100644 index 00000000000..137755ffc45 --- /dev/null +++ b/queue-3.19/mm-memory.c-actually-remap-enough-memory.patch @@ -0,0 +1,36 @@ +From 9cb12d7b4ccaa976f97ce0c5fd0f1b6a83bc2a75 Mon Sep 17 00:00:00 2001 +From: Grazvydas Ignotas +Date: Thu, 12 Feb 2015 15:00:19 -0800 +Subject: mm/memory.c: actually remap enough memory + +From: Grazvydas Ignotas + +commit 9cb12d7b4ccaa976f97ce0c5fd0f1b6a83bc2a75 upstream. + +For whatever reason, generic_access_phys() only remaps one page, but +actually allows to access arbitrary size. It's quite easy to trigger +large reads, like printing out large structure with gdb, which leads to a +crash. Fix it by remapping correct size. + +Fixes: 28b2ee20c7cb ("access_process_vm device memory infrastructure") +Signed-off-by: Grazvydas Ignotas +Cc: Rik van Riel +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/memory.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/memory.c ++++ b/mm/memory.c +@@ -3561,7 +3561,7 @@ int generic_access_phys(struct vm_area_s + if (follow_phys(vma, addr, write, &prot, &phys_addr)) + return -EINVAL; + +- maddr = ioremap_prot(phys_addr, PAGE_SIZE, prot); ++ maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot); + if (write) + memcpy_toio(maddr + offset, buf, len); + else diff --git a/queue-3.19/mm-mmap.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch b/queue-3.19/mm-mmap.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch new file mode 100644 index 00000000000..bc5312466bf --- /dev/null +++ b/queue-3.19/mm-mmap.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch @@ -0,0 +1,63 @@ +From 5703b087dc8eaf47bfb399d6cf512d471beff405 Mon Sep 17 00:00:00 2001 +From: Roman Gushchin +Date: Wed, 11 Feb 2015 15:28:39 -0800 +Subject: mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() + +From: Roman Gushchin + +commit 5703b087dc8eaf47bfb399d6cf512d471beff405 upstream. + +I noticed, that "allowed" can easily overflow by falling below 0, +because (total_vm / 32) can be larger than "allowed". The problem +occurs in OVERCOMMIT_NONE mode. + +In this case, a huge allocation can success and overcommit the system +(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall +(system-wide), so system become unusable. + +The problem was masked out by commit c9b1d0981fcc +("mm: limit growth of 3% hardcoded other user reserve"), +but it's easy to reproduce it on older kernels: +1) set overcommit_memory sysctl to 2 +2) mmap() large file multiple times (with VM_SHARED flag) +3) try to malloc() large amount of memory + +It also can be reproduced on newer kernels, but miss-configured +sysctl_user_reserve_kbytes is required. + +Fix this issue by switching to signed arithmetic here. + +[akpm@linux-foundation.org: use min_t] +Signed-off-by: Roman Gushchin +Cc: Andrew Shewmaker +Cc: Rik van Riel +Cc: Konstantin Khlebnikov +Reviewed-by: Michal Hocko +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/mmap.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/mm/mmap.c ++++ b/mm/mmap.c +@@ -152,7 +152,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed); + */ + int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) + { +- unsigned long free, allowed, reserve; ++ long free, allowed, reserve; + + VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < + -(s64)vm_committed_as_batch * num_online_cpus(), +@@ -220,7 +220,7 @@ int __vm_enough_memory(struct mm_struct + */ + if (mm) { + reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10); +- allowed -= min(mm->total_vm / 32, reserve); ++ allowed -= min_t(long, mm->total_vm / 32, reserve); + } + + if (percpu_counter_read_positive(&vm_committed_as) < allowed) diff --git a/queue-3.19/mm-nommu-fix-memory-leak.patch b/queue-3.19/mm-nommu-fix-memory-leak.patch new file mode 100644 index 00000000000..61d3b115241 --- /dev/null +++ b/queue-3.19/mm-nommu-fix-memory-leak.patch @@ -0,0 +1,126 @@ +From da616534ed7f6e8ffaab699258b55c8d78d0b4ea Mon Sep 17 00:00:00 2001 +From: Joonsoo Kim +Date: Fri, 27 Feb 2015 15:51:43 -0800 +Subject: mm/nommu: fix memory leak + +From: Joonsoo Kim + +commit da616534ed7f6e8ffaab699258b55c8d78d0b4ea upstream. + +Maxime reported the following memory leak regression due to commit +dbc8358c7237 ("mm/nommu: use alloc_pages_exact() rather than its own +implementation"). + +On v3.19, I am facing a memory leak. Each time I run a command one page +is lost. Here an example with busybox's free command: + + / # free + total used free shared buffers cached + Mem: 7928 1972 5956 0 0 492 + -/+ buffers/cache: 1480 6448 + / # free + total used free shared buffers cached + Mem: 7928 1976 5952 0 0 492 + -/+ buffers/cache: 1484 6444 + / # free + total used free shared buffers cached + Mem: 7928 1980 5948 0 0 492 + -/+ buffers/cache: 1488 6440 + / # free + total used free shared buffers cached + Mem: 7928 1984 5944 0 0 492 + -/+ buffers/cache: 1492 6436 + / # free + total used free shared buffers cached + Mem: 7928 1988 5940 0 0 492 + -/+ buffers/cache: 1496 6432 + +At some point, the system fails to sastisfy 256KB allocations: + + free: page allocation failure: order:6, mode:0xd0 + CPU: 0 PID: 67 Comm: free Not tainted 3.19.0-05389-gacf2cf1-dirty #64 + Hardware name: STM32 (Device Tree Support) + show_stack+0xb/0xc + warn_alloc_failed+0x97/0xbc + __alloc_pages_nodemask+0x295/0x35c + __get_free_pages+0xb/0x24 + alloc_pages_exact+0x19/0x24 + do_mmap_pgoff+0x423/0x658 + vm_mmap_pgoff+0x3f/0x4e + load_flat_file+0x20d/0x4f8 + load_flat_binary+0x3f/0x26c + search_binary_handler+0x51/0xe4 + do_execveat_common+0x271/0x35c + do_execve+0x19/0x1c + ret_fast_syscall+0x1/0x4a + Mem-info: + Normal per-cpu: + CPU 0: hi: 0, btch: 1 usd: 0 + active_anon:0 inactive_anon:0 isolated_anon:0 + active_file:0 inactive_file:0 isolated_file:0 + unevictable:123 dirty:0 writeback:0 unstable:0 + free:1515 slab_reclaimable:17 slab_unreclaimable:139 + mapped:0 shmem:0 pagetables:0 bounce:0 + free_cma:0 + Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks + lowmem_reserve[]: 0 0 + Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB + 123 total pagecache pages + 2048 pages of RAM + 1538 free pages + 66 reserved pages + 109 slab pages + -46 pages shared + 0 pages swap cached + nommu: Allocation of length 221184 from process 67 (free) failed + Normal per-cpu: + CPU 0: hi: 0, btch: 1 usd: 0 + active_anon:0 inactive_anon:0 isolated_anon:0 + active_file:0 inactive_file:0 isolated_file:0 + unevictable:123 dirty:0 writeback:0 unstable:0 + free:1515 slab_reclaimable:17 slab_unreclaimable:139 + mapped:0 shmem:0 pagetables:0 bounce:0 + free_cma:0 + Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks + lowmem_reserve[]: 0 0 + Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB + 123 total pagecache pages + Unable to allocate RAM for process text/data, errno 12 SEGV + +This problem happens because we allocate ordered page through +__get_free_pages() in do_mmap_private() in some cases and we try to free +individual pages rather than ordered page in free_page_series(). In +this case, freeing pages whose refcount is not 0 won't be freed to the +page allocator so memory leak happens. + +To fix the problem, this patch changes __get_free_pages() to +alloc_pages_exact() since alloc_pages_exact() returns +physically-contiguous pages but each pages are refcounted. + +Fixes: dbc8358c7237 ("mm/nommu: use alloc_pages_exact() rather than its own implementation"). +Reported-by: Maxime Coquelin +Tested-by: Maxime Coquelin +Signed-off-by: Joonsoo Kim +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/nommu.c | 4 +--- + 1 file changed, 1 insertion(+), 3 deletions(-) + +--- a/mm/nommu.c ++++ b/mm/nommu.c +@@ -1189,11 +1189,9 @@ static int do_mmap_private(struct vm_are + if (sysctl_nr_trim_pages && total - point >= sysctl_nr_trim_pages) { + total = point; + kdebug("try to alloc exact %lu pages", total); +- base = alloc_pages_exact(len, GFP_KERNEL); +- } else { +- base = (void *)__get_free_pages(GFP_KERNEL, order); + } + ++ base = alloc_pages_exact(total << PAGE_SHIFT, GFP_KERNEL); + if (!base) + goto enomem; + diff --git a/queue-3.19/mm-nommu.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch b/queue-3.19/mm-nommu.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch new file mode 100644 index 00000000000..05882445d09 --- /dev/null +++ b/queue-3.19/mm-nommu.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch @@ -0,0 +1,61 @@ +From 8138a67a5557ffea3a21dfd6f037842d4e748513 Mon Sep 17 00:00:00 2001 +From: Roman Gushchin +Date: Wed, 11 Feb 2015 15:28:42 -0800 +Subject: mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() + +From: Roman Gushchin + +commit 8138a67a5557ffea3a21dfd6f037842d4e748513 upstream. + +I noticed that "allowed" can easily overflow by falling below 0, because +(total_vm / 32) can be larger than "allowed". The problem occurs in +OVERCOMMIT_NONE mode. + +In this case, a huge allocation can success and overcommit the system +(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall +(system-wide), so system become unusable. + +The problem was masked out by commit c9b1d0981fcc +("mm: limit growth of 3% hardcoded other user reserve"), +but it's easy to reproduce it on older kernels: +1) set overcommit_memory sysctl to 2 +2) mmap() large file multiple times (with VM_SHARED flag) +3) try to malloc() large amount of memory + +It also can be reproduced on newer kernels, but miss-configured +sysctl_user_reserve_kbytes is required. + +Fix this issue by switching to signed arithmetic here. + +Signed-off-by: Roman Gushchin +Cc: Andrew Shewmaker +Cc: Rik van Riel +Cc: Konstantin Khlebnikov +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/nommu.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/mm/nommu.c ++++ b/mm/nommu.c +@@ -1895,7 +1895,7 @@ EXPORT_SYMBOL(unmap_mapping_range); + */ + int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) + { +- unsigned long free, allowed, reserve; ++ long free, allowed, reserve; + + vm_acct_memory(pages); + +@@ -1959,7 +1959,7 @@ int __vm_enough_memory(struct mm_struct + */ + if (mm) { + reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10); +- allowed -= min(mm->total_vm / 32, reserve); ++ allowed -= min_t(long, mm->total_vm / 32, reserve); + } + + if (percpu_counter_read_positive(&vm_committed_as) < allowed) diff --git a/queue-3.19/mm-page_alloc-revert-inadvertent-__gfp_fs-retry-behavior-change.patch b/queue-3.19/mm-page_alloc-revert-inadvertent-__gfp_fs-retry-behavior-change.patch new file mode 100644 index 00000000000..17d51d2e4cf --- /dev/null +++ b/queue-3.19/mm-page_alloc-revert-inadvertent-__gfp_fs-retry-behavior-change.patch @@ -0,0 +1,58 @@ +From cc87317726f851531ae8422e0c2d3d6e2d7b1955 Mon Sep 17 00:00:00 2001 +From: Johannes Weiner +Date: Fri, 27 Feb 2015 15:52:09 -0800 +Subject: mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change + +From: Johannes Weiner + +commit cc87317726f851531ae8422e0c2d3d6e2d7b1955 upstream. + +Historically, !__GFP_FS allocations were not allowed to invoke the OOM +killer once reclaim had failed, but nevertheless kept looping in the +allocator. + +Commit 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into +allocation slowpath"), which should have been a simple cleanup patch, +accidentally changed the behavior to aborting the allocation at that +point. This creates problems with filesystem callers (?) that currently +rely on the allocator waiting for other tasks to intervene. + +Revert the behavior as it shouldn't have been changed as part of a +cleanup patch. + +Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath") +Signed-off-by: Johannes Weiner +Acked-by: Michal Hocko +Reported-by: Tetsuo Handa +Cc: Theodore Ts'o +Cc: Dave Chinner +Acked-by: David Rientjes +Cc: Oleg Nesterov +Cc: Mel Gorman +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/page_alloc.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -2380,8 +2380,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, un + if (high_zoneidx < ZONE_NORMAL) + goto out; + /* The OOM killer does not compensate for light reclaim */ +- if (!(gfp_mask & __GFP_FS)) ++ if (!(gfp_mask & __GFP_FS)) { ++ /* ++ * XXX: Page reclaim didn't yield anything, ++ * and the OOM killer can't be invoked, but ++ * keep looping as per should_alloc_retry(). ++ */ ++ *did_some_progress = 1; + goto out; ++ } + /* + * GFP_THISNODE contains __GFP_NORETRY and we never hit this. + * Sanity check for bare calls of __GFP_THISNODE, not real OOM. diff --git a/queue-3.19/mm-when-stealing-freepages-also-take-pages-created-by-splitting-buddy-page.patch b/queue-3.19/mm-when-stealing-freepages-also-take-pages-created-by-splitting-buddy-page.patch new file mode 100644 index 00000000000..731bf49eb2c --- /dev/null +++ b/queue-3.19/mm-when-stealing-freepages-also-take-pages-created-by-splitting-buddy-page.patch @@ -0,0 +1,362 @@ +From 99592d598eca62bdbbf62b59941c189176dfc614 Mon Sep 17 00:00:00 2001 +From: Vlastimil Babka +Date: Wed, 11 Feb 2015 15:28:15 -0800 +Subject: mm: when stealing freepages, also take pages created by splitting buddy page + +From: Vlastimil Babka + +commit 99592d598eca62bdbbf62b59941c189176dfc614 upstream. + +When studying page stealing, I noticed some weird looking decisions in +try_to_steal_freepages(). The first I assume is a bug (Patch 1), the +following two patches were driven by evaluation. + +Testing was done with stress-highalloc of mmtests, using the +mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how +often page stealing occurs for individual migratetypes, and what +migratetypes are used for fallbacks. Arguably, the worst case of page +stealing is when UNMOVABLE allocation steals from MOVABLE pageblock. +RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal, +so the goal is to minimize these two cases. + +The evaluation of v2 wasn't always clear win and Joonsoo questioned the +results. Here I used different baseline which includes RFC compaction +improvements from [1]. I found that the compaction improvements reduce +variability of stress-highalloc, so there's less noise in the data. + +First, let's look at stress-highalloc configured to do sync compaction, +and how these patches reduce page stealing events during the test. First +column is after fresh reboot, other two are reiterations of test without +reboot. That was all accumulater over 5 re-iterations (so the benchmark +was run 5x3 times with 5 fresh restarts). + +Baseline: + + 3.19-rc4 3.19-rc4 3.19-rc4 + 5-nothp-1 5-nothp-2 5-nothp-3 +Page alloc extfrag event 10264225 8702233 10244125 +Extfrag fragmenting 10263271 8701552 10243473 +Extfrag fragmenting for unmovable 13595 17616 15960 +Extfrag fragmenting unmovable placed with movable 7989 12193 8447 +Extfrag fragmenting for reclaimable 658 1840 1817 +Extfrag fragmenting reclaimable placed with movable 558 1677 1679 +Extfrag fragmenting for movable 10249018 8682096 10225696 + +With Patch 1: + 3.19-rc4 3.19-rc4 3.19-rc4 + 6-nothp-1 6-nothp-2 6-nothp-3 +Page alloc extfrag event 11834954 9877523 9774860 +Extfrag fragmenting 11833993 9876880 9774245 +Extfrag fragmenting for unmovable 7342 16129 11712 +Extfrag fragmenting unmovable placed with movable 4191 10547 6270 +Extfrag fragmenting for reclaimable 373 1130 923 +Extfrag fragmenting reclaimable placed with movable 302 906 738 +Extfrag fragmenting for movable 11826278 9859621 9761610 + +With Patch 2: + 3.19-rc4 3.19-rc4 3.19-rc4 + 7-nothp-1 7-nothp-2 7-nothp-3 +Page alloc extfrag event 4725990 3668793 3807436 +Extfrag fragmenting 4725104 3668252 3806898 +Extfrag fragmenting for unmovable 6678 7974 7281 +Extfrag fragmenting unmovable placed with movable 2051 3829 4017 +Extfrag fragmenting for reclaimable 429 1208 1278 +Extfrag fragmenting reclaimable placed with movable 369 976 1034 +Extfrag fragmenting for movable 4717997 3659070 3798339 + +With Patch 3: + 3.19-rc4 3.19-rc4 3.19-rc4 + 8-nothp-1 8-nothp-2 8-nothp-3 +Page alloc extfrag event 5016183 4700142 3850633 +Extfrag fragmenting 5015325 4699613 3850072 +Extfrag fragmenting for unmovable 1312 3154 3088 +Extfrag fragmenting unmovable placed with movable 1115 2777 2714 +Extfrag fragmenting for reclaimable 437 1193 1097 +Extfrag fragmenting reclaimable placed with movable 330 969 879 +Extfrag fragmenting for movable 5013576 4695266 3845887 + +In v2 we've seen apparent regression with Patch 1 for unmovable events, +this is now gone, suggesting it was indeed noise. Here, each patch +improves the situation for unmovable events. Reclaimable is improved by +patch 1 and then either the same modulo noise, or perhaps sligtly worse - +a small price for unmovable improvements, IMHO. The number of movable +allocations falling back to other migratetypes is most noisy, but it's +reduced to half at Patch 2 nevertheless. These are least critical as +compaction can move them around. + +If we look at success rates, the patches don't affect them, that didn't change. + +Baseline: + 3.19-rc4 3.19-rc4 3.19-rc4 + 5-nothp-1 5-nothp-2 5-nothp-3 +Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%) +Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%) +Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%) +Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%) +Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%) +Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%) +Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%) +Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%) +Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) + +Patch 1: + 3.19-rc4 3.19-rc4 3.19-rc4 + 6-nothp-1 6-nothp-2 6-nothp-3 +Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%) +Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%) +Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%) +Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%) +Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%) +Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%) +Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) +Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%) +Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%) + +Patch 2: + + 3.19-rc4 3.19-rc4 3.19-rc4 + 7-nothp-1 7-nothp-2 7-nothp-3 +Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%) +Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%) +Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%) +Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%) +Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%) +Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%) +Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) +Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%) +Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) + +Patch 3: + 3.19-rc4 3.19-rc4 3.19-rc4 + 8-nothp-1 8-nothp-2 8-nothp-3 +Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%) +Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%) +Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%) +Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%) +Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%) +Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%) +Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%) +Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%) +Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%) + +While there's no improvement here, I consider reduced fragmentation events +to be worth on its own. Patch 2 also seems to reduce scanning for free +pages, and migrations in compaction, suggesting it has somewhat less work +to do: + +Patch 1: + +Compaction stalls 4153 3959 3978 +Compaction success 1523 1441 1446 +Compaction failures 2630 2517 2531 +Page migrate success 4600827 4943120 5104348 +Page migrate failure 19763 16656 17806 +Compaction pages isolated 9597640 10305617 10653541 +Compaction migrate scanned 77828948 86533283 87137064 +Compaction free scanned 517758295 521312840 521462251 +Compaction cost 5503 5932 6110 + +Patch 2: + +Compaction stalls 3800 3450 3518 +Compaction success 1421 1316 1317 +Compaction failures 2379 2134 2201 +Page migrate success 4160421 4502708 4752148 +Page migrate failure 19705 14340 14911 +Compaction pages isolated 8731983 9382374 9910043 +Compaction migrate scanned 98362797 96349194 98609686 +Compaction free scanned 496512560 469502017 480442545 +Compaction cost 5173 5526 5811 + +As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers +of unmovable and reclaimable pageblocks. + +Configuring the benchmark to allocate like THP page fault (i.e. no sync +compaction) gives much noisier results for iterations 2 and 3 after +reboot. This is not so surprising given how [1] offers lower improvements +in this scenario due to less restarts after deferred compaction which +would change compaction pivot. + +Baseline: + 3.19-rc4 3.19-rc4 3.19-rc4 + 5-thp-1 5-thp-2 5-thp-3 +Page alloc extfrag event 8148965 6227815 6646741 +Extfrag fragmenting 8147872 6227130 6646117 +Extfrag fragmenting for unmovable 10324 12942 15975 +Extfrag fragmenting unmovable placed with movable 5972 8495 10907 +Extfrag fragmenting for reclaimable 601 1707 2210 +Extfrag fragmenting reclaimable placed with movable 520 1570 2000 +Extfrag fragmenting for movable 8136947 6212481 6627932 + +Patch 1: + 3.19-rc4 3.19-rc4 3.19-rc4 + 6-thp-1 6-thp-2 6-thp-3 +Page alloc extfrag event 8345457 7574471 7020419 +Extfrag fragmenting 8343546 7573777 7019718 +Extfrag fragmenting for unmovable 10256 18535 30716 +Extfrag fragmenting unmovable placed with movable 6893 11726 22181 +Extfrag fragmenting for reclaimable 465 1208 1023 +Extfrag fragmenting reclaimable placed with movable 353 996 843 +Extfrag fragmenting for movable 8332825 7554034 6987979 + +Patch 2: + 3.19-rc4 3.19-rc4 3.19-rc4 + 7-thp-1 7-thp-2 7-thp-3 +Page alloc extfrag event 3512847 3020756 2891625 +Extfrag fragmenting 3511940 3020185 2891059 +Extfrag fragmenting for unmovable 9017 6892 6191 +Extfrag fragmenting unmovable placed with movable 1524 3053 2435 +Extfrag fragmenting for reclaimable 445 1081 1160 +Extfrag fragmenting reclaimable placed with movable 375 918 986 +Extfrag fragmenting for movable 3502478 3012212 2883708 + +Patch 3: + 3.19-rc4 3.19-rc4 3.19-rc4 + 8-thp-1 8-thp-2 8-thp-3 +Page alloc extfrag event 3181699 3082881 2674164 +Extfrag fragmenting 3180812 3082303 2673611 +Extfrag fragmenting for unmovable 1201 4031 4040 +Extfrag fragmenting unmovable placed with movable 974 3611 3645 +Extfrag fragmenting for reclaimable 478 1165 1294 +Extfrag fragmenting reclaimable placed with movable 387 985 1030 +Extfrag fragmenting for movable 3179133 3077107 2668277 + +The improvements for first iteration are clear, the rest is much noisier +and can appear like regression for Patch 1. Anyway, patch 2 rectifies it. + +Allocation success rates are again unaffected so there's no point in +making this e-mail any longer. + +[1] http://marc.info/?l=linux-mm&m=142166196321125&w=2 + +This patch (of 3): + +When __rmqueue_fallback() is called to allocate a page of order X, it will +find a page of order Y >= X of a fallback migratetype, which is different +from the desired migratetype. With the help of try_to_steal_freepages(), +it may change the migratetype (to the desired one) also of: + +1) all currently free pages in the pageblock containing the fallback page +2) the fallback pageblock itself +3) buddy pages created by splitting the fallback page (when Y > X) + +These decisions take the order Y into account, as well as the desired +migratetype, with the goal of preventing multiple fallback allocations +that could e.g. distribute UNMOVABLE allocations among multiple +pageblocks. + +Originally, decision for 1) has implied the decision for 3). Commit +47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that +(probably unintentionally) so that the buddy pages in case 3) are always +changed to the desired migratetype, except for CMA pageblocks. + +Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code +and fix a bug") did some refactoring and added a comment that the case of +3) is intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should +respect pageblock type") removed the comment and tried to restore the +original behavior where 1) implies 3), but due to the previous +refactoring, the result is instead that only 2) implies 3) - and the +conditions for 2) are less frequently met than conditions for 1). This +may increase fragmentation in situations where the code decides to steal +all free pages from the pageblock (case 1)), but then gives back the buddy +pages produced by splitting. + +This patch restores the original intended logic where 1) implies 3). +During testing with stress-highalloc from mmtests, this has shown to +decrease the number of events where UNMOVABLE and RECLAIMABLE allocations +steal from MOVABLE pageblocks, which can lead to permanent fragmentation. +In some cases it has increased the number of events when MOVABLE +allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are +fixable by sync compaction and thus less harmful. + +Note that evaluation has shown that the behavior introduced by +47118af076f6 for buddy pages in case 3) is actually even better than the +original logic, so the following patch will introduce it properly once +again. For stable backports of this patch it makes thus sense to only fix +versions containing 0cbef29a7821. + +[iamjoonsoo.kim@lge.com: tracepoint fix] +Signed-off-by: Vlastimil Babka +Acked-by: Mel Gorman +Cc: Zhang Yanfei +Acked-by: Minchan Kim +Cc: David Rientjes +Cc: Rik van Riel +Cc: "Aneesh Kumar K.V" +Cc: "Kirill A. Shutemov" +Cc: Johannes Weiner +Cc: Joonsoo Kim +Cc: Michal Hocko +Cc: KOSAKI Motohiro +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + include/trace/events/kmem.h | 7 ++++--- + mm/page_alloc.c | 12 +++++------- + 2 files changed, 9 insertions(+), 10 deletions(-) + +--- a/include/trace/events/kmem.h ++++ b/include/trace/events/kmem.h +@@ -268,11 +268,11 @@ TRACE_EVENT(mm_page_alloc_extfrag, + + TP_PROTO(struct page *page, + int alloc_order, int fallback_order, +- int alloc_migratetype, int fallback_migratetype, int new_migratetype), ++ int alloc_migratetype, int fallback_migratetype), + + TP_ARGS(page, + alloc_order, fallback_order, +- alloc_migratetype, fallback_migratetype, new_migratetype), ++ alloc_migratetype, fallback_migratetype), + + TP_STRUCT__entry( + __field( struct page *, page ) +@@ -289,7 +289,8 @@ TRACE_EVENT(mm_page_alloc_extfrag, + __entry->fallback_order = fallback_order; + __entry->alloc_migratetype = alloc_migratetype; + __entry->fallback_migratetype = fallback_migratetype; +- __entry->change_ownership = (new_migratetype == alloc_migratetype); ++ __entry->change_ownership = (alloc_migratetype == ++ get_pageblock_migratetype(page)); + ), + + TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d", +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -1138,8 +1138,8 @@ static void change_pageblock_range(struc + * nor move CMA pages to different free lists. We don't want unmovable pages + * to be allocated from MIGRATE_CMA areas. + * +- * Returns the new migratetype of the pageblock (or the same old migratetype +- * if it was unchanged). ++ * Returns the allocation migratetype if free pages were stolen, or the ++ * fallback migratetype if it was decided not to steal. + */ + static int try_to_steal_freepages(struct zone *zone, struct page *page, + int start_type, int fallback_type) +@@ -1170,12 +1170,10 @@ static int try_to_steal_freepages(struct + + /* Claim the whole block if over half of it is free */ + if (pages >= (1 << (pageblock_order-1)) || +- page_group_by_mobility_disabled) { +- ++ page_group_by_mobility_disabled) + set_pageblock_migratetype(page, start_type); +- return start_type; +- } + ++ return start_type; + } + + return fallback_type; +@@ -1227,7 +1225,7 @@ __rmqueue_fallback(struct zone *zone, un + set_freepage_migratetype(page, new_type); + + trace_mm_page_alloc_extfrag(page, order, current_order, +- start_migratetype, migratetype, new_type); ++ start_migratetype, migratetype); + + return page; + } diff --git a/queue-3.19/series b/queue-3.19/series index faa08ae67cd..dc3f749698c 100644 --- a/queue-3.19/series +++ b/queue-3.19/series @@ -28,3 +28,16 @@ usb-plusb-add-support-for-national-instruments-host-to-host-cable.patch udp-only-allow-ufo-for-packets-from-sock_dgram-sockets.patch net-ping-return-eafnosupport-when-appropriate.patch team-don-t-traverse-port-list-using-rcu-in-team_set_mac_address.patch +mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch +mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch +mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch +mm-hugetlb-remove-unnecessary-lower-bound-on-sysctl-handlers.patch +mm-when-stealing-freepages-also-take-pages-created-by-splitting-buddy-page.patch +mm-mmap.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch +mm-nommu.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch +mm-compaction-fix-wrong-order-check-in-compact_finished.patch +mm-memory.c-actually-remap-enough-memory.patch +mm-hwpoison-drop-lru_add_drain_all-in-__soft_offline_page.patch +mm-fix-negative-nr_isolated-counts.patch +mm-nommu-fix-memory-leak.patch +mm-page_alloc-revert-inadvertent-__gfp_fs-retry-behavior-change.patch