--- /dev/null
+From 372549c2a3778fd3df445819811c944ad54609ca Mon Sep 17 00:00:00 2001
+From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Date: Thu, 12 Feb 2015 14:59:50 -0800
+Subject: mm/compaction: fix wrong order check in compact_finished()
+
+From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+
+commit 372549c2a3778fd3df445819811c944ad54609ca upstream.
+
+What we want to check here is whether there is highorder freepage in buddy
+list of other migratetype in order to steal it without fragmentation.
+But, current code just checks cc->order which means allocation request
+order. So, this is wrong.
+
+Without this fix, non-movable synchronous compaction below pageblock order
+would not stopped until compaction is complete, because migratetype of
+most pageblocks are movable and high order freepage made by compaction is
+usually on movable type buddy list.
+
+There is some report related to this bug. See below link.
+
+ http://www.spinics.net/lists/linux-mm/msg81666.html
+
+Although the issued system still has load spike comes from compaction,
+this makes that system completely stable and responsive according to his
+report.
+
+stress-highalloc test in mmtests with non movable order 7 allocation
+doesn't show any notable difference in allocation success rate, but, it
+shows more compaction success rate.
+
+Compaction success rate (Compaction success * 100 / Compaction stalls, %)
+18.47 : 28.94
+
+Fixes: 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available")
+Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Acked-by: Vlastimil Babka <vbabka@suse.cz>
+Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
+Cc: Mel Gorman <mgorman@suse.de>
+Cc: David Rientjes <rientjes@google.com>
+Cc: Rik van Riel <riel@redhat.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/compaction.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/mm/compaction.c
++++ b/mm/compaction.c
+@@ -1088,7 +1088,7 @@ static int compact_finished(struct zone
+ return COMPACT_PARTIAL;
+
+ /* Job done if allocation would set block type */
+- if (cc->order >= pageblock_order && area->nr_free)
++ if (order >= pageblock_order && area->nr_free)
+ return COMPACT_PARTIAL;
+ }
+
--- /dev/null
+From ff59909a077b3c51c168cb658601c6b63136a347 Mon Sep 17 00:00:00 2001
+From: Hugh Dickins <hughd@google.com>
+Date: Thu, 12 Feb 2015 15:00:28 -0800
+Subject: mm: fix negative nr_isolated counts
+
+From: Hugh Dickins <hughd@google.com>
+
+commit ff59909a077b3c51c168cb658601c6b63136a347 upstream.
+
+The vmstat interfaces are good at hiding negative counts (at least when
+CONFIG_SMP); but if you peer behind the curtain, you find that
+nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
+more negative: so they can absorb larger and larger numbers of isolated
+pages, yet still appear to be zero.
+
+I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
+but I guess it's there for a good reason, in which case we ought to get
+too_many_isolated() working again.
+
+The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
+putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
+to call acct_isolated() to increment them.
+
+It is possible that the bug whcih this patch fixes could cause OOM kills
+when the system still has a lot of reclaimable page cache.
+
+Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
+Signed-off-by: Hugh Dickins <hughd@google.com>
+Acked-by: Vlastimil Babka <vbabka@suse.cz>
+Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/compaction.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/mm/compaction.c
++++ b/mm/compaction.c
+@@ -1015,8 +1015,10 @@ static isolate_migrate_t isolate_migrate
+ low_pfn = isolate_migratepages_block(cc, low_pfn, end_pfn,
+ isolate_mode);
+
+- if (!low_pfn || cc->contended)
++ if (!low_pfn || cc->contended) {
++ acct_isolated(zone, cc);
+ return ISOLATE_ABORT;
++ }
+
+ /*
+ * Either we isolated something and proceed with migration. Or
--- /dev/null
+From 9fbc1f635fd0bd28cb32550211bf095753ac637a Mon Sep 17 00:00:00 2001
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Date: Wed, 11 Feb 2015 15:25:32 -0800
+Subject: mm/hugetlb: add migration entry check in __unmap_hugepage_range
+
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+
+commit 9fbc1f635fd0bd28cb32550211bf095753ac637a upstream.
+
+If __unmap_hugepage_range() tries to unmap the address range over which
+hugepage migration is on the way, we get the wrong page because pte_page()
+doesn't work for migration entries. This patch simply clears the pte for
+migration entries as we do for hwpoison entries.
+
+Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
+Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Cc: Hugh Dickins <hughd@google.com>
+Cc: James Hogan <james.hogan@imgtec.com>
+Cc: David Rientjes <rientjes@google.com>
+Cc: Mel Gorman <mel@csn.ul.ie>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Michal Hocko <mhocko@suse.cz>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Luiz Capitulino <lcapitulino@redhat.com>
+Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
+Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
+Cc: Steve Capper <steve.capper@linaro.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/hugetlb.c | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+--- a/mm/hugetlb.c
++++ b/mm/hugetlb.c
+@@ -2657,9 +2657,10 @@ again:
+ goto unlock;
+
+ /*
+- * HWPoisoned hugepage is already unmapped and dropped reference
++ * Migrating hugepage or HWPoisoned hugepage is already
++ * unmapped and its refcount is dropped, so just clear pte here.
+ */
+- if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) {
++ if (unlikely(!pte_present(pte))) {
+ huge_pte_clear(mm, address, ptep);
+ goto unlock;
+ }
--- /dev/null
+From a8bda28d87c38c6aa93de28ba5d30cc18e865a11 Mon Sep 17 00:00:00 2001
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Date: Wed, 11 Feb 2015 15:25:28 -0800
+Subject: mm/hugetlb: add migration/hwpoisoned entry check in hugetlb_change_protection
+
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+
+commit a8bda28d87c38c6aa93de28ba5d30cc18e865a11 upstream.
+
+There is a race condition between hugepage migration and
+change_protection(), where hugetlb_change_protection() doesn't care about
+migration entries and wrongly overwrites them. That causes unexpected
+results like kernel crash. HWPoison entries also can cause the same
+problem.
+
+This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this
+function to do proper actions.
+
+Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
+Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Cc: Hugh Dickins <hughd@google.com>
+Cc: James Hogan <james.hogan@imgtec.com>
+Cc: David Rientjes <rientjes@google.com>
+Cc: Mel Gorman <mel@csn.ul.ie>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Michal Hocko <mhocko@suse.cz>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Luiz Capitulino <lcapitulino@redhat.com>
+Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
+Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
+Cc: Steve Capper <steve.capper@linaro.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/hugetlb.c | 21 ++++++++++++++++++++-
+ 1 file changed, 20 insertions(+), 1 deletion(-)
+
+--- a/mm/hugetlb.c
++++ b/mm/hugetlb.c
+@@ -3384,7 +3384,26 @@ unsigned long hugetlb_change_protection(
+ spin_unlock(ptl);
+ continue;
+ }
+- if (!huge_pte_none(huge_ptep_get(ptep))) {
++ pte = huge_ptep_get(ptep);
++ if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) {
++ spin_unlock(ptl);
++ continue;
++ }
++ if (unlikely(is_hugetlb_entry_migration(pte))) {
++ swp_entry_t entry = pte_to_swp_entry(pte);
++
++ if (is_write_migration_entry(entry)) {
++ pte_t newpte;
++
++ make_migration_entry_read(&entry);
++ newpte = swp_entry_to_pte(entry);
++ set_huge_pte_at(mm, address, ptep, newpte);
++ pages++;
++ }
++ spin_unlock(ptl);
++ continue;
++ }
++ if (!huge_pte_none(pte)) {
+ pte = huge_ptep_get_and_clear(mm, address, ptep);
+ pte = pte_mkhuge(huge_pte_modify(pte, newprot));
+ pte = arch_make_huge_pte(pte, vma, NULL, 0);
--- /dev/null
+From 0f792cf949a0be506c2aa8bfac0605746b146dda Mon Sep 17 00:00:00 2001
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Date: Wed, 11 Feb 2015 15:25:25 -0800
+Subject: mm/hugetlb: fix getting refcount 0 page in hugetlb_fault()
+
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+
+commit 0f792cf949a0be506c2aa8bfac0605746b146dda upstream.
+
+When running the test which causes the race as shown in the previous patch,
+we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().
+
+This race happens when pte turns into migration entry just after the first
+check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
+To fix this, we need to check pte_present() again after huge_ptep_get().
+
+This patch also reorders taking ptl and doing pte_page(), because
+pte_page() should be done in ptl. Due to this reordering, we need use
+trylock_page() in page != pagecache_page case to respect locking order.
+
+Fixes: 66aebce747ea ("hugetlb: fix race condition in hugetlb_fault()")
+Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Cc: Hugh Dickins <hughd@google.com>
+Cc: James Hogan <james.hogan@imgtec.com>
+Cc: David Rientjes <rientjes@google.com>
+Cc: Mel Gorman <mel@csn.ul.ie>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Michal Hocko <mhocko@suse.cz>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Luiz Capitulino <lcapitulino@redhat.com>
+Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
+Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
+Cc: Steve Capper <steve.capper@linaro.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/hugetlb.c | 52 ++++++++++++++++++++++++++++++++++++----------------
+ 1 file changed, 36 insertions(+), 16 deletions(-)
+
+--- a/mm/hugetlb.c
++++ b/mm/hugetlb.c
+@@ -3134,6 +3134,7 @@ int hugetlb_fault(struct mm_struct *mm,
+ struct page *pagecache_page = NULL;
+ struct hstate *h = hstate_vma(vma);
+ struct address_space *mapping;
++ int need_wait_lock = 0;
+
+ address &= huge_page_mask(h);
+
+@@ -3172,6 +3173,16 @@ int hugetlb_fault(struct mm_struct *mm,
+ ret = 0;
+
+ /*
++ * entry could be a migration/hwpoison entry at this point, so this
++ * check prevents the kernel from going below assuming that we have
++ * a active hugepage in pagecache. This goto expects the 2nd page fault,
++ * and is_hugetlb_entry_(migration|hwpoisoned) check will properly
++ * handle it.
++ */
++ if (!pte_present(entry))
++ goto out_mutex;
++
++ /*
+ * If we are going to COW the mapping later, we examine the pending
+ * reservations for this page now. This will ensure that any
+ * allocations necessary to record that reservation occur outside the
+@@ -3190,30 +3201,31 @@ int hugetlb_fault(struct mm_struct *mm,
+ vma, address);
+ }
+
++ ptl = huge_pte_lock(h, mm, ptep);
++
++ /* Check for a racing update before calling hugetlb_cow */
++ if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
++ goto out_ptl;
++
+ /*
+ * hugetlb_cow() requires page locks of pte_page(entry) and
+ * pagecache_page, so here we need take the former one
+ * when page != pagecache_page or !pagecache_page.
+- * Note that locking order is always pagecache_page -> page,
+- * so no worry about deadlock.
+ */
+ page = pte_page(entry);
+- get_page(page);
+ if (page != pagecache_page)
+- lock_page(page);
+-
+- ptl = huge_pte_lockptr(h, mm, ptep);
+- spin_lock(ptl);
+- /* Check for a racing update before calling hugetlb_cow */
+- if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
+- goto out_ptl;
++ if (!trylock_page(page)) {
++ need_wait_lock = 1;
++ goto out_ptl;
++ }
+
++ get_page(page);
+
+ if (flags & FAULT_FLAG_WRITE) {
+ if (!huge_pte_write(entry)) {
+ ret = hugetlb_cow(mm, vma, address, ptep, entry,
+ pagecache_page, ptl);
+- goto out_ptl;
++ goto out_put_page;
+ }
+ entry = huge_pte_mkdirty(entry);
+ }
+@@ -3221,7 +3233,10 @@ int hugetlb_fault(struct mm_struct *mm,
+ if (huge_ptep_set_access_flags(vma, address, ptep, entry,
+ flags & FAULT_FLAG_WRITE))
+ update_mmu_cache(vma, address, ptep);
+-
++out_put_page:
++ if (page != pagecache_page)
++ unlock_page(page);
++ put_page(page);
+ out_ptl:
+ spin_unlock(ptl);
+
+@@ -3229,12 +3244,17 @@ out_ptl:
+ unlock_page(pagecache_page);
+ put_page(pagecache_page);
+ }
+- if (page != pagecache_page)
+- unlock_page(page);
+- put_page(page);
+-
+ out_mutex:
+ mutex_unlock(&htlb_fault_mutex_table[hash]);
++ /*
++ * Generally it's safe to hold refcount during waiting page lock. But
++ * here we just wait to defer the next page fault to avoid busy loop and
++ * the page is not used after unlocked before returning from the current
++ * page fault. So we are safe from accessing freed page, even if we wait
++ * here without taking refcount.
++ */
++ if (need_wait_lock)
++ wait_on_page_locked(page);
+ return ret;
+ }
+
--- /dev/null
+From 3cd7645de624939c38f5124b4ac15f8b35a1a8b7 Mon Sep 17 00:00:00 2001
+From: Andrey Ryabinin <a.ryabinin@samsung.com>
+Date: Tue, 10 Feb 2015 14:11:33 -0800
+Subject: mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
+
+From: Andrey Ryabinin <a.ryabinin@samsung.com>
+
+commit 3cd7645de624939c38f5124b4ac15f8b35a1a8b7 upstream.
+
+Commit ed4d4902ebdd ("mm, hugetlb: remove hugetlb_zero and
+hugetlb_infinity") replaced 'unsigned long hugetlb_zero' with 'int zero'
+leading to out-of-bounds access in proc_doulongvec_minmax(). Use
+'.extra1 = NULL' instead of '.extra1 = &zero'. Passing NULL is
+equivalent to passing minimal value, which is 0 for unsigned types.
+
+Fixes: ed4d4902ebdd ("mm, hugetlb: remove hugetlb_zero and hugetlb_infinity")
+Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
+Reported-by: Dmitry Vyukov <dvyukov@google.com>
+Suggested-by: Manfred Spraul <manfred@colorfullife.com>
+Acked-by: David Rientjes <rientjes@google.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ kernel/sysctl.c | 3 ---
+ 1 file changed, 3 deletions(-)
+
+--- a/kernel/sysctl.c
++++ b/kernel/sysctl.c
+@@ -1248,7 +1248,6 @@ static struct ctl_table vm_table[] = {
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = hugetlb_sysctl_handler,
+- .extra1 = &zero,
+ },
+ #ifdef CONFIG_NUMA
+ {
+@@ -1257,7 +1256,6 @@ static struct ctl_table vm_table[] = {
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = &hugetlb_mempolicy_sysctl_handler,
+- .extra1 = &zero,
+ },
+ #endif
+ {
+@@ -1280,7 +1278,6 @@ static struct ctl_table vm_table[] = {
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = hugetlb_overcommit_handler,
+- .extra1 = &zero,
+ },
+ #endif
+ {
--- /dev/null
+From 9ab3b598d2dfbdb0153ffa7e4b1456bbff59a25d Mon Sep 17 00:00:00 2001
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Date: Thu, 12 Feb 2015 15:00:25 -0800
+Subject: mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()
+
+From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+
+commit 9ab3b598d2dfbdb0153ffa7e4b1456bbff59a25d upstream.
+
+A race condition starts to be visible in recent mmotm, where a PG_hwpoison
+flag is set on a migration source page *before* it's back in buddy page
+poo= l.
+
+This is problematic because no page flag is supposed to be set when
+freeing (see __free_one_page().) So the user-visible effect of this race
+is that it could trigger the BUG_ON() when soft-offlining is called.
+
+The root cause is that we call lru_add_drain_all() to make sure that the
+page is in buddy, but that doesn't work because this function just
+schedule= s a work item and doesn't wait its completion.
+drain_all_pages() does drainin= g directly, so simply dropping
+lru_add_drain_all() solves this problem.
+
+Fixes: f15bdfa802bf ("mm/memory-failure.c: fix memory leak in successful soft offlining")
+Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
+Cc: Andi Kleen <andi@firstfloor.org>
+Cc: Tony Luck <tony.luck@intel.com>
+Cc: Chen Gong <gong.chen@linux.intel.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/memory-failure.c | 2 --
+ 1 file changed, 2 deletions(-)
+
+--- a/mm/memory-failure.c
++++ b/mm/memory-failure.c
+@@ -1654,8 +1654,6 @@ static int __soft_offline_page(struct pa
+ * setting PG_hwpoison.
+ */
+ if (!is_free_buddy_page(page))
+- lru_add_drain_all();
+- if (!is_free_buddy_page(page))
+ drain_all_pages(page_zone(page));
+ SetPageHWPoison(page);
+ if (!is_free_buddy_page(page))
--- /dev/null
+From 9cb12d7b4ccaa976f97ce0c5fd0f1b6a83bc2a75 Mon Sep 17 00:00:00 2001
+From: Grazvydas Ignotas <notasas@gmail.com>
+Date: Thu, 12 Feb 2015 15:00:19 -0800
+Subject: mm/memory.c: actually remap enough memory
+
+From: Grazvydas Ignotas <notasas@gmail.com>
+
+commit 9cb12d7b4ccaa976f97ce0c5fd0f1b6a83bc2a75 upstream.
+
+For whatever reason, generic_access_phys() only remaps one page, but
+actually allows to access arbitrary size. It's quite easy to trigger
+large reads, like printing out large structure with gdb, which leads to a
+crash. Fix it by remapping correct size.
+
+Fixes: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
+Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
+Cc: Rik van Riel <riel@redhat.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/memory.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/mm/memory.c
++++ b/mm/memory.c
+@@ -3561,7 +3561,7 @@ int generic_access_phys(struct vm_area_s
+ if (follow_phys(vma, addr, write, &prot, &phys_addr))
+ return -EINVAL;
+
+- maddr = ioremap_prot(phys_addr, PAGE_SIZE, prot);
++ maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
+ if (write)
+ memcpy_toio(maddr + offset, buf, len);
+ else
--- /dev/null
+From 5703b087dc8eaf47bfb399d6cf512d471beff405 Mon Sep 17 00:00:00 2001
+From: Roman Gushchin <klamm@yandex-team.ru>
+Date: Wed, 11 Feb 2015 15:28:39 -0800
+Subject: mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
+
+From: Roman Gushchin <klamm@yandex-team.ru>
+
+commit 5703b087dc8eaf47bfb399d6cf512d471beff405 upstream.
+
+I noticed, that "allowed" can easily overflow by falling below 0,
+because (total_vm / 32) can be larger than "allowed". The problem
+occurs in OVERCOMMIT_NONE mode.
+
+In this case, a huge allocation can success and overcommit the system
+(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall
+(system-wide), so system become unusable.
+
+The problem was masked out by commit c9b1d0981fcc
+("mm: limit growth of 3% hardcoded other user reserve"),
+but it's easy to reproduce it on older kernels:
+1) set overcommit_memory sysctl to 2
+2) mmap() large file multiple times (with VM_SHARED flag)
+3) try to malloc() large amount of memory
+
+It also can be reproduced on newer kernels, but miss-configured
+sysctl_user_reserve_kbytes is required.
+
+Fix this issue by switching to signed arithmetic here.
+
+[akpm@linux-foundation.org: use min_t]
+Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
+Cc: Andrew Shewmaker <agshew@gmail.com>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
+Reviewed-by: Michal Hocko <mhocko@suse.cz>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/mmap.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/mm/mmap.c
++++ b/mm/mmap.c
+@@ -152,7 +152,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed);
+ */
+ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
+ {
+- unsigned long free, allowed, reserve;
++ long free, allowed, reserve;
+
+ VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
+ -(s64)vm_committed_as_batch * num_online_cpus(),
+@@ -220,7 +220,7 @@ int __vm_enough_memory(struct mm_struct
+ */
+ if (mm) {
+ reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
+- allowed -= min(mm->total_vm / 32, reserve);
++ allowed -= min_t(long, mm->total_vm / 32, reserve);
+ }
+
+ if (percpu_counter_read_positive(&vm_committed_as) < allowed)
--- /dev/null
+From da616534ed7f6e8ffaab699258b55c8d78d0b4ea Mon Sep 17 00:00:00 2001
+From: Joonsoo Kim <js1304@gmail.com>
+Date: Fri, 27 Feb 2015 15:51:43 -0800
+Subject: mm/nommu: fix memory leak
+
+From: Joonsoo Kim <js1304@gmail.com>
+
+commit da616534ed7f6e8ffaab699258b55c8d78d0b4ea upstream.
+
+Maxime reported the following memory leak regression due to commit
+dbc8358c7237 ("mm/nommu: use alloc_pages_exact() rather than its own
+implementation").
+
+On v3.19, I am facing a memory leak. Each time I run a command one page
+is lost. Here an example with busybox's free command:
+
+ / # free
+ total used free shared buffers cached
+ Mem: 7928 1972 5956 0 0 492
+ -/+ buffers/cache: 1480 6448
+ / # free
+ total used free shared buffers cached
+ Mem: 7928 1976 5952 0 0 492
+ -/+ buffers/cache: 1484 6444
+ / # free
+ total used free shared buffers cached
+ Mem: 7928 1980 5948 0 0 492
+ -/+ buffers/cache: 1488 6440
+ / # free
+ total used free shared buffers cached
+ Mem: 7928 1984 5944 0 0 492
+ -/+ buffers/cache: 1492 6436
+ / # free
+ total used free shared buffers cached
+ Mem: 7928 1988 5940 0 0 492
+ -/+ buffers/cache: 1496 6432
+
+At some point, the system fails to sastisfy 256KB allocations:
+
+ free: page allocation failure: order:6, mode:0xd0
+ CPU: 0 PID: 67 Comm: free Not tainted 3.19.0-05389-gacf2cf1-dirty #64
+ Hardware name: STM32 (Device Tree Support)
+ show_stack+0xb/0xc
+ warn_alloc_failed+0x97/0xbc
+ __alloc_pages_nodemask+0x295/0x35c
+ __get_free_pages+0xb/0x24
+ alloc_pages_exact+0x19/0x24
+ do_mmap_pgoff+0x423/0x658
+ vm_mmap_pgoff+0x3f/0x4e
+ load_flat_file+0x20d/0x4f8
+ load_flat_binary+0x3f/0x26c
+ search_binary_handler+0x51/0xe4
+ do_execveat_common+0x271/0x35c
+ do_execve+0x19/0x1c
+ ret_fast_syscall+0x1/0x4a
+ Mem-info:
+ Normal per-cpu:
+ CPU 0: hi: 0, btch: 1 usd: 0
+ active_anon:0 inactive_anon:0 isolated_anon:0
+ active_file:0 inactive_file:0 isolated_file:0
+ unevictable:123 dirty:0 writeback:0 unstable:0
+ free:1515 slab_reclaimable:17 slab_unreclaimable:139
+ mapped:0 shmem:0 pagetables:0 bounce:0
+ free_cma:0
+ Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks
+ lowmem_reserve[]: 0 0
+ Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB
+ 123 total pagecache pages
+ 2048 pages of RAM
+ 1538 free pages
+ 66 reserved pages
+ 109 slab pages
+ -46 pages shared
+ 0 pages swap cached
+ nommu: Allocation of length 221184 from process 67 (free) failed
+ Normal per-cpu:
+ CPU 0: hi: 0, btch: 1 usd: 0
+ active_anon:0 inactive_anon:0 isolated_anon:0
+ active_file:0 inactive_file:0 isolated_file:0
+ unevictable:123 dirty:0 writeback:0 unstable:0
+ free:1515 slab_reclaimable:17 slab_unreclaimable:139
+ mapped:0 shmem:0 pagetables:0 bounce:0
+ free_cma:0
+ Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks
+ lowmem_reserve[]: 0 0
+ Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB
+ 123 total pagecache pages
+ Unable to allocate RAM for process text/data, errno 12 SEGV
+
+This problem happens because we allocate ordered page through
+__get_free_pages() in do_mmap_private() in some cases and we try to free
+individual pages rather than ordered page in free_page_series(). In
+this case, freeing pages whose refcount is not 0 won't be freed to the
+page allocator so memory leak happens.
+
+To fix the problem, this patch changes __get_free_pages() to
+alloc_pages_exact() since alloc_pages_exact() returns
+physically-contiguous pages but each pages are refcounted.
+
+Fixes: dbc8358c7237 ("mm/nommu: use alloc_pages_exact() rather than its own implementation").
+Reported-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
+Tested-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
+Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/nommu.c | 4 +---
+ 1 file changed, 1 insertion(+), 3 deletions(-)
+
+--- a/mm/nommu.c
++++ b/mm/nommu.c
+@@ -1189,11 +1189,9 @@ static int do_mmap_private(struct vm_are
+ if (sysctl_nr_trim_pages && total - point >= sysctl_nr_trim_pages) {
+ total = point;
+ kdebug("try to alloc exact %lu pages", total);
+- base = alloc_pages_exact(len, GFP_KERNEL);
+- } else {
+- base = (void *)__get_free_pages(GFP_KERNEL, order);
+ }
+
++ base = alloc_pages_exact(total << PAGE_SHIFT, GFP_KERNEL);
+ if (!base)
+ goto enomem;
+
--- /dev/null
+From 8138a67a5557ffea3a21dfd6f037842d4e748513 Mon Sep 17 00:00:00 2001
+From: Roman Gushchin <klamm@yandex-team.ru>
+Date: Wed, 11 Feb 2015 15:28:42 -0800
+Subject: mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
+
+From: Roman Gushchin <klamm@yandex-team.ru>
+
+commit 8138a67a5557ffea3a21dfd6f037842d4e748513 upstream.
+
+I noticed that "allowed" can easily overflow by falling below 0, because
+(total_vm / 32) can be larger than "allowed". The problem occurs in
+OVERCOMMIT_NONE mode.
+
+In this case, a huge allocation can success and overcommit the system
+(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall
+(system-wide), so system become unusable.
+
+The problem was masked out by commit c9b1d0981fcc
+("mm: limit growth of 3% hardcoded other user reserve"),
+but it's easy to reproduce it on older kernels:
+1) set overcommit_memory sysctl to 2
+2) mmap() large file multiple times (with VM_SHARED flag)
+3) try to malloc() large amount of memory
+
+It also can be reproduced on newer kernels, but miss-configured
+sysctl_user_reserve_kbytes is required.
+
+Fix this issue by switching to signed arithmetic here.
+
+Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
+Cc: Andrew Shewmaker <agshew@gmail.com>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/nommu.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/mm/nommu.c
++++ b/mm/nommu.c
+@@ -1895,7 +1895,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
+ */
+ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
+ {
+- unsigned long free, allowed, reserve;
++ long free, allowed, reserve;
+
+ vm_acct_memory(pages);
+
+@@ -1959,7 +1959,7 @@ int __vm_enough_memory(struct mm_struct
+ */
+ if (mm) {
+ reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
+- allowed -= min(mm->total_vm / 32, reserve);
++ allowed -= min_t(long, mm->total_vm / 32, reserve);
+ }
+
+ if (percpu_counter_read_positive(&vm_committed_as) < allowed)
--- /dev/null
+From cc87317726f851531ae8422e0c2d3d6e2d7b1955 Mon Sep 17 00:00:00 2001
+From: Johannes Weiner <hannes@cmpxchg.org>
+Date: Fri, 27 Feb 2015 15:52:09 -0800
+Subject: mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change
+
+From: Johannes Weiner <hannes@cmpxchg.org>
+
+commit cc87317726f851531ae8422e0c2d3d6e2d7b1955 upstream.
+
+Historically, !__GFP_FS allocations were not allowed to invoke the OOM
+killer once reclaim had failed, but nevertheless kept looping in the
+allocator.
+
+Commit 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into
+allocation slowpath"), which should have been a simple cleanup patch,
+accidentally changed the behavior to aborting the allocation at that
+point. This creates problems with filesystem callers (?) that currently
+rely on the allocator waiting for other tasks to intervene.
+
+Revert the behavior as it shouldn't have been changed as part of a
+cleanup patch.
+
+Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath")
+Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
+Acked-by: Michal Hocko <mhocko@suse.cz>
+Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
+Cc: Theodore Ts'o <tytso@mit.edu>
+Cc: Dave Chinner <david@fromorbit.com>
+Acked-by: David Rientjes <rientjes@google.com>
+Cc: Oleg Nesterov <oleg@redhat.com>
+Cc: Mel Gorman <mgorman@suse.de>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/page_alloc.c | 9 ++++++++-
+ 1 file changed, 8 insertions(+), 1 deletion(-)
+
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -2380,8 +2380,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, un
+ if (high_zoneidx < ZONE_NORMAL)
+ goto out;
+ /* The OOM killer does not compensate for light reclaim */
+- if (!(gfp_mask & __GFP_FS))
++ if (!(gfp_mask & __GFP_FS)) {
++ /*
++ * XXX: Page reclaim didn't yield anything,
++ * and the OOM killer can't be invoked, but
++ * keep looping as per should_alloc_retry().
++ */
++ *did_some_progress = 1;
+ goto out;
++ }
+ /*
+ * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
+ * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
--- /dev/null
+From 99592d598eca62bdbbf62b59941c189176dfc614 Mon Sep 17 00:00:00 2001
+From: Vlastimil Babka <vbabka@suse.cz>
+Date: Wed, 11 Feb 2015 15:28:15 -0800
+Subject: mm: when stealing freepages, also take pages created by splitting buddy page
+
+From: Vlastimil Babka <vbabka@suse.cz>
+
+commit 99592d598eca62bdbbf62b59941c189176dfc614 upstream.
+
+When studying page stealing, I noticed some weird looking decisions in
+try_to_steal_freepages(). The first I assume is a bug (Patch 1), the
+following two patches were driven by evaluation.
+
+Testing was done with stress-highalloc of mmtests, using the
+mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how
+often page stealing occurs for individual migratetypes, and what
+migratetypes are used for fallbacks. Arguably, the worst case of page
+stealing is when UNMOVABLE allocation steals from MOVABLE pageblock.
+RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal,
+so the goal is to minimize these two cases.
+
+The evaluation of v2 wasn't always clear win and Joonsoo questioned the
+results. Here I used different baseline which includes RFC compaction
+improvements from [1]. I found that the compaction improvements reduce
+variability of stress-highalloc, so there's less noise in the data.
+
+First, let's look at stress-highalloc configured to do sync compaction,
+and how these patches reduce page stealing events during the test. First
+column is after fresh reboot, other two are reiterations of test without
+reboot. That was all accumulater over 5 re-iterations (so the benchmark
+was run 5x3 times with 5 fresh restarts).
+
+Baseline:
+
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 5-nothp-1 5-nothp-2 5-nothp-3
+Page alloc extfrag event 10264225 8702233 10244125
+Extfrag fragmenting 10263271 8701552 10243473
+Extfrag fragmenting for unmovable 13595 17616 15960
+Extfrag fragmenting unmovable placed with movable 7989 12193 8447
+Extfrag fragmenting for reclaimable 658 1840 1817
+Extfrag fragmenting reclaimable placed with movable 558 1677 1679
+Extfrag fragmenting for movable 10249018 8682096 10225696
+
+With Patch 1:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 6-nothp-1 6-nothp-2 6-nothp-3
+Page alloc extfrag event 11834954 9877523 9774860
+Extfrag fragmenting 11833993 9876880 9774245
+Extfrag fragmenting for unmovable 7342 16129 11712
+Extfrag fragmenting unmovable placed with movable 4191 10547 6270
+Extfrag fragmenting for reclaimable 373 1130 923
+Extfrag fragmenting reclaimable placed with movable 302 906 738
+Extfrag fragmenting for movable 11826278 9859621 9761610
+
+With Patch 2:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 7-nothp-1 7-nothp-2 7-nothp-3
+Page alloc extfrag event 4725990 3668793 3807436
+Extfrag fragmenting 4725104 3668252 3806898
+Extfrag fragmenting for unmovable 6678 7974 7281
+Extfrag fragmenting unmovable placed with movable 2051 3829 4017
+Extfrag fragmenting for reclaimable 429 1208 1278
+Extfrag fragmenting reclaimable placed with movable 369 976 1034
+Extfrag fragmenting for movable 4717997 3659070 3798339
+
+With Patch 3:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 8-nothp-1 8-nothp-2 8-nothp-3
+Page alloc extfrag event 5016183 4700142 3850633
+Extfrag fragmenting 5015325 4699613 3850072
+Extfrag fragmenting for unmovable 1312 3154 3088
+Extfrag fragmenting unmovable placed with movable 1115 2777 2714
+Extfrag fragmenting for reclaimable 437 1193 1097
+Extfrag fragmenting reclaimable placed with movable 330 969 879
+Extfrag fragmenting for movable 5013576 4695266 3845887
+
+In v2 we've seen apparent regression with Patch 1 for unmovable events,
+this is now gone, suggesting it was indeed noise. Here, each patch
+improves the situation for unmovable events. Reclaimable is improved by
+patch 1 and then either the same modulo noise, or perhaps sligtly worse -
+a small price for unmovable improvements, IMHO. The number of movable
+allocations falling back to other migratetypes is most noisy, but it's
+reduced to half at Patch 2 nevertheless. These are least critical as
+compaction can move them around.
+
+If we look at success rates, the patches don't affect them, that didn't change.
+
+Baseline:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 5-nothp-1 5-nothp-2 5-nothp-3
+Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
+Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
+Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
+Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
+Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
+Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
+Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
+Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
+Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
+
+Patch 1:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 6-nothp-1 6-nothp-2 6-nothp-3
+Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%)
+Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%)
+Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%)
+Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%)
+Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%)
+Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%)
+Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%)
+Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%)
+Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%)
+
+Patch 2:
+
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 7-nothp-1 7-nothp-2 7-nothp-3
+Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%)
+Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%)
+Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%)
+Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%)
+Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%)
+Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%)
+Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%)
+Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%)
+Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
+
+Patch 3:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 8-nothp-1 8-nothp-2 8-nothp-3
+Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%)
+Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%)
+Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%)
+Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%)
+Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%)
+Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%)
+Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%)
+Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%)
+Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%)
+
+While there's no improvement here, I consider reduced fragmentation events
+to be worth on its own. Patch 2 also seems to reduce scanning for free
+pages, and migrations in compaction, suggesting it has somewhat less work
+to do:
+
+Patch 1:
+
+Compaction stalls 4153 3959 3978
+Compaction success 1523 1441 1446
+Compaction failures 2630 2517 2531
+Page migrate success 4600827 4943120 5104348
+Page migrate failure 19763 16656 17806
+Compaction pages isolated 9597640 10305617 10653541
+Compaction migrate scanned 77828948 86533283 87137064
+Compaction free scanned 517758295 521312840 521462251
+Compaction cost 5503 5932 6110
+
+Patch 2:
+
+Compaction stalls 3800 3450 3518
+Compaction success 1421 1316 1317
+Compaction failures 2379 2134 2201
+Page migrate success 4160421 4502708 4752148
+Page migrate failure 19705 14340 14911
+Compaction pages isolated 8731983 9382374 9910043
+Compaction migrate scanned 98362797 96349194 98609686
+Compaction free scanned 496512560 469502017 480442545
+Compaction cost 5173 5526 5811
+
+As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers
+of unmovable and reclaimable pageblocks.
+
+Configuring the benchmark to allocate like THP page fault (i.e. no sync
+compaction) gives much noisier results for iterations 2 and 3 after
+reboot. This is not so surprising given how [1] offers lower improvements
+in this scenario due to less restarts after deferred compaction which
+would change compaction pivot.
+
+Baseline:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 5-thp-1 5-thp-2 5-thp-3
+Page alloc extfrag event 8148965 6227815 6646741
+Extfrag fragmenting 8147872 6227130 6646117
+Extfrag fragmenting for unmovable 10324 12942 15975
+Extfrag fragmenting unmovable placed with movable 5972 8495 10907
+Extfrag fragmenting for reclaimable 601 1707 2210
+Extfrag fragmenting reclaimable placed with movable 520 1570 2000
+Extfrag fragmenting for movable 8136947 6212481 6627932
+
+Patch 1:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 6-thp-1 6-thp-2 6-thp-3
+Page alloc extfrag event 8345457 7574471 7020419
+Extfrag fragmenting 8343546 7573777 7019718
+Extfrag fragmenting for unmovable 10256 18535 30716
+Extfrag fragmenting unmovable placed with movable 6893 11726 22181
+Extfrag fragmenting for reclaimable 465 1208 1023
+Extfrag fragmenting reclaimable placed with movable 353 996 843
+Extfrag fragmenting for movable 8332825 7554034 6987979
+
+Patch 2:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 7-thp-1 7-thp-2 7-thp-3
+Page alloc extfrag event 3512847 3020756 2891625
+Extfrag fragmenting 3511940 3020185 2891059
+Extfrag fragmenting for unmovable 9017 6892 6191
+Extfrag fragmenting unmovable placed with movable 1524 3053 2435
+Extfrag fragmenting for reclaimable 445 1081 1160
+Extfrag fragmenting reclaimable placed with movable 375 918 986
+Extfrag fragmenting for movable 3502478 3012212 2883708
+
+Patch 3:
+ 3.19-rc4 3.19-rc4 3.19-rc4
+ 8-thp-1 8-thp-2 8-thp-3
+Page alloc extfrag event 3181699 3082881 2674164
+Extfrag fragmenting 3180812 3082303 2673611
+Extfrag fragmenting for unmovable 1201 4031 4040
+Extfrag fragmenting unmovable placed with movable 974 3611 3645
+Extfrag fragmenting for reclaimable 478 1165 1294
+Extfrag fragmenting reclaimable placed with movable 387 985 1030
+Extfrag fragmenting for movable 3179133 3077107 2668277
+
+The improvements for first iteration are clear, the rest is much noisier
+and can appear like regression for Patch 1. Anyway, patch 2 rectifies it.
+
+Allocation success rates are again unaffected so there's no point in
+making this e-mail any longer.
+
+[1] http://marc.info/?l=linux-mm&m=142166196321125&w=2
+
+This patch (of 3):
+
+When __rmqueue_fallback() is called to allocate a page of order X, it will
+find a page of order Y >= X of a fallback migratetype, which is different
+from the desired migratetype. With the help of try_to_steal_freepages(),
+it may change the migratetype (to the desired one) also of:
+
+1) all currently free pages in the pageblock containing the fallback page
+2) the fallback pageblock itself
+3) buddy pages created by splitting the fallback page (when Y > X)
+
+These decisions take the order Y into account, as well as the desired
+migratetype, with the goal of preventing multiple fallback allocations
+that could e.g. distribute UNMOVABLE allocations among multiple
+pageblocks.
+
+Originally, decision for 1) has implied the decision for 3). Commit
+47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
+(probably unintentionally) so that the buddy pages in case 3) are always
+changed to the desired migratetype, except for CMA pageblocks.
+
+Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code
+and fix a bug") did some refactoring and added a comment that the case of
+3) is intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should
+respect pageblock type") removed the comment and tried to restore the
+original behavior where 1) implies 3), but due to the previous
+refactoring, the result is instead that only 2) implies 3) - and the
+conditions for 2) are less frequently met than conditions for 1). This
+may increase fragmentation in situations where the code decides to steal
+all free pages from the pageblock (case 1)), but then gives back the buddy
+pages produced by splitting.
+
+This patch restores the original intended logic where 1) implies 3).
+During testing with stress-highalloc from mmtests, this has shown to
+decrease the number of events where UNMOVABLE and RECLAIMABLE allocations
+steal from MOVABLE pageblocks, which can lead to permanent fragmentation.
+In some cases it has increased the number of events when MOVABLE
+allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are
+fixable by sync compaction and thus less harmful.
+
+Note that evaluation has shown that the behavior introduced by
+47118af076f6 for buddy pages in case 3) is actually even better than the
+original logic, so the following patch will introduce it properly once
+again. For stable backports of this patch it makes thus sense to only fix
+versions containing 0cbef29a7821.
+
+[iamjoonsoo.kim@lge.com: tracepoint fix]
+Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
+Acked-by: Mel Gorman <mgorman@suse.de>
+Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
+Acked-by: Minchan Kim <minchan@kernel.org>
+Cc: David Rientjes <rientjes@google.com>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
+Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Cc: Michal Hocko <mhocko@suse.cz>
+Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ include/trace/events/kmem.h | 7 ++++---
+ mm/page_alloc.c | 12 +++++-------
+ 2 files changed, 9 insertions(+), 10 deletions(-)
+
+--- a/include/trace/events/kmem.h
++++ b/include/trace/events/kmem.h
+@@ -268,11 +268,11 @@ TRACE_EVENT(mm_page_alloc_extfrag,
+
+ TP_PROTO(struct page *page,
+ int alloc_order, int fallback_order,
+- int alloc_migratetype, int fallback_migratetype, int new_migratetype),
++ int alloc_migratetype, int fallback_migratetype),
+
+ TP_ARGS(page,
+ alloc_order, fallback_order,
+- alloc_migratetype, fallback_migratetype, new_migratetype),
++ alloc_migratetype, fallback_migratetype),
+
+ TP_STRUCT__entry(
+ __field( struct page *, page )
+@@ -289,7 +289,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
+ __entry->fallback_order = fallback_order;
+ __entry->alloc_migratetype = alloc_migratetype;
+ __entry->fallback_migratetype = fallback_migratetype;
+- __entry->change_ownership = (new_migratetype == alloc_migratetype);
++ __entry->change_ownership = (alloc_migratetype ==
++ get_pageblock_migratetype(page));
+ ),
+
+ TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -1138,8 +1138,8 @@ static void change_pageblock_range(struc
+ * nor move CMA pages to different free lists. We don't want unmovable pages
+ * to be allocated from MIGRATE_CMA areas.
+ *
+- * Returns the new migratetype of the pageblock (or the same old migratetype
+- * if it was unchanged).
++ * Returns the allocation migratetype if free pages were stolen, or the
++ * fallback migratetype if it was decided not to steal.
+ */
+ static int try_to_steal_freepages(struct zone *zone, struct page *page,
+ int start_type, int fallback_type)
+@@ -1170,12 +1170,10 @@ static int try_to_steal_freepages(struct
+
+ /* Claim the whole block if over half of it is free */
+ if (pages >= (1 << (pageblock_order-1)) ||
+- page_group_by_mobility_disabled) {
+-
++ page_group_by_mobility_disabled)
+ set_pageblock_migratetype(page, start_type);
+- return start_type;
+- }
+
++ return start_type;
+ }
+
+ return fallback_type;
+@@ -1227,7 +1225,7 @@ __rmqueue_fallback(struct zone *zone, un
+ set_freepage_migratetype(page, new_type);
+
+ trace_mm_page_alloc_extfrag(page, order, current_order,
+- start_migratetype, migratetype, new_type);
++ start_migratetype, migratetype);
+
+ return page;
+ }
udp-only-allow-ufo-for-packets-from-sock_dgram-sockets.patch
net-ping-return-eafnosupport-when-appropriate.patch
team-don-t-traverse-port-list-using-rcu-in-team_set_mac_address.patch
+mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch
+mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch
+mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch
+mm-hugetlb-remove-unnecessary-lower-bound-on-sysctl-handlers.patch
+mm-when-stealing-freepages-also-take-pages-created-by-splitting-buddy-page.patch
+mm-mmap.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch
+mm-nommu.c-fix-arithmetic-overflow-in-__vm_enough_memory.patch
+mm-compaction-fix-wrong-order-check-in-compact_finished.patch
+mm-memory.c-actually-remap-enough-memory.patch
+mm-hwpoison-drop-lru_add_drain_all-in-__soft_offline_page.patch
+mm-fix-negative-nr_isolated-counts.patch
+mm-nommu-fix-memory-leak.patch
+mm-page_alloc-revert-inadvertent-__gfp_fs-retry-behavior-change.patch