From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sat, 2 Dec 2017 09:43:53 +0000 (+0000)
Subject: 4.14-stable patches
X-Git-Tag: v3.18.86~25
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=1d671f69d5f8d8d0c9d2114b3d41cb302bdd3369;p=thirdparty%2Fkernel%2Fstable-queue.git

4.14-stable patches

added patches:
	mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
	mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
---

diff --git a/queue-4.14/mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch b/queue-4.14/mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
new file mode 100644
index 00000000000..5d248a43687
--- /dev/null
+++ b/queue-4.14/mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
@@ -0,0 +1,64 @@
+From 4b81cb2ff69c8a8e297a147d2eb4d9b5e8d7c435 Mon Sep 17 00:00:00 2001
+From: Michal Hocko <mhocko@suse.com>
+Date: Wed, 29 Nov 2017 16:09:54 -0800
+Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context
+
+From: Michal Hocko <mhocko@suse.com>
+
+commit 4b81cb2ff69c8a8e297a147d2eb4d9b5e8d7c435 upstream.
+
+drain_all_pages backs off when called from a kworker context since
+commit 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue
+context") because the original IPI based pcp draining has been replaced
+by a WQ based one and the check wanted to prevent from recursion and
+inter workers dependencies.  This has made some sense at the time
+because the system WQ has been used and one worker holding the lock
+could be blocked while waiting for new workers to emerge which can be a
+problem under OOM conditions.
+
+Since then commit ce612879ddc7 ("mm: move pcp and lru-pcp draining into
+single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a
+rescuer so we shouldn't depend on any other WQ activity to make a
+forward progress so calling drain_all_pages from a worker context is
+safe as long as this doesn't happen from mm_percpu_wq itself which is
+not the case because all workers are required to _not_ depend on any MM
+locks.
+
+Why is this a problem in the first place? ACPI driven memory hot-remove
+(acpi_device_hotplug) is executed from the worker context.  We end up
+calling __offline_pages to free all the pages and that requires both
+lru_add_drain_all_cpuslocked and drain_all_pages to do their job
+otherwise we can have dangling pages on pcp lists and fail the offline
+operation (__test_page_isolated_in_pageblock would see a page with 0 ref
+count but without PageBuddy set).
+
+Fix the issue by removing the worker check in drain_all_pages.
+lru_add_drain_all_cpuslocked doesn't have this restriction so it works
+as expected.
+
+Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
+Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Cc: Mel Gorman <mgorman@suse.de>
+Cc: Tejun Heo <tj@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/page_alloc.c |    4 ----
+ 1 file changed, 4 deletions(-)
+
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -2487,10 +2487,6 @@ void drain_all_pages(struct zone *zone)
+ 	if (WARN_ON_ONCE(!mm_percpu_wq))
+ 		return;
+ 
+-	/* Workqueues cannot recurse */
+-	if (current->flags & PF_WQ_WORKER)
+-		return;
+-
+ 	/*
+ 	 * Do not drain if one is already in progress unless it's specific to
+ 	 * a zone. Such callers are primarily CMA and memory hotplug and need
diff --git a/queue-4.14/mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch b/queue-4.14/mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
new file mode 100644
index 00000000000..0610c22fffe
--- /dev/null
+++ b/queue-4.14/mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
@@ -0,0 +1,86 @@
+From 687cb0884a714ff484d038e9190edc874edcf146 Mon Sep 17 00:00:00 2001
+From: Wang Nan <wangnan0@huawei.com>
+Date: Wed, 29 Nov 2017 16:09:58 -0800
+Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry
+
+From: Wang Nan <wangnan0@huawei.com>
+
+commit 687cb0884a714ff484d038e9190edc874edcf146 upstream.
+
+tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
+space.  In this case, tlb->fullmm is true.  Some archs like arm64
+doesn't flush TLB when tlb->fullmm is true:
+
+  commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
+
+Which causes leaking of tlb entries.
+
+Will clarifies his patch:
+ "Basically, we tag each address space with an ASID (PCID on x86) which
+  is resident in the TLB. This means we can elide TLB invalidation when
+  pulling down a full mm because we won't ever assign that ASID to
+  another mm without doing TLB invalidation elsewhere (which actually
+  just nukes the whole TLB).
+
+  I think that means that we could potentially not fault on a kernel
+  uaccess, because we could hit in the TLB"
+
+There could be a window between complete_signal() sending IPI to other
+cores and all threads sharing this mm are really kicked off from cores.
+In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
+flush TLB then frees pages.  However, due to the above problem, the TLB
+entries are not really flushed on arm64.  Other threads are possible to
+access these pages through TLB entries.  Moreover, a copy_to_user() can
+also write to these pages without generating page fault, causes
+use-after-free bugs.
+
+This patch gathers each vma instead of gathering full vm space.  In this
+case tlb->fullmm is not true.  The behavior of oom reaper become similar
+to munmapping before do_exit, which should be safe for all archs.
+
+Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
+Fixes: aac453635549 ("mm, oom: introduce oom reaper")
+Signed-off-by: Wang Nan <wangnan0@huawei.com>
+Acked-by: Michal Hocko <mhocko@suse.com>
+Acked-by: David Rientjes <rientjes@google.com>
+Cc: Minchan Kim <minchan@kernel.org>
+Cc: Will Deacon <will.deacon@arm.com>
+Cc: Bob Liu <liubo95@huawei.com>
+Cc: Ingo Molnar <mingo@kernel.org>
+Cc: Roman Gushchin <guro@fb.com>
+Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/oom_kill.c |    7 ++++---
+ 1 file changed, 4 insertions(+), 3 deletions(-)
+
+--- a/mm/oom_kill.c
++++ b/mm/oom_kill.c
+@@ -532,7 +532,6 @@ static bool __oom_reap_task_mm(struct ta
+ 	 */
+ 	set_bit(MMF_UNSTABLE, &mm->flags);
+ 
+-	tlb_gather_mmu(&tlb, mm, 0, -1);
+ 	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+ 		if (!can_madv_dontneed_vma(vma))
+ 			continue;
+@@ -547,11 +546,13 @@ static bool __oom_reap_task_mm(struct ta
+ 		 * we do not want to block exit_mmap by keeping mm ref
+ 		 * count elevated without a good reason.
+ 		 */
+-		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
++		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
++			tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
+ 			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
+ 					 NULL);
++			tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
++		}
+ 	}
+-	tlb_finish_mmu(&tlb, 0, -1);
+ 	pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
+ 			task_pid_nr(tsk), tsk->comm,
+ 			K(get_mm_counter(mm, MM_ANONPAGES)),
diff --git a/queue-4.14/series b/queue-4.14/series
index e67efbcfa26..51d61ddee04 100644
--- a/queue-4.14/series
+++ b/queue-4.14/series
@@ -1 +1,3 @@
 platform-x86-hp-wmi-fix-tablet-mode-detection-for-convertibles.patch
+mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
+mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch