From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 14 Jan 2015 07:19:29 +0000 (-0800)
Subject: 3.10-stable patches
X-Git-Tag: v3.10.65~5
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=dd994cf215b1a63756f7d23b88ef6582a053b2ee;p=thirdparty%2Fkernel%2Fstable-queue.git

3.10-stable patches

added patches:
	mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch
	mm-propagate-error-from-stack-expansion-even-for-guard-page.patch
	mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch
---

diff --git a/queue-3.10/mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch b/queue-3.10/mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch
new file mode 100644
index 00000000000..d15514aa9ed
--- /dev/null
+++ b/queue-3.10/mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch
@@ -0,0 +1,54 @@
+From 690eac53daff34169a4d74fc7bfbd388c4896abb Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Sun, 11 Jan 2015 11:33:57 -0800
+Subject: mm: Don't count the stack guard page towards RLIMIT_STACK
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit 690eac53daff34169a4d74fc7bfbd388c4896abb upstream.
+
+Commit fee7e49d4514 ("mm: propagate error from stack expansion even for
+guard page") made sure that we return the error properly for stack
+growth conditions.  It also theorized that counting the guard page
+towards the stack limit might break something, but also said "Let's see
+if anybody notices".
+
+Somebody did notice.  Apparently android-x86 sets the stack limit very
+close to the limit indeed, and including the guard page in the rlimit
+check causes the android 'zygote' process problems.
+
+So this adds the (fairly trivial) code to make the stack rlimit check be
+against the actual real stack size, rather than the size of the vma that
+includes the guard page.
+
+Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
+Cc: Jay Foad <jay.foad@gmail.com>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/mmap.c |    7 +++++--
+ 1 file changed, 5 insertions(+), 2 deletions(-)
+
+--- a/mm/mmap.c
++++ b/mm/mmap.c
+@@ -2056,14 +2056,17 @@ static int acct_stack_growth(struct vm_a
+ {
+ 	struct mm_struct *mm = vma->vm_mm;
+ 	struct rlimit *rlim = current->signal->rlim;
+-	unsigned long new_start;
++	unsigned long new_start, actual_size;
+ 
+ 	/* address space limit tests */
+ 	if (!may_expand_vm(mm, grow))
+ 		return -ENOMEM;
+ 
+ 	/* Stack limit test */
+-	if (size > ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur))
++	actual_size = size;
++	if (size && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN)))
++		actual_size -= PAGE_SIZE;
++	if (actual_size > ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur))
+ 		return -ENOMEM;
+ 
+ 	/* mlock limit tests */
diff --git a/queue-3.10/mm-propagate-error-from-stack-expansion-even-for-guard-page.patch b/queue-3.10/mm-propagate-error-from-stack-expansion-even-for-guard-page.patch
new file mode 100644
index 00000000000..c26a22dfd4d
--- /dev/null
+++ b/queue-3.10/mm-propagate-error-from-stack-expansion-even-for-guard-page.patch
@@ -0,0 +1,70 @@
+From fee7e49d45149fba60156f5b59014f764d3e3728 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Tue, 6 Jan 2015 13:00:05 -0800
+Subject: mm: propagate error from stack expansion even for guard page
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream.
+
+Jay Foad reports that the address sanitizer test (asan) sometimes gets
+confused by a stack pointer that ends up being outside the stack vma
+that is reported by /proc/maps.
+
+This happens due to an interaction between RLIMIT_STACK and the guard
+page: when we do the guard page check, we ignore the potential error
+from the stack expansion, which effectively results in a missing guard
+page, since the expected stack expansion won't have been done.
+
+And since /proc/maps explicitly ignores the guard page (commit
+d7824370e263: "mm: fix up some user-visible effects of the stack guard
+page"), the stack pointer ends up being outside the reported stack area.
+
+This is the minimal patch: it just propagates the error.  It also
+effectively makes the guard page part of the stack limit, which in turn
+measn that the actual real stack is one page less than the stack limit.
+
+Let's see if anybody notices.  We could teach acct_stack_growth() to
+allow an extra page for a grow-up/grow-down stack in the rlimit test,
+but I don't want to add more complexity if it isn't needed.
+
+Reported-and-tested-by: Jay Foad <jay.foad@gmail.com>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ include/linux/mm.h |    2 +-
+ mm/memory.c        |    4 ++--
+ 2 files changed, 3 insertions(+), 3 deletions(-)
+
+--- a/include/linux/mm.h
++++ b/include/linux/mm.h
+@@ -1630,7 +1630,7 @@ extern int expand_downwards(struct vm_ar
+ #if VM_GROWSUP
+ extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
+ #else
+-  #define expand_upwards(vma, address) do { } while (0)
++  #define expand_upwards(vma, address) (0)
+ #endif
+ 
+ /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
+--- a/mm/memory.c
++++ b/mm/memory.c
+@@ -3200,7 +3200,7 @@ static inline int check_stack_guard_page
+ 		if (prev && prev->vm_end == address)
+ 			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
+ 
+-		expand_downwards(vma, address - PAGE_SIZE);
++		return expand_downwards(vma, address - PAGE_SIZE);
+ 	}
+ 	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
+ 		struct vm_area_struct *next = vma->vm_next;
+@@ -3209,7 +3209,7 @@ static inline int check_stack_guard_page
+ 		if (next && next->vm_start == address + PAGE_SIZE)
+ 			return next->vm_flags & VM_GROWSUP ? 0 : -ENOMEM;
+ 
+-		expand_upwards(vma, address + PAGE_SIZE);
++		return expand_upwards(vma, address + PAGE_SIZE);
+ 	}
+ 	return 0;
+ }
diff --git a/queue-3.10/mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch b/queue-3.10/mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch
new file mode 100644
index 00000000000..6473de22050
--- /dev/null
+++ b/queue-3.10/mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch
@@ -0,0 +1,109 @@
+From 9e5e3661727eaf960d3480213f8e87c8d67b6956 Mon Sep 17 00:00:00 2001
+From: Vlastimil Babka <vbabka@suse.cz>
+Date: Thu, 8 Jan 2015 14:32:40 -0800
+Subject: mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
+
+From: Vlastimil Babka <vbabka@suse.cz>
+
+commit 9e5e3661727eaf960d3480213f8e87c8d67b6956 upstream.
+
+Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
+stuck in a busy loop with nothing left to balance, but
+kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
+to be a combination of several factors:
+
+1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait
+
+2. The process has been killed (by OOM in this case), but has not yet been
+   scheduled to remove itself from the waitqueue and die.
+
+3. kswapd checks for throttled processes in prepare_kswapd_sleep():
+
+        if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
+                wake_up(&pgdat->pfmemalloc_wait);
+		return false; // kswapd will not go to sleep
+	}
+
+   However, for a process that was already killed, wake_up() does not remove
+   the process from the waitqueue, since try_to_wake_up() checks its state
+   first and returns false when the process is no longer waiting.
+
+4. kswapd is running on the same CPU as the only CPU that the process is
+   allowed to run on (through cpus_allowed, or possibly single-cpu system).
+
+5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
+   encounters no voluntary preemption points and repeatedly fails
+   prepare_kswapd_sleep(), blocking the process from running and removing
+   itself from the waitqueue, which would let kswapd sleep.
+
+So, the source of the problem is that we prevent kswapd from going to
+sleep until there are processes waiting on the pfmemalloc_wait queue,
+and a process waiting on a queue is guaranteed to be removed from the
+queue only when it gets scheduled.  This was done to make sure that no
+process is left sleeping on pfmemalloc_wait when kswapd itself goes to
+sleep.
+
+However, it isn't necessary to postpone kswapd sleep until the
+pfmemalloc_wait queue actually empties.  To prevent processes from being
+left sleeping, it's actually enough to guarantee that all processes
+waiting on pfmemalloc_wait queue have been woken up by the time we put
+kswapd to sleep.
+
+This patch therefore fixes this issue by substituting 'wake_up' with
+'wake_up_all' and removing 'return false' in the code snippet from
+prepare_kswapd_sleep() above.  Note that if any process puts itself in
+the queue after this waitqueue_active() check, or after the wake up
+itself, it means that the process will also wake up kswapd - and since
+we are under prepare_to_wait(), the wake up won't be missed.  Also we
+update the comment prepare_kswapd_sleep() to hopefully more clearly
+describe the races it is preventing.
+
+Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
+Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
+Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
+Cc: Mel Gorman <mgorman@suse.de>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Acked-by: Michal Hocko <mhocko@suse.cz>
+Acked-by: Rik van Riel <riel@redhat.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/vmscan.c |   24 +++++++++++++-----------
+ 1 file changed, 13 insertions(+), 11 deletions(-)
+
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -2631,18 +2631,20 @@ static bool prepare_kswapd_sleep(pg_data
+ 		return false;
+ 
+ 	/*
+-	 * There is a potential race between when kswapd checks its watermarks
+-	 * and a process gets throttled. There is also a potential race if
+-	 * processes get throttled, kswapd wakes, a large process exits therby
+-	 * balancing the zones that causes kswapd to miss a wakeup. If kswapd
+-	 * is going to sleep, no process should be sleeping on pfmemalloc_wait
+-	 * so wake them now if necessary. If necessary, processes will wake
+-	 * kswapd and get throttled again
++	 * The throttled processes are normally woken up in balance_pgdat() as
++	 * soon as pfmemalloc_watermark_ok() is true. But there is a potential
++	 * race between when kswapd checks the watermarks and a process gets
++	 * throttled. There is also a potential race if processes get
++	 * throttled, kswapd wakes, a large process exits thereby balancing the
++	 * zones, which causes kswapd to exit balance_pgdat() before reaching
++	 * the wake up checks. If kswapd is going to sleep, no process should
++	 * be sleeping on pfmemalloc_wait, so wake them now if necessary. If
++	 * the wake up is premature, processes will wake kswapd and get
++	 * throttled again. The difference from wake ups in balance_pgdat() is
++	 * that here we are under prepare_to_wait().
+ 	 */
+-	if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
+-		wake_up(&pgdat->pfmemalloc_wait);
+-		return false;
+-	}
++	if (waitqueue_active(&pgdat->pfmemalloc_wait))
++		wake_up_all(&pgdat->pfmemalloc_wait);
+ 
+ 	return pgdat_balanced(pgdat, order, classzone_idx);
+ }
diff --git a/queue-3.10/series b/queue-3.10/series
index 7086e5ffff3..33db55ccea6 100644
--- a/queue-3.10/series
+++ b/queue-3.10/series
@@ -39,3 +39,6 @@ btrfs-don-t-delay-inode-ref-updates-during-log-replay.patch
 perf-x86-intel-uncore-make-sure-only-uncore-events-are-collected.patch
 perf-fix-events-installation-during-moving-group.patch
 perf-session-do-not-fail-on-processing-out-of-order-event.patch
+mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch
+mm-propagate-error-from-stack-expansion-even-for-guard-page.patch
+mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch