From: Greg Kroah-Hartman Date: Wed, 14 Jan 2015 07:19:29 +0000 (-0800) Subject: 3.10-stable patches X-Git-Tag: v3.10.65~5 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=dd994cf215b1a63756f7d23b88ef6582a053b2ee;p=thirdparty%2Fkernel%2Fstable-queue.git 3.10-stable patches added patches: mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch mm-propagate-error-from-stack-expansion-even-for-guard-page.patch mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch --- diff --git a/queue-3.10/mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch b/queue-3.10/mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch new file mode 100644 index 00000000000..d15514aa9ed --- /dev/null +++ b/queue-3.10/mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch @@ -0,0 +1,54 @@ +From 690eac53daff34169a4d74fc7bfbd388c4896abb Mon Sep 17 00:00:00 2001 +From: Linus Torvalds +Date: Sun, 11 Jan 2015 11:33:57 -0800 +Subject: mm: Don't count the stack guard page towards RLIMIT_STACK + +From: Linus Torvalds + +commit 690eac53daff34169a4d74fc7bfbd388c4896abb upstream. + +Commit fee7e49d4514 ("mm: propagate error from stack expansion even for +guard page") made sure that we return the error properly for stack +growth conditions. It also theorized that counting the guard page +towards the stack limit might break something, but also said "Let's see +if anybody notices". + +Somebody did notice. Apparently android-x86 sets the stack limit very +close to the limit indeed, and including the guard page in the rlimit +check causes the android 'zygote' process problems. + +So this adds the (fairly trivial) code to make the stack rlimit check be +against the actual real stack size, rather than the size of the vma that +includes the guard page. + +Reported-and-tested-by: Chih-Wei Huang +Cc: Jay Foad +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/mmap.c | 7 +++++-- + 1 file changed, 5 insertions(+), 2 deletions(-) + +--- a/mm/mmap.c ++++ b/mm/mmap.c +@@ -2056,14 +2056,17 @@ static int acct_stack_growth(struct vm_a + { + struct mm_struct *mm = vma->vm_mm; + struct rlimit *rlim = current->signal->rlim; +- unsigned long new_start; ++ unsigned long new_start, actual_size; + + /* address space limit tests */ + if (!may_expand_vm(mm, grow)) + return -ENOMEM; + + /* Stack limit test */ +- if (size > ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur)) ++ actual_size = size; ++ if (size && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN))) ++ actual_size -= PAGE_SIZE; ++ if (actual_size > ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur)) + return -ENOMEM; + + /* mlock limit tests */ diff --git a/queue-3.10/mm-propagate-error-from-stack-expansion-even-for-guard-page.patch b/queue-3.10/mm-propagate-error-from-stack-expansion-even-for-guard-page.patch new file mode 100644 index 00000000000..c26a22dfd4d --- /dev/null +++ b/queue-3.10/mm-propagate-error-from-stack-expansion-even-for-guard-page.patch @@ -0,0 +1,70 @@ +From fee7e49d45149fba60156f5b59014f764d3e3728 Mon Sep 17 00:00:00 2001 +From: Linus Torvalds +Date: Tue, 6 Jan 2015 13:00:05 -0800 +Subject: mm: propagate error from stack expansion even for guard page + +From: Linus Torvalds + +commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream. + +Jay Foad reports that the address sanitizer test (asan) sometimes gets +confused by a stack pointer that ends up being outside the stack vma +that is reported by /proc/maps. + +This happens due to an interaction between RLIMIT_STACK and the guard +page: when we do the guard page check, we ignore the potential error +from the stack expansion, which effectively results in a missing guard +page, since the expected stack expansion won't have been done. + +And since /proc/maps explicitly ignores the guard page (commit +d7824370e263: "mm: fix up some user-visible effects of the stack guard +page"), the stack pointer ends up being outside the reported stack area. + +This is the minimal patch: it just propagates the error. It also +effectively makes the guard page part of the stack limit, which in turn +measn that the actual real stack is one page less than the stack limit. + +Let's see if anybody notices. We could teach acct_stack_growth() to +allow an extra page for a grow-up/grow-down stack in the rlimit test, +but I don't want to add more complexity if it isn't needed. + +Reported-and-tested-by: Jay Foad +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + include/linux/mm.h | 2 +- + mm/memory.c | 4 ++-- + 2 files changed, 3 insertions(+), 3 deletions(-) + +--- a/include/linux/mm.h ++++ b/include/linux/mm.h +@@ -1630,7 +1630,7 @@ extern int expand_downwards(struct vm_ar + #if VM_GROWSUP + extern int expand_upwards(struct vm_area_struct *vma, unsigned long address); + #else +- #define expand_upwards(vma, address) do { } while (0) ++ #define expand_upwards(vma, address) (0) + #endif + + /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ +--- a/mm/memory.c ++++ b/mm/memory.c +@@ -3200,7 +3200,7 @@ static inline int check_stack_guard_page + if (prev && prev->vm_end == address) + return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM; + +- expand_downwards(vma, address - PAGE_SIZE); ++ return expand_downwards(vma, address - PAGE_SIZE); + } + if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) { + struct vm_area_struct *next = vma->vm_next; +@@ -3209,7 +3209,7 @@ static inline int check_stack_guard_page + if (next && next->vm_start == address + PAGE_SIZE) + return next->vm_flags & VM_GROWSUP ? 0 : -ENOMEM; + +- expand_upwards(vma, address + PAGE_SIZE); ++ return expand_upwards(vma, address + PAGE_SIZE); + } + return 0; + } diff --git a/queue-3.10/mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch b/queue-3.10/mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch new file mode 100644 index 00000000000..6473de22050 --- /dev/null +++ b/queue-3.10/mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch @@ -0,0 +1,109 @@ +From 9e5e3661727eaf960d3480213f8e87c8d67b6956 Mon Sep 17 00:00:00 2001 +From: Vlastimil Babka +Date: Thu, 8 Jan 2015 14:32:40 -0800 +Subject: mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed + +From: Vlastimil Babka + +commit 9e5e3661727eaf960d3480213f8e87c8d67b6956 upstream. + +Charles Shirron and Paul Cassella from Cray Inc have reported kswapd +stuck in a busy loop with nothing left to balance, but +kswapd_try_to_sleep() failing to sleep. Their analysis found the cause +to be a combination of several factors: + +1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait + +2. The process has been killed (by OOM in this case), but has not yet been + scheduled to remove itself from the waitqueue and die. + +3. kswapd checks for throttled processes in prepare_kswapd_sleep(): + + if (waitqueue_active(&pgdat->pfmemalloc_wait)) { + wake_up(&pgdat->pfmemalloc_wait); + return false; // kswapd will not go to sleep + } + + However, for a process that was already killed, wake_up() does not remove + the process from the waitqueue, since try_to_wake_up() checks its state + first and returns false when the process is no longer waiting. + +4. kswapd is running on the same CPU as the only CPU that the process is + allowed to run on (through cpus_allowed, or possibly single-cpu system). + +5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd + encounters no voluntary preemption points and repeatedly fails + prepare_kswapd_sleep(), blocking the process from running and removing + itself from the waitqueue, which would let kswapd sleep. + +So, the source of the problem is that we prevent kswapd from going to +sleep until there are processes waiting on the pfmemalloc_wait queue, +and a process waiting on a queue is guaranteed to be removed from the +queue only when it gets scheduled. This was done to make sure that no +process is left sleeping on pfmemalloc_wait when kswapd itself goes to +sleep. + +However, it isn't necessary to postpone kswapd sleep until the +pfmemalloc_wait queue actually empties. To prevent processes from being +left sleeping, it's actually enough to guarantee that all processes +waiting on pfmemalloc_wait queue have been woken up by the time we put +kswapd to sleep. + +This patch therefore fixes this issue by substituting 'wake_up' with +'wake_up_all' and removing 'return false' in the code snippet from +prepare_kswapd_sleep() above. Note that if any process puts itself in +the queue after this waitqueue_active() check, or after the wake up +itself, it means that the process will also wake up kswapd - and since +we are under prepare_to_wait(), the wake up won't be missed. Also we +update the comment prepare_kswapd_sleep() to hopefully more clearly +describe the races it is preventing. + +Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") +Signed-off-by: Vlastimil Babka +Signed-off-by: Vladimir Davydov +Cc: Mel Gorman +Cc: Johannes Weiner +Acked-by: Michal Hocko +Acked-by: Rik van Riel +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/vmscan.c | 24 +++++++++++++----------- + 1 file changed, 13 insertions(+), 11 deletions(-) + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -2631,18 +2631,20 @@ static bool prepare_kswapd_sleep(pg_data + return false; + + /* +- * There is a potential race between when kswapd checks its watermarks +- * and a process gets throttled. There is also a potential race if +- * processes get throttled, kswapd wakes, a large process exits therby +- * balancing the zones that causes kswapd to miss a wakeup. If kswapd +- * is going to sleep, no process should be sleeping on pfmemalloc_wait +- * so wake them now if necessary. If necessary, processes will wake +- * kswapd and get throttled again ++ * The throttled processes are normally woken up in balance_pgdat() as ++ * soon as pfmemalloc_watermark_ok() is true. But there is a potential ++ * race between when kswapd checks the watermarks and a process gets ++ * throttled. There is also a potential race if processes get ++ * throttled, kswapd wakes, a large process exits thereby balancing the ++ * zones, which causes kswapd to exit balance_pgdat() before reaching ++ * the wake up checks. If kswapd is going to sleep, no process should ++ * be sleeping on pfmemalloc_wait, so wake them now if necessary. If ++ * the wake up is premature, processes will wake kswapd and get ++ * throttled again. The difference from wake ups in balance_pgdat() is ++ * that here we are under prepare_to_wait(). + */ +- if (waitqueue_active(&pgdat->pfmemalloc_wait)) { +- wake_up(&pgdat->pfmemalloc_wait); +- return false; +- } ++ if (waitqueue_active(&pgdat->pfmemalloc_wait)) ++ wake_up_all(&pgdat->pfmemalloc_wait); + + return pgdat_balanced(pgdat, order, classzone_idx); + } diff --git a/queue-3.10/series b/queue-3.10/series index 7086e5ffff3..33db55ccea6 100644 --- a/queue-3.10/series +++ b/queue-3.10/series @@ -39,3 +39,6 @@ btrfs-don-t-delay-inode-ref-updates-during-log-replay.patch perf-x86-intel-uncore-make-sure-only-uncore-events-are-collected.patch perf-fix-events-installation-during-moving-group.patch perf-session-do-not-fail-on-processing-out-of-order-event.patch +mm-vmscan-prevent-kswapd-livelock-due-to-pfmemalloc-throttled-process-being-killed.patch +mm-propagate-error-from-stack-expansion-even-for-guard-page.patch +mm-don-t-count-the-stack-guard-page-towards-rlimit_stack.patch