]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/commitdiff
4.4-stable patches
authorGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 26 Oct 2016 09:33:59 +0000 (11:33 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 26 Oct 2016 09:33:59 +0000 (11:33 +0200)
added patches:
mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch
mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch
using-bug_on-as-an-assert-is-_never_-acceptable.patch

queue-4.4/mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch [new file with mode: 0644]
queue-4.4/mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch [new file with mode: 0644]
queue-4.4/sched-fair-fix-min_vruntime-tracking.patch [deleted file]
queue-4.4/series
queue-4.4/using-bug_on-as-an-assert-is-_never_-acceptable.patch [new file with mode: 0644]

diff --git a/queue-4.4/mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch b/queue-4.4/mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch
new file mode 100644 (file)
index 0000000..8fb84ef
--- /dev/null
@@ -0,0 +1,46 @@
+From 3ddf40e8c31964b744ff10abb48c8e36a83ec6e7 Mon Sep 17 00:00:00 2001
+From: Johannes Weiner <hannes@cmpxchg.org>
+Date: Tue, 4 Oct 2016 16:58:06 +0200
+Subject: mm: filemap: fix mapping->nrpages double accounting in fuse
+
+From: Johannes Weiner <hannes@cmpxchg.org>
+
+commit 3ddf40e8c31964b744ff10abb48c8e36a83ec6e7 upstream.
+
+Commit 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker
+caused by replace_page_cache_page()") switched replace_page_cache() from
+raw radix tree operations to page_cache_tree_insert() but didn't take
+into account that the latter function, unlike the raw radix tree op,
+handles mapping->nrpages.  As a result, that counter is bumped for each
+page replacement rather than balanced out even.
+
+The mapping->nrpages counter is used to skip needless radix tree walks
+when invalidating, truncating, syncing inodes without pages, as well as
+statistics for userspace.  Since the error is positive, we'll do more
+page cache tree walks than necessary; we won't miss a necessary one.
+And we'll report more buffer pages to userspace than there are.  The
+error is limited to fuse inodes.
+
+Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
+Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Cc: Miklos Szeredi <miklos@szeredi.hu>
+Cc: stable@vger.kernel.org
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/filemap.c |    1 -
+ 1 file changed, 1 deletion(-)
+
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -590,7 +590,6 @@ int replace_page_cache_page(struct page
+               __delete_from_page_cache(old, NULL, memcg);
+               error = page_cache_tree_insert(mapping, new, NULL);
+               BUG_ON(error);
+-              mapping->nrpages++;
+               /*
+                * hugetlb pages do not participate in page cache accounting.
diff --git a/queue-4.4/mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch b/queue-4.4/mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch
new file mode 100644 (file)
index 0000000..91b07f0
--- /dev/null
@@ -0,0 +1,226 @@
+From 22f2ac51b6d643666f4db093f13144f773ff3f3a Mon Sep 17 00:00:00 2001
+From: Johannes Weiner <hannes@cmpxchg.org>
+Date: Fri, 30 Sep 2016 15:11:29 -0700
+Subject: mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
+
+From: Johannes Weiner <hannes@cmpxchg.org>
+
+commit 22f2ac51b6d643666f4db093f13144f773ff3f3a upstream.
+
+Antonio reports the following crash when using fuse under memory pressure:
+
+  kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
+  invalid opcode: 0000 [#1] SMP
+  Modules linked in: all of them
+  CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
+  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
+  task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
+  RIP: shadow_lru_isolate+0x181/0x190
+  Call Trace:
+    __list_lru_walk_one.isra.3+0x8f/0x130
+    list_lru_walk_one+0x23/0x30
+    scan_shadow_nodes+0x34/0x50
+    shrink_slab.part.40+0x1ed/0x3d0
+    shrink_zone+0x2ca/0x2e0
+    kswapd+0x51e/0x990
+    kthread+0xd8/0xf0
+    ret_from_fork+0x3f/0x70
+
+which corresponds to the following sanity check in the shadow node
+tracking:
+
+  BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
+
+The workingset code tracks radix tree nodes that exclusively contain
+shadow entries of evicted pages in them, and this (somewhat obscure)
+line checks whether there are real pages left that would interfere with
+reclaim of the radix tree node under memory pressure.
+
+While discussing ways how fuse might sneak pages into the radix tree
+past the workingset code, Miklos pointed to replace_page_cache_page(),
+and indeed there is a problem there: it properly accounts for the old
+page being removed - __delete_from_page_cache() does that - but then
+does a raw raw radix_tree_insert(), not accounting for the replacement
+page.  Eventually the page count bits in node->count underflow while
+leaving the node incorrectly linked to the shadow node LRU.
+
+To address this, make sure replace_page_cache_page() uses the tracked
+page insertion code, page_cache_tree_insert().  This fixes the page
+accounting and makes sure page-containing nodes are properly unlinked
+from the shadow node LRU again.
+
+Also, make the sanity checks a bit less obscure by using the helpers for
+checking the number of pages and shadows in a radix tree node.
+
+[mhocko@suse.com: backport for 4.4]
+Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
+Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
+Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
+Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
+Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ include/linux/swap.h |    2 +
+ mm/filemap.c         |   86 +++++++++++++++++++++++++--------------------------
+ mm/workingset.c      |   10 ++---
+ 3 files changed, 49 insertions(+), 49 deletions(-)
+
+--- a/include/linux/swap.h
++++ b/include/linux/swap.h
+@@ -266,6 +266,7 @@ static inline void workingset_node_pages
+ static inline void workingset_node_pages_dec(struct radix_tree_node *node)
+ {
++      VM_BUG_ON(!workingset_node_pages(node));
+       node->count--;
+ }
+@@ -281,6 +282,7 @@ static inline void workingset_node_shado
+ static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
+ {
++      VM_BUG_ON(!workingset_node_shadows(node));
+       node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
+ }
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -109,6 +109,48 @@
+  *   ->tasklist_lock            (memory_failure, collect_procs_ao)
+  */
++static int page_cache_tree_insert(struct address_space *mapping,
++                                struct page *page, void **shadowp)
++{
++      struct radix_tree_node *node;
++      void **slot;
++      int error;
++
++      error = __radix_tree_create(&mapping->page_tree, page->index,
++                                  &node, &slot);
++      if (error)
++              return error;
++      if (*slot) {
++              void *p;
++
++              p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
++              if (!radix_tree_exceptional_entry(p))
++                      return -EEXIST;
++              if (shadowp)
++                      *shadowp = p;
++              mapping->nrshadows--;
++              if (node)
++                      workingset_node_shadows_dec(node);
++      }
++      radix_tree_replace_slot(slot, page);
++      mapping->nrpages++;
++      if (node) {
++              workingset_node_pages_inc(node);
++              /*
++               * Don't track node that contains actual pages.
++               *
++               * Avoid acquiring the list_lru lock if already
++               * untracked.  The list_empty() test is safe as
++               * node->private_list is protected by
++               * mapping->tree_lock.
++               */
++              if (!list_empty(&node->private_list))
++                      list_lru_del(&workingset_shadow_nodes,
++                                   &node->private_list);
++      }
++      return 0;
++}
++
+ static void page_cache_tree_delete(struct address_space *mapping,
+                                  struct page *page, void *shadow)
+ {
+@@ -546,7 +588,7 @@ int replace_page_cache_page(struct page
+               memcg = mem_cgroup_begin_page_stat(old);
+               spin_lock_irqsave(&mapping->tree_lock, flags);
+               __delete_from_page_cache(old, NULL, memcg);
+-              error = radix_tree_insert(&mapping->page_tree, offset, new);
++              error = page_cache_tree_insert(mapping, new, NULL);
+               BUG_ON(error);
+               mapping->nrpages++;
+@@ -570,48 +612,6 @@ int replace_page_cache_page(struct page
+ }
+ EXPORT_SYMBOL_GPL(replace_page_cache_page);
+-static int page_cache_tree_insert(struct address_space *mapping,
+-                                struct page *page, void **shadowp)
+-{
+-      struct radix_tree_node *node;
+-      void **slot;
+-      int error;
+-
+-      error = __radix_tree_create(&mapping->page_tree, page->index,
+-                                  &node, &slot);
+-      if (error)
+-              return error;
+-      if (*slot) {
+-              void *p;
+-
+-              p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+-              if (!radix_tree_exceptional_entry(p))
+-                      return -EEXIST;
+-              if (shadowp)
+-                      *shadowp = p;
+-              mapping->nrshadows--;
+-              if (node)
+-                      workingset_node_shadows_dec(node);
+-      }
+-      radix_tree_replace_slot(slot, page);
+-      mapping->nrpages++;
+-      if (node) {
+-              workingset_node_pages_inc(node);
+-              /*
+-               * Don't track node that contains actual pages.
+-               *
+-               * Avoid acquiring the list_lru lock if already
+-               * untracked.  The list_empty() test is safe as
+-               * node->private_list is protected by
+-               * mapping->tree_lock.
+-               */
+-              if (!list_empty(&node->private_list))
+-                      list_lru_del(&workingset_shadow_nodes,
+-                                   &node->private_list);
+-      }
+-      return 0;
+-}
+-
+ static int __add_to_page_cache_locked(struct page *page,
+                                     struct address_space *mapping,
+                                     pgoff_t offset, gfp_t gfp_mask,
+--- a/mm/workingset.c
++++ b/mm/workingset.c
+@@ -341,21 +341,19 @@ static enum lru_status shadow_lru_isolat
+        * no pages, so we expect to be able to remove them all and
+        * delete and free the empty node afterwards.
+        */
+-
+-      BUG_ON(!node->count);
+-      BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
++      BUG_ON(!workingset_node_shadows(node));
++      BUG_ON(workingset_node_pages(node));
+       for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
+               if (node->slots[i]) {
+                       BUG_ON(!radix_tree_exceptional_entry(node->slots[i]));
+                       node->slots[i] = NULL;
+-                      BUG_ON(node->count < (1U << RADIX_TREE_COUNT_SHIFT));
+-                      node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
++                      workingset_node_shadows_dec(node);
+                       BUG_ON(!mapping->nrshadows);
+                       mapping->nrshadows--;
+               }
+       }
+-      BUG_ON(node->count);
++      BUG_ON(workingset_node_shadows(node));
+       inc_zone_state(page_zone(virt_to_page(node)), WORKINGSET_NODERECLAIM);
+       if (!__radix_tree_delete_node(&mapping->page_tree, node))
+               BUG();
diff --git a/queue-4.4/sched-fair-fix-min_vruntime-tracking.patch b/queue-4.4/sched-fair-fix-min_vruntime-tracking.patch
deleted file mode 100644 (file)
index 9d6e55c..0000000
+++ /dev/null
@@ -1,104 +0,0 @@
-From b60205c7c558330e4e2b5df498355ec959457358 Mon Sep 17 00:00:00 2001
-From: Peter Zijlstra <peterz@infradead.org>
-Date: Tue, 20 Sep 2016 21:58:12 +0200
-Subject: sched/fair: Fix min_vruntime tracking
-
-From: Peter Zijlstra <peterz@infradead.org>
-
-commit b60205c7c558330e4e2b5df498355ec959457358 upstream.
-
-While going through enqueue/dequeue to review the movement of
-set_curr_task() I noticed that the (2nd) update_min_vruntime() call in
-dequeue_entity() is suspect.
-
-It turns out, its actually wrong because it will consider
-cfs_rq->curr, which could be the entry we just normalized. This mixes
-different vruntime forms and leads to fail.
-
-The purpose of the second update_min_vruntime() is to move
-min_vruntime forward if the entity we just removed is the one that was
-holding it back; _except_ for the DEQUEUE_SAVE case, because then we
-know its a temporary removal and it will come back.
-
-However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr
-will still be set (and per the above, can be tranformed into a
-different unit), so update_min_vruntime() should also consider
-curr->on_rq. This also fixes another corner case where the enqueue
-(which also does update_curr()->update_min_vruntime()) happens on the
-rq->lock break in schedule(), between dequeue and put_prev_task.
-
-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-Cc: Linus Torvalds <torvalds@linux-foundation.org>
-Cc: Mike Galbraith <efault@gmx.de>
-Cc: Peter Zijlstra <peterz@infradead.org>
-Cc: Thomas Gleixner <tglx@linutronix.de>
-Cc: linux-kernel@vger.kernel.org
-Fixes: 1e876231785d ("sched: Fix ->min_vruntime calculation in dequeue_entity()")
-Signed-off-by: Ingo Molnar <mingo@kernel.org>
-Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
----
- kernel/sched/fair.c |   29 ++++++++++++++++++++++-------
- 1 file changed, 22 insertions(+), 7 deletions(-)
-
---- a/kernel/sched/fair.c
-+++ b/kernel/sched/fair.c
-@@ -456,17 +456,23 @@ static inline int entity_before(struct s
- static void update_min_vruntime(struct cfs_rq *cfs_rq)
- {
-+      struct sched_entity *curr = cfs_rq->curr;
-+
-       u64 vruntime = cfs_rq->min_vruntime;
--      if (cfs_rq->curr)
--              vruntime = cfs_rq->curr->vruntime;
-+      if (curr) {
-+              if (curr->on_rq)
-+                      vruntime = curr->vruntime;
-+              else
-+                      curr = NULL;
-+      }
-       if (cfs_rq->rb_leftmost) {
-               struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
-                                                  struct sched_entity,
-                                                  run_node);
--              if (!cfs_rq->curr)
-+              if (!curr)
-                       vruntime = se->vruntime;
-               else
-                       vruntime = min_vruntime(vruntime, se->vruntime);
-@@ -3139,9 +3145,10 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
-       account_entity_dequeue(cfs_rq, se);
-       /*
--       * Normalize the entity after updating the min_vruntime because the
--       * update can refer to the ->curr item and we need to reflect this
--       * movement in our normalized position.
-+       * Normalize after update_curr(); which will also have moved
-+       * min_vruntime if @se is the one holding it back. But before doing
-+       * update_min_vruntime() again, which will discount @se's position and
-+       * can move min_vruntime forward still more.
-        */
-       if (!(flags & DEQUEUE_SLEEP))
-               se->vruntime -= cfs_rq->min_vruntime;
-@@ -3149,8 +3156,16 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
-       /* return excess runtime on last dequeue */
-       return_cfs_rq_runtime(cfs_rq);
--      update_min_vruntime(cfs_rq);
-       update_cfs_shares(cfs_rq);
-+
-+      /*
-+       * Now advance min_vruntime if @se was the entity holding it back,
-+       * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
-+       * put back on, and if we advance min_vruntime, we'll be placed back
-+       * further than we started -- ie. we'll be penalized.
-+       */
-+      if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
-+              update_min_vruntime(cfs_rq);
- }
- /*
index a74f25075b62f761255958fd2cb3173caed9e1a5..ccc7762f387bea5398d0ce018690c3c7e523468f 100644 (file)
@@ -82,7 +82,6 @@ blkcg-unlock-blkcg_pol_mutex-only-once-when-cpd-null.patch
 x86-e820-don-t-merge-consecutive-e820_pram-ranges.patch
 kvm-x86-memset-whole-irq_eoi.patch
 pinctrl-intel-only-restore-pins-that-are-used-by-the-driver.patch
-sched-fair-fix-min_vruntime-tracking.patch
 irqchip-gicv3-handle-loop-timeout-proper.patch
 sd-fix-rw_max-for-devices-that-report-an-optimal-xfer-size.patch
 hpsa-correct-skipping-masked-peripherals.patch
@@ -90,3 +89,6 @@ pkcs-7-don-t-require-spcspopusinfo-in-authenticode-pkcs7-signatures.patch
 bnx2x-prevent-false-warning-for-lack-of-fc-npiv.patch
 net-mlx4_core-allow-resetting-vf-admin-mac-to-zero.patch
 acpi-nfit-check-for-the-correct-event-code-in-notifications.patch
+mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch
+mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch
+using-bug_on-as-an-assert-is-_never_-acceptable.patch
diff --git a/queue-4.4/using-bug_on-as-an-assert-is-_never_-acceptable.patch b/queue-4.4/using-bug_on-as-an-assert-is-_never_-acceptable.patch
new file mode 100644 (file)
index 0000000..5d687c0
--- /dev/null
@@ -0,0 +1,53 @@
+From 21f54ddae449f4bdd9f1498124901d67202243d9 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Mon, 3 Oct 2016 21:03:48 -0700
+Subject: Using BUG_ON() as an assert() is _never_ acceptable
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit 21f54ddae449f4bdd9f1498124901d67202243d9 upstream.
+
+That just generally kills the machine, and makes debugging only much
+harder, since the traces may long be gone.
+
+Debugging by assert() is a disease.  Don't do it.  If you can continue,
+you're much better off doing so with a live machine where you have a
+much higher chance that the report actually makes it to the system logs,
+rather than result in a machine that is just completely dead.
+
+The only valid situation for BUG_ON() is when continuing is not an
+option, because there is massive corruption.  But if you are just
+verifying that something is true, you warn about your broken assumptions
+(preferably just once), and limp on.
+
+Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Miklos Szeredi <miklos@szeredi.hu>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/swap.h |    4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/include/linux/swap.h
++++ b/include/linux/swap.h
+@@ -266,7 +266,7 @@ static inline void workingset_node_pages
+ static inline void workingset_node_pages_dec(struct radix_tree_node *node)
+ {
+-      VM_BUG_ON(!workingset_node_pages(node));
++      VM_WARN_ON_ONCE(!workingset_node_pages(node));
+       node->count--;
+ }
+@@ -282,7 +282,7 @@ static inline void workingset_node_shado
+ static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
+ {
+-      VM_BUG_ON(!workingset_node_shadows(node));
++      VM_WARN_ON_ONCE(!workingset_node_shadows(node));
+       node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
+ }