--- /dev/null
+From 3ddf40e8c31964b744ff10abb48c8e36a83ec6e7 Mon Sep 17 00:00:00 2001
+From: Johannes Weiner <hannes@cmpxchg.org>
+Date: Tue, 4 Oct 2016 16:58:06 +0200
+Subject: mm: filemap: fix mapping->nrpages double accounting in fuse
+
+From: Johannes Weiner <hannes@cmpxchg.org>
+
+commit 3ddf40e8c31964b744ff10abb48c8e36a83ec6e7 upstream.
+
+Commit 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker
+caused by replace_page_cache_page()") switched replace_page_cache() from
+raw radix tree operations to page_cache_tree_insert() but didn't take
+into account that the latter function, unlike the raw radix tree op,
+handles mapping->nrpages. As a result, that counter is bumped for each
+page replacement rather than balanced out even.
+
+The mapping->nrpages counter is used to skip needless radix tree walks
+when invalidating, truncating, syncing inodes without pages, as well as
+statistics for userspace. Since the error is positive, we'll do more
+page cache tree walks than necessary; we won't miss a necessary one.
+And we'll report more buffer pages to userspace than there are. The
+error is limited to fuse inodes.
+
+Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
+Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Cc: Miklos Szeredi <miklos@szeredi.hu>
+Cc: stable@vger.kernel.org
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/filemap.c | 1 -
+ 1 file changed, 1 deletion(-)
+
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -590,7 +590,6 @@ int replace_page_cache_page(struct page
+ __delete_from_page_cache(old, NULL, memcg);
+ error = page_cache_tree_insert(mapping, new, NULL);
+ BUG_ON(error);
+- mapping->nrpages++;
+
+ /*
+ * hugetlb pages do not participate in page cache accounting.
--- /dev/null
+From 22f2ac51b6d643666f4db093f13144f773ff3f3a Mon Sep 17 00:00:00 2001
+From: Johannes Weiner <hannes@cmpxchg.org>
+Date: Fri, 30 Sep 2016 15:11:29 -0700
+Subject: mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
+
+From: Johannes Weiner <hannes@cmpxchg.org>
+
+commit 22f2ac51b6d643666f4db093f13144f773ff3f3a upstream.
+
+Antonio reports the following crash when using fuse under memory pressure:
+
+ kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
+ invalid opcode: 0000 [#1] SMP
+ Modules linked in: all of them
+ CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
+ Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
+ task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
+ RIP: shadow_lru_isolate+0x181/0x190
+ Call Trace:
+ __list_lru_walk_one.isra.3+0x8f/0x130
+ list_lru_walk_one+0x23/0x30
+ scan_shadow_nodes+0x34/0x50
+ shrink_slab.part.40+0x1ed/0x3d0
+ shrink_zone+0x2ca/0x2e0
+ kswapd+0x51e/0x990
+ kthread+0xd8/0xf0
+ ret_from_fork+0x3f/0x70
+
+which corresponds to the following sanity check in the shadow node
+tracking:
+
+ BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
+
+The workingset code tracks radix tree nodes that exclusively contain
+shadow entries of evicted pages in them, and this (somewhat obscure)
+line checks whether there are real pages left that would interfere with
+reclaim of the radix tree node under memory pressure.
+
+While discussing ways how fuse might sneak pages into the radix tree
+past the workingset code, Miklos pointed to replace_page_cache_page(),
+and indeed there is a problem there: it properly accounts for the old
+page being removed - __delete_from_page_cache() does that - but then
+does a raw raw radix_tree_insert(), not accounting for the replacement
+page. Eventually the page count bits in node->count underflow while
+leaving the node incorrectly linked to the shadow node LRU.
+
+To address this, make sure replace_page_cache_page() uses the tracked
+page insertion code, page_cache_tree_insert(). This fixes the page
+accounting and makes sure page-containing nodes are properly unlinked
+from the shadow node LRU again.
+
+Also, make the sanity checks a bit less obscure by using the helpers for
+checking the number of pages and shadows in a radix tree node.
+
+[mhocko@suse.com: backport for 4.4]
+Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
+Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
+Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
+Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
+Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ include/linux/swap.h | 2 +
+ mm/filemap.c | 86 +++++++++++++++++++++++++--------------------------
+ mm/workingset.c | 10 ++---
+ 3 files changed, 49 insertions(+), 49 deletions(-)
+
+--- a/include/linux/swap.h
++++ b/include/linux/swap.h
+@@ -266,6 +266,7 @@ static inline void workingset_node_pages
+
+ static inline void workingset_node_pages_dec(struct radix_tree_node *node)
+ {
++ VM_BUG_ON(!workingset_node_pages(node));
+ node->count--;
+ }
+
+@@ -281,6 +282,7 @@ static inline void workingset_node_shado
+
+ static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
+ {
++ VM_BUG_ON(!workingset_node_shadows(node));
+ node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
+ }
+
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -109,6 +109,48 @@
+ * ->tasklist_lock (memory_failure, collect_procs_ao)
+ */
+
++static int page_cache_tree_insert(struct address_space *mapping,
++ struct page *page, void **shadowp)
++{
++ struct radix_tree_node *node;
++ void **slot;
++ int error;
++
++ error = __radix_tree_create(&mapping->page_tree, page->index,
++ &node, &slot);
++ if (error)
++ return error;
++ if (*slot) {
++ void *p;
++
++ p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
++ if (!radix_tree_exceptional_entry(p))
++ return -EEXIST;
++ if (shadowp)
++ *shadowp = p;
++ mapping->nrshadows--;
++ if (node)
++ workingset_node_shadows_dec(node);
++ }
++ radix_tree_replace_slot(slot, page);
++ mapping->nrpages++;
++ if (node) {
++ workingset_node_pages_inc(node);
++ /*
++ * Don't track node that contains actual pages.
++ *
++ * Avoid acquiring the list_lru lock if already
++ * untracked. The list_empty() test is safe as
++ * node->private_list is protected by
++ * mapping->tree_lock.
++ */
++ if (!list_empty(&node->private_list))
++ list_lru_del(&workingset_shadow_nodes,
++ &node->private_list);
++ }
++ return 0;
++}
++
+ static void page_cache_tree_delete(struct address_space *mapping,
+ struct page *page, void *shadow)
+ {
+@@ -546,7 +588,7 @@ int replace_page_cache_page(struct page
+ memcg = mem_cgroup_begin_page_stat(old);
+ spin_lock_irqsave(&mapping->tree_lock, flags);
+ __delete_from_page_cache(old, NULL, memcg);
+- error = radix_tree_insert(&mapping->page_tree, offset, new);
++ error = page_cache_tree_insert(mapping, new, NULL);
+ BUG_ON(error);
+ mapping->nrpages++;
+
+@@ -570,48 +612,6 @@ int replace_page_cache_page(struct page
+ }
+ EXPORT_SYMBOL_GPL(replace_page_cache_page);
+
+-static int page_cache_tree_insert(struct address_space *mapping,
+- struct page *page, void **shadowp)
+-{
+- struct radix_tree_node *node;
+- void **slot;
+- int error;
+-
+- error = __radix_tree_create(&mapping->page_tree, page->index,
+- &node, &slot);
+- if (error)
+- return error;
+- if (*slot) {
+- void *p;
+-
+- p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+- if (!radix_tree_exceptional_entry(p))
+- return -EEXIST;
+- if (shadowp)
+- *shadowp = p;
+- mapping->nrshadows--;
+- if (node)
+- workingset_node_shadows_dec(node);
+- }
+- radix_tree_replace_slot(slot, page);
+- mapping->nrpages++;
+- if (node) {
+- workingset_node_pages_inc(node);
+- /*
+- * Don't track node that contains actual pages.
+- *
+- * Avoid acquiring the list_lru lock if already
+- * untracked. The list_empty() test is safe as
+- * node->private_list is protected by
+- * mapping->tree_lock.
+- */
+- if (!list_empty(&node->private_list))
+- list_lru_del(&workingset_shadow_nodes,
+- &node->private_list);
+- }
+- return 0;
+-}
+-
+ static int __add_to_page_cache_locked(struct page *page,
+ struct address_space *mapping,
+ pgoff_t offset, gfp_t gfp_mask,
+--- a/mm/workingset.c
++++ b/mm/workingset.c
+@@ -341,21 +341,19 @@ static enum lru_status shadow_lru_isolat
+ * no pages, so we expect to be able to remove them all and
+ * delete and free the empty node afterwards.
+ */
+-
+- BUG_ON(!node->count);
+- BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
++ BUG_ON(!workingset_node_shadows(node));
++ BUG_ON(workingset_node_pages(node));
+
+ for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
+ if (node->slots[i]) {
+ BUG_ON(!radix_tree_exceptional_entry(node->slots[i]));
+ node->slots[i] = NULL;
+- BUG_ON(node->count < (1U << RADIX_TREE_COUNT_SHIFT));
+- node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
++ workingset_node_shadows_dec(node);
+ BUG_ON(!mapping->nrshadows);
+ mapping->nrshadows--;
+ }
+ }
+- BUG_ON(node->count);
++ BUG_ON(workingset_node_shadows(node));
+ inc_zone_state(page_zone(virt_to_page(node)), WORKINGSET_NODERECLAIM);
+ if (!__radix_tree_delete_node(&mapping->page_tree, node))
+ BUG();
+++ /dev/null
-From b60205c7c558330e4e2b5df498355ec959457358 Mon Sep 17 00:00:00 2001
-From: Peter Zijlstra <peterz@infradead.org>
-Date: Tue, 20 Sep 2016 21:58:12 +0200
-Subject: sched/fair: Fix min_vruntime tracking
-
-From: Peter Zijlstra <peterz@infradead.org>
-
-commit b60205c7c558330e4e2b5df498355ec959457358 upstream.
-
-While going through enqueue/dequeue to review the movement of
-set_curr_task() I noticed that the (2nd) update_min_vruntime() call in
-dequeue_entity() is suspect.
-
-It turns out, its actually wrong because it will consider
-cfs_rq->curr, which could be the entry we just normalized. This mixes
-different vruntime forms and leads to fail.
-
-The purpose of the second update_min_vruntime() is to move
-min_vruntime forward if the entity we just removed is the one that was
-holding it back; _except_ for the DEQUEUE_SAVE case, because then we
-know its a temporary removal and it will come back.
-
-However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr
-will still be set (and per the above, can be tranformed into a
-different unit), so update_min_vruntime() should also consider
-curr->on_rq. This also fixes another corner case where the enqueue
-(which also does update_curr()->update_min_vruntime()) happens on the
-rq->lock break in schedule(), between dequeue and put_prev_task.
-
-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-Cc: Linus Torvalds <torvalds@linux-foundation.org>
-Cc: Mike Galbraith <efault@gmx.de>
-Cc: Peter Zijlstra <peterz@infradead.org>
-Cc: Thomas Gleixner <tglx@linutronix.de>
-Cc: linux-kernel@vger.kernel.org
-Fixes: 1e876231785d ("sched: Fix ->min_vruntime calculation in dequeue_entity()")
-Signed-off-by: Ingo Molnar <mingo@kernel.org>
-Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
----
- kernel/sched/fair.c | 29 ++++++++++++++++++++++-------
- 1 file changed, 22 insertions(+), 7 deletions(-)
-
---- a/kernel/sched/fair.c
-+++ b/kernel/sched/fair.c
-@@ -456,17 +456,23 @@ static inline int entity_before(struct s
-
- static void update_min_vruntime(struct cfs_rq *cfs_rq)
- {
-+ struct sched_entity *curr = cfs_rq->curr;
-+
- u64 vruntime = cfs_rq->min_vruntime;
-
-- if (cfs_rq->curr)
-- vruntime = cfs_rq->curr->vruntime;
-+ if (curr) {
-+ if (curr->on_rq)
-+ vruntime = curr->vruntime;
-+ else
-+ curr = NULL;
-+ }
-
- if (cfs_rq->rb_leftmost) {
- struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
- struct sched_entity,
- run_node);
-
-- if (!cfs_rq->curr)
-+ if (!curr)
- vruntime = se->vruntime;
- else
- vruntime = min_vruntime(vruntime, se->vruntime);
-@@ -3139,9 +3145,10 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
- account_entity_dequeue(cfs_rq, se);
-
- /*
-- * Normalize the entity after updating the min_vruntime because the
-- * update can refer to the ->curr item and we need to reflect this
-- * movement in our normalized position.
-+ * Normalize after update_curr(); which will also have moved
-+ * min_vruntime if @se is the one holding it back. But before doing
-+ * update_min_vruntime() again, which will discount @se's position and
-+ * can move min_vruntime forward still more.
- */
- if (!(flags & DEQUEUE_SLEEP))
- se->vruntime -= cfs_rq->min_vruntime;
-@@ -3149,8 +3156,16 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
- /* return excess runtime on last dequeue */
- return_cfs_rq_runtime(cfs_rq);
-
-- update_min_vruntime(cfs_rq);
- update_cfs_shares(cfs_rq);
-+
-+ /*
-+ * Now advance min_vruntime if @se was the entity holding it back,
-+ * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
-+ * put back on, and if we advance min_vruntime, we'll be placed back
-+ * further than we started -- ie. we'll be penalized.
-+ */
-+ if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
-+ update_min_vruntime(cfs_rq);
- }
-
- /*
x86-e820-don-t-merge-consecutive-e820_pram-ranges.patch
kvm-x86-memset-whole-irq_eoi.patch
pinctrl-intel-only-restore-pins-that-are-used-by-the-driver.patch
-sched-fair-fix-min_vruntime-tracking.patch
irqchip-gicv3-handle-loop-timeout-proper.patch
sd-fix-rw_max-for-devices-that-report-an-optimal-xfer-size.patch
hpsa-correct-skipping-masked-peripherals.patch
bnx2x-prevent-false-warning-for-lack-of-fc-npiv.patch
net-mlx4_core-allow-resetting-vf-admin-mac-to-zero.patch
acpi-nfit-check-for-the-correct-event-code-in-notifications.patch
+mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch
+mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch
+using-bug_on-as-an-assert-is-_never_-acceptable.patch
--- /dev/null
+From 21f54ddae449f4bdd9f1498124901d67202243d9 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Mon, 3 Oct 2016 21:03:48 -0700
+Subject: Using BUG_ON() as an assert() is _never_ acceptable
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit 21f54ddae449f4bdd9f1498124901d67202243d9 upstream.
+
+That just generally kills the machine, and makes debugging only much
+harder, since the traces may long be gone.
+
+Debugging by assert() is a disease. Don't do it. If you can continue,
+you're much better off doing so with a live machine where you have a
+much higher chance that the report actually makes it to the system logs,
+rather than result in a machine that is just completely dead.
+
+The only valid situation for BUG_ON() is when continuing is not an
+option, because there is massive corruption. But if you are just
+verifying that something is true, you warn about your broken assumptions
+(preferably just once), and limp on.
+
+Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Miklos Szeredi <miklos@szeredi.hu>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/swap.h | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/include/linux/swap.h
++++ b/include/linux/swap.h
+@@ -266,7 +266,7 @@ static inline void workingset_node_pages
+
+ static inline void workingset_node_pages_dec(struct radix_tree_node *node)
+ {
+- VM_BUG_ON(!workingset_node_pages(node));
++ VM_WARN_ON_ONCE(!workingset_node_pages(node));
+ node->count--;
+ }
+
+@@ -282,7 +282,7 @@ static inline void workingset_node_shado
+
+ static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
+ {
+- VM_BUG_ON(!workingset_node_shadows(node));
++ VM_WARN_ON_ONCE(!workingset_node_shadows(node));
+ node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
+ }
+