From: Greg Kroah-Hartman Date: Wed, 26 Oct 2016 09:33:59 +0000 (+0200) Subject: 4.4-stable patches X-Git-Tag: v4.8.5~7 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=b610dc90528f36440889562c1c2dc91f634ca234;p=thirdparty%2Fkernel%2Fstable-queue.git 4.4-stable patches added patches: mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch using-bug_on-as-an-assert-is-_never_-acceptable.patch --- diff --git a/queue-4.4/mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch b/queue-4.4/mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch new file mode 100644 index 00000000000..8fb84ef5f2c --- /dev/null +++ b/queue-4.4/mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch @@ -0,0 +1,46 @@ +From 3ddf40e8c31964b744ff10abb48c8e36a83ec6e7 Mon Sep 17 00:00:00 2001 +From: Johannes Weiner +Date: Tue, 4 Oct 2016 16:58:06 +0200 +Subject: mm: filemap: fix mapping->nrpages double accounting in fuse + +From: Johannes Weiner + +commit 3ddf40e8c31964b744ff10abb48c8e36a83ec6e7 upstream. + +Commit 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker +caused by replace_page_cache_page()") switched replace_page_cache() from +raw radix tree operations to page_cache_tree_insert() but didn't take +into account that the latter function, unlike the raw radix tree op, +handles mapping->nrpages. As a result, that counter is bumped for each +page replacement rather than balanced out even. + +The mapping->nrpages counter is used to skip needless radix tree walks +when invalidating, truncating, syncing inodes without pages, as well as +statistics for userspace. Since the error is positive, we'll do more +page cache tree walks than necessary; we won't miss a necessary one. +And we'll report more buffer pages to userspace than there are. The +error is limited to fuse inodes. + +Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()") +Signed-off-by: Johannes Weiner +Cc: Andrew Morton +Cc: Miklos Szeredi +Cc: stable@vger.kernel.org +Signed-off-by: Linus Torvalds +Signed-off-by: Michal Hocko +Signed-off-by: Greg Kroah-Hartman + +--- + mm/filemap.c | 1 - + 1 file changed, 1 deletion(-) + +--- a/mm/filemap.c ++++ b/mm/filemap.c +@@ -590,7 +590,6 @@ int replace_page_cache_page(struct page + __delete_from_page_cache(old, NULL, memcg); + error = page_cache_tree_insert(mapping, new, NULL); + BUG_ON(error); +- mapping->nrpages++; + + /* + * hugetlb pages do not participate in page cache accounting. diff --git a/queue-4.4/mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch b/queue-4.4/mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch new file mode 100644 index 00000000000..91b07f0759e --- /dev/null +++ b/queue-4.4/mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch @@ -0,0 +1,226 @@ +From 22f2ac51b6d643666f4db093f13144f773ff3f3a Mon Sep 17 00:00:00 2001 +From: Johannes Weiner +Date: Fri, 30 Sep 2016 15:11:29 -0700 +Subject: mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() + +From: Johannes Weiner + +commit 22f2ac51b6d643666f4db093f13144f773ff3f3a upstream. + +Antonio reports the following crash when using fuse under memory pressure: + + kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346! + invalid opcode: 0000 [#1] SMP + Modules linked in: all of them + CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu + Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 + task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000 + RIP: shadow_lru_isolate+0x181/0x190 + Call Trace: + __list_lru_walk_one.isra.3+0x8f/0x130 + list_lru_walk_one+0x23/0x30 + scan_shadow_nodes+0x34/0x50 + shrink_slab.part.40+0x1ed/0x3d0 + shrink_zone+0x2ca/0x2e0 + kswapd+0x51e/0x990 + kthread+0xd8/0xf0 + ret_from_fork+0x3f/0x70 + +which corresponds to the following sanity check in the shadow node +tracking: + + BUG_ON(node->count & RADIX_TREE_COUNT_MASK); + +The workingset code tracks radix tree nodes that exclusively contain +shadow entries of evicted pages in them, and this (somewhat obscure) +line checks whether there are real pages left that would interfere with +reclaim of the radix tree node under memory pressure. + +While discussing ways how fuse might sneak pages into the radix tree +past the workingset code, Miklos pointed to replace_page_cache_page(), +and indeed there is a problem there: it properly accounts for the old +page being removed - __delete_from_page_cache() does that - but then +does a raw raw radix_tree_insert(), not accounting for the replacement +page. Eventually the page count bits in node->count underflow while +leaving the node incorrectly linked to the shadow node LRU. + +To address this, make sure replace_page_cache_page() uses the tracked +page insertion code, page_cache_tree_insert(). This fixes the page +accounting and makes sure page-containing nodes are properly unlinked +from the shadow node LRU again. + +Also, make the sanity checks a bit less obscure by using the helpers for +checking the number of pages and shadows in a radix tree node. + +[mhocko@suse.com: backport for 4.4] +Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") +Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org +Signed-off-by: Johannes Weiner +Reported-by: Antonio SJ Musumeci +Debugged-by: Miklos Szeredi +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Michal Hocko +Signed-off-by: Greg Kroah-Hartman + +--- + include/linux/swap.h | 2 + + mm/filemap.c | 86 +++++++++++++++++++++++++-------------------------- + mm/workingset.c | 10 ++--- + 3 files changed, 49 insertions(+), 49 deletions(-) + +--- a/include/linux/swap.h ++++ b/include/linux/swap.h +@@ -266,6 +266,7 @@ static inline void workingset_node_pages + + static inline void workingset_node_pages_dec(struct radix_tree_node *node) + { ++ VM_BUG_ON(!workingset_node_pages(node)); + node->count--; + } + +@@ -281,6 +282,7 @@ static inline void workingset_node_shado + + static inline void workingset_node_shadows_dec(struct radix_tree_node *node) + { ++ VM_BUG_ON(!workingset_node_shadows(node)); + node->count -= 1U << RADIX_TREE_COUNT_SHIFT; + } + +--- a/mm/filemap.c ++++ b/mm/filemap.c +@@ -109,6 +109,48 @@ + * ->tasklist_lock (memory_failure, collect_procs_ao) + */ + ++static int page_cache_tree_insert(struct address_space *mapping, ++ struct page *page, void **shadowp) ++{ ++ struct radix_tree_node *node; ++ void **slot; ++ int error; ++ ++ error = __radix_tree_create(&mapping->page_tree, page->index, ++ &node, &slot); ++ if (error) ++ return error; ++ if (*slot) { ++ void *p; ++ ++ p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock); ++ if (!radix_tree_exceptional_entry(p)) ++ return -EEXIST; ++ if (shadowp) ++ *shadowp = p; ++ mapping->nrshadows--; ++ if (node) ++ workingset_node_shadows_dec(node); ++ } ++ radix_tree_replace_slot(slot, page); ++ mapping->nrpages++; ++ if (node) { ++ workingset_node_pages_inc(node); ++ /* ++ * Don't track node that contains actual pages. ++ * ++ * Avoid acquiring the list_lru lock if already ++ * untracked. The list_empty() test is safe as ++ * node->private_list is protected by ++ * mapping->tree_lock. ++ */ ++ if (!list_empty(&node->private_list)) ++ list_lru_del(&workingset_shadow_nodes, ++ &node->private_list); ++ } ++ return 0; ++} ++ + static void page_cache_tree_delete(struct address_space *mapping, + struct page *page, void *shadow) + { +@@ -546,7 +588,7 @@ int replace_page_cache_page(struct page + memcg = mem_cgroup_begin_page_stat(old); + spin_lock_irqsave(&mapping->tree_lock, flags); + __delete_from_page_cache(old, NULL, memcg); +- error = radix_tree_insert(&mapping->page_tree, offset, new); ++ error = page_cache_tree_insert(mapping, new, NULL); + BUG_ON(error); + mapping->nrpages++; + +@@ -570,48 +612,6 @@ int replace_page_cache_page(struct page + } + EXPORT_SYMBOL_GPL(replace_page_cache_page); + +-static int page_cache_tree_insert(struct address_space *mapping, +- struct page *page, void **shadowp) +-{ +- struct radix_tree_node *node; +- void **slot; +- int error; +- +- error = __radix_tree_create(&mapping->page_tree, page->index, +- &node, &slot); +- if (error) +- return error; +- if (*slot) { +- void *p; +- +- p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock); +- if (!radix_tree_exceptional_entry(p)) +- return -EEXIST; +- if (shadowp) +- *shadowp = p; +- mapping->nrshadows--; +- if (node) +- workingset_node_shadows_dec(node); +- } +- radix_tree_replace_slot(slot, page); +- mapping->nrpages++; +- if (node) { +- workingset_node_pages_inc(node); +- /* +- * Don't track node that contains actual pages. +- * +- * Avoid acquiring the list_lru lock if already +- * untracked. The list_empty() test is safe as +- * node->private_list is protected by +- * mapping->tree_lock. +- */ +- if (!list_empty(&node->private_list)) +- list_lru_del(&workingset_shadow_nodes, +- &node->private_list); +- } +- return 0; +-} +- + static int __add_to_page_cache_locked(struct page *page, + struct address_space *mapping, + pgoff_t offset, gfp_t gfp_mask, +--- a/mm/workingset.c ++++ b/mm/workingset.c +@@ -341,21 +341,19 @@ static enum lru_status shadow_lru_isolat + * no pages, so we expect to be able to remove them all and + * delete and free the empty node afterwards. + */ +- +- BUG_ON(!node->count); +- BUG_ON(node->count & RADIX_TREE_COUNT_MASK); ++ BUG_ON(!workingset_node_shadows(node)); ++ BUG_ON(workingset_node_pages(node)); + + for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) { + if (node->slots[i]) { + BUG_ON(!radix_tree_exceptional_entry(node->slots[i])); + node->slots[i] = NULL; +- BUG_ON(node->count < (1U << RADIX_TREE_COUNT_SHIFT)); +- node->count -= 1U << RADIX_TREE_COUNT_SHIFT; ++ workingset_node_shadows_dec(node); + BUG_ON(!mapping->nrshadows); + mapping->nrshadows--; + } + } +- BUG_ON(node->count); ++ BUG_ON(workingset_node_shadows(node)); + inc_zone_state(page_zone(virt_to_page(node)), WORKINGSET_NODERECLAIM); + if (!__radix_tree_delete_node(&mapping->page_tree, node)) + BUG(); diff --git a/queue-4.4/sched-fair-fix-min_vruntime-tracking.patch b/queue-4.4/sched-fair-fix-min_vruntime-tracking.patch deleted file mode 100644 index 9d6e55c29cb..00000000000 --- a/queue-4.4/sched-fair-fix-min_vruntime-tracking.patch +++ /dev/null @@ -1,104 +0,0 @@ -From b60205c7c558330e4e2b5df498355ec959457358 Mon Sep 17 00:00:00 2001 -From: Peter Zijlstra -Date: Tue, 20 Sep 2016 21:58:12 +0200 -Subject: sched/fair: Fix min_vruntime tracking - -From: Peter Zijlstra - -commit b60205c7c558330e4e2b5df498355ec959457358 upstream. - -While going through enqueue/dequeue to review the movement of -set_curr_task() I noticed that the (2nd) update_min_vruntime() call in -dequeue_entity() is suspect. - -It turns out, its actually wrong because it will consider -cfs_rq->curr, which could be the entry we just normalized. This mixes -different vruntime forms and leads to fail. - -The purpose of the second update_min_vruntime() is to move -min_vruntime forward if the entity we just removed is the one that was -holding it back; _except_ for the DEQUEUE_SAVE case, because then we -know its a temporary removal and it will come back. - -However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr -will still be set (and per the above, can be tranformed into a -different unit), so update_min_vruntime() should also consider -curr->on_rq. This also fixes another corner case where the enqueue -(which also does update_curr()->update_min_vruntime()) happens on the -rq->lock break in schedule(), between dequeue and put_prev_task. - -Signed-off-by: Peter Zijlstra (Intel) -Cc: Linus Torvalds -Cc: Mike Galbraith -Cc: Peter Zijlstra -Cc: Thomas Gleixner -Cc: linux-kernel@vger.kernel.org -Fixes: 1e876231785d ("sched: Fix ->min_vruntime calculation in dequeue_entity()") -Signed-off-by: Ingo Molnar -Signed-off-by: Greg Kroah-Hartman - ---- - kernel/sched/fair.c | 29 ++++++++++++++++++++++------- - 1 file changed, 22 insertions(+), 7 deletions(-) - ---- a/kernel/sched/fair.c -+++ b/kernel/sched/fair.c -@@ -456,17 +456,23 @@ static inline int entity_before(struct s - - static void update_min_vruntime(struct cfs_rq *cfs_rq) - { -+ struct sched_entity *curr = cfs_rq->curr; -+ - u64 vruntime = cfs_rq->min_vruntime; - -- if (cfs_rq->curr) -- vruntime = cfs_rq->curr->vruntime; -+ if (curr) { -+ if (curr->on_rq) -+ vruntime = curr->vruntime; -+ else -+ curr = NULL; -+ } - - if (cfs_rq->rb_leftmost) { - struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost, - struct sched_entity, - run_node); - -- if (!cfs_rq->curr) -+ if (!curr) - vruntime = se->vruntime; - else - vruntime = min_vruntime(vruntime, se->vruntime); -@@ -3139,9 +3145,10 @@ dequeue_entity(struct cfs_rq *cfs_rq, st - account_entity_dequeue(cfs_rq, se); - - /* -- * Normalize the entity after updating the min_vruntime because the -- * update can refer to the ->curr item and we need to reflect this -- * movement in our normalized position. -+ * Normalize after update_curr(); which will also have moved -+ * min_vruntime if @se is the one holding it back. But before doing -+ * update_min_vruntime() again, which will discount @se's position and -+ * can move min_vruntime forward still more. - */ - if (!(flags & DEQUEUE_SLEEP)) - se->vruntime -= cfs_rq->min_vruntime; -@@ -3149,8 +3156,16 @@ dequeue_entity(struct cfs_rq *cfs_rq, st - /* return excess runtime on last dequeue */ - return_cfs_rq_runtime(cfs_rq); - -- update_min_vruntime(cfs_rq); - update_cfs_shares(cfs_rq); -+ -+ /* -+ * Now advance min_vruntime if @se was the entity holding it back, -+ * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be -+ * put back on, and if we advance min_vruntime, we'll be placed back -+ * further than we started -- ie. we'll be penalized. -+ */ -+ if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE) -+ update_min_vruntime(cfs_rq); - } - - /* diff --git a/queue-4.4/series b/queue-4.4/series index a74f25075b6..ccc7762f387 100644 --- a/queue-4.4/series +++ b/queue-4.4/series @@ -82,7 +82,6 @@ blkcg-unlock-blkcg_pol_mutex-only-once-when-cpd-null.patch x86-e820-don-t-merge-consecutive-e820_pram-ranges.patch kvm-x86-memset-whole-irq_eoi.patch pinctrl-intel-only-restore-pins-that-are-used-by-the-driver.patch -sched-fair-fix-min_vruntime-tracking.patch irqchip-gicv3-handle-loop-timeout-proper.patch sd-fix-rw_max-for-devices-that-report-an-optimal-xfer-size.patch hpsa-correct-skipping-masked-peripherals.patch @@ -90,3 +89,6 @@ pkcs-7-don-t-require-spcspopusinfo-in-authenticode-pkcs7-signatures.patch bnx2x-prevent-false-warning-for-lack-of-fc-npiv.patch net-mlx4_core-allow-resetting-vf-admin-mac-to-zero.patch acpi-nfit-check-for-the-correct-event-code-in-notifications.patch +mm-workingset-fix-crash-in-shadow-node-shrinker-caused-by-replace_page_cache_page.patch +mm-filemap-fix-mapping-nrpages-double-accounting-in-fuse.patch +using-bug_on-as-an-assert-is-_never_-acceptable.patch diff --git a/queue-4.4/using-bug_on-as-an-assert-is-_never_-acceptable.patch b/queue-4.4/using-bug_on-as-an-assert-is-_never_-acceptable.patch new file mode 100644 index 00000000000..5d687c08e49 --- /dev/null +++ b/queue-4.4/using-bug_on-as-an-assert-is-_never_-acceptable.patch @@ -0,0 +1,53 @@ +From 21f54ddae449f4bdd9f1498124901d67202243d9 Mon Sep 17 00:00:00 2001 +From: Linus Torvalds +Date: Mon, 3 Oct 2016 21:03:48 -0700 +Subject: Using BUG_ON() as an assert() is _never_ acceptable + +From: Linus Torvalds + +commit 21f54ddae449f4bdd9f1498124901d67202243d9 upstream. + +That just generally kills the machine, and makes debugging only much +harder, since the traces may long be gone. + +Debugging by assert() is a disease. Don't do it. If you can continue, +you're much better off doing so with a live machine where you have a +much higher chance that the report actually makes it to the system logs, +rather than result in a machine that is just completely dead. + +The only valid situation for BUG_ON() is when continuing is not an +option, because there is massive corruption. But if you are just +verifying that something is true, you warn about your broken assumptions +(preferably just once), and limp on. + +Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()") +Cc: Johannes Weiner +Cc: Miklos Szeredi +Cc: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Michal Hocko +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/swap.h | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/include/linux/swap.h ++++ b/include/linux/swap.h +@@ -266,7 +266,7 @@ static inline void workingset_node_pages + + static inline void workingset_node_pages_dec(struct radix_tree_node *node) + { +- VM_BUG_ON(!workingset_node_pages(node)); ++ VM_WARN_ON_ONCE(!workingset_node_pages(node)); + node->count--; + } + +@@ -282,7 +282,7 @@ static inline void workingset_node_shado + + static inline void workingset_node_shadows_dec(struct radix_tree_node *node) + { +- VM_BUG_ON(!workingset_node_shadows(node)); ++ VM_WARN_ON_ONCE(!workingset_node_shadows(node)); + node->count -= 1U << RADIX_TREE_COUNT_SHIFT; + } +