From: Sasha Levin Date: Sat, 4 Jul 2020 16:29:56 +0000 (-0400) Subject: Fixes for 5.4 X-Git-Tag: v4.4.230~39 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=d5a523b6d399dc42d607f7fcbdf3214d0ccf009b;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for 5.4 Signed-off-by: Sasha Levin --- diff --git a/queue-5.4/kgdb-avoid-suspicious-rcu-usage-warning.patch b/queue-5.4/kgdb-avoid-suspicious-rcu-usage-warning.patch new file mode 100644 index 00000000000..9c0917b8359 --- /dev/null +++ b/queue-5.4/kgdb-avoid-suspicious-rcu-usage-warning.patch @@ -0,0 +1,109 @@ +From d702ebb5acdc07fce6a085384100373795e7f23f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 2 Jun 2020 15:47:39 -0700 +Subject: kgdb: Avoid suspicious RCU usage warning + +From: Douglas Anderson + +[ Upstream commit 440ab9e10e2e6e5fd677473ee6f9e3af0f6904d6 ] + +At times when I'm using kgdb I see a splat on my console about +suspicious RCU usage. I managed to come up with a case that could +reproduce this that looked like this: + + WARNING: suspicious RCU usage + 5.7.0-rc4+ #609 Not tainted + ----------------------------- + kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection! + + other info that might help us debug this: + + rcu_scheduler_active = 2, debug_locks = 1 + 3 locks held by swapper/0/1: + #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c + #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac + #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac + + stack backtrace: + CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609 + Hardware name: Google Cheza (rev3+) (DT) + Call trace: + dump_backtrace+0x0/0x1b8 + show_stack+0x1c/0x24 + dump_stack+0xd4/0x134 + lockdep_rcu_suspicious+0xf0/0x100 + find_task_by_pid_ns+0x5c/0x80 + getthread+0x8c/0xb0 + gdb_serial_stub+0x9d4/0xd04 + kgdb_cpu_enter+0x284/0x7ac + kgdb_handle_exception+0x174/0x20c + kgdb_brk_fn+0x24/0x30 + call_break_hook+0x6c/0x7c + brk_handler+0x20/0x5c + do_debug_exception+0x1c8/0x22c + el1_sync_handler+0x3c/0xe4 + el1_sync+0x7c/0x100 + rpmh_rsc_probe+0x38/0x420 + platform_drv_probe+0x94/0xb4 + really_probe+0x134/0x300 + driver_probe_device+0x68/0x100 + __device_attach_driver+0x90/0xa8 + bus_for_each_drv+0x84/0xcc + __device_attach+0xb4/0x13c + device_initial_probe+0x18/0x20 + bus_probe_device+0x38/0x98 + device_add+0x38c/0x420 + +If I understand properly we should just be able to blanket kgdb under +one big RCU read lock and the problem should go away. We'll add it to +the beast-of-a-function known as kgdb_cpu_enter(). + +With this I no longer get any splats and things seem to work fine. + +Signed-off-by: Douglas Anderson +Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e1b7a5bb7e4e@changeid +Signed-off-by: Daniel Thompson +Signed-off-by: Sasha Levin +--- + kernel/debug/debug_core.c | 4 ++++ + 1 file changed, 4 insertions(+) + +diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c +index 7d54c7c280544..2222f3225e53d 100644 +--- a/kernel/debug/debug_core.c ++++ b/kernel/debug/debug_core.c +@@ -546,6 +546,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, + arch_kgdb_ops.disable_hw_break(regs); + + acquirelock: ++ rcu_read_lock(); + /* + * Interrupts will be restored by the 'trap return' code, except when + * single stepping. +@@ -602,6 +603,7 @@ return_normal: + atomic_dec(&slaves_in_kgdb); + dbg_touch_watchdogs(); + local_irq_restore(flags); ++ rcu_read_unlock(); + return 0; + } + cpu_relax(); +@@ -620,6 +622,7 @@ return_normal: + raw_spin_unlock(&dbg_master_lock); + dbg_touch_watchdogs(); + local_irq_restore(flags); ++ rcu_read_unlock(); + + goto acquirelock; + } +@@ -743,6 +746,7 @@ kgdb_restore: + raw_spin_unlock(&dbg_master_lock); + dbg_touch_watchdogs(); + local_irq_restore(flags); ++ rcu_read_unlock(); + + return kgdb_info[cpu].ret_state; + } +-- +2.25.1 + diff --git a/queue-5.4/mm-slub-fix-stack-overruns-with-slub_stats.patch b/queue-5.4/mm-slub-fix-stack-overruns-with-slub_stats.patch new file mode 100644 index 00000000000..c4b2c43edf8 --- /dev/null +++ b/queue-5.4/mm-slub-fix-stack-overruns-with-slub_stats.patch @@ -0,0 +1,90 @@ +From 16868695d3cac53a405e7ecb68df570c6f8c139a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 1 Jun 2020 21:45:57 -0700 +Subject: mm/slub: fix stack overruns with SLUB_STATS + +From: Qian Cai + +[ Upstream commit a68ee0573991e90af2f1785db309206408bad3e5 ] + +There is no need to copy SLUB_STATS items from root memcg cache to new +memcg cache copies. Doing so could result in stack overruns because the +store function only accepts 0 to clear the stat and returns an error for +everything else while the show method would print out the whole stat. + +Then, the mismatch of the lengths returns from show and store methods +happens in memcg_propagate_slab_attrs(): + + else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) + buf = mbuf; + +max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64] +in show_stat() later where a bounch of sprintf() would overrun the stack +variable. Fix it by always allocating a page of buffer to be used in +show_stat() if SLUB_STATS=y which should only be used for debug purpose. + + # echo 1 > /sys/kernel/slab/fs_cache/shrink + BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0 + Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251 + + Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 + Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func + Call Trace: + number+0x421/0x6e0 + vsnprintf+0x451/0x8e0 + sprintf+0x9e/0xd0 + show_stat+0x124/0x1d0 + alloc_slowpath_show+0x13/0x20 + __kmem_cache_create+0x47a/0x6b0 + + addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame: + process_one_work+0x0/0xb90 + + this frame has 1 object: + [32, 72) 'lockdep_map' + + Memory state around the buggy address: + ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 + ^ + ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 + ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ================================================================== + Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0 + Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func + Call Trace: + __kmem_cache_create+0x6ac/0x6b0 + +Fixes: 107dab5c92d5 ("slub: slub-specific propagation changes") +Signed-off-by: Qian Cai +Signed-off-by: Andrew Morton +Cc: Glauber Costa +Cc: Christoph Lameter +Cc: Pekka Enberg +Cc: David Rientjes +Cc: Joonsoo Kim +Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pw +Signed-off-by: Linus Torvalds +Signed-off-by: Sasha Levin +--- + mm/slub.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/mm/slub.c b/mm/slub.c +index 5c05a36bb746f..709e31002504c 100644 +--- a/mm/slub.c ++++ b/mm/slub.c +@@ -5648,7 +5648,8 @@ static void memcg_propagate_slab_attrs(struct kmem_cache *s) + */ + if (buffer) + buf = buffer; +- else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) ++ else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf) && ++ !IS_ENABLED(CONFIG_SLUB_STATS)) + buf = mbuf; + else { + buffer = (char *) get_zeroed_page(GFP_KERNEL); +-- +2.25.1 + diff --git a/queue-5.4/mm-slub.c-fix-corrupted-freechain-in-deactivate_slab.patch b/queue-5.4/mm-slub.c-fix-corrupted-freechain-in-deactivate_slab.patch new file mode 100644 index 00000000000..57874a8c566 --- /dev/null +++ b/queue-5.4/mm-slub.c-fix-corrupted-freechain-in-deactivate_slab.patch @@ -0,0 +1,115 @@ +From ac95d8e9c6d5b805602aa442b52ee065cf878af6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 1 Jun 2020 21:45:47 -0700 +Subject: mm/slub.c: fix corrupted freechain in deactivate_slab() + +From: Dongli Zhang + +[ Upstream commit 52f23478081ae0dcdb95d1650ea1e7d52d586829 ] + +The slub_debug is able to fix the corrupted slab freelist/page. +However, alloc_debug_processing() only checks the validity of current +and next freepointer during allocation path. As a result, once some +objects have their freepointers corrupted, deactivate_slab() may lead to +page fault. + +Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 +slub_nomerge'. The test kernel corrupts the freepointer of one free +object on purpose. Unfortunately, deactivate_slab() does not detect it +when iterating the freechain. + + BUG: unable to handle page fault for address: 00000000123456f8 + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 0 P4D 0 + Oops: 0000 [#1] SMP PTI + ... ... + RIP: 0010:deactivate_slab.isra.92+0xed/0x490 + ... ... + Call Trace: + ___slab_alloc+0x536/0x570 + __slab_alloc+0x17/0x30 + __kmalloc+0x1d9/0x200 + ext4_htree_store_dirent+0x30/0xf0 + htree_dirblock_to_tree+0xcb/0x1c0 + ext4_htree_fill_tree+0x1bc/0x2d0 + ext4_readdir+0x54f/0x920 + iterate_dir+0x88/0x190 + __x64_sys_getdents+0xa6/0x140 + do_syscall_64+0x49/0x170 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + +Therefore, this patch adds extra consistency check in deactivate_slab(). +Once an object's freepointer is corrupted, all following objects +starting at this object are isolated. + +[akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n] +Signed-off-by: Dongli Zhang +Signed-off-by: Andrew Morton +Cc: Joe Jin +Cc: Christoph Lameter +Cc: Pekka Enberg +Cc: David Rientjes +Cc: Joonsoo Kim +Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.com +Signed-off-by: Linus Torvalds +Signed-off-by: Sasha Levin +--- + mm/slub.c | 27 +++++++++++++++++++++++++++ + 1 file changed, 27 insertions(+) + +diff --git a/mm/slub.c b/mm/slub.c +index fca33abd6c428..5c05a36bb746f 100644 +--- a/mm/slub.c ++++ b/mm/slub.c +@@ -644,6 +644,20 @@ static void slab_fix(struct kmem_cache *s, char *fmt, ...) + va_end(args); + } + ++static bool freelist_corrupted(struct kmem_cache *s, struct page *page, ++ void *freelist, void *nextfree) ++{ ++ if ((s->flags & SLAB_CONSISTENCY_CHECKS) && ++ !check_valid_pointer(s, page, nextfree)) { ++ object_err(s, page, freelist, "Freechain corrupt"); ++ freelist = NULL; ++ slab_fix(s, "Isolate corrupted freechain"); ++ return true; ++ } ++ ++ return false; ++} ++ + static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p) + { + unsigned int off; /* Offset of last byte */ +@@ -1379,6 +1393,11 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node, + static inline void dec_slabs_node(struct kmem_cache *s, int node, + int objects) {} + ++static bool freelist_corrupted(struct kmem_cache *s, struct page *page, ++ void *freelist, void *nextfree) ++{ ++ return false; ++} + #endif /* CONFIG_SLUB_DEBUG */ + + /* +@@ -2062,6 +2081,14 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page, + void *prior; + unsigned long counters; + ++ /* ++ * If 'nextfree' is invalid, it is possible that the object at ++ * 'freelist' is already corrupted. So isolate all objects ++ * starting at 'freelist'. ++ */ ++ if (freelist_corrupted(s, page, freelist, nextfree)) ++ break; ++ + do { + prior = page->freelist; + counters = page->counters; +-- +2.25.1 + diff --git a/queue-5.4/nvme-fix-possible-deadlock-when-i-o-is-blocked.patch b/queue-5.4/nvme-fix-possible-deadlock-when-i-o-is-blocked.patch new file mode 100644 index 00000000000..96bd1d675ba --- /dev/null +++ b/queue-5.4/nvme-fix-possible-deadlock-when-i-o-is-blocked.patch @@ -0,0 +1,124 @@ +From ff0ab77899a11463749e83e2209d95f311d27e15 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 24 Jun 2020 01:53:08 -0700 +Subject: nvme: fix possible deadlock when I/O is blocked + +From: Sagi Grimberg + +[ Upstream commit 3b4b19721ec652ad2c4fe51dfbe5124212b5f581 ] + +Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk +in nvme_validate_ns") + +When adding a new namespace to the head disk (via nvme_mpath_set_live) +we will see partition scan which triggers I/O on the mpath device node. +This process will usually be triggered from the scan_work which holds +the scan_lock. If I/O blocks (if we got ana change currently have only +available paths but none are accessible) this can deadlock on the head +disk bd_mutex as both partition scan I/O takes it, and head disk revalidation +takes it to check for resize (also triggered from scan_work on a different +path). See trace [1]. + +The mpath disk revalidation was originally added to detect online disk +size change, but this is no longer needed since commit cb224c3af4df +("nvme: Convert to use set_capacity_revalidate_and_notify") which already +updates resize info without unnecessarily revalidating the disk (the +mpath disk doesn't even implement .revalidate_disk fop). + +[1]: +-- +kernel: INFO: task kworker/u65:9:494 blocked for more than 241 seconds. +kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 +kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. +kernel: kworker/u65:9 D 0 494 2 0x80004000 +kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] +kernel: Call Trace: +kernel: __schedule+0x2b9/0x6c0 +kernel: schedule+0x42/0xb0 +kernel: schedule_preempt_disabled+0xe/0x10 +kernel: __mutex_lock.isra.0+0x182/0x4f0 +kernel: __mutex_lock_slowpath+0x13/0x20 +kernel: mutex_lock+0x2e/0x40 +kernel: revalidate_disk+0x63/0xa0 +kernel: __nvme_revalidate_disk+0xfe/0x110 [nvme_core] +kernel: nvme_revalidate_disk+0xa4/0x160 [nvme_core] +kernel: ? evict+0x14c/0x1b0 +kernel: revalidate_disk+0x2b/0xa0 +kernel: nvme_validate_ns+0x49/0x940 [nvme_core] +kernel: ? blk_mq_free_request+0xd2/0x100 +kernel: ? __nvme_submit_sync_cmd+0xbe/0x1e0 [nvme_core] +kernel: nvme_scan_work+0x24f/0x380 [nvme_core] +kernel: process_one_work+0x1db/0x380 +kernel: worker_thread+0x249/0x400 +kernel: kthread+0x104/0x140 +kernel: ? process_one_work+0x380/0x380 +kernel: ? kthread_park+0x80/0x80 +kernel: ret_from_fork+0x1f/0x40 +... +kernel: INFO: task kworker/u65:1:2630 blocked for more than 241 seconds. +kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 +kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. +kernel: kworker/u65:1 D 0 2630 2 0x80004000 +kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] +kernel: Call Trace: +kernel: __schedule+0x2b9/0x6c0 +kernel: schedule+0x42/0xb0 +kernel: io_schedule+0x16/0x40 +kernel: do_read_cache_page+0x438/0x830 +kernel: ? __switch_to_asm+0x34/0x70 +kernel: ? file_fdatawait_range+0x30/0x30 +kernel: read_cache_page+0x12/0x20 +kernel: read_dev_sector+0x27/0xc0 +kernel: read_lba+0xc1/0x220 +kernel: ? kmem_cache_alloc_trace+0x19c/0x230 +kernel: efi_partition+0x1e6/0x708 +kernel: ? vsnprintf+0x39e/0x4e0 +kernel: ? snprintf+0x49/0x60 +kernel: check_partition+0x154/0x244 +kernel: rescan_partitions+0xae/0x280 +kernel: __blkdev_get+0x40f/0x560 +kernel: blkdev_get+0x3d/0x140 +kernel: __device_add_disk+0x388/0x480 +kernel: device_add_disk+0x13/0x20 +kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] +kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] +kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] +kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] +kernel: ? nvme_update_ns_ana_state+0x60/0x60 [nvme_core] +kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] +kernel: nvme_validate_ns+0x396/0x940 [nvme_core] +kernel: ? blk_mq_free_request+0xd2/0x100 +kernel: nvme_scan_work+0x24f/0x380 [nvme_core] +kernel: process_one_work+0x1db/0x380 +kernel: worker_thread+0x249/0x400 +kernel: kthread+0x104/0x140 +kernel: ? process_one_work+0x380/0x380 +kernel: ? kthread_park+0x80/0x80 +kernel: ret_from_fork+0x1f/0x40 +-- + +Fixes: fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk +in nvme_validate_ns") +Signed-off-by: Anton Eidelman +Signed-off-by: Sagi Grimberg +Signed-off-by: Christoph Hellwig +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/core.c | 1 - + 1 file changed, 1 deletion(-) + +diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c +index d4b388793f40d..c44c00b9e1d85 100644 +--- a/drivers/nvme/host/core.c ++++ b/drivers/nvme/host/core.c +@@ -1870,7 +1870,6 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id) + if (ns->head->disk) { + nvme_update_disk_info(ns->head->disk, ns, id); + blk_queue_stack_limits(ns->head->disk->queue, ns->queue); +- revalidate_disk(ns->head->disk); + } + #endif + } +-- +2.25.1 + diff --git a/queue-5.4/nvme-multipath-fix-bogus-request-queue-reference-put.patch b/queue-5.4/nvme-multipath-fix-bogus-request-queue-reference-put.patch new file mode 100644 index 00000000000..5f6b699b0b8 --- /dev/null +++ b/queue-5.4/nvme-multipath-fix-bogus-request-queue-reference-put.patch @@ -0,0 +1,84 @@ +From 44ce411b3c7d7f6b61219b667bf3a3b7ef55856a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 24 Jun 2020 01:53:12 -0700 +Subject: nvme-multipath: fix bogus request queue reference put + +From: Sagi Grimberg + +[ Upstream commit c31244669f57963b6ce133a5555b118fc50aec95 ] + +The mpath disk node takes a reference on the request mpath +request queue when adding live path to the mpath gendisk. +However if we connected to an inaccessible path device_add_disk +is not called, so if we disconnect and remove the mpath gendisk +we endup putting an reference on the request queue that was +never taken [1]. + +Fix that to check if we ever added a live path (using +NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue +reference. + +[1]: +------------[ cut here ]------------ +refcount_t: underflow; use-after-free. +WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 +CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ #3 +Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 +RIP: 0010:refcount_warn_saturate+0xa6/0xf0 +RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 +RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 +RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 +RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 +R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 +R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 +FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 +CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 +DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 +DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 +Call Trace: + disk_release+0xa2/0xc0 + device_release+0x28/0x80 + kobject_put+0xa5/0x1b0 + nvme_put_ns_head+0x26/0x70 [nvme_core] + nvme_put_ns+0x30/0x60 [nvme_core] + nvme_remove_namespaces+0x9b/0xe0 [nvme_core] + nvme_do_delete_ctrl+0x43/0x5c [nvme_core] + nvme_sysfs_delete.cold+0x8/0xd [nvme_core] + kernfs_fop_write+0xc1/0x1a0 + vfs_write+0xb6/0x1a0 + ksys_write+0x5f/0xe0 + do_syscall_64+0x52/0x1a0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + +Reported-by: Anton Eidelman +Tested-by: Anton Eidelman +Signed-off-by: Sagi Grimberg +Signed-off-by: Christoph Hellwig +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/multipath.c | 8 ++++++++ + 1 file changed, 8 insertions(+) + +diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c +index 574b52e911f08..e1eeed5856570 100644 +--- a/drivers/nvme/host/multipath.c ++++ b/drivers/nvme/host/multipath.c +@@ -691,6 +691,14 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) + kblockd_schedule_work(&head->requeue_work); + flush_work(&head->requeue_work); + blk_cleanup_queue(head->disk->queue); ++ if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) { ++ /* ++ * if device_add_disk wasn't called, prevent ++ * disk release to put a bogus reference on the ++ * request queue ++ */ ++ head->disk->queue = NULL; ++ } + put_disk(head->disk); + } + +-- +2.25.1 + diff --git a/queue-5.4/nvme-multipath-fix-deadlock-between-ana_work-and-sca.patch b/queue-5.4/nvme-multipath-fix-deadlock-between-ana_work-and-sca.patch new file mode 100644 index 00000000000..8cc1eac727d --- /dev/null +++ b/queue-5.4/nvme-multipath-fix-deadlock-between-ana_work-and-sca.patch @@ -0,0 +1,134 @@ +From f2f53fa48061a5de3db51d71404f0abdae1bb90c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 24 Jun 2020 01:53:09 -0700 +Subject: nvme-multipath: fix deadlock between ana_work and scan_work + +From: Anton Eidelman + +[ Upstream commit 489dd102a2c7c94d783a35f9412eb085b8da1aa4 ] + +When scan_work calls nvme_mpath_add_disk() this holds ana_lock +and invokes nvme_parse_ana_log(), which may issue IO +in device_add_disk() and hang waiting for an accessible path. +While nvme_mpath_set_live() only called when nvme_state_is_live(), +a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. + +In order to recover and complete the IO ana_work on the same ctrl +should be able to update the path state and remove NVME_NS_ANA_PENDING. + +The deadlock occurs because scan_work keeps holding ana_lock, +so ana_work hangs [1]. + +Fix: +Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy +of the ANA group desc, and then calls nvme_update_ns_ana_state() without +holding ana_lock. + +[1]: +kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] +kernel: Call Trace: +kernel: __schedule+0x2b9/0x6c0 +kernel: schedule+0x42/0xb0 +kernel: io_schedule+0x16/0x40 +kernel: do_read_cache_page+0x438/0x830 +kernel: read_cache_page+0x12/0x20 +kernel: read_dev_sector+0x27/0xc0 +kernel: read_lba+0xc1/0x220 +kernel: efi_partition+0x1e6/0x708 +kernel: check_partition+0x154/0x244 +kernel: rescan_partitions+0xae/0x280 +kernel: __blkdev_get+0x40f/0x560 +kernel: blkdev_get+0x3d/0x140 +kernel: __device_add_disk+0x388/0x480 +kernel: device_add_disk+0x13/0x20 +kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] +kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] +kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] +kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] +kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] +kernel: nvme_validate_ns+0x396/0x940 [nvme_core] +kernel: nvme_scan_work+0x24f/0x380 [nvme_core] +kernel: process_one_work+0x1db/0x380 +kernel: worker_thread+0x249/0x400 +kernel: kthread+0x104/0x140 + +kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] +kernel: Call Trace: +kernel: __schedule+0x2b9/0x6c0 +kernel: schedule+0x42/0xb0 +kernel: schedule_preempt_disabled+0xe/0x10 +kernel: __mutex_lock.isra.0+0x182/0x4f0 +kernel: ? __switch_to_asm+0x34/0x70 +kernel: ? select_task_rq_fair+0x1aa/0x5c0 +kernel: ? kvm_sched_clock_read+0x11/0x20 +kernel: ? sched_clock+0x9/0x10 +kernel: __mutex_lock_slowpath+0x13/0x20 +kernel: mutex_lock+0x2e/0x40 +kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core] +kernel: nvme_ana_work+0x15/0x20 [nvme_core] +kernel: process_one_work+0x1db/0x380 +kernel: worker_thread+0x4d/0x400 +kernel: kthread+0x104/0x140 +kernel: ? process_one_work+0x380/0x380 +kernel: ? kthread_park+0x80/0x80 +kernel: ret_from_fork+0x35/0x40 + +Fixes: 0d0b660f214d ("nvme: add ANA support") +Signed-off-by: Anton Eidelman +Signed-off-by: Sagi Grimberg +Signed-off-by: Christoph Hellwig +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/multipath.c | 24 ++++++++++++++++-------- + 1 file changed, 16 insertions(+), 8 deletions(-) + +diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c +index 7cc0ec180d555..18f0a05c74b56 100644 +--- a/drivers/nvme/host/multipath.c ++++ b/drivers/nvme/host/multipath.c +@@ -639,26 +639,34 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr, + } + DEVICE_ATTR_RO(ana_state); + +-static int nvme_set_ns_ana_state(struct nvme_ctrl *ctrl, ++static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl, + struct nvme_ana_group_desc *desc, void *data) + { +- struct nvme_ns *ns = data; ++ struct nvme_ana_group_desc *dst = data; + +- if (ns->ana_grpid == le32_to_cpu(desc->grpid)) { +- nvme_update_ns_ana_state(desc, ns); +- return -ENXIO; /* just break out of the loop */ +- } ++ if (desc->grpid != dst->grpid) ++ return 0; + +- return 0; ++ *dst = *desc; ++ return -ENXIO; /* just break out of the loop */ + } + + void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id) + { + if (nvme_ctrl_use_ana(ns->ctrl)) { ++ struct nvme_ana_group_desc desc = { ++ .grpid = id->anagrpid, ++ .state = 0, ++ }; ++ + mutex_lock(&ns->ctrl->ana_lock); + ns->ana_grpid = le32_to_cpu(id->anagrpid); +- nvme_parse_ana_log(ns->ctrl, ns, nvme_set_ns_ana_state); ++ nvme_parse_ana_log(ns->ctrl, &desc, nvme_lookup_ana_group_desc); + mutex_unlock(&ns->ctrl->ana_lock); ++ if (desc.state) { ++ /* found the group desc: update */ ++ nvme_update_ns_ana_state(&desc, ns); ++ } + } else { + ns->ana_state = NVME_ANA_OPTIMIZED; + nvme_mpath_set_live(ns); +-- +2.25.1 + diff --git a/queue-5.4/nvme-multipath-fix-deadlock-due-to-head-lock.patch b/queue-5.4/nvme-multipath-fix-deadlock-due-to-head-lock.patch new file mode 100644 index 00000000000..9983ccdabc3 --- /dev/null +++ b/queue-5.4/nvme-multipath-fix-deadlock-due-to-head-lock.patch @@ -0,0 +1,124 @@ +From dae462879a77d4d97004c7f1d1aac2c233bf2632 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 24 Jun 2020 01:53:11 -0700 +Subject: nvme-multipath: fix deadlock due to head->lock + +From: Anton Eidelman + +[ Upstream commit d8a22f85609fadb46ba699e0136cc3ebdeebff79 ] + +In the following scenario scan_work and ana_work will deadlock: + +When scan_work calls nvme_mpath_add_disk() this holds ana_lock +and invokes nvme_parse_ana_log(), which may issue IO +in device_add_disk() and hang waiting for an accessible path. + +While nvme_mpath_set_live() only called when nvme_state_is_live(), +a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. + +Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on +ANY ctrl will not be able to complete nvme_mpath_set_live() +on the same ns->head, which is required in order to update +the new accessible path and remove NVME_NS_ANA_PENDING.. +Therefore IO never completes: deadlock [1]. + +Fix: +Move device_add_disk out of the head->lock and protect it with an +atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit. + +[1]: +kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds. +kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 +kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. +kernel: kworker/u8:2 D 0 160 2 0x80004000 +kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] +kernel: Call Trace: +kernel: __schedule+0x2b9/0x6c0 +kernel: schedule+0x42/0xb0 +kernel: schedule_preempt_disabled+0xe/0x10 +kernel: __mutex_lock.isra.0+0x182/0x4f0 +kernel: __mutex_lock_slowpath+0x13/0x20 +kernel: mutex_lock+0x2e/0x40 +kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core] +kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core] +kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] +kernel: nvme_read_ana_log+0x76/0x100 [nvme_core] +kernel: nvme_ana_work+0x15/0x20 [nvme_core] +kernel: process_one_work+0x1db/0x380 +kernel: worker_thread+0x4d/0x400 +kernel: kthread+0x104/0x140 +kernel: ret_from_fork+0x35/0x40 +kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds. +kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 +kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. +kernel: kworker/u8:4 D 0 439 2 0x80004000 +kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] +kernel: Call Trace: +kernel: __schedule+0x2b9/0x6c0 +kernel: schedule+0x42/0xb0 +kernel: io_schedule+0x16/0x40 +kernel: do_read_cache_page+0x438/0x830 +kernel: read_cache_page+0x12/0x20 +kernel: read_dev_sector+0x27/0xc0 +kernel: read_lba+0xc1/0x220 +kernel: efi_partition+0x1e6/0x708 +kernel: check_partition+0x154/0x244 +kernel: rescan_partitions+0xae/0x280 +kernel: __blkdev_get+0x40f/0x560 +kernel: blkdev_get+0x3d/0x140 +kernel: __device_add_disk+0x388/0x480 +kernel: device_add_disk+0x13/0x20 +kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] +kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] +kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core] +kernel: nvme_validate_ns+0x396/0x940 [nvme_core] +kernel: nvme_scan_work+0x256/0x390 [nvme_core] +kernel: process_one_work+0x1db/0x380 +kernel: worker_thread+0x4d/0x400 +kernel: kthread+0x104/0x140 +kernel: ret_from_fork+0x35/0x40 + +Fixes: 0d0b660f214d ("nvme: add ANA support") +Signed-off-by: Anton Eidelman +Signed-off-by: Sagi Grimberg +Signed-off-by: Christoph Hellwig +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/multipath.c | 4 ++-- + drivers/nvme/host/nvme.h | 2 ++ + 2 files changed, 4 insertions(+), 2 deletions(-) + +diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c +index 18f0a05c74b56..574b52e911f08 100644 +--- a/drivers/nvme/host/multipath.c ++++ b/drivers/nvme/host/multipath.c +@@ -417,11 +417,11 @@ static void nvme_mpath_set_live(struct nvme_ns *ns) + if (!head->disk) + return; + +- mutex_lock(&head->lock); +- if (!(head->disk->flags & GENHD_FL_UP)) ++ if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) + device_add_disk(&head->subsys->dev, head->disk, + nvme_ns_id_attr_groups); + ++ mutex_lock(&head->lock); + if (nvme_path_is_optimized(ns)) { + int node, srcu_idx; + +diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h +index 22e8401352c22..ed02260862cb5 100644 +--- a/drivers/nvme/host/nvme.h ++++ b/drivers/nvme/host/nvme.h +@@ -345,6 +345,8 @@ struct nvme_ns_head { + spinlock_t requeue_lock; + struct work_struct requeue_work; + struct mutex lock; ++ unsigned long flags; ++#define NVME_NSHEAD_DISK_LIVE 0 + struct nvme_ns __rcu *current_path[]; + #endif + }; +-- +2.25.1 + diff --git a/queue-5.4/nvme-multipath-set-bdi-capabilities-once.patch b/queue-5.4/nvme-multipath-set-bdi-capabilities-once.patch new file mode 100644 index 00000000000..9a5a38f6276 --- /dev/null +++ b/queue-5.4/nvme-multipath-set-bdi-capabilities-once.patch @@ -0,0 +1,51 @@ +From 865046a12fee7b8875fe5d197fca9ec9e2456ae0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 9 Apr 2020 09:09:04 -0700 +Subject: nvme-multipath: set bdi capabilities once + +From: Keith Busch + +[ Upstream commit b2ce4d90690bd29ce5b554e203cd03682dd59697 ] + +The queues' backing device info capabilities don't change with each +namespace revalidation. Set it only when each path's request_queue +is initially added to a multipath queue. + +Signed-off-by: Keith Busch +Reviewed-by: Sagi Grimberg +Signed-off-by: Christoph Hellwig +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/multipath.c | 8 ++++++++ + 1 file changed, 8 insertions(+) + +diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c +index 772eb05e57afe..7cc0ec180d555 100644 +--- a/drivers/nvme/host/multipath.c ++++ b/drivers/nvme/host/multipath.c +@@ -3,6 +3,7 @@ + * Copyright (c) 2017-2018 Christoph Hellwig. + */ + ++#include + #include + #include + #include "nvme.h" +@@ -662,6 +663,13 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id) + ns->ana_state = NVME_ANA_OPTIMIZED; + nvme_mpath_set_live(ns); + } ++ ++ if (bdi_cap_stable_pages_required(ns->queue->backing_dev_info)) { ++ struct backing_dev_info *info = ++ ns->head->disk->queue->backing_dev_info; ++ ++ info->capabilities |= BDI_CAP_STABLE_WRITES; ++ } + } + + void nvme_mpath_remove_disk(struct nvme_ns_head *head) +-- +2.25.1 + diff --git a/queue-5.4/rxrpc-fix-race-between-incoming-ack-parser-and-retra.patch b/queue-5.4/rxrpc-fix-race-between-incoming-ack-parser-and-retra.patch new file mode 100644 index 00000000000..0728ccaa1f7 --- /dev/null +++ b/queue-5.4/rxrpc-fix-race-between-incoming-ack-parser-and-retra.patch @@ -0,0 +1,104 @@ +From ef4e4dc72c18d94fc13b28183ef662edb57d0db3 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 11 Jun 2020 21:57:00 +0100 +Subject: rxrpc: Fix race between incoming ACK parser and retransmitter + +From: David Howells + +[ Upstream commit 2ad6691d988c0c611362ddc2aad89e0fb50e3261 ] + +There's a race between the retransmission code and the received ACK parser. +The problem is that the retransmission loop has to drop the lock under +which it is iterating through the transmission buffer in order to transmit +a packet, but whilst the lock is dropped, the ACK parser can crank the Tx +window round and discard the packets from the buffer. + +The retransmission code then updated the annotations for the wrong packet +and a later retransmission thought it had to retransmit a packet that +wasn't there, leading to a NULL pointer dereference. + +Fix this by: + + (1) Moving the annotation change to before we drop the lock prior to + transmission. This means we can't vary the annotation depending on + the outcome of the transmission, but that's fine - we'll retransmit + again later if it failed now. + + (2) Skipping the packet if the skb pointer is NULL. + +The following oops was seen: + + BUG: kernel NULL pointer dereference, address: 000000000000002d + Workqueue: krxrpcd rxrpc_process_call + RIP: 0010:rxrpc_get_skb+0x14/0x8a + ... + Call Trace: + rxrpc_resend+0x331/0x41e + ? get_vtime_delta+0x13/0x20 + rxrpc_process_call+0x3c0/0x4ac + process_one_work+0x18f/0x27f + worker_thread+0x1a3/0x247 + ? create_worker+0x17d/0x17d + kthread+0xe6/0xeb + ? kthread_delayed_work_timer_fn+0x83/0x83 + ret_from_fork+0x1f/0x30 + +Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") +Signed-off-by: David Howells +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + net/rxrpc/call_event.c | 29 +++++++++++------------------ + 1 file changed, 11 insertions(+), 18 deletions(-) + +diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c +index 2a65ac41055f5..985fb89202d0c 100644 +--- a/net/rxrpc/call_event.c ++++ b/net/rxrpc/call_event.c +@@ -248,7 +248,18 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) + if (anno_type != RXRPC_TX_ANNO_RETRANS) + continue; + ++ /* We need to reset the retransmission state, but we need to do ++ * so before we drop the lock as a new ACK/NAK may come in and ++ * confuse things ++ */ ++ annotation &= ~RXRPC_TX_ANNO_MASK; ++ annotation |= RXRPC_TX_ANNO_RESENT; ++ call->rxtx_annotations[ix] = annotation; ++ + skb = call->rxtx_buffer[ix]; ++ if (!skb) ++ continue; ++ + rxrpc_get_skb(skb, rxrpc_skb_got); + spin_unlock_bh(&call->lock); + +@@ -262,24 +273,6 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) + + rxrpc_free_skb(skb, rxrpc_skb_freed); + spin_lock_bh(&call->lock); +- +- /* We need to clear the retransmit state, but there are two +- * things we need to be aware of: A new ACK/NAK might have been +- * received and the packet might have been hard-ACK'd (in which +- * case it will no longer be in the buffer). +- */ +- if (after(seq, call->tx_hard_ack)) { +- annotation = call->rxtx_annotations[ix]; +- anno_type = annotation & RXRPC_TX_ANNO_MASK; +- if (anno_type == RXRPC_TX_ANNO_RETRANS || +- anno_type == RXRPC_TX_ANNO_NAK) { +- annotation &= ~RXRPC_TX_ANNO_MASK; +- annotation |= RXRPC_TX_ANNO_UNACK; +- } +- annotation |= RXRPC_TX_ANNO_RESENT; +- call->rxtx_annotations[ix] = annotation; +- } +- + if (after(call->tx_hard_ack, seq)) + seq = call->tx_hard_ack; + } +-- +2.25.1 + diff --git a/queue-5.4/s390-debug-avoid-kernel-warning-on-too-large-number-.patch b/queue-5.4/s390-debug-avoid-kernel-warning-on-too-large-number-.patch new file mode 100644 index 00000000000..4a0b6039f39 --- /dev/null +++ b/queue-5.4/s390-debug-avoid-kernel-warning-on-too-large-number-.patch @@ -0,0 +1,41 @@ +From b523ff060667967971a62988e07d61c81420174d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 31 Mar 2020 05:57:23 -0400 +Subject: s390/debug: avoid kernel warning on too large number of pages + +From: Christian Borntraeger + +[ Upstream commit 827c4913923e0b441ba07ba4cc41e01181102303 ] + +When specifying insanely large debug buffers a kernel warning is +printed. The debug code does handle the error gracefully, though. +Instead of duplicating the check let us silence the warning to +avoid crashes when panic_on_warn is used. + +Signed-off-by: Christian Borntraeger +Reviewed-by: Heiko Carstens +Signed-off-by: Heiko Carstens +Signed-off-by: Sasha Levin +--- + arch/s390/kernel/debug.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c +index 6d321f5f101d6..7184d55d87aae 100644 +--- a/arch/s390/kernel/debug.c ++++ b/arch/s390/kernel/debug.c +@@ -198,9 +198,10 @@ static debug_entry_t ***debug_areas_alloc(int pages_per_area, int nr_areas) + if (!areas) + goto fail_malloc_areas; + for (i = 0; i < nr_areas; i++) { ++ /* GFP_NOWARN to avoid user triggerable WARN, we handle fails */ + areas[i] = kmalloc_array(pages_per_area, + sizeof(debug_entry_t *), +- GFP_KERNEL); ++ GFP_KERNEL | __GFP_NOWARN); + if (!areas[i]) + goto fail_malloc_areas2; + for (j = 0; j < pages_per_area; j++) { +-- +2.25.1 + diff --git a/queue-5.4/sched-debug-make-sd-flags-sysctl-read-only.patch b/queue-5.4/sched-debug-make-sd-flags-sysctl-read-only.patch new file mode 100644 index 00000000000..c48106746ef --- /dev/null +++ b/queue-5.4/sched-debug-make-sd-flags-sysctl-read-only.patch @@ -0,0 +1,48 @@ +From b153dbbc08cc3b4db319d0590cbb187eac7da13f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 15 Apr 2020 22:05:05 +0100 +Subject: sched/debug: Make sd->flags sysctl read-only + +From: Valentin Schneider + +[ Upstream commit 9818427c6270a9ce8c52c8621026fe9cebae0f92 ] + +Writing to the sysctl of a sched_domain->flags directly updates the value of +the field, and goes nowhere near update_top_cache_domain(). This means that +the cached domain pointers can end up containing stale data (e.g. the +domain pointed to doesn't have the relevant flag set anymore). + +Explicit domain walks that check for flags will be affected by +the write, but this won't be in sync with the cached pointers which will +still point to the domains that were cached at the last sched_domain +build. + +In other words, writing to this interface is playing a dangerous game. It +could be made to trigger an update of the cached sched_domain pointers when +written to, but this does not seem to be worth the trouble. Make it +read-only. + +Signed-off-by: Valentin Schneider +Signed-off-by: Peter Zijlstra (Intel) +Link: https://lkml.kernel.org/r/20200415210512.805-3-valentin.schneider@arm.com +Signed-off-by: Sasha Levin +--- + kernel/sched/debug.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c +index f7e4579e746c5..c4b702fe1d738 100644 +--- a/kernel/sched/debug.c ++++ b/kernel/sched/debug.c +@@ -258,7 +258,7 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd) + set_table_entry(&table[2], "busy_factor", &sd->busy_factor, sizeof(int), 0644, proc_dointvec_minmax); + set_table_entry(&table[3], "imbalance_pct", &sd->imbalance_pct, sizeof(int), 0644, proc_dointvec_minmax); + set_table_entry(&table[4], "cache_nice_tries", &sd->cache_nice_tries, sizeof(int), 0644, proc_dointvec_minmax); +- set_table_entry(&table[5], "flags", &sd->flags, sizeof(int), 0644, proc_dointvec_minmax); ++ set_table_entry(&table[5], "flags", &sd->flags, sizeof(int), 0444, proc_dointvec_minmax); + set_table_entry(&table[6], "max_newidle_lb_cost", &sd->max_newidle_lb_cost, sizeof(long), 0644, proc_doulongvec_minmax); + set_table_entry(&table[7], "name", sd->name, CORENAME_MAX_SIZE, 0444, proc_dostring); + /* &table[8] is terminator */ +-- +2.25.1 + diff --git a/queue-5.4/series b/queue-5.4/series index d5bbfec73fa..d02adb8b132 100644 --- a/queue-5.4/series +++ b/queue-5.4/series @@ -1,3 +1,18 @@ io_uring-make-sure-async-workqueue-is-canceled-on-ex.patch mm-fix-swap-cache-node-allocation-mask.patch edac-amd64-read-back-the-scrub-rate-pci-register-on-.patch +usbnet-smsc95xx-fix-use-after-free-after-removal.patch +sched-debug-make-sd-flags-sysctl-read-only.patch +mm-slub.c-fix-corrupted-freechain-in-deactivate_slab.patch +mm-slub-fix-stack-overruns-with-slub_stats.patch +rxrpc-fix-race-between-incoming-ack-parser-and-retra.patch +usb-usbtest-fix-missing-kfree-dev-buf-in-usbtest_dis.patch +tools-lib-traceevent-add-append-function-helper-for-.patch +tools-lib-traceevent-handle-__attribute__-user-in-fi.patch +s390-debug-avoid-kernel-warning-on-too-large-number-.patch +nvme-multipath-set-bdi-capabilities-once.patch +nvme-fix-possible-deadlock-when-i-o-is-blocked.patch +nvme-multipath-fix-deadlock-between-ana_work-and-sca.patch +nvme-multipath-fix-deadlock-due-to-head-lock.patch +nvme-multipath-fix-bogus-request-queue-reference-put.patch +kgdb-avoid-suspicious-rcu-usage-warning.patch diff --git a/queue-5.4/tools-lib-traceevent-add-append-function-helper-for-.patch b/queue-5.4/tools-lib-traceevent-add-append-function-helper-for-.patch new file mode 100644 index 00000000000..73fd5123b9c --- /dev/null +++ b/queue-5.4/tools-lib-traceevent-add-append-function-helper-for-.patch @@ -0,0 +1,243 @@ +From 16e372a2fa70ec98010494490c8c3871f0110aed Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 24 Mar 2020 16:08:46 -0400 +Subject: tools lib traceevent: Add append() function helper for appending + strings + +From: Steven Rostedt (VMware) + +[ Upstream commit 27d4d336f2872193e90ee5450559e1699fae0f6d ] + +There's several locations that open code realloc and strcat() to append +text to strings. Add an append() function that takes a delimiter and a +string to append to another string. + +Signed-off-by: Steven Rostedt (VMware) +Cc: Andrew Morton +Cc: Jaewon Lim +Cc: Jiri Olsa +Cc: Kees Kook +Cc: linux-mm@kvack.org +Cc: linux-trace-devel@vger.kernel.org +Cc: Namhyung Kim +Cc: Vlastimil Babka +Link: http://lore.kernel.org/lkml/20200324200956.515118403@goodmis.org +Signed-off-by: Arnaldo Carvalho de Melo +Signed-off-by: Sasha Levin +--- + tools/lib/traceevent/event-parse.c | 98 ++++++++++++------------------ + 1 file changed, 40 insertions(+), 58 deletions(-) + +diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c +index d948475585ced..4bc3e1b906652 100644 +--- a/tools/lib/traceevent/event-parse.c ++++ b/tools/lib/traceevent/event-parse.c +@@ -1425,6 +1425,19 @@ static unsigned int type_size(const char *name) + return 0; + } + ++static int append(char **buf, const char *delim, const char *str) ++{ ++ char *new_buf; ++ ++ new_buf = realloc(*buf, strlen(*buf) + strlen(delim) + strlen(str) + 1); ++ if (!new_buf) ++ return -1; ++ strcat(new_buf, delim); ++ strcat(new_buf, str); ++ *buf = new_buf; ++ return 0; ++} ++ + static int event_read_fields(struct tep_event *event, struct tep_format_field **fields) + { + struct tep_format_field *field = NULL; +@@ -1432,6 +1445,7 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + char *token; + char *last_token; + int count = 0; ++ int ret; + + do { + unsigned int size_dynamic = 0; +@@ -1490,24 +1504,15 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + field->flags |= TEP_FIELD_IS_POINTER; + + if (field->type) { +- char *new_type; +- new_type = realloc(field->type, +- strlen(field->type) + +- strlen(last_token) + 2); +- if (!new_type) { +- free(last_token); +- goto fail; +- } +- field->type = new_type; +- strcat(field->type, " "); +- strcat(field->type, last_token); ++ ret = append(&field->type, " ", last_token); + free(last_token); ++ if (ret < 0) ++ goto fail; + } else + field->type = last_token; + last_token = token; + continue; + } +- + break; + } + +@@ -1523,8 +1528,6 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + if (strcmp(token, "[") == 0) { + enum tep_event_type last_type = type; + char *brackets = token; +- char *new_brackets; +- int len; + + field->flags |= TEP_FIELD_IS_ARRAY; + +@@ -1536,29 +1539,27 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + field->arraylen = 0; + + while (strcmp(token, "]") != 0) { ++ const char *delim; ++ + if (last_type == TEP_EVENT_ITEM && + type == TEP_EVENT_ITEM) +- len = 2; ++ delim = " "; + else +- len = 1; ++ delim = ""; ++ + last_type = type; + +- new_brackets = realloc(brackets, +- strlen(brackets) + +- strlen(token) + len); +- if (!new_brackets) { ++ ret = append(&brackets, delim, token); ++ if (ret < 0) { + free(brackets); + goto fail; + } +- brackets = new_brackets; +- if (len == 2) +- strcat(brackets, " "); +- strcat(brackets, token); + /* We only care about the last token */ + field->arraylen = strtoul(token, NULL, 0); + free_token(token); + type = read_token(&token); + if (type == TEP_EVENT_NONE) { ++ free(brackets); + do_warning_event(event, "failed to find token"); + goto fail; + } +@@ -1566,13 +1567,11 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + + free_token(token); + +- new_brackets = realloc(brackets, strlen(brackets) + 2); +- if (!new_brackets) { ++ ret = append(&brackets, "", "]"); ++ if (ret < 0) { + free(brackets); + goto fail; + } +- brackets = new_brackets; +- strcat(brackets, "]"); + + /* add brackets to type */ + +@@ -1582,34 +1581,23 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + * the format: type [] item; + */ + if (type == TEP_EVENT_ITEM) { +- char *new_type; +- new_type = realloc(field->type, +- strlen(field->type) + +- strlen(field->name) + +- strlen(brackets) + 2); +- if (!new_type) { ++ ret = append(&field->type, " ", field->name); ++ if (ret < 0) { + free(brackets); + goto fail; + } +- field->type = new_type; +- strcat(field->type, " "); +- strcat(field->type, field->name); ++ ret = append(&field->type, "", brackets); ++ + size_dynamic = type_size(field->name); + free_token(field->name); +- strcat(field->type, brackets); + field->name = field->alias = token; + type = read_token(&token); + } else { +- char *new_type; +- new_type = realloc(field->type, +- strlen(field->type) + +- strlen(brackets) + 1); +- if (!new_type) { ++ ret = append(&field->type, "", brackets); ++ if (ret < 0) { + free(brackets); + goto fail; + } +- field->type = new_type; +- strcat(field->type, brackets); + } + free(brackets); + } +@@ -2046,19 +2034,16 @@ process_op(struct tep_event *event, struct tep_print_arg *arg, char **tok) + /* could just be a type pointer */ + if ((strcmp(arg->op.op, "*") == 0) && + type == TEP_EVENT_DELIM && (strcmp(token, ")") == 0)) { +- char *new_atom; ++ int ret; + + if (left->type != TEP_PRINT_ATOM) { + do_warning_event(event, "bad pointer type"); + goto out_free; + } +- new_atom = realloc(left->atom.atom, +- strlen(left->atom.atom) + 3); +- if (!new_atom) ++ ret = append(&left->atom.atom, " ", "*"); ++ if (ret < 0) + goto out_warn_free; + +- left->atom.atom = new_atom; +- strcat(left->atom.atom, " *"); + free(arg->op.op); + *arg = *left; + free(left); +@@ -3151,18 +3136,15 @@ process_arg_token(struct tep_event *event, struct tep_print_arg *arg, + } + /* atoms can be more than one token long */ + while (type == TEP_EVENT_ITEM) { +- char *new_atom; +- new_atom = realloc(atom, +- strlen(atom) + strlen(token) + 2); +- if (!new_atom) { ++ int ret; ++ ++ ret = append(&atom, " ", token); ++ if (ret < 0) { + free(atom); + *tok = NULL; + free_token(token); + return TEP_EVENT_ERROR; + } +- atom = new_atom; +- strcat(atom, " "); +- strcat(atom, token); + free_token(token); + type = read_token_item(&token); + } +-- +2.25.1 + diff --git a/queue-5.4/tools-lib-traceevent-handle-__attribute__-user-in-fi.patch b/queue-5.4/tools-lib-traceevent-handle-__attribute__-user-in-fi.patch new file mode 100644 index 00000000000..0f759dc967c --- /dev/null +++ b/queue-5.4/tools-lib-traceevent-handle-__attribute__-user-in-fi.patch @@ -0,0 +1,98 @@ +From 758f76514935b62f4e0dca284c4d343841bbfaa9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 24 Mar 2020 16:08:47 -0400 +Subject: tools lib traceevent: Handle __attribute__((user)) in field names + +From: Steven Rostedt (VMware) + +[ Upstream commit 74621d929d944529a5e2878a84f48bfa6fb69a66 ] + +Commit c61f13eaa1ee1 ("gcc-plugins: Add structleak for more stack +initialization") added "__attribute__((user))" to the user when +stackleak detector is enabled. This now appears in the field format of +system call trace events for system calls that have user buffers. The +"__attribute__((user))" breaks the parsing in libtraceevent. That needs +to be handled. + +Signed-off-by: Steven Rostedt (VMware) +Cc: Andrew Morton +Cc: Jaewon Kim +Cc: Jiri Olsa +Cc: Kees Kook +Cc: Namhyung Kim +Cc: Vlastimil Babka +Cc: linux-mm@kvack.org +Cc: linux-trace-devel@vger.kernel.org +Link: http://lore.kernel.org/lkml/20200324200956.663647256@goodmis.org +Signed-off-by: Arnaldo Carvalho de Melo +Signed-off-by: Sasha Levin +--- + tools/lib/traceevent/event-parse.c | 39 +++++++++++++++++++++++++++++- + 1 file changed, 38 insertions(+), 1 deletion(-) + +diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c +index 4bc3e1b906652..798284f511f16 100644 +--- a/tools/lib/traceevent/event-parse.c ++++ b/tools/lib/traceevent/event-parse.c +@@ -1444,6 +1444,7 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + enum tep_event_type type; + char *token; + char *last_token; ++ char *delim = " "; + int count = 0; + int ret; + +@@ -1504,13 +1505,49 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** + field->flags |= TEP_FIELD_IS_POINTER; + + if (field->type) { +- ret = append(&field->type, " ", last_token); ++ ret = append(&field->type, delim, last_token); + free(last_token); + if (ret < 0) + goto fail; + } else + field->type = last_token; + last_token = token; ++ delim = " "; ++ continue; ++ } ++ ++ /* Handle __attribute__((user)) */ ++ if ((type == TEP_EVENT_DELIM) && ++ strcmp("__attribute__", last_token) == 0 && ++ token[0] == '(') { ++ int depth = 1; ++ int ret; ++ ++ ret = append(&field->type, " ", last_token); ++ ret |= append(&field->type, "", "("); ++ if (ret < 0) ++ goto fail; ++ ++ delim = " "; ++ while ((type = read_token(&token)) != TEP_EVENT_NONE) { ++ if (type == TEP_EVENT_DELIM) { ++ if (token[0] == '(') ++ depth++; ++ else if (token[0] == ')') ++ depth--; ++ if (!depth) ++ break; ++ ret = append(&field->type, "", token); ++ delim = ""; ++ } else { ++ ret = append(&field->type, delim, token); ++ delim = " "; ++ } ++ if (ret < 0) ++ goto fail; ++ free(last_token); ++ last_token = token; ++ } + continue; + } + break; +-- +2.25.1 + diff --git a/queue-5.4/usb-usbtest-fix-missing-kfree-dev-buf-in-usbtest_dis.patch b/queue-5.4/usb-usbtest-fix-missing-kfree-dev-buf-in-usbtest_dis.patch new file mode 100644 index 00000000000..5904ca05c62 --- /dev/null +++ b/queue-5.4/usb-usbtest-fix-missing-kfree-dev-buf-in-usbtest_dis.patch @@ -0,0 +1,69 @@ +From 6474c143ea9caeb525a5f33d9416b8c3f5e75023 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 12 Jun 2020 11:52:10 +0800 +Subject: usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect + +From: Zqiang + +[ Upstream commit 28ebeb8db77035e058a510ce9bd17c2b9a009dba ] + +BUG: memory leak +unreferenced object 0xffff888055046e00 (size 256): + comm "kworker/2:9", pid 2570, jiffies 4294942129 (age 1095.500s) + hex dump (first 32 bytes): + 00 70 04 55 80 88 ff ff 18 bb 5a 81 ff ff ff ff .p.U......Z..... + f5 96 78 81 ff ff ff ff 37 de 8e 81 ff ff ff ff ..x.....7....... + backtrace: + [<00000000d121dccf>] kmemleak_alloc_recursive +include/linux/kmemleak.h:43 [inline] + [<00000000d121dccf>] slab_post_alloc_hook mm/slab.h:586 [inline] + [<00000000d121dccf>] slab_alloc_node mm/slub.c:2786 [inline] + [<00000000d121dccf>] slab_alloc mm/slub.c:2794 [inline] + [<00000000d121dccf>] kmem_cache_alloc_trace+0x15e/0x2d0 mm/slub.c:2811 + [<000000005c3c3381>] kmalloc include/linux/slab.h:555 [inline] + [<000000005c3c3381>] usbtest_probe+0x286/0x19d0 +drivers/usb/misc/usbtest.c:2790 + [<000000001cec6910>] usb_probe_interface+0x2bd/0x870 +drivers/usb/core/driver.c:361 + [<000000007806c118>] really_probe+0x48d/0x8f0 drivers/base/dd.c:551 + [<00000000a3308c3e>] driver_probe_device+0xfc/0x2a0 drivers/base/dd.c:724 + [<000000003ef66004>] __device_attach_driver+0x1b6/0x240 +drivers/base/dd.c:831 + [<00000000eee53e97>] bus_for_each_drv+0x14e/0x1e0 drivers/base/bus.c:431 + [<00000000bb0648d0>] __device_attach+0x1f9/0x350 drivers/base/dd.c:897 + [<00000000838b324a>] device_initial_probe+0x1a/0x20 drivers/base/dd.c:944 + [<0000000030d501c1>] bus_probe_device+0x1e1/0x280 drivers/base/bus.c:491 + [<000000005bd7adef>] device_add+0x131d/0x1c40 drivers/base/core.c:2504 + [<00000000a0937814>] usb_set_configuration+0xe84/0x1ab0 +drivers/usb/core/message.c:2030 + [<00000000e3934741>] generic_probe+0x6a/0xe0 drivers/usb/core/generic.c:210 + [<0000000098ade0f1>] usb_probe_device+0x90/0xd0 +drivers/usb/core/driver.c:266 + [<000000007806c118>] really_probe+0x48d/0x8f0 drivers/base/dd.c:551 + [<00000000a3308c3e>] driver_probe_device+0xfc/0x2a0 drivers/base/dd.c:724 + +Acked-by: Alan Stern +Reported-by: Kyungtae Kim +Signed-off-by: Zqiang +Link: https://lore.kernel.org/r/20200612035210.20494-1-qiang.zhang@windriver.com +Signed-off-by: Greg Kroah-Hartman +Signed-off-by: Sasha Levin +--- + drivers/usb/misc/usbtest.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/usb/misc/usbtest.c b/drivers/usb/misc/usbtest.c +index 98ada1a3425c6..bae88893ee8e3 100644 +--- a/drivers/usb/misc/usbtest.c ++++ b/drivers/usb/misc/usbtest.c +@@ -2873,6 +2873,7 @@ static void usbtest_disconnect(struct usb_interface *intf) + + usb_set_intfdata(intf, NULL); + dev_dbg(&intf->dev, "disconnect\n"); ++ kfree(dev->buf); + kfree(dev); + } + +-- +2.25.1 + diff --git a/queue-5.4/usbnet-smsc95xx-fix-use-after-free-after-removal.patch b/queue-5.4/usbnet-smsc95xx-fix-use-after-free-after-removal.patch new file mode 100644 index 00000000000..fd99645421d --- /dev/null +++ b/queue-5.4/usbnet-smsc95xx-fix-use-after-free-after-removal.patch @@ -0,0 +1,49 @@ +From 543bb99e0ca369c1d005d219108fa0d4bf9813e8 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 21 Jun 2020 13:43:26 +0300 +Subject: usbnet: smsc95xx: Fix use-after-free after removal + +From: Tuomas Tynkkynen + +[ Upstream commit b835a71ef64a61383c414d6bf2896d2c0161deca ] + +Syzbot reports an use-after-free in workqueue context: + +BUG: KASAN: use-after-free in mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737 + mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737 + __smsc95xx_mdio_read drivers/net/usb/smsc95xx.c:217 [inline] + smsc95xx_mdio_read+0x583/0x870 drivers/net/usb/smsc95xx.c:278 + check_carrier+0xd1/0x2e0 drivers/net/usb/smsc95xx.c:644 + process_one_work+0x777/0xf90 kernel/workqueue.c:2274 + worker_thread+0xa8f/0x1430 kernel/workqueue.c:2420 + kthread+0x2df/0x300 kernel/kthread.c:255 + +It looks like that smsc95xx_unbind() is freeing the structures that are +still in use by the concurrently running workqueue callback. Thus switch +to using cancel_delayed_work_sync() to ensure the work callback really +is no longer active. + +Reported-by: syzbot+29dc7d4ae19b703ff947@syzkaller.appspotmail.com +Signed-off-by: Tuomas Tynkkynen +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/usb/smsc95xx.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c +index 355be77f42418..3cf4dc3433f91 100644 +--- a/drivers/net/usb/smsc95xx.c ++++ b/drivers/net/usb/smsc95xx.c +@@ -1324,7 +1324,7 @@ static void smsc95xx_unbind(struct usbnet *dev, struct usb_interface *intf) + struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]); + + if (pdata) { +- cancel_delayed_work(&pdata->carrier_check); ++ cancel_delayed_work_sync(&pdata->carrier_check); + netif_dbg(dev, ifdown, dev->net, "free pdata\n"); + kfree(pdata); + pdata = NULL; +-- +2.25.1 +