From: Greg Kroah-Hartman Date: Mon, 20 Sep 2021 07:31:54 +0000 (+0200) Subject: 5.14-stable patches X-Git-Tag: v4.4.284~47 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=c981f0e17d3fcfefd814e41a564b656fa08a801e;p=thirdparty%2Fkernel%2Fstable-queue.git 5.14-stable patches added patches: ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch x86-pat-pass-valid-address-to-sanitize_phys.patch --- diff --git a/queue-5.14/ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch b/queue-5.14/ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch new file mode 100644 index 00000000000..9aa4e6ac383 --- /dev/null +++ b/queue-5.14/ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch @@ -0,0 +1,88 @@ +From 9b29a161ef38040f000dcf9ccf78e34495edfd55 Mon Sep 17 00:00:00 2001 +From: Saeed Mahameed +Date: Mon, 26 Jul 2021 15:15:39 -0700 +Subject: ethtool: Fix rxnfc copy to user buffer overflow + +From: Saeed Mahameed + +commit 9b29a161ef38040f000dcf9ccf78e34495edfd55 upstream. + +In the cited commit, copy_to_user() got called with the wrong pointer, +instead of passing the actual buffer ptr to copy from, a pointer to +the pointer got passed, which causes a buffer overflow calltrace to pop +up when executing "ethtool -x ethX". + +Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed +to the function, instead of a pointer to it. + +This fixes below call trace: +[ 15.533533] ------------[ cut here ]------------ +[ 15.539007] Buffer overflow detected (8 < 192)! +[ 15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20 +[ 15.549308] Modules linked in: +[ 15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058 +[ 15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 +[ 15.558378] RIP: 0010:copy_overflow+0x15/0x20 +[ 15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 +[ 15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286 +[ 15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000 +[ 15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff +[ 15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff +[ 15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000 +[ 15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000 +[ 15.573584] FS: 00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000 +[ 15.575639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +[ 15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0 +[ 15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 +[ 15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 +[ 15.582441] Call Trace: +[ 15.582970] ethtool_rxnfc_copy_to_user+0x30/0x46 +[ 15.583815] ethtool_get_rxnfc.cold+0x23/0x2b +[ 15.584584] dev_ethtool+0x29c/0x25f0 +[ 15.585286] ? security_netlbl_sid_to_secattr+0x77/0xd0 +[ 15.586728] ? do_set_pte+0xc4/0x110 +[ 15.587349] ? _raw_spin_unlock+0x18/0x30 +[ 15.588118] ? __might_sleep+0x49/0x80 +[ 15.588956] dev_ioctl+0x2c1/0x490 +[ 15.589616] sock_ioctl+0x18e/0x330 +[ 15.591143] __x64_sys_ioctl+0x41c/0x990 +[ 15.591823] ? irqentry_exit_to_user_mode+0x9/0x20 +[ 15.592657] ? irqentry_exit+0x33/0x40 +[ 15.593308] ? exc_page_fault+0x32f/0x770 +[ 15.593877] ? exit_to_user_mode_prepare+0x3c/0x130 +[ 15.594775] do_syscall_64+0x35/0x80 +[ 15.595397] entry_SYSCALL_64_after_hwframe+0x44/0xae +[ 15.596037] RIP: 0033:0x7f5381226d4b +[ 15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48 +[ 15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 +[ 15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b +[ 15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003 +[ 15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001 +[ 15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000 +[ 15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000 +[ 15.605042] ---[ end trace 325cf185e2795048 ]--- + +Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling") +Reported-by: Shannon Nelson +CC: Arnd Bergmann +CC: Christoph Hellwig +Signed-off-by: Saeed Mahameed +Tested-by: Shannon Nelson +Acked-by: Arnd Bergmann +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + net/ethtool/ioctl.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/net/ethtool/ioctl.c ++++ b/net/ethtool/ioctl.c +@@ -906,7 +906,7 @@ static int ethtool_rxnfc_copy_to_user(vo + rule_buf); + useraddr += offsetof(struct compat_ethtool_rxnfc, rule_locs); + } else { +- ret = copy_to_user(useraddr, &rxnfc, size); ++ ret = copy_to_user(useraddr, rxnfc, size); + useraddr += offsetof(struct ethtool_rxnfc, rule_locs); + } + diff --git a/queue-5.14/net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch b/queue-5.14/net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch new file mode 100644 index 00000000000..c3dfee50e97 --- /dev/null +++ b/queue-5.14/net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch @@ -0,0 +1,64 @@ +From 7c3a0a018e672a9723a79b128227272562300055 Mon Sep 17 00:00:00 2001 +From: Eli Cohen +Date: Wed, 15 Sep 2021 07:47:27 +0300 +Subject: net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert + +From: Eli Cohen + +commit 7c3a0a018e672a9723a79b128227272562300055 upstream. + +Remove the assert from the callback priv lookup function since it does +not require RTNL lock and is already protected by flow_indr_block_lock. + +This will avoid warnings from being emitted to dmesg if the driver +registers its callback after an ingress qdisc was created for a +netdevice. + +The warnings started after the following patch was merged: +commit 74fc4f828769 ("net: Fix offloading indirect devices dependency on qdisc order creation") + +Signed-off-by: Eli Cohen +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 3 --- + drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 3 --- + drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 --- + 3 files changed, 9 deletions(-) + +--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c ++++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c +@@ -1870,9 +1870,6 @@ bnxt_tc_indr_block_cb_lookup(struct bnxt + { + struct bnxt_flower_indr_block_cb_priv *cb_priv; + +- /* All callback list access should be protected by RTNL. */ +- ASSERT_RTNL(); +- + list_for_each_entry(cb_priv, &bp->tc_indr_block_list, list) + if (cb_priv->tunnel_netdev == netdev) + return cb_priv; +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c +@@ -300,9 +300,6 @@ mlx5e_rep_indr_block_priv_lookup(struct + { + struct mlx5e_rep_indr_block_priv *cb_priv; + +- /* All callback list access should be protected by RTNL. */ +- ASSERT_RTNL(); +- + list_for_each_entry(cb_priv, + &rpriv->uplink_priv.tc_indr_block_priv_list, + list) +--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c ++++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c +@@ -1766,9 +1766,6 @@ nfp_flower_indr_block_cb_priv_lookup(str + struct nfp_flower_indr_block_cb_priv *cb_priv; + struct nfp_flower_priv *priv = app->priv; + +- /* All callback list access should be protected by RTNL. */ +- ASSERT_RTNL(); +- + list_for_each_entry(cb_priv, &priv->indr_block_cb_priv, list) + if (cb_priv->netdev == netdev) + return cb_priv; diff --git a/queue-5.14/net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch b/queue-5.14/net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch new file mode 100644 index 00000000000..6713c8408c6 --- /dev/null +++ b/queue-5.14/net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch @@ -0,0 +1,52 @@ +From 9756e44fd4d283ebcc94df353642f322428b73de Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?=E7=8E=8B=E8=B4=87?= +Date: Fri, 3 Sep 2021 10:27:18 +0800 +Subject: net: remove the unnecessary check in cipso_v4_doi_free + +From: 王贇 + +commit 9756e44fd4d283ebcc94df353642f322428b73de upstream. + +The commit 733c99ee8be9 ("net: fix NULL pointer reference in +cipso_v4_doi_free") was merged by a mistake, this patch try +to cleanup the mess. + +And we already have the commit e842cb60e8ac ("net: fix NULL +pointer reference in cipso_v4_doi_free") which fixed the root +cause of the issue mentioned in it's description. + +Suggested-by: Paul Moore +Signed-off-by: Michael Wang +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/cipso_ipv4.c | 18 ++++++++---------- + 1 file changed, 8 insertions(+), 10 deletions(-) + +--- a/net/ipv4/cipso_ipv4.c ++++ b/net/ipv4/cipso_ipv4.c +@@ -465,16 +465,14 @@ void cipso_v4_doi_free(struct cipso_v4_d + if (!doi_def) + return; + +- if (doi_def->map.std) { +- switch (doi_def->type) { +- case CIPSO_V4_MAP_TRANS: +- kfree(doi_def->map.std->lvl.cipso); +- kfree(doi_def->map.std->lvl.local); +- kfree(doi_def->map.std->cat.cipso); +- kfree(doi_def->map.std->cat.local); +- kfree(doi_def->map.std); +- break; +- } ++ switch (doi_def->type) { ++ case CIPSO_V4_MAP_TRANS: ++ kfree(doi_def->map.std->lvl.cipso); ++ kfree(doi_def->map.std->lvl.local); ++ kfree(doi_def->map.std->cat.cipso); ++ kfree(doi_def->map.std->cat.local); ++ kfree(doi_def->map.std); ++ break; + } + kfree(doi_def); + } diff --git a/queue-5.14/series b/queue-5.14/series index bcc8196e8d7..4d7a706408e 100644 --- a/queue-5.14/series +++ b/queue-5.14/series @@ -31,3 +31,10 @@ drm-etnaviv-reference-mmu-context-when-setting-up-hardware-state.patch drm-etnaviv-add-missing-mmu-context-put-when-reaping-mmu-mapping.patch s390-sclp-fix-secure-ipl-facility-detection.patch net-qrtr-revert-check-in-qrtr_endpoint_post.patch +x86-pat-pass-valid-address-to-sanitize_phys.patch +x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch +x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch +tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch +ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch +net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch +net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch diff --git a/queue-5.14/tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch b/queue-5.14/tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch new file mode 100644 index 00000000000..cb741abfee5 --- /dev/null +++ b/queue-5.14/tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch @@ -0,0 +1,56 @@ +From cc19862ffe454a5b632ca202e5a51bfec9f89fd2 Mon Sep 17 00:00:00 2001 +From: Xin Long +Date: Fri, 23 Jul 2021 13:25:36 -0400 +Subject: tipc: fix an use-after-free issue in tipc_recvmsg + +From: Xin Long + +commit cc19862ffe454a5b632ca202e5a51bfec9f89fd2 upstream. + +syzbot reported an use-after-free crash: + + BUG: KASAN: use-after-free in tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979 + Call Trace: + tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979 + sock_recvmsg_nosec net/socket.c:943 [inline] + sock_recvmsg net/socket.c:961 [inline] + sock_recvmsg+0xca/0x110 net/socket.c:957 + tipc_conn_rcv_from_sock+0x162/0x2f0 net/tipc/topsrv.c:398 + tipc_conn_recv_work+0xeb/0x190 net/tipc/topsrv.c:421 + process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 + worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 + +As Hoang pointed out, it was caused by skb_cb->bytes_read still accessed +after calling tsk_advance_rx_queue() to free the skb in tipc_recvmsg(). + +This patch is to fix it by accessing skb_cb->bytes_read earlier than +calling tsk_advance_rx_queue(). + +Fixes: f4919ff59c28 ("tipc: keep the skb in rcv queue until the whole data is read") +Reported-by: syzbot+e6741b97d5552f97c24d@syzkaller.appspotmail.com +Signed-off-by: Xin Long +Acked-by: Jon Maloy +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + net/tipc/socket.c | 8 +++++--- + 1 file changed, 5 insertions(+), 3 deletions(-) + +--- a/net/tipc/socket.c ++++ b/net/tipc/socket.c +@@ -1979,10 +1979,12 @@ static int tipc_recvmsg(struct socket *s + tipc_node_distr_xmit(sock_net(sk), &xmitq); + } + +- if (!skb_cb->bytes_read) +- tsk_advance_rx_queue(sk); ++ if (skb_cb->bytes_read) ++ goto exit; ++ ++ tsk_advance_rx_queue(sk); + +- if (likely(!connected) || skb_cb->bytes_read) ++ if (likely(!connected)) + goto exit; + + /* Send connection flow control advertisement when applicable */ diff --git a/queue-5.14/x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch b/queue-5.14/x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch new file mode 100644 index 00000000000..72e12336cfc --- /dev/null +++ b/queue-5.14/x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch @@ -0,0 +1,166 @@ +From 81065b35e2486c024c7aa86caed452e1f01a59d4 Mon Sep 17 00:00:00 2001 +From: Tony Luck +Date: Mon, 13 Sep 2021 14:52:39 -0700 +Subject: x86/mce: Avoid infinite loop for copy from user recovery + +From: Tony Luck + +commit 81065b35e2486c024c7aa86caed452e1f01a59d4 upstream. + +There are two cases for machine check recovery: + +1) The machine check was triggered by ring3 (application) code. + This is the simpler case. The machine check handler simply queues + work to be executed on return to user. That code unmaps the page + from all users and arranges to send a SIGBUS to the task that + triggered the poison. + +2) The machine check was triggered in kernel code that is covered by + an exception table entry. In this case the machine check handler + still queues a work entry to unmap the page, etc. but this will + not be called right away because the #MC handler returns to the + fix up code address in the exception table entry. + +Problems occur if the kernel triggers another machine check before the +return to user processes the first queued work item. + +Specifically, the work is queued using the ->mce_kill_me callback +structure in the task struct for the current thread. Attempting to queue +a second work item using this same callback results in a loop in the +linked list of work functions to call. So when the kernel does return to +user, it enters an infinite loop processing the same entry for ever. + +There are some legitimate scenarios where the kernel may take a second +machine check before returning to the user. + +1) Some code (e.g. futex) first tries a get_user() with page faults + disabled. If this fails, the code retries with page faults enabled + expecting that this will resolve the page fault. + +2) Copy from user code retries a copy in byte-at-time mode to check + whether any additional bytes can be copied. + +On the other side of the fence are some bad drivers that do not check +the return value from individual get_user() calls and may access +multiple user addresses without noticing that some/all calls have +failed. + +Fix by adding a counter (current->mce_count) to keep track of repeated +machine checks before task_work() is called. First machine check saves +the address information and calls task_work_add(). Subsequent machine +checks before that task_work call back is executed check that the address +is in the same page as the first machine check (since the callback will +offline exactly one page). + +Expected worst case is four machine checks before moving on (e.g. one +user access with page faults disabled, then a repeat to the same address +with page faults enabled ... repeat in copy tail bytes). Just in case +there is some code that loops forever enforce a limit of 10. + + [ bp: Massage commit message, drop noinstr, fix typo, extend panic + messages. ] + +Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work") +Signed-off-by: Tony Luck +Signed-off-by: Borislav Petkov +Cc: +Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kernel/cpu/mce/core.c | 45 ++++++++++++++++++++++++++++++----------- + include/linux/sched.h | 1 + 2 files changed, 34 insertions(+), 12 deletions(-) + +--- a/arch/x86/kernel/cpu/mce/core.c ++++ b/arch/x86/kernel/cpu/mce/core.c +@@ -1253,6 +1253,9 @@ static void __mc_scan_banks(struct mce * + + static void kill_me_now(struct callback_head *ch) + { ++ struct task_struct *p = container_of(ch, struct task_struct, mce_kill_me); ++ ++ p->mce_count = 0; + force_sig(SIGBUS); + } + +@@ -1262,6 +1265,7 @@ static void kill_me_maybe(struct callbac + int flags = MF_ACTION_REQUIRED; + int ret; + ++ p->mce_count = 0; + pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr); + + if (!p->mce_ripv) +@@ -1290,17 +1294,34 @@ static void kill_me_maybe(struct callbac + } + } + +-static void queue_task_work(struct mce *m, int kill_current_task) ++static void queue_task_work(struct mce *m, char *msg, int kill_current_task) + { +- current->mce_addr = m->addr; +- current->mce_kflags = m->kflags; +- current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); +- current->mce_whole_page = whole_page(m); +- +- if (kill_current_task) +- current->mce_kill_me.func = kill_me_now; +- else +- current->mce_kill_me.func = kill_me_maybe; ++ int count = ++current->mce_count; ++ ++ /* First call, save all the details */ ++ if (count == 1) { ++ current->mce_addr = m->addr; ++ current->mce_kflags = m->kflags; ++ current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); ++ current->mce_whole_page = whole_page(m); ++ ++ if (kill_current_task) ++ current->mce_kill_me.func = kill_me_now; ++ else ++ current->mce_kill_me.func = kill_me_maybe; ++ } ++ ++ /* Ten is likely overkill. Don't expect more than two faults before task_work() */ ++ if (count > 10) ++ mce_panic("Too many consecutive machine checks while accessing user data", m, msg); ++ ++ /* Second or later call, make sure page address matches the one from first call */ ++ if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT)) ++ mce_panic("Consecutive machine checks to different user pages", m, msg); ++ ++ /* Do not call task_work_add() more than once */ ++ if (count > 1) ++ return; + + task_work_add(current, ¤t->mce_kill_me, TWA_RESUME); + } +@@ -1438,7 +1459,7 @@ noinstr void do_machine_check(struct pt_ + /* If this triggers there is no way to recover. Die hard. */ + BUG_ON(!on_thread_stack() || !user_mode(regs)); + +- queue_task_work(&m, kill_current_task); ++ queue_task_work(&m, msg, kill_current_task); + + } else { + /* +@@ -1456,7 +1477,7 @@ noinstr void do_machine_check(struct pt_ + } + + if (m.kflags & MCE_IN_KERNEL_COPYIN) +- queue_task_work(&m, kill_current_task); ++ queue_task_work(&m, msg, kill_current_task); + } + out: + mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); +--- a/include/linux/sched.h ++++ b/include/linux/sched.h +@@ -1394,6 +1394,7 @@ struct task_struct { + mce_whole_page : 1, + __mce_reserved : 62; + struct callback_head mce_kill_me; ++ int mce_count; + #endif + + #ifdef CONFIG_KRETPROBES diff --git a/queue-5.14/x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch b/queue-5.14/x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch new file mode 100644 index 00000000000..f3522161c20 --- /dev/null +++ b/queue-5.14/x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch @@ -0,0 +1,115 @@ +From 34b1999da935a33be6239226bfa6cd4f704c5c88 Mon Sep 17 00:00:00 2001 +From: Mike Rapoport +Date: Thu, 19 Aug 2021 16:27:17 +0300 +Subject: x86/mm: Fix kern_addr_valid() to cope with existing but not present entries + +From: Mike Rapoport + +commit 34b1999da935a33be6239226bfa6cd4f704c5c88 upstream. + +Jiri Olsa reported a fault when running: + + # cat /proc/kallsyms | grep ksys_read + ffffffff8136d580 T ksys_read + # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore + + /proc/kcore: file format elf64-x86-64 + + Segmentation fault + + general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI + CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508 + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014 + RIP: 0010:kern_addr_valid + Call Trace: + read_kcore + ? rcu_read_lock_sched_held + ? rcu_read_lock_sched_held + ? rcu_read_lock_sched_held + ? trace_hardirqs_on + ? rcu_read_lock_sched_held + ? lock_acquire + ? lock_acquire + ? rcu_read_lock_sched_held + ? lock_acquire + ? rcu_read_lock_sched_held + ? rcu_read_lock_sched_held + ? rcu_read_lock_sched_held + ? lock_release + ? _raw_spin_unlock + ? __handle_mm_fault + ? rcu_read_lock_sched_held + ? lock_acquire + ? rcu_read_lock_sched_held + ? lock_release + proc_reg_read + ? vfs_read + vfs_read + ksys_read + do_syscall_64 + entry_SYSCALL_64_after_hwframe + +The fault happens because kern_addr_valid() dereferences existent but not +present PMD in the high kernel mappings. + +Such PMDs are created when free_kernel_image_pages() frees regions larger +than 2Mb. In this case, a part of the freed memory is mapped with PMDs and +the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will +mark the PMD as not present rather than wipe it completely. + +Have kern_addr_valid() check whether higher level page table entries are +present before trying to dereference them to fix this issue and to avoid +similar issues in the future. + +Stable backporting note: +------------------------ + +Note that the stable marking is for all active stable branches because +there could be cases where pagetable entries exist but are not valid - +see 9a14aefc1d28 ("x86: cpa, fix lookup_address"), for example. So make +sure to be on the safe side here and use pXY_present() accessors rather +than pXY_none() which could #GP when accessing pages in the direct map. + +Also see: + + c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping") + +for more info. + +Reported-by: Jiri Olsa +Signed-off-by: Mike Rapoport +Signed-off-by: Borislav Petkov +Reviewed-by: David Hildenbrand +Acked-by: Dave Hansen +Tested-by: Jiri Olsa +Cc: # 4.4+ +Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/mm/init_64.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +--- a/arch/x86/mm/init_64.c ++++ b/arch/x86/mm/init_64.c +@@ -1433,18 +1433,18 @@ int kern_addr_valid(unsigned long addr) + return 0; + + p4d = p4d_offset(pgd, addr); +- if (p4d_none(*p4d)) ++ if (!p4d_present(*p4d)) + return 0; + + pud = pud_offset(p4d, addr); +- if (pud_none(*pud)) ++ if (!pud_present(*pud)) + return 0; + + if (pud_large(*pud)) + return pfn_valid(pud_pfn(*pud)); + + pmd = pmd_offset(pud, addr); +- if (pmd_none(*pmd)) ++ if (!pmd_present(*pmd)) + return 0; + + if (pmd_large(*pmd)) diff --git a/queue-5.14/x86-pat-pass-valid-address-to-sanitize_phys.patch b/queue-5.14/x86-pat-pass-valid-address-to-sanitize_phys.patch new file mode 100644 index 00000000000..ac64a24b277 --- /dev/null +++ b/queue-5.14/x86-pat-pass-valid-address-to-sanitize_phys.patch @@ -0,0 +1,57 @@ +From aeef8b5089b76852bd84889f2809e69a7cfb414e Mon Sep 17 00:00:00 2001 +From: Jeff Moyer +Date: Wed, 11 Aug 2021 17:07:37 -0400 +Subject: x86/pat: Pass valid address to sanitize_phys() + +From: Jeff Moyer + +commit aeef8b5089b76852bd84889f2809e69a7cfb414e upstream. + +The end address passed to memtype_reserve() is handed directly to +sanitize_phys(). However, end is exclusive and sanitize_phys() expects +an inclusive address. If end falls at the end of the physical address +space, sanitize_phys() will return 0. This can result in drivers +failing to load, and the following warning: + + WARNING: CPU: 26 PID: 749 at arch/x86/mm/pat.c:354 reserve_memtype+0x262/0x450 + reserve_memtype failed: [mem 0x3ffffff00000-0xffffffffffffffff], req uncached-minus + Call Trace: + [] reserve_memtype+0x262/0x450 + [] ioremap_nocache+0x1a/0x20 + [] mpt3sas_base_map_resources+0x151/0xa60 [mpt3sas] + [] mpt3sas_base_attach+0xf5/0xa50 [mpt3sas] + ---[ end trace 6d6eea4438db89ef ]--- + ioremap reserve_memtype failed -22 + mpt3sas_cm0: unable to map adapter memory! or resource not found + mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10597/_scsih_probe()! + +Fix this by passing the inclusive end address to sanitize_phys(). + +Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses") +Signed-off-by: Jeff Moyer +Signed-off-by: Thomas Gleixner +Reviewed-by: David Hildenbrand +Reviewed-by: Dan Williams +Cc: stable@vger.kernel.org +Link: https://lore.kernel.org/r/x49o8a3pu5i.fsf@segfault.boston.devel.redhat.com +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/mm/pat/memtype.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +--- a/arch/x86/mm/pat/memtype.c ++++ b/arch/x86/mm/pat/memtype.c +@@ -583,7 +583,12 @@ int memtype_reserve(u64 start, u64 end, + int err = 0; + + start = sanitize_phys(start); +- end = sanitize_phys(end); ++ ++ /* ++ * The end address passed into this function is exclusive, but ++ * sanitize_phys() expects an inclusive address. ++ */ ++ end = sanitize_phys(end - 1) + 1; + if (start >= end) { + WARN(1, "%s failed: [mem %#010Lx-%#010Lx], req %s\n", __func__, + start, end - 1, cattr_name(req_type));