]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/commitdiff
5.14-stable patches
authorGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Mon, 20 Sep 2021 07:31:54 +0000 (09:31 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Mon, 20 Sep 2021 07:31:54 +0000 (09:31 +0200)
added patches:
ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch
net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch
net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch
tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch
x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch
x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch
x86-pat-pass-valid-address-to-sanitize_phys.patch

queue-5.14/ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch [new file with mode: 0644]
queue-5.14/net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch [new file with mode: 0644]
queue-5.14/net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch [new file with mode: 0644]
queue-5.14/series
queue-5.14/tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch [new file with mode: 0644]
queue-5.14/x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch [new file with mode: 0644]
queue-5.14/x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch [new file with mode: 0644]
queue-5.14/x86-pat-pass-valid-address-to-sanitize_phys.patch [new file with mode: 0644]

diff --git a/queue-5.14/ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch b/queue-5.14/ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch
new file mode 100644 (file)
index 0000000..9aa4e6a
--- /dev/null
@@ -0,0 +1,88 @@
+From 9b29a161ef38040f000dcf9ccf78e34495edfd55 Mon Sep 17 00:00:00 2001
+From: Saeed Mahameed <saeedm@nvidia.com>
+Date: Mon, 26 Jul 2021 15:15:39 -0700
+Subject: ethtool: Fix rxnfc copy to user buffer overflow
+
+From: Saeed Mahameed <saeedm@nvidia.com>
+
+commit 9b29a161ef38040f000dcf9ccf78e34495edfd55 upstream.
+
+In the cited commit, copy_to_user() got called with the wrong pointer,
+instead of passing the actual buffer ptr to copy from, a pointer to
+the pointer got passed, which causes a buffer overflow calltrace to pop
+up when executing "ethtool -x ethX".
+
+Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed
+to the function, instead of a pointer to it.
+
+This fixes below call trace:
+[   15.533533] ------------[ cut here ]------------
+[   15.539007] Buffer overflow detected (8 < 192)!
+[   15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20
+[   15.549308] Modules linked in:
+[   15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058
+[   15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
+[   15.558378] RIP: 0010:copy_overflow+0x15/0x20
+[   15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55
+[   15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286
+[   15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000
+[   15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff
+[   15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff
+[   15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000
+[   15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000
+[   15.573584] FS:  00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000
+[   15.575639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+[   15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0
+[   15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+[   15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+[   15.582441] Call Trace:
+[   15.582970]  ethtool_rxnfc_copy_to_user+0x30/0x46
+[   15.583815]  ethtool_get_rxnfc.cold+0x23/0x2b
+[   15.584584]  dev_ethtool+0x29c/0x25f0
+[   15.585286]  ? security_netlbl_sid_to_secattr+0x77/0xd0
+[   15.586728]  ? do_set_pte+0xc4/0x110
+[   15.587349]  ? _raw_spin_unlock+0x18/0x30
+[   15.588118]  ? __might_sleep+0x49/0x80
+[   15.588956]  dev_ioctl+0x2c1/0x490
+[   15.589616]  sock_ioctl+0x18e/0x330
+[   15.591143]  __x64_sys_ioctl+0x41c/0x990
+[   15.591823]  ? irqentry_exit_to_user_mode+0x9/0x20
+[   15.592657]  ? irqentry_exit+0x33/0x40
+[   15.593308]  ? exc_page_fault+0x32f/0x770
+[   15.593877]  ? exit_to_user_mode_prepare+0x3c/0x130
+[   15.594775]  do_syscall_64+0x35/0x80
+[   15.595397]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[   15.596037] RIP: 0033:0x7f5381226d4b
+[   15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48
+[   15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
+[   15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b
+[   15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003
+[   15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001
+[   15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000
+[   15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000
+[   15.605042] ---[ end trace 325cf185e2795048 ]---
+
+Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling")
+Reported-by: Shannon Nelson <snelson@pensando.io>
+CC: Arnd Bergmann <arnd@arndb.de>
+CC: Christoph Hellwig <hch@lst.de>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Tested-by: Shannon Nelson <snelson@pensando.io>
+Acked-by: Arnd Bergmann <arnd@arndb.de>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ethtool/ioctl.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/net/ethtool/ioctl.c
++++ b/net/ethtool/ioctl.c
+@@ -906,7 +906,7 @@ static int ethtool_rxnfc_copy_to_user(vo
+                                                  rule_buf);
+               useraddr += offsetof(struct compat_ethtool_rxnfc, rule_locs);
+       } else {
+-              ret = copy_to_user(useraddr, &rxnfc, size);
++              ret = copy_to_user(useraddr, rxnfc, size);
+               useraddr += offsetof(struct ethtool_rxnfc, rule_locs);
+       }
diff --git a/queue-5.14/net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch b/queue-5.14/net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch
new file mode 100644 (file)
index 0000000..c3dfee5
--- /dev/null
@@ -0,0 +1,64 @@
+From 7c3a0a018e672a9723a79b128227272562300055 Mon Sep 17 00:00:00 2001
+From: Eli Cohen <elic@nvidia.com>
+Date: Wed, 15 Sep 2021 07:47:27 +0300
+Subject: net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
+
+From: Eli Cohen <elic@nvidia.com>
+
+commit 7c3a0a018e672a9723a79b128227272562300055 upstream.
+
+Remove the assert from the callback priv lookup function since it does
+not require RTNL lock and is already protected by flow_indr_block_lock.
+
+This will avoid warnings from being emitted to dmesg if the driver
+registers its callback after an ingress qdisc was created for a
+netdevice.
+
+The warnings started after the following patch was merged:
+commit 74fc4f828769 ("net: Fix offloading indirect devices dependency on qdisc order creation")
+
+Signed-off-by: Eli Cohen <elic@nvidia.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c        |    3 ---
+ drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c |    3 ---
+ drivers/net/ethernet/netronome/nfp/flower/offload.c |    3 ---
+ 3 files changed, 9 deletions(-)
+
+--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
++++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+@@ -1870,9 +1870,6 @@ bnxt_tc_indr_block_cb_lookup(struct bnxt
+ {
+       struct bnxt_flower_indr_block_cb_priv *cb_priv;
+-      /* All callback list access should be protected by RTNL. */
+-      ASSERT_RTNL();
+-
+       list_for_each_entry(cb_priv, &bp->tc_indr_block_list, list)
+               if (cb_priv->tunnel_netdev == netdev)
+                       return cb_priv;
+--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+@@ -300,9 +300,6 @@ mlx5e_rep_indr_block_priv_lookup(struct
+ {
+       struct mlx5e_rep_indr_block_priv *cb_priv;
+-      /* All callback list access should be protected by RTNL. */
+-      ASSERT_RTNL();
+-
+       list_for_each_entry(cb_priv,
+                           &rpriv->uplink_priv.tc_indr_block_priv_list,
+                           list)
+--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
++++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
+@@ -1766,9 +1766,6 @@ nfp_flower_indr_block_cb_priv_lookup(str
+       struct nfp_flower_indr_block_cb_priv *cb_priv;
+       struct nfp_flower_priv *priv = app->priv;
+-      /* All callback list access should be protected by RTNL. */
+-      ASSERT_RTNL();
+-
+       list_for_each_entry(cb_priv, &priv->indr_block_cb_priv, list)
+               if (cb_priv->netdev == netdev)
+                       return cb_priv;
diff --git a/queue-5.14/net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch b/queue-5.14/net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch
new file mode 100644 (file)
index 0000000..6713c84
--- /dev/null
@@ -0,0 +1,52 @@
+From 9756e44fd4d283ebcc94df353642f322428b73de Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?=E7=8E=8B=E8=B4=87?= <yun.wang@linux.alibaba.com>
+Date: Fri, 3 Sep 2021 10:27:18 +0800
+Subject: net: remove the unnecessary check in cipso_v4_doi_free
+
+From: 王贇 <yun.wang@linux.alibaba.com>
+
+commit 9756e44fd4d283ebcc94df353642f322428b73de upstream.
+
+The commit 733c99ee8be9 ("net: fix NULL pointer reference in
+cipso_v4_doi_free") was merged by a mistake, this patch try
+to cleanup the mess.
+
+And we already have the commit e842cb60e8ac ("net: fix NULL
+pointer reference in cipso_v4_doi_free") which fixed the root
+cause of the issue mentioned in it's description.
+
+Suggested-by: Paul Moore <paul@paul-moore.com>
+Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ipv4/cipso_ipv4.c |   18 ++++++++----------
+ 1 file changed, 8 insertions(+), 10 deletions(-)
+
+--- a/net/ipv4/cipso_ipv4.c
++++ b/net/ipv4/cipso_ipv4.c
+@@ -465,16 +465,14 @@ void cipso_v4_doi_free(struct cipso_v4_d
+       if (!doi_def)
+               return;
+-      if (doi_def->map.std) {
+-              switch (doi_def->type) {
+-              case CIPSO_V4_MAP_TRANS:
+-                      kfree(doi_def->map.std->lvl.cipso);
+-                      kfree(doi_def->map.std->lvl.local);
+-                      kfree(doi_def->map.std->cat.cipso);
+-                      kfree(doi_def->map.std->cat.local);
+-                      kfree(doi_def->map.std);
+-                      break;
+-              }
++      switch (doi_def->type) {
++      case CIPSO_V4_MAP_TRANS:
++              kfree(doi_def->map.std->lvl.cipso);
++              kfree(doi_def->map.std->lvl.local);
++              kfree(doi_def->map.std->cat.cipso);
++              kfree(doi_def->map.std->cat.local);
++              kfree(doi_def->map.std);
++              break;
+       }
+       kfree(doi_def);
+ }
index bcc8196e8d710b4b12b60271160eb5bdc1956c27..4d7a706408e724894d2b85f07e3a04a15e75a6bd 100644 (file)
@@ -31,3 +31,10 @@ drm-etnaviv-reference-mmu-context-when-setting-up-hardware-state.patch
 drm-etnaviv-add-missing-mmu-context-put-when-reaping-mmu-mapping.patch
 s390-sclp-fix-secure-ipl-facility-detection.patch
 net-qrtr-revert-check-in-qrtr_endpoint_post.patch
+x86-pat-pass-valid-address-to-sanitize_phys.patch
+x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch
+x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch
+tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch
+ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch
+net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch
+net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch
diff --git a/queue-5.14/tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch b/queue-5.14/tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch
new file mode 100644 (file)
index 0000000..cb741ab
--- /dev/null
@@ -0,0 +1,56 @@
+From cc19862ffe454a5b632ca202e5a51bfec9f89fd2 Mon Sep 17 00:00:00 2001
+From: Xin Long <lucien.xin@gmail.com>
+Date: Fri, 23 Jul 2021 13:25:36 -0400
+Subject: tipc: fix an use-after-free issue in tipc_recvmsg
+
+From: Xin Long <lucien.xin@gmail.com>
+
+commit cc19862ffe454a5b632ca202e5a51bfec9f89fd2 upstream.
+
+syzbot reported an use-after-free crash:
+
+  BUG: KASAN: use-after-free in tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979
+  Call Trace:
+   tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979
+   sock_recvmsg_nosec net/socket.c:943 [inline]
+   sock_recvmsg net/socket.c:961 [inline]
+   sock_recvmsg+0xca/0x110 net/socket.c:957
+   tipc_conn_rcv_from_sock+0x162/0x2f0 net/tipc/topsrv.c:398
+   tipc_conn_recv_work+0xeb/0x190 net/tipc/topsrv.c:421
+   process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
+   worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
+
+As Hoang pointed out, it was caused by skb_cb->bytes_read still accessed
+after calling tsk_advance_rx_queue() to free the skb in tipc_recvmsg().
+
+This patch is to fix it by accessing skb_cb->bytes_read earlier than
+calling tsk_advance_rx_queue().
+
+Fixes: f4919ff59c28 ("tipc: keep the skb in rcv queue until the whole data is read")
+Reported-by: syzbot+e6741b97d5552f97c24d@syzkaller.appspotmail.com
+Signed-off-by: Xin Long <lucien.xin@gmail.com>
+Acked-by: Jon Maloy <jmaloy@redhat.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/tipc/socket.c |    8 +++++---
+ 1 file changed, 5 insertions(+), 3 deletions(-)
+
+--- a/net/tipc/socket.c
++++ b/net/tipc/socket.c
+@@ -1979,10 +1979,12 @@ static int tipc_recvmsg(struct socket *s
+               tipc_node_distr_xmit(sock_net(sk), &xmitq);
+       }
+-      if (!skb_cb->bytes_read)
+-              tsk_advance_rx_queue(sk);
++      if (skb_cb->bytes_read)
++              goto exit;
++
++      tsk_advance_rx_queue(sk);
+-      if (likely(!connected) || skb_cb->bytes_read)
++      if (likely(!connected))
+               goto exit;
+       /* Send connection flow control advertisement when applicable */
diff --git a/queue-5.14/x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch b/queue-5.14/x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch
new file mode 100644 (file)
index 0000000..72e1233
--- /dev/null
@@ -0,0 +1,166 @@
+From 81065b35e2486c024c7aa86caed452e1f01a59d4 Mon Sep 17 00:00:00 2001
+From: Tony Luck <tony.luck@intel.com>
+Date: Mon, 13 Sep 2021 14:52:39 -0700
+Subject: x86/mce: Avoid infinite loop for copy from user recovery
+
+From: Tony Luck <tony.luck@intel.com>
+
+commit 81065b35e2486c024c7aa86caed452e1f01a59d4 upstream.
+
+There are two cases for machine check recovery:
+
+1) The machine check was triggered by ring3 (application) code.
+   This is the simpler case. The machine check handler simply queues
+   work to be executed on return to user. That code unmaps the page
+   from all users and arranges to send a SIGBUS to the task that
+   triggered the poison.
+
+2) The machine check was triggered in kernel code that is covered by
+   an exception table entry. In this case the machine check handler
+   still queues a work entry to unmap the page, etc. but this will
+   not be called right away because the #MC handler returns to the
+   fix up code address in the exception table entry.
+
+Problems occur if the kernel triggers another machine check before the
+return to user processes the first queued work item.
+
+Specifically, the work is queued using the ->mce_kill_me callback
+structure in the task struct for the current thread. Attempting to queue
+a second work item using this same callback results in a loop in the
+linked list of work functions to call. So when the kernel does return to
+user, it enters an infinite loop processing the same entry for ever.
+
+There are some legitimate scenarios where the kernel may take a second
+machine check before returning to the user.
+
+1) Some code (e.g. futex) first tries a get_user() with page faults
+   disabled. If this fails, the code retries with page faults enabled
+   expecting that this will resolve the page fault.
+
+2) Copy from user code retries a copy in byte-at-time mode to check
+   whether any additional bytes can be copied.
+
+On the other side of the fence are some bad drivers that do not check
+the return value from individual get_user() calls and may access
+multiple user addresses without noticing that some/all calls have
+failed.
+
+Fix by adding a counter (current->mce_count) to keep track of repeated
+machine checks before task_work() is called. First machine check saves
+the address information and calls task_work_add(). Subsequent machine
+checks before that task_work call back is executed check that the address
+is in the same page as the first machine check (since the callback will
+offline exactly one page).
+
+Expected worst case is four machine checks before moving on (e.g. one
+user access with page faults disabled, then a repeat to the same address
+with page faults enabled ... repeat in copy tail bytes). Just in case
+there is some code that loops forever enforce a limit of 10.
+
+ [ bp: Massage commit message, drop noinstr, fix typo, extend panic
+   messages. ]
+
+Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work")
+Signed-off-by: Tony Luck <tony.luck@intel.com>
+Signed-off-by: Borislav Petkov <bp@suse.de>
+Cc: <stable@vger.kernel.org>
+Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/kernel/cpu/mce/core.c |   45 ++++++++++++++++++++++++++++++-----------
+ include/linux/sched.h          |    1 
+ 2 files changed, 34 insertions(+), 12 deletions(-)
+
+--- a/arch/x86/kernel/cpu/mce/core.c
++++ b/arch/x86/kernel/cpu/mce/core.c
+@@ -1253,6 +1253,9 @@ static void __mc_scan_banks(struct mce *
+ static void kill_me_now(struct callback_head *ch)
+ {
++      struct task_struct *p = container_of(ch, struct task_struct, mce_kill_me);
++
++      p->mce_count = 0;
+       force_sig(SIGBUS);
+ }
+@@ -1262,6 +1265,7 @@ static void kill_me_maybe(struct callbac
+       int flags = MF_ACTION_REQUIRED;
+       int ret;
++      p->mce_count = 0;
+       pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr);
+       if (!p->mce_ripv)
+@@ -1290,17 +1294,34 @@ static void kill_me_maybe(struct callbac
+       }
+ }
+-static void queue_task_work(struct mce *m, int kill_current_task)
++static void queue_task_work(struct mce *m, char *msg, int kill_current_task)
+ {
+-      current->mce_addr = m->addr;
+-      current->mce_kflags = m->kflags;
+-      current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
+-      current->mce_whole_page = whole_page(m);
+-
+-      if (kill_current_task)
+-              current->mce_kill_me.func = kill_me_now;
+-      else
+-              current->mce_kill_me.func = kill_me_maybe;
++      int count = ++current->mce_count;
++
++      /* First call, save all the details */
++      if (count == 1) {
++              current->mce_addr = m->addr;
++              current->mce_kflags = m->kflags;
++              current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
++              current->mce_whole_page = whole_page(m);
++
++              if (kill_current_task)
++                      current->mce_kill_me.func = kill_me_now;
++              else
++                      current->mce_kill_me.func = kill_me_maybe;
++      }
++
++      /* Ten is likely overkill. Don't expect more than two faults before task_work() */
++      if (count > 10)
++              mce_panic("Too many consecutive machine checks while accessing user data", m, msg);
++
++      /* Second or later call, make sure page address matches the one from first call */
++      if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT))
++              mce_panic("Consecutive machine checks to different user pages", m, msg);
++
++      /* Do not call task_work_add() more than once */
++      if (count > 1)
++              return;
+       task_work_add(current, &current->mce_kill_me, TWA_RESUME);
+ }
+@@ -1438,7 +1459,7 @@ noinstr void do_machine_check(struct pt_
+               /* If this triggers there is no way to recover. Die hard. */
+               BUG_ON(!on_thread_stack() || !user_mode(regs));
+-              queue_task_work(&m, kill_current_task);
++              queue_task_work(&m, msg, kill_current_task);
+       } else {
+               /*
+@@ -1456,7 +1477,7 @@ noinstr void do_machine_check(struct pt_
+               }
+               if (m.kflags & MCE_IN_KERNEL_COPYIN)
+-                      queue_task_work(&m, kill_current_task);
++                      queue_task_work(&m, msg, kill_current_task);
+       }
+ out:
+       mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1394,6 +1394,7 @@ struct task_struct {
+                                       mce_whole_page : 1,
+                                       __mce_reserved : 62;
+       struct callback_head            mce_kill_me;
++      int                             mce_count;
+ #endif
+ #ifdef CONFIG_KRETPROBES
diff --git a/queue-5.14/x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch b/queue-5.14/x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch
new file mode 100644 (file)
index 0000000..f352216
--- /dev/null
@@ -0,0 +1,115 @@
+From 34b1999da935a33be6239226bfa6cd4f704c5c88 Mon Sep 17 00:00:00 2001
+From: Mike Rapoport <rppt@linux.ibm.com>
+Date: Thu, 19 Aug 2021 16:27:17 +0300
+Subject: x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
+
+From: Mike Rapoport <rppt@linux.ibm.com>
+
+commit 34b1999da935a33be6239226bfa6cd4f704c5c88 upstream.
+
+Jiri Olsa reported a fault when running:
+
+  # cat /proc/kallsyms | grep ksys_read
+  ffffffff8136d580 T ksys_read
+  # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore
+
+  /proc/kcore:     file format elf64-x86-64
+
+  Segmentation fault
+
+  general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI
+  CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508
+  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
+  RIP: 0010:kern_addr_valid
+  Call Trace:
+   read_kcore
+   ? rcu_read_lock_sched_held
+   ? rcu_read_lock_sched_held
+   ? rcu_read_lock_sched_held
+   ? trace_hardirqs_on
+   ? rcu_read_lock_sched_held
+   ? lock_acquire
+   ? lock_acquire
+   ? rcu_read_lock_sched_held
+   ? lock_acquire
+   ? rcu_read_lock_sched_held
+   ? rcu_read_lock_sched_held
+   ? rcu_read_lock_sched_held
+   ? lock_release
+   ? _raw_spin_unlock
+   ? __handle_mm_fault
+   ? rcu_read_lock_sched_held
+   ? lock_acquire
+   ? rcu_read_lock_sched_held
+   ? lock_release
+   proc_reg_read
+   ? vfs_read
+   vfs_read
+   ksys_read
+   do_syscall_64
+   entry_SYSCALL_64_after_hwframe
+
+The fault happens because kern_addr_valid() dereferences existent but not
+present PMD in the high kernel mappings.
+
+Such PMDs are created when free_kernel_image_pages() frees regions larger
+than 2Mb. In this case, a part of the freed memory is mapped with PMDs and
+the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will
+mark the PMD as not present rather than wipe it completely.
+
+Have kern_addr_valid() check whether higher level page table entries are
+present before trying to dereference them to fix this issue and to avoid
+similar issues in the future.
+
+Stable backporting note:
+------------------------
+
+Note that the stable marking is for all active stable branches because
+there could be cases where pagetable entries exist but are not valid -
+see 9a14aefc1d28 ("x86: cpa, fix lookup_address"), for example. So make
+sure to be on the safe side here and use pXY_present() accessors rather
+than pXY_none() which could #GP when accessing pages in the direct map.
+
+Also see:
+
+  c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping")
+
+for more info.
+
+Reported-by: Jiri Olsa <jolsa@redhat.com>
+Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
+Signed-off-by: Borislav Petkov <bp@suse.de>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Acked-by: Dave Hansen <dave.hansen@intel.com>
+Tested-by: Jiri Olsa <jolsa@redhat.com>
+Cc: <stable@vger.kernel.org>   # 4.4+
+Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/mm/init_64.c |    6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/arch/x86/mm/init_64.c
++++ b/arch/x86/mm/init_64.c
+@@ -1433,18 +1433,18 @@ int kern_addr_valid(unsigned long addr)
+               return 0;
+       p4d = p4d_offset(pgd, addr);
+-      if (p4d_none(*p4d))
++      if (!p4d_present(*p4d))
+               return 0;
+       pud = pud_offset(p4d, addr);
+-      if (pud_none(*pud))
++      if (!pud_present(*pud))
+               return 0;
+       if (pud_large(*pud))
+               return pfn_valid(pud_pfn(*pud));
+       pmd = pmd_offset(pud, addr);
+-      if (pmd_none(*pmd))
++      if (!pmd_present(*pmd))
+               return 0;
+       if (pmd_large(*pmd))
diff --git a/queue-5.14/x86-pat-pass-valid-address-to-sanitize_phys.patch b/queue-5.14/x86-pat-pass-valid-address-to-sanitize_phys.patch
new file mode 100644 (file)
index 0000000..ac64a24
--- /dev/null
@@ -0,0 +1,57 @@
+From aeef8b5089b76852bd84889f2809e69a7cfb414e Mon Sep 17 00:00:00 2001
+From: Jeff Moyer <jmoyer@redhat.com>
+Date: Wed, 11 Aug 2021 17:07:37 -0400
+Subject: x86/pat: Pass valid address to sanitize_phys()
+
+From: Jeff Moyer <jmoyer@redhat.com>
+
+commit aeef8b5089b76852bd84889f2809e69a7cfb414e upstream.
+
+The end address passed to memtype_reserve() is handed directly to
+sanitize_phys().  However, end is exclusive and sanitize_phys() expects
+an inclusive address.  If end falls at the end of the physical address
+space, sanitize_phys() will return 0.  This can result in drivers
+failing to load, and the following warning:
+
+ WARNING: CPU: 26 PID: 749 at arch/x86/mm/pat.c:354 reserve_memtype+0x262/0x450
+ reserve_memtype failed: [mem 0x3ffffff00000-0xffffffffffffffff], req uncached-minus
+ Call Trace:
+  [<ffffffffa427b1f2>] reserve_memtype+0x262/0x450
+  [<ffffffffa42764aa>] ioremap_nocache+0x1a/0x20
+  [<ffffffffc04620a1>] mpt3sas_base_map_resources+0x151/0xa60 [mpt3sas]
+  [<ffffffffc0465555>] mpt3sas_base_attach+0xf5/0xa50 [mpt3sas]
+ ---[ end trace 6d6eea4438db89ef ]---
+ ioremap reserve_memtype failed -22
+ mpt3sas_cm0: unable to map adapter memory! or resource not found
+ mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10597/_scsih_probe()!
+
+Fix this by passing the inclusive end address to sanitize_phys().
+
+Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses")
+Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Reviewed-by: Dan Williams <dan.j.williams@intel.com>
+Cc: stable@vger.kernel.org
+Link: https://lore.kernel.org/r/x49o8a3pu5i.fsf@segfault.boston.devel.redhat.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/mm/pat/memtype.c |    7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/mm/pat/memtype.c
++++ b/arch/x86/mm/pat/memtype.c
+@@ -583,7 +583,12 @@ int memtype_reserve(u64 start, u64 end,
+       int err = 0;
+       start = sanitize_phys(start);
+-      end = sanitize_phys(end);
++
++      /*
++       * The end address passed into this function is exclusive, but
++       * sanitize_phys() expects an inclusive address.
++       */
++      end = sanitize_phys(end - 1) + 1;
+       if (start >= end) {
+               WARN(1, "%s failed: [mem %#010Lx-%#010Lx], req %s\n", __func__,
+                               start, end - 1, cattr_name(req_type));