--- /dev/null
+From 9b29a161ef38040f000dcf9ccf78e34495edfd55 Mon Sep 17 00:00:00 2001
+From: Saeed Mahameed <saeedm@nvidia.com>
+Date: Mon, 26 Jul 2021 15:15:39 -0700
+Subject: ethtool: Fix rxnfc copy to user buffer overflow
+
+From: Saeed Mahameed <saeedm@nvidia.com>
+
+commit 9b29a161ef38040f000dcf9ccf78e34495edfd55 upstream.
+
+In the cited commit, copy_to_user() got called with the wrong pointer,
+instead of passing the actual buffer ptr to copy from, a pointer to
+the pointer got passed, which causes a buffer overflow calltrace to pop
+up when executing "ethtool -x ethX".
+
+Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed
+to the function, instead of a pointer to it.
+
+This fixes below call trace:
+[ 15.533533] ------------[ cut here ]------------
+[ 15.539007] Buffer overflow detected (8 < 192)!
+[ 15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20
+[ 15.549308] Modules linked in:
+[ 15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058
+[ 15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
+[ 15.558378] RIP: 0010:copy_overflow+0x15/0x20
+[ 15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55
+[ 15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286
+[ 15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000
+[ 15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff
+[ 15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff
+[ 15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000
+[ 15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000
+[ 15.573584] FS: 00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000
+[ 15.575639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+[ 15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0
+[ 15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+[ 15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+[ 15.582441] Call Trace:
+[ 15.582970] ethtool_rxnfc_copy_to_user+0x30/0x46
+[ 15.583815] ethtool_get_rxnfc.cold+0x23/0x2b
+[ 15.584584] dev_ethtool+0x29c/0x25f0
+[ 15.585286] ? security_netlbl_sid_to_secattr+0x77/0xd0
+[ 15.586728] ? do_set_pte+0xc4/0x110
+[ 15.587349] ? _raw_spin_unlock+0x18/0x30
+[ 15.588118] ? __might_sleep+0x49/0x80
+[ 15.588956] dev_ioctl+0x2c1/0x490
+[ 15.589616] sock_ioctl+0x18e/0x330
+[ 15.591143] __x64_sys_ioctl+0x41c/0x990
+[ 15.591823] ? irqentry_exit_to_user_mode+0x9/0x20
+[ 15.592657] ? irqentry_exit+0x33/0x40
+[ 15.593308] ? exc_page_fault+0x32f/0x770
+[ 15.593877] ? exit_to_user_mode_prepare+0x3c/0x130
+[ 15.594775] do_syscall_64+0x35/0x80
+[ 15.595397] entry_SYSCALL_64_after_hwframe+0x44/0xae
+[ 15.596037] RIP: 0033:0x7f5381226d4b
+[ 15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48
+[ 15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
+[ 15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b
+[ 15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003
+[ 15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001
+[ 15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000
+[ 15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000
+[ 15.605042] ---[ end trace 325cf185e2795048 ]---
+
+Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling")
+Reported-by: Shannon Nelson <snelson@pensando.io>
+CC: Arnd Bergmann <arnd@arndb.de>
+CC: Christoph Hellwig <hch@lst.de>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Tested-by: Shannon Nelson <snelson@pensando.io>
+Acked-by: Arnd Bergmann <arnd@arndb.de>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ethtool/ioctl.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/net/ethtool/ioctl.c
++++ b/net/ethtool/ioctl.c
+@@ -906,7 +906,7 @@ static int ethtool_rxnfc_copy_to_user(vo
+ rule_buf);
+ useraddr += offsetof(struct compat_ethtool_rxnfc, rule_locs);
+ } else {
+- ret = copy_to_user(useraddr, &rxnfc, size);
++ ret = copy_to_user(useraddr, rxnfc, size);
+ useraddr += offsetof(struct ethtool_rxnfc, rule_locs);
+ }
+
--- /dev/null
+From 7c3a0a018e672a9723a79b128227272562300055 Mon Sep 17 00:00:00 2001
+From: Eli Cohen <elic@nvidia.com>
+Date: Wed, 15 Sep 2021 07:47:27 +0300
+Subject: net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
+
+From: Eli Cohen <elic@nvidia.com>
+
+commit 7c3a0a018e672a9723a79b128227272562300055 upstream.
+
+Remove the assert from the callback priv lookup function since it does
+not require RTNL lock and is already protected by flow_indr_block_lock.
+
+This will avoid warnings from being emitted to dmesg if the driver
+registers its callback after an ingress qdisc was created for a
+netdevice.
+
+The warnings started after the following patch was merged:
+commit 74fc4f828769 ("net: Fix offloading indirect devices dependency on qdisc order creation")
+
+Signed-off-by: Eli Cohen <elic@nvidia.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 3 ---
+ drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 3 ---
+ drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 ---
+ 3 files changed, 9 deletions(-)
+
+--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
++++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+@@ -1870,9 +1870,6 @@ bnxt_tc_indr_block_cb_lookup(struct bnxt
+ {
+ struct bnxt_flower_indr_block_cb_priv *cb_priv;
+
+- /* All callback list access should be protected by RTNL. */
+- ASSERT_RTNL();
+-
+ list_for_each_entry(cb_priv, &bp->tc_indr_block_list, list)
+ if (cb_priv->tunnel_netdev == netdev)
+ return cb_priv;
+--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+@@ -300,9 +300,6 @@ mlx5e_rep_indr_block_priv_lookup(struct
+ {
+ struct mlx5e_rep_indr_block_priv *cb_priv;
+
+- /* All callback list access should be protected by RTNL. */
+- ASSERT_RTNL();
+-
+ list_for_each_entry(cb_priv,
+ &rpriv->uplink_priv.tc_indr_block_priv_list,
+ list)
+--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
++++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
+@@ -1766,9 +1766,6 @@ nfp_flower_indr_block_cb_priv_lookup(str
+ struct nfp_flower_indr_block_cb_priv *cb_priv;
+ struct nfp_flower_priv *priv = app->priv;
+
+- /* All callback list access should be protected by RTNL. */
+- ASSERT_RTNL();
+-
+ list_for_each_entry(cb_priv, &priv->indr_block_cb_priv, list)
+ if (cb_priv->netdev == netdev)
+ return cb_priv;
--- /dev/null
+From 9756e44fd4d283ebcc94df353642f322428b73de Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?=E7=8E=8B=E8=B4=87?= <yun.wang@linux.alibaba.com>
+Date: Fri, 3 Sep 2021 10:27:18 +0800
+Subject: net: remove the unnecessary check in cipso_v4_doi_free
+
+From: 王贇 <yun.wang@linux.alibaba.com>
+
+commit 9756e44fd4d283ebcc94df353642f322428b73de upstream.
+
+The commit 733c99ee8be9 ("net: fix NULL pointer reference in
+cipso_v4_doi_free") was merged by a mistake, this patch try
+to cleanup the mess.
+
+And we already have the commit e842cb60e8ac ("net: fix NULL
+pointer reference in cipso_v4_doi_free") which fixed the root
+cause of the issue mentioned in it's description.
+
+Suggested-by: Paul Moore <paul@paul-moore.com>
+Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ipv4/cipso_ipv4.c | 18 ++++++++----------
+ 1 file changed, 8 insertions(+), 10 deletions(-)
+
+--- a/net/ipv4/cipso_ipv4.c
++++ b/net/ipv4/cipso_ipv4.c
+@@ -465,16 +465,14 @@ void cipso_v4_doi_free(struct cipso_v4_d
+ if (!doi_def)
+ return;
+
+- if (doi_def->map.std) {
+- switch (doi_def->type) {
+- case CIPSO_V4_MAP_TRANS:
+- kfree(doi_def->map.std->lvl.cipso);
+- kfree(doi_def->map.std->lvl.local);
+- kfree(doi_def->map.std->cat.cipso);
+- kfree(doi_def->map.std->cat.local);
+- kfree(doi_def->map.std);
+- break;
+- }
++ switch (doi_def->type) {
++ case CIPSO_V4_MAP_TRANS:
++ kfree(doi_def->map.std->lvl.cipso);
++ kfree(doi_def->map.std->lvl.local);
++ kfree(doi_def->map.std->cat.cipso);
++ kfree(doi_def->map.std->cat.local);
++ kfree(doi_def->map.std);
++ break;
+ }
+ kfree(doi_def);
+ }
drm-etnaviv-add-missing-mmu-context-put-when-reaping-mmu-mapping.patch
s390-sclp-fix-secure-ipl-facility-detection.patch
net-qrtr-revert-check-in-qrtr_endpoint_post.patch
+x86-pat-pass-valid-address-to-sanitize_phys.patch
+x86-mm-fix-kern_addr_valid-to-cope-with-existing-but-not-present-entries.patch
+x86-mce-avoid-infinite-loop-for-copy-from-user-recovery.patch
+tipc-fix-an-use-after-free-issue-in-tipc_recvmsg.patch
+ethtool-fix-rxnfc-copy-to-user-buffer-overflow.patch
+net-remove-the-unnecessary-check-in-cipso_v4_doi_free.patch
+net-mlx5-nfp-bnxt-remove-unnecessary-rtnl-lock-assert.patch
--- /dev/null
+From cc19862ffe454a5b632ca202e5a51bfec9f89fd2 Mon Sep 17 00:00:00 2001
+From: Xin Long <lucien.xin@gmail.com>
+Date: Fri, 23 Jul 2021 13:25:36 -0400
+Subject: tipc: fix an use-after-free issue in tipc_recvmsg
+
+From: Xin Long <lucien.xin@gmail.com>
+
+commit cc19862ffe454a5b632ca202e5a51bfec9f89fd2 upstream.
+
+syzbot reported an use-after-free crash:
+
+ BUG: KASAN: use-after-free in tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979
+ Call Trace:
+ tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979
+ sock_recvmsg_nosec net/socket.c:943 [inline]
+ sock_recvmsg net/socket.c:961 [inline]
+ sock_recvmsg+0xca/0x110 net/socket.c:957
+ tipc_conn_rcv_from_sock+0x162/0x2f0 net/tipc/topsrv.c:398
+ tipc_conn_recv_work+0xeb/0x190 net/tipc/topsrv.c:421
+ process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
+ worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
+
+As Hoang pointed out, it was caused by skb_cb->bytes_read still accessed
+after calling tsk_advance_rx_queue() to free the skb in tipc_recvmsg().
+
+This patch is to fix it by accessing skb_cb->bytes_read earlier than
+calling tsk_advance_rx_queue().
+
+Fixes: f4919ff59c28 ("tipc: keep the skb in rcv queue until the whole data is read")
+Reported-by: syzbot+e6741b97d5552f97c24d@syzkaller.appspotmail.com
+Signed-off-by: Xin Long <lucien.xin@gmail.com>
+Acked-by: Jon Maloy <jmaloy@redhat.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/tipc/socket.c | 8 +++++---
+ 1 file changed, 5 insertions(+), 3 deletions(-)
+
+--- a/net/tipc/socket.c
++++ b/net/tipc/socket.c
+@@ -1979,10 +1979,12 @@ static int tipc_recvmsg(struct socket *s
+ tipc_node_distr_xmit(sock_net(sk), &xmitq);
+ }
+
+- if (!skb_cb->bytes_read)
+- tsk_advance_rx_queue(sk);
++ if (skb_cb->bytes_read)
++ goto exit;
++
++ tsk_advance_rx_queue(sk);
+
+- if (likely(!connected) || skb_cb->bytes_read)
++ if (likely(!connected))
+ goto exit;
+
+ /* Send connection flow control advertisement when applicable */
--- /dev/null
+From 81065b35e2486c024c7aa86caed452e1f01a59d4 Mon Sep 17 00:00:00 2001
+From: Tony Luck <tony.luck@intel.com>
+Date: Mon, 13 Sep 2021 14:52:39 -0700
+Subject: x86/mce: Avoid infinite loop for copy from user recovery
+
+From: Tony Luck <tony.luck@intel.com>
+
+commit 81065b35e2486c024c7aa86caed452e1f01a59d4 upstream.
+
+There are two cases for machine check recovery:
+
+1) The machine check was triggered by ring3 (application) code.
+ This is the simpler case. The machine check handler simply queues
+ work to be executed on return to user. That code unmaps the page
+ from all users and arranges to send a SIGBUS to the task that
+ triggered the poison.
+
+2) The machine check was triggered in kernel code that is covered by
+ an exception table entry. In this case the machine check handler
+ still queues a work entry to unmap the page, etc. but this will
+ not be called right away because the #MC handler returns to the
+ fix up code address in the exception table entry.
+
+Problems occur if the kernel triggers another machine check before the
+return to user processes the first queued work item.
+
+Specifically, the work is queued using the ->mce_kill_me callback
+structure in the task struct for the current thread. Attempting to queue
+a second work item using this same callback results in a loop in the
+linked list of work functions to call. So when the kernel does return to
+user, it enters an infinite loop processing the same entry for ever.
+
+There are some legitimate scenarios where the kernel may take a second
+machine check before returning to the user.
+
+1) Some code (e.g. futex) first tries a get_user() with page faults
+ disabled. If this fails, the code retries with page faults enabled
+ expecting that this will resolve the page fault.
+
+2) Copy from user code retries a copy in byte-at-time mode to check
+ whether any additional bytes can be copied.
+
+On the other side of the fence are some bad drivers that do not check
+the return value from individual get_user() calls and may access
+multiple user addresses without noticing that some/all calls have
+failed.
+
+Fix by adding a counter (current->mce_count) to keep track of repeated
+machine checks before task_work() is called. First machine check saves
+the address information and calls task_work_add(). Subsequent machine
+checks before that task_work call back is executed check that the address
+is in the same page as the first machine check (since the callback will
+offline exactly one page).
+
+Expected worst case is four machine checks before moving on (e.g. one
+user access with page faults disabled, then a repeat to the same address
+with page faults enabled ... repeat in copy tail bytes). Just in case
+there is some code that loops forever enforce a limit of 10.
+
+ [ bp: Massage commit message, drop noinstr, fix typo, extend panic
+ messages. ]
+
+Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work")
+Signed-off-by: Tony Luck <tony.luck@intel.com>
+Signed-off-by: Borislav Petkov <bp@suse.de>
+Cc: <stable@vger.kernel.org>
+Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/kernel/cpu/mce/core.c | 45 ++++++++++++++++++++++++++++++-----------
+ include/linux/sched.h | 1
+ 2 files changed, 34 insertions(+), 12 deletions(-)
+
+--- a/arch/x86/kernel/cpu/mce/core.c
++++ b/arch/x86/kernel/cpu/mce/core.c
+@@ -1253,6 +1253,9 @@ static void __mc_scan_banks(struct mce *
+
+ static void kill_me_now(struct callback_head *ch)
+ {
++ struct task_struct *p = container_of(ch, struct task_struct, mce_kill_me);
++
++ p->mce_count = 0;
+ force_sig(SIGBUS);
+ }
+
+@@ -1262,6 +1265,7 @@ static void kill_me_maybe(struct callbac
+ int flags = MF_ACTION_REQUIRED;
+ int ret;
+
++ p->mce_count = 0;
+ pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr);
+
+ if (!p->mce_ripv)
+@@ -1290,17 +1294,34 @@ static void kill_me_maybe(struct callbac
+ }
+ }
+
+-static void queue_task_work(struct mce *m, int kill_current_task)
++static void queue_task_work(struct mce *m, char *msg, int kill_current_task)
+ {
+- current->mce_addr = m->addr;
+- current->mce_kflags = m->kflags;
+- current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
+- current->mce_whole_page = whole_page(m);
+-
+- if (kill_current_task)
+- current->mce_kill_me.func = kill_me_now;
+- else
+- current->mce_kill_me.func = kill_me_maybe;
++ int count = ++current->mce_count;
++
++ /* First call, save all the details */
++ if (count == 1) {
++ current->mce_addr = m->addr;
++ current->mce_kflags = m->kflags;
++ current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
++ current->mce_whole_page = whole_page(m);
++
++ if (kill_current_task)
++ current->mce_kill_me.func = kill_me_now;
++ else
++ current->mce_kill_me.func = kill_me_maybe;
++ }
++
++ /* Ten is likely overkill. Don't expect more than two faults before task_work() */
++ if (count > 10)
++ mce_panic("Too many consecutive machine checks while accessing user data", m, msg);
++
++ /* Second or later call, make sure page address matches the one from first call */
++ if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT))
++ mce_panic("Consecutive machine checks to different user pages", m, msg);
++
++ /* Do not call task_work_add() more than once */
++ if (count > 1)
++ return;
+
+ task_work_add(current, ¤t->mce_kill_me, TWA_RESUME);
+ }
+@@ -1438,7 +1459,7 @@ noinstr void do_machine_check(struct pt_
+ /* If this triggers there is no way to recover. Die hard. */
+ BUG_ON(!on_thread_stack() || !user_mode(regs));
+
+- queue_task_work(&m, kill_current_task);
++ queue_task_work(&m, msg, kill_current_task);
+
+ } else {
+ /*
+@@ -1456,7 +1477,7 @@ noinstr void do_machine_check(struct pt_
+ }
+
+ if (m.kflags & MCE_IN_KERNEL_COPYIN)
+- queue_task_work(&m, kill_current_task);
++ queue_task_work(&m, msg, kill_current_task);
+ }
+ out:
+ mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1394,6 +1394,7 @@ struct task_struct {
+ mce_whole_page : 1,
+ __mce_reserved : 62;
+ struct callback_head mce_kill_me;
++ int mce_count;
+ #endif
+
+ #ifdef CONFIG_KRETPROBES
--- /dev/null
+From 34b1999da935a33be6239226bfa6cd4f704c5c88 Mon Sep 17 00:00:00 2001
+From: Mike Rapoport <rppt@linux.ibm.com>
+Date: Thu, 19 Aug 2021 16:27:17 +0300
+Subject: x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
+
+From: Mike Rapoport <rppt@linux.ibm.com>
+
+commit 34b1999da935a33be6239226bfa6cd4f704c5c88 upstream.
+
+Jiri Olsa reported a fault when running:
+
+ # cat /proc/kallsyms | grep ksys_read
+ ffffffff8136d580 T ksys_read
+ # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore
+
+ /proc/kcore: file format elf64-x86-64
+
+ Segmentation fault
+
+ general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI
+ CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508
+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
+ RIP: 0010:kern_addr_valid
+ Call Trace:
+ read_kcore
+ ? rcu_read_lock_sched_held
+ ? rcu_read_lock_sched_held
+ ? rcu_read_lock_sched_held
+ ? trace_hardirqs_on
+ ? rcu_read_lock_sched_held
+ ? lock_acquire
+ ? lock_acquire
+ ? rcu_read_lock_sched_held
+ ? lock_acquire
+ ? rcu_read_lock_sched_held
+ ? rcu_read_lock_sched_held
+ ? rcu_read_lock_sched_held
+ ? lock_release
+ ? _raw_spin_unlock
+ ? __handle_mm_fault
+ ? rcu_read_lock_sched_held
+ ? lock_acquire
+ ? rcu_read_lock_sched_held
+ ? lock_release
+ proc_reg_read
+ ? vfs_read
+ vfs_read
+ ksys_read
+ do_syscall_64
+ entry_SYSCALL_64_after_hwframe
+
+The fault happens because kern_addr_valid() dereferences existent but not
+present PMD in the high kernel mappings.
+
+Such PMDs are created when free_kernel_image_pages() frees regions larger
+than 2Mb. In this case, a part of the freed memory is mapped with PMDs and
+the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will
+mark the PMD as not present rather than wipe it completely.
+
+Have kern_addr_valid() check whether higher level page table entries are
+present before trying to dereference them to fix this issue and to avoid
+similar issues in the future.
+
+Stable backporting note:
+------------------------
+
+Note that the stable marking is for all active stable branches because
+there could be cases where pagetable entries exist but are not valid -
+see 9a14aefc1d28 ("x86: cpa, fix lookup_address"), for example. So make
+sure to be on the safe side here and use pXY_present() accessors rather
+than pXY_none() which could #GP when accessing pages in the direct map.
+
+Also see:
+
+ c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping")
+
+for more info.
+
+Reported-by: Jiri Olsa <jolsa@redhat.com>
+Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
+Signed-off-by: Borislav Petkov <bp@suse.de>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Acked-by: Dave Hansen <dave.hansen@intel.com>
+Tested-by: Jiri Olsa <jolsa@redhat.com>
+Cc: <stable@vger.kernel.org> # 4.4+
+Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/mm/init_64.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/arch/x86/mm/init_64.c
++++ b/arch/x86/mm/init_64.c
+@@ -1433,18 +1433,18 @@ int kern_addr_valid(unsigned long addr)
+ return 0;
+
+ p4d = p4d_offset(pgd, addr);
+- if (p4d_none(*p4d))
++ if (!p4d_present(*p4d))
+ return 0;
+
+ pud = pud_offset(p4d, addr);
+- if (pud_none(*pud))
++ if (!pud_present(*pud))
+ return 0;
+
+ if (pud_large(*pud))
+ return pfn_valid(pud_pfn(*pud));
+
+ pmd = pmd_offset(pud, addr);
+- if (pmd_none(*pmd))
++ if (!pmd_present(*pmd))
+ return 0;
+
+ if (pmd_large(*pmd))
--- /dev/null
+From aeef8b5089b76852bd84889f2809e69a7cfb414e Mon Sep 17 00:00:00 2001
+From: Jeff Moyer <jmoyer@redhat.com>
+Date: Wed, 11 Aug 2021 17:07:37 -0400
+Subject: x86/pat: Pass valid address to sanitize_phys()
+
+From: Jeff Moyer <jmoyer@redhat.com>
+
+commit aeef8b5089b76852bd84889f2809e69a7cfb414e upstream.
+
+The end address passed to memtype_reserve() is handed directly to
+sanitize_phys(). However, end is exclusive and sanitize_phys() expects
+an inclusive address. If end falls at the end of the physical address
+space, sanitize_phys() will return 0. This can result in drivers
+failing to load, and the following warning:
+
+ WARNING: CPU: 26 PID: 749 at arch/x86/mm/pat.c:354 reserve_memtype+0x262/0x450
+ reserve_memtype failed: [mem 0x3ffffff00000-0xffffffffffffffff], req uncached-minus
+ Call Trace:
+ [<ffffffffa427b1f2>] reserve_memtype+0x262/0x450
+ [<ffffffffa42764aa>] ioremap_nocache+0x1a/0x20
+ [<ffffffffc04620a1>] mpt3sas_base_map_resources+0x151/0xa60 [mpt3sas]
+ [<ffffffffc0465555>] mpt3sas_base_attach+0xf5/0xa50 [mpt3sas]
+ ---[ end trace 6d6eea4438db89ef ]---
+ ioremap reserve_memtype failed -22
+ mpt3sas_cm0: unable to map adapter memory! or resource not found
+ mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10597/_scsih_probe()!
+
+Fix this by passing the inclusive end address to sanitize_phys().
+
+Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses")
+Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Reviewed-by: Dan Williams <dan.j.williams@intel.com>
+Cc: stable@vger.kernel.org
+Link: https://lore.kernel.org/r/x49o8a3pu5i.fsf@segfault.boston.devel.redhat.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/mm/pat/memtype.c | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/mm/pat/memtype.c
++++ b/arch/x86/mm/pat/memtype.c
+@@ -583,7 +583,12 @@ int memtype_reserve(u64 start, u64 end,
+ int err = 0;
+
+ start = sanitize_phys(start);
+- end = sanitize_phys(end);
++
++ /*
++ * The end address passed into this function is exclusive, but
++ * sanitize_phys() expects an inclusive address.
++ */
++ end = sanitize_phys(end - 1) + 1;
+ if (start >= end) {
+ WARN(1, "%s failed: [mem %#010Lx-%#010Lx], req %s\n", __func__,
+ start, end - 1, cattr_name(req_type));