Fixes for 6.1

author Sasha Levin <sashal@kernel.org>

Sat, 27 Jan 2024 12:47:05 +0000 (07:47 -0500)

committer Sasha Levin <sashal@kernel.org>

Sat, 27 Jan 2024 12:47:05 +0000 (07:47 -0500)
author Sasha Levin <sashal@kernel.org>
Sat, 27 Jan 2024 12:47:05 +0000 (07:47 -0500)
committer Sasha Levin <sashal@kernel.org>
Sat, 27 Jan 2024 12:47:05 +0000 (07:47 -0500)
diff --git a/queue-6.1/afs-hide-silly-rename-files-from-userspace.patch b/queue-6.1/afs-hide-silly-rename-files-from-userspace.patch

new file mode 100644 (file)

index 0000000..5ed51be
--- /dev/null
+++ b/queue-6.1/afs-hide-silly-rename-files-from-userspace.patch
@@ -0,0 +1,54 @@
+From 8ceb82e37993e798c148af906dcd37daf8e589d8 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 8 Jan 2024 17:22:36 +0000
+Subject: afs: Hide silly-rename files from userspace
+
+From: David Howells <dhowells@redhat.com>
+
+[ Upstream commit 57e9d49c54528c49b8bffe6d99d782ea051ea534 ]
+
+There appears to be a race between silly-rename files being created/removed
+and various userspace tools iterating over the contents of a directory,
+leading to such errors as:
+
+       find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory
+       tar: ./include/linux/greybus/.__afs3C95: File removed before we read it
+
+when building a kernel.
+
+Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files
+to userspace.  This doesn't stop them being looked up directly by name as
+we need to be able to look them up from within the kernel as part of the
+silly-rename algorithm.
+
+Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename")
+Signed-off-by: David Howells <dhowells@redhat.com>
+cc: Marc Dionne <marc.dionne@auristor.com>
+cc: linux-afs@lists.infradead.org
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/afs/dir.c | 8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+diff --git a/fs/afs/dir.c b/fs/afs/dir.c
+index 07dc4ec73520..cf811b77ee67 100644
+--- a/fs/afs/dir.c
++++ b/fs/afs/dir.c
+@@ -473,6 +473,14 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode,
+                       continue;
+               }
+ 
++              /* Don't expose silly rename entries to userspace. */
++              if (nlen > 6 &&
++                  dire->u.name[0] == '.' &&
++                  ctx->actor != afs_lookup_filldir &&
++                  ctx->actor != afs_lookup_one_filldir &&
++                  memcmp(dire->u.name, ".__afs", 6) == 0)
++                      continue;
++
+               /* found the next entry */
+               if (!dir_emit(ctx, dire->u.name, nlen,
+                             ntohl(dire->u.vnode),
+-- 
+2.43.0
+
diff --git a/queue-6.1/bnxt_en-wait-for-flr-to-complete-during-probe.patch b/queue-6.1/bnxt_en-wait-for-flr-to-complete-during-probe.patch

new file mode 100644 (file)

index 0000000..d5e7308
--- /dev/null
+++ b/queue-6.1/bnxt_en-wait-for-flr-to-complete-during-probe.patch
@@ -0,0 +1,43 @@
+From 9459996938ff96d389d6e6540bafe07176bd569f Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 17 Jan 2024 15:45:11 -0800
+Subject: bnxt_en: Wait for FLR to complete during probe
+
+From: Michael Chan <michael.chan@broadcom.com>
+
+[ Upstream commit 3c1069fa42872f95cf3c6fedf80723d391e12d57 ]
+
+The first message to firmware may fail if the device is undergoing FLR.
+The driver has some recovery logic for this failure scenario but we must
+wait 100 msec for FLR to complete before proceeding.  Otherwise the
+recovery will always fail.
+
+Fixes: ba02629ff6cb ("bnxt_en: log firmware status on firmware init failure")
+Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
+Signed-off-by: Michael Chan <michael.chan@broadcom.com>
+Link: https://lore.kernel.org/r/20240117234515.226944-2-michael.chan@broadcom.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+index df4d88d35701..f810b5dc25f0 100644
+--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
++++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+@@ -12269,6 +12269,11 @@ static int bnxt_fw_init_one_p1(struct bnxt *bp)
+ 
+       bp->fw_cap = 0;
+       rc = bnxt_hwrm_ver_get(bp);
++      /* FW may be unresponsive after FLR. FLR must complete within 100 msec
++       * so wait before continuing with recovery.
++       */
++      if (rc)
++              msleep(100);
+       bnxt_try_map_fw_health_reg(bp);
+       if (rc) {
+               rc = bnxt_try_recover_fw(bp);
+-- 
+2.43.0
+
diff --git a/queue-6.1/fjes-fix-memleaks-in-fjes_hw_setup.patch b/queue-6.1/fjes-fix-memleaks-in-fjes_hw_setup.patch

new file mode 100644 (file)

index 0000000..8b908ef
--- /dev/null
+++ b/queue-6.1/fjes-fix-memleaks-in-fjes_hw_setup.patch
@@ -0,0 +1,109 @@
+From 8023723f4c6cc5ea7f121b33bdbfba4af8119b89 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 23 Jan 2024 01:24:42 +0800
+Subject: fjes: fix memleaks in fjes_hw_setup
+
+From: Zhipeng Lu <alexious@zju.edu.cn>
+
+[ Upstream commit f6cc4b6a3ae53df425771000e9c9540cce9b7bb1 ]
+
+In fjes_hw_setup, it allocates several memory and delay the deallocation
+to the fjes_hw_exit in fjes_probe through the following call chain:
+
+fjes_probe
+  |-> fjes_hw_init
+        |-> fjes_hw_setup
+  |-> fjes_hw_exit
+
+However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
+all the resources allocated in fjes_hw_setup will be leaked. In this
+patch, we free those resources in fjes_hw_setup and prevents such leaks.
+
+Fixes: 2fcbca687702 ("fjes: platform_driver's .probe and .remove routine")
+Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
+Reviewed-by: Simon Horman <horms@kernel.org>
+Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/fjes/fjes_hw.c | 37 ++++++++++++++++++++++++++++++-------
+ 1 file changed, 30 insertions(+), 7 deletions(-)
+
+diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
+index 704e949484d0..b9b5554ea862 100644
+--- a/drivers/net/fjes/fjes_hw.c
++++ b/drivers/net/fjes/fjes_hw.c
+@@ -221,21 +221,25 @@ static int fjes_hw_setup(struct fjes_hw *hw)
+ 
+       mem_size = FJES_DEV_REQ_BUF_SIZE(hw->max_epid);
+       hw->hw_info.req_buf = kzalloc(mem_size, GFP_KERNEL);
+-      if (!(hw->hw_info.req_buf))
+-              return -ENOMEM;
++      if (!(hw->hw_info.req_buf)) {
++              result = -ENOMEM;
++              goto free_ep_info;
++      }
+ 
+       hw->hw_info.req_buf_size = mem_size;
+ 
+       mem_size = FJES_DEV_RES_BUF_SIZE(hw->max_epid);
+       hw->hw_info.res_buf = kzalloc(mem_size, GFP_KERNEL);
+-      if (!(hw->hw_info.res_buf))
+-              return -ENOMEM;
++      if (!(hw->hw_info.res_buf)) {
++              result = -ENOMEM;
++              goto free_req_buf;
++      }
+ 
+       hw->hw_info.res_buf_size = mem_size;
+ 
+       result = fjes_hw_alloc_shared_status_region(hw);
+       if (result)
+-              return result;
++              goto free_res_buf;
+ 
+       hw->hw_info.buffer_share_bit = 0;
+       hw->hw_info.buffer_unshare_reserve_bit = 0;
+@@ -246,11 +250,11 @@ static int fjes_hw_setup(struct fjes_hw *hw)
+ 
+                       result = fjes_hw_alloc_epbuf(&buf_pair->tx);
+                       if (result)
+-                              return result;
++                              goto free_epbuf;
+ 
+                       result = fjes_hw_alloc_epbuf(&buf_pair->rx);
+                       if (result)
+-                              return result;
++                              goto free_epbuf;
+ 
+                       spin_lock_irqsave(&hw->rx_status_lock, flags);
+                       fjes_hw_setup_epbuf(&buf_pair->tx, mac,
+@@ -273,6 +277,25 @@ static int fjes_hw_setup(struct fjes_hw *hw)
+       fjes_hw_init_command_registers(hw, &param);
+ 
+       return 0;
++
++free_epbuf:
++      for (epidx = 0; epidx < hw->max_epid ; epidx++) {
++              if (epidx == hw->my_epid)
++                      continue;
++              fjes_hw_free_epbuf(&hw->ep_shm_info[epidx].tx);
++              fjes_hw_free_epbuf(&hw->ep_shm_info[epidx].rx);
++      }
++      fjes_hw_free_shared_status_region(hw);
++free_res_buf:
++      kfree(hw->hw_info.res_buf);
++      hw->hw_info.res_buf = NULL;
++free_req_buf:
++      kfree(hw->hw_info.req_buf);
++      hw->hw_info.req_buf = NULL;
++free_ep_info:
++      kfree(hw->ep_shm_info);
++      hw->ep_shm_info = NULL;
++      return result;
+ }
+ 
+ static void fjes_hw_cleanup(struct fjes_hw *hw)
+-- 
+2.43.0
+
diff --git a/queue-6.1/ipv6-init-the-accept_queue-s-spinlocks-in-inet6_crea.patch b/queue-6.1/ipv6-init-the-accept_queue-s-spinlocks-in-inet6_crea.patch

new file mode 100644 (file)

index 0000000..b65d794
--- /dev/null
+++ b/queue-6.1/ipv6-init-the-accept_queue-s-spinlocks-in-inet6_crea.patch
@@ -0,0 +1,70 @@
+From fbd9a386176120c5fba72bb27d4375f151afc9a2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 22 Jan 2024 18:20:01 +0800
+Subject: ipv6: init the accept_queue's spinlocks in inet6_create
+
+From: Zhengchao Shao <shaozhengchao@huawei.com>
+
+[ Upstream commit 435e202d645c197dcfd39d7372eb2a56529b6640 ]
+
+In commit 198bc90e0e73("tcp: make sure init the accept_queue's spinlocks
+once"), the spinlocks of accept_queue are initialized only when socket is
+created in the inet4 scenario. The locks are not initialized when socket
+is created in the inet6 scenario. The kernel reports the following error:
+INFO: trying to register non-static key.
+The code is fine but needs lockdep annotation, or maybe
+you didn't initialize this object before use?
+turning off the locking correctness validator.
+Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
+Call Trace:
+<TASK>
+       dump_stack_lvl (lib/dump_stack.c:107)
+       register_lock_class (kernel/locking/lockdep.c:1289)
+       __lock_acquire (kernel/locking/lockdep.c:5015)
+       lock_acquire.part.0 (kernel/locking/lockdep.c:5756)
+       _raw_spin_lock_bh (kernel/locking/spinlock.c:178)
+       inet_csk_listen_stop (net/ipv4/inet_connection_sock.c:1386)
+       tcp_disconnect (net/ipv4/tcp.c:2981)
+       inet_shutdown (net/ipv4/af_inet.c:935)
+       __sys_shutdown (./include/linux/file.h:32 net/socket.c:2438)
+       __x64_sys_shutdown (net/socket.c:2445)
+       do_syscall_64 (arch/x86/entry/common.c:52)
+       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
+RIP: 0033:0x7f52ecd05a3d
+Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7
+48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
+ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48
+RSP: 002b:00007f52ecf5dde8 EFLAGS: 00000293 ORIG_RAX: 0000000000000030
+RAX: ffffffffffffffda RBX: 00007f52ecf5e640 RCX: 00007f52ecd05a3d
+RDX: 00007f52ecc8b188 RSI: 0000000000000000 RDI: 0000000000000004
+RBP: 00007f52ecf5de20 R08: 00007ffdae45c69f R09: 0000000000000000
+R10: 0000000000000000 R11: 0000000000000293 R12: 00007f52ecf5e640
+R13: 0000000000000000 R14: 00007f52ecc8b060 R15: 00007ffdae45c6e0
+
+Fixes: 198bc90e0e73 ("tcp: make sure init the accept_queue's spinlocks once")
+Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
+Reviewed-by: Eric Dumazet <edumazet@google.com>
+Link: https://lore.kernel.org/r/20240122102001.2851701-1-shaozhengchao@huawei.com
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/af_inet6.c | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
+index a2f29ca51600..0b42eb8c55aa 100644
+--- a/net/ipv6/af_inet6.c
++++ b/net/ipv6/af_inet6.c
+@@ -199,6 +199,9 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
+       if (INET_PROTOSW_REUSE & answer_flags)
+               sk->sk_reuse = SK_CAN_REUSE;
+ 
++      if (INET_PROTOSW_ICSK & answer_flags)
++              inet_init_csk_locks(sk);
++
+       inet = inet_sk(sk);
+       inet->is_icsk = (INET_PROTOSW_ICSK & answer_flags) != 0;
+ 
+-- 
+2.43.0
+
diff --git a/queue-6.1/llc-drop-support-for-eth_p_tr_802_2.patch b/queue-6.1/llc-drop-support-for-eth_p_tr_802_2.patch

new file mode 100644 (file)

index 0000000..fce8016
--- /dev/null
+++ b/queue-6.1/llc-drop-support-for-eth_p_tr_802_2.patch
@@ -0,0 +1,130 @@
+From 43de8d2f22f03e91f934593b8aceb6ef4389568c Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 17:55:15 -0800
+Subject: llc: Drop support for ETH_P_TR_802_2.
+
+From: Kuniyuki Iwashima <kuniyu@amazon.com>
+
+[ Upstream commit e3f9bed9bee261e3347131764e42aeedf1ffea61 ]
+
+syzbot reported an uninit-value bug below. [0]
+
+llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2
+(0x0011), and syzbot abused the latter to trigger the bug.
+
+  write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)
+
+llc_conn_handler() initialises local variables {saddr,daddr}.mac
+based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes
+them to __llc_lookup().
+
+However, the initialisation is done only when skb->protocol is
+htons(ETH_P_802_2), otherwise, __llc_lookup_established() and
+__llc_lookup_listener() will read garbage.
+
+The missing initialisation existed prior to commit 211ed865108e
+("net: delete all instances of special processing for token ring").
+
+It removed the part to kick out the token ring stuff but forgot to
+close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().
+
+Let's remove llc_tr_packet_type and complete the deprecation.
+
+[0]:
+BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90
+ __llc_lookup_established+0xe9d/0xf90
+ __llc_lookup net/llc/llc_conn.c:611 [inline]
+ llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791
+ llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
+ __netif_receive_skb_one_core net/core/dev.c:5527 [inline]
+ __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641
+ netif_receive_skb_internal net/core/dev.c:5727 [inline]
+ netif_receive_skb+0x58/0x660 net/core/dev.c:5786
+ tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
+ tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
+ tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
+ call_write_iter include/linux/fs.h:2020 [inline]
+ new_sync_write fs/read_write.c:491 [inline]
+ vfs_write+0x8ef/0x1490 fs/read_write.c:584
+ ksys_write+0x20f/0x4c0 fs/read_write.c:637
+ __do_sys_write fs/read_write.c:649 [inline]
+ __se_sys_write fs/read_write.c:646 [inline]
+ __x64_sys_write+0x93/0xd0 fs/read_write.c:646
+ do_syscall_x64 arch/x86/entry/common.c:51 [inline]
+ do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
+ entry_SYSCALL_64_after_hwframe+0x63/0x6b
+
+Local variable daddr created at:
+ llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783
+ llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
+
+CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
+Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
+
+Fixes: 211ed865108e ("net: delete all instances of special processing for token ring")
+Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com
+Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f
+Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Reviewed-by: Eric Dumazet <edumazet@google.com>
+Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/llc_pdu.h | 6 ++----
+ net/llc/llc_core.c    | 7 -------
+ 2 files changed, 2 insertions(+), 11 deletions(-)
+
+diff --git a/include/net/llc_pdu.h b/include/net/llc_pdu.h
+index 49aa79c7b278..581cd37aa98b 100644
+--- a/include/net/llc_pdu.h
++++ b/include/net/llc_pdu.h
+@@ -262,8 +262,7 @@ static inline void llc_pdu_header_init(struct sk_buff *skb, u8 type,
+  */
+ static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa)
+ {
+-      if (skb->protocol == htons(ETH_P_802_2))
+-              memcpy(sa, eth_hdr(skb)->h_source, ETH_ALEN);
++      memcpy(sa, eth_hdr(skb)->h_source, ETH_ALEN);
+ }
+ 
+ /**
+@@ -275,8 +274,7 @@ static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa)
+  */
+ static inline void llc_pdu_decode_da(struct sk_buff *skb, u8 *da)
+ {
+-      if (skb->protocol == htons(ETH_P_802_2))
+-              memcpy(da, eth_hdr(skb)->h_dest, ETH_ALEN);
++      memcpy(da, eth_hdr(skb)->h_dest, ETH_ALEN);
+ }
+ 
+ /**
+diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
+index 6e387aadffce..4f16d9c88350 100644
+--- a/net/llc/llc_core.c
++++ b/net/llc/llc_core.c
+@@ -135,22 +135,15 @@ static struct packet_type llc_packet_type __read_mostly = {
+       .func = llc_rcv,
+ };
+ 
+-static struct packet_type llc_tr_packet_type __read_mostly = {
+-      .type = cpu_to_be16(ETH_P_TR_802_2),
+-      .func = llc_rcv,
+-};
+-
+ static int __init llc_init(void)
+ {
+       dev_add_pack(&llc_packet_type);
+-      dev_add_pack(&llc_tr_packet_type);
+       return 0;
+ }
+ 
+ static void __exit llc_exit(void)
+ {
+       dev_remove_pack(&llc_packet_type);
+-      dev_remove_pack(&llc_tr_packet_type);
+ }
+ 
+ module_init(llc_init);
+-- 
+2.43.0
+
diff --git a/queue-6.1/llc-make-llc_ui_sendmsg-more-robust-against-bonding-.patch b/queue-6.1/llc-make-llc_ui_sendmsg-more-robust-against-bonding-.patch

new file mode 100644 (file)

index 0000000..87e8003
--- /dev/null
+++ b/queue-6.1/llc-make-llc_ui_sendmsg-more-robust-against-bonding-.patch
@@ -0,0 +1,154 @@
+From 239b811ab635fc8c9173db6cc4a30b38b7772c28 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 18:36:25 +0000
+Subject: llc: make llc_ui_sendmsg() more robust against bonding changes
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit dad555c816a50c6a6a8a86be1f9177673918c647 ]
+
+syzbot was able to trick llc_ui_sendmsg(), allocating an skb with no
+headroom, but subsequently trying to push 14 bytes of Ethernet header [1]
+
+Like some others, llc_ui_sendmsg() releases the socket lock before
+calling sock_alloc_send_skb().
+Then it acquires it again, but does not redo all the sanity checks
+that were performed.
+
+This fix:
+
+- Uses LL_RESERVED_SPACE() to reserve space.
+- Check all conditions again after socket lock is held again.
+- Do not account Ethernet header for mtu limitation.
+
+[1]
+
+skbuff: skb_under_panic: text:ffff800088baa334 len:1514 put:14 head:ffff0000c9c37000 data:ffff0000c9c36ff2 tail:0x5dc end:0x6c0 dev:bond0
+
+ kernel BUG at net/core/skbuff.c:193 !
+Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
+Modules linked in:
+CPU: 0 PID: 6875 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00101-g0802e17d9aca-dirty #0
+Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
+pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
+ pc : skb_panic net/core/skbuff.c:189 [inline]
+ pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203
+ lr : skb_panic net/core/skbuff.c:189 [inline]
+ lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203
+sp : ffff800096f97000
+x29: ffff800096f97010 x28: ffff80008cc8d668 x27: dfff800000000000
+x26: ffff0000cb970c90 x25: 00000000000005dc x24: ffff0000c9c36ff2
+x23: ffff0000c9c37000 x22: 00000000000005ea x21: 00000000000006c0
+x20: 000000000000000e x19: ffff800088baa334 x18: 1fffe000368261ce
+x17: ffff80008e4ed000 x16: ffff80008a8310f8 x15: 0000000000000001
+x14: 1ffff00012df2d58 x13: 0000000000000000 x12: 0000000000000000
+x11: 0000000000000001 x10: 0000000000ff0100 x9 : e28a51f1087e8400
+x8 : e28a51f1087e8400 x7 : ffff80008028f8d0 x6 : 0000000000000000
+x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff800082b78714
+x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000089
+Call trace:
+  skb_panic net/core/skbuff.c:189 [inline]
+  skb_under_panic+0x13c/0x140 net/core/skbuff.c:203
+  skb_push+0xf0/0x108 net/core/skbuff.c:2451
+  eth_header+0x44/0x1f8 net/ethernet/eth.c:83
+  dev_hard_header include/linux/netdevice.h:3188 [inline]
+  llc_mac_hdr_init+0x110/0x17c net/llc/llc_output.c:33
+  llc_sap_action_send_xid_c+0x170/0x344 net/llc/llc_s_ac.c:85
+  llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline]
+  llc_sap_next_state net/llc/llc_sap.c:182 [inline]
+  llc_sap_state_process+0x1ec/0x774 net/llc/llc_sap.c:209
+  llc_build_and_send_xid_pkt+0x12c/0x1c0 net/llc/llc_sap.c:270
+  llc_ui_sendmsg+0x7bc/0xb1c net/llc/af_llc.c:997
+  sock_sendmsg_nosec net/socket.c:730 [inline]
+  __sock_sendmsg net/socket.c:745 [inline]
+  sock_sendmsg+0x194/0x274 net/socket.c:767
+  splice_to_socket+0x7cc/0xd58 fs/splice.c:881
+  do_splice_from fs/splice.c:933 [inline]
+  direct_splice_actor+0xe4/0x1c0 fs/splice.c:1142
+  splice_direct_to_actor+0x2a0/0x7e4 fs/splice.c:1088
+  do_splice_direct+0x20c/0x348 fs/splice.c:1194
+  do_sendfile+0x4bc/0xc70 fs/read_write.c:1254
+  __do_sys_sendfile64 fs/read_write.c:1322 [inline]
+  __se_sys_sendfile64 fs/read_write.c:1308 [inline]
+  __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1308
+  __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
+  invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
+  el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
+  do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
+  el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678
+  el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
+  el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595
+Code: aa1803e6 aa1903e7 a90023f5 94792f6a (d4210000)
+
+Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
+Reported-and-tested-by: syzbot+2a7024e9502df538e8ef@syzkaller.appspotmail.com
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://lore.kernel.org/r/20240118183625.4007013-1-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/llc/af_llc.c | 24 ++++++++++++++++--------
+ 1 file changed, 16 insertions(+), 8 deletions(-)
+
+diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
+index 9ffbc667be6c..19c478bd85bd 100644
+--- a/net/llc/af_llc.c
++++ b/net/llc/af_llc.c
+@@ -928,14 +928,15 @@ static int llc_ui_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+  */
+ static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+ {
++      DECLARE_SOCKADDR(struct sockaddr_llc *, addr, msg->msg_name);
+       struct sock *sk = sock->sk;
+       struct llc_sock *llc = llc_sk(sk);
+-      DECLARE_SOCKADDR(struct sockaddr_llc *, addr, msg->msg_name);
+       int flags = msg->msg_flags;
+       int noblock = flags & MSG_DONTWAIT;
++      int rc = -EINVAL, copied = 0, hdrlen, hh_len;
+       struct sk_buff *skb = NULL;
++      struct net_device *dev;
+       size_t size = 0;
+-      int rc = -EINVAL, copied = 0, hdrlen;
+ 
+       dprintk("%s: sending from %02X to %02X\n", __func__,
+               llc->laddr.lsap, llc->daddr.lsap);
+@@ -955,22 +956,29 @@ static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+               if (rc)
+                       goto out;
+       }
+-      hdrlen = llc->dev->hard_header_len + llc_ui_header_len(sk, addr);
++      dev = llc->dev;
++      hh_len = LL_RESERVED_SPACE(dev);
++      hdrlen = llc_ui_header_len(sk, addr);
+       size = hdrlen + len;
+-      if (size > llc->dev->mtu)
+-              size = llc->dev->mtu;
++      size = min_t(size_t, size, READ_ONCE(dev->mtu));
+       copied = size - hdrlen;
+       rc = -EINVAL;
+       if (copied < 0)
+               goto out;
+       release_sock(sk);
+-      skb = sock_alloc_send_skb(sk, size, noblock, &rc);
++      skb = sock_alloc_send_skb(sk, hh_len + size, noblock, &rc);
+       lock_sock(sk);
+       if (!skb)
+               goto out;
+-      skb->dev      = llc->dev;
++      if (sock_flag(sk, SOCK_ZAPPED) ||
++          llc->dev != dev ||
++          hdrlen != llc_ui_header_len(sk, addr) ||
++          hh_len != LL_RESERVED_SPACE(dev) ||
++          size > READ_ONCE(dev->mtu))
++              goto out;
++      skb->dev      = dev;
+       skb->protocol = llc_proto_type(addr->sllc_arphrd);
+-      skb_reserve(skb, hdrlen);
++      skb_reserve(skb, hh_len + hdrlen);
+       rc = memcpy_from_msg(skb_put(skb, copied), msg, copied);
+       if (rc)
+               goto out;
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-fec-fix-the-unhandled-context-fault-from-smmu.patch b/queue-6.1/net-fec-fix-the-unhandled-context-fault-from-smmu.patch

new file mode 100644 (file)

index 0000000..2b4649a
--- /dev/null
+++ b/queue-6.1/net-fec-fix-the-unhandled-context-fault-from-smmu.patch
@@ -0,0 +1,58 @@
+From 3ced8ef727839378bc8d0cd8278ce5453d4df746 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 23 Jan 2024 10:51:41 -0600
+Subject: net: fec: fix the unhandled context fault from smmu
+
+From: Shenwei Wang <shenwei.wang@nxp.com>
+
+[ Upstream commit 5e344807735023cd3a67c37a1852b849caa42620 ]
+
+When repeatedly changing the interface link speed using the command below:
+
+ethtool -s eth0 speed 100 duplex full
+ethtool -s eth0 speed 1000 duplex full
+
+The following errors may sometimes be reported by the ARM SMMU driver:
+
+[ 5395.035364] fec 5b040000.ethernet eth0: Link is Down
+[ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault:
+fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2
+[ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full -
+flow control off
+
+It is identified that the FEC driver does not properly stop the TX queue
+during the link speed transitions, and this results in the invalid virtual
+I/O address translations from the SMMU and causes the context faults.
+
+Fixes: dbc64a8ea231 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()")
+Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
+Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/freescale/fec_main.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
+index 6d1b76002282..97d12c7eea77 100644
+--- a/drivers/net/ethernet/freescale/fec_main.c
++++ b/drivers/net/ethernet/freescale/fec_main.c
+@@ -1917,6 +1917,7 @@ static void fec_enet_adjust_link(struct net_device *ndev)
+ 
+               /* if any of the above changed restart the FEC */
+               if (status_change) {
++                      netif_stop_queue(ndev);
+                       napi_disable(&fep->napi);
+                       netif_tx_lock_bh(ndev);
+                       fec_restart(ndev);
+@@ -1926,6 +1927,7 @@ static void fec_enet_adjust_link(struct net_device *ndev)
+               }
+       } else {
+               if (fep->link) {
++                      netif_stop_queue(ndev);
+                       napi_disable(&fep->napi);
+                       netif_tx_lock_bh(ndev);
+                       fec_stop(ndev);
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-fix-removing-a-namespace-with-conflicting-altnam.patch b/queue-6.1/net-fix-removing-a-namespace-with-conflicting-altnam.patch

new file mode 100644 (file)

index 0000000..7c366db
--- /dev/null
+++ b/queue-6.1/net-fix-removing-a-namespace-with-conflicting-altnam.patch
@@ -0,0 +1,81 @@
+From 63ab8148b17c01cd9e6b3efe3c28da8d22111dda Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 16:58:59 -0800
+Subject: net: fix removing a namespace with conflicting altnames
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Jakub Kicinski <kuba@kernel.org>
+
+[ Upstream commit d09486a04f5da0a812c26217213b89a3b1acf836 ]
+
+Mark reports a BUG() when a net namespace is removed.
+
+    kernel BUG at net/core/dev.c:11520!
+
+Physical interfaces moved outside of init_net get "refunded"
+to init_net when that namespace disappears. The main interface
+name may get overwritten in the process if it would have
+conflicted. We need to also discard all conflicting altnames.
+Recent fixes addressed ensuring that altnames get moved
+with the main interface, which surfaced this problem.
+
+Reported-by: Марк Коренберг <socketpair@gmail.com>
+Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEavtDg@mail.gmail.com/
+Fixes: 7663d522099e ("net: check for altname conflicts when changing netdev's netns")
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Reviewed-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Jiri Pirko <jiri@nvidia.com>
+Reviewed-by: Xin Long <lucien.xin@gmail.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/core/dev.c | 9 +++++++++
+ net/core/dev.h | 3 +++
+ 2 files changed, 12 insertions(+)
+
+diff --git a/net/core/dev.c b/net/core/dev.c
+index 0a5566b6f8a2..1ba3662faf0a 100644
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -11323,6 +11323,7 @@ static struct pernet_operations __net_initdata netdev_net_ops = {
+ 
+ static void __net_exit default_device_exit_net(struct net *net)
+ {
++      struct netdev_name_node *name_node, *tmp;
+       struct net_device *dev, *aux;
+       /*
+        * Push all migratable network devices back to the
+@@ -11345,6 +11346,14 @@ static void __net_exit default_device_exit_net(struct net *net)
+               snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
+               if (netdev_name_in_use(&init_net, fb_name))
+                       snprintf(fb_name, IFNAMSIZ, "dev%%d");
++
++              netdev_for_each_altname_safe(dev, name_node, tmp)
++                      if (netdev_name_in_use(&init_net, name_node->name)) {
++                              netdev_name_node_del(name_node);
++                              synchronize_rcu();
++                              __netdev_name_node_alt_destroy(name_node);
++                      }
++
+               err = dev_change_net_namespace(dev, &init_net, fb_name);
+               if (err) {
+                       pr_emerg("%s: failed to move %s to init_net: %d\n",
+diff --git a/net/core/dev.h b/net/core/dev.h
+index 9ca91457c197..db9ff8cd8d46 100644
+--- a/net/core/dev.h
++++ b/net/core/dev.h
+@@ -63,6 +63,9 @@ int dev_change_name(struct net_device *dev, const char *newname);
+ 
+ #define netdev_for_each_altname(dev, namenode)                                \
+       list_for_each_entry((namenode), &(dev)->name_node->list, list)
++#define netdev_for_each_altname_safe(dev, namenode, next)             \
++      list_for_each_entry_safe((namenode), (next), &(dev)->name_node->list, \
++                               list)
+ 
+ int netdev_name_node_alt_create(struct net_device *dev, const char *name);
+ int netdev_name_node_alt_destroy(struct net_device *dev, const char *name);
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-micrel-fix-ptp-frame-parsing-for-lan8814.patch b/queue-6.1/net-micrel-fix-ptp-frame-parsing-for-lan8814.patch

new file mode 100644 (file)

index 0000000..5f1aa5d
--- /dev/null
+++ b/queue-6.1/net-micrel-fix-ptp-frame-parsing-for-lan8814.patch
@@ -0,0 +1,61 @@
+From 5bcca20ba9e4a0cbd216dc4f73d2e557c38a5e0e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 11:47:50 +0100
+Subject: net: micrel: Fix PTP frame parsing for lan8814
+
+From: Horatiu Vultur <horatiu.vultur@microchip.com>
+
+[ Upstream commit aaf632f7ab6dec57bc9329a438f94504fe8034b9 ]
+
+The HW has the capability to check each frame if it is a PTP frame,
+which domain it is, which ptp frame type it is, different ip address in
+the frame. And if one of these checks fail then the frame is not
+timestamp. Most of these checks were disabled except checking the field
+minorVersionPTP inside the PTP header. Meaning that once a partner sends
+a frame compliant to 8021AS which has minorVersionPTP set to 1, then the
+frame was not timestamp because the HW expected by default a value of 0
+in minorVersionPTP. This is exactly the same issue as on lan8841.
+Fix this issue by removing this check so the userspace can decide on this.
+
+Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
+Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
+Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
+Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/phy/micrel.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
+index 7cbcf51bae92..9481f172830f 100644
+--- a/drivers/net/phy/micrel.c
++++ b/drivers/net/phy/micrel.c
+@@ -120,6 +120,11 @@
+  */
+ #define LAN8814_1PPM_FORMAT                   17179
+ 
++#define PTP_RX_VERSION                                0x0248
++#define PTP_TX_VERSION                                0x0288
++#define PTP_MAX_VERSION(x)                    (((x) & GENMASK(7, 0)) << 8)
++#define PTP_MIN_VERSION(x)                    ((x) & GENMASK(7, 0))
++
+ #define PTP_RX_MOD                            0x024F
+ #define PTP_RX_MOD_BAD_UDPV4_CHKSUM_FORCE_FCS_DIS_ BIT(3)
+ #define PTP_RX_TIMESTAMP_EN                   0x024D
+@@ -2922,6 +2927,12 @@ static void lan8814_ptp_init(struct phy_device *phydev)
+       lanphy_write_page_reg(phydev, 5, PTP_TX_PARSE_IP_ADDR_EN, 0);
+       lanphy_write_page_reg(phydev, 5, PTP_RX_PARSE_IP_ADDR_EN, 0);
+ 
++      /* Disable checking for minorVersionPTP field */
++      lanphy_write_page_reg(phydev, 5, PTP_RX_VERSION,
++                            PTP_MAX_VERSION(0xff) | PTP_MIN_VERSION(0x0));
++      lanphy_write_page_reg(phydev, 5, PTP_TX_VERSION,
++                            PTP_MAX_VERSION(0xff) | PTP_MIN_VERSION(0x0));
++
+       skb_queue_head_init(&ptp_priv->tx_queue);
+       skb_queue_head_init(&ptp_priv->rx_queue);
+       INIT_LIST_HEAD(&ptp_priv->rx_ts_list);
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mlx5-dr-can-t-go-to-uplink-vport-on-rx-rule.patch b/queue-6.1/net-mlx5-dr-can-t-go-to-uplink-vport-on-rx-rule.patch

new file mode 100644 (file)

index 0000000..916da05
--- /dev/null
+++ b/queue-6.1/net-mlx5-dr-can-t-go-to-uplink-vport-on-rx-rule.patch
@@ -0,0 +1,51 @@
+From bf0a645a1ccf8c1cd72c26ba56c96f09f2df4a0e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 17 Dec 2023 13:20:36 +0200
+Subject: net/mlx5: DR, Can't go to uplink vport on RX rule
+
+From: Yevgeny Kliteynik <kliteyn@nvidia.com>
+
+[ Upstream commit 5b2a2523eeea5f03d39a9d1ff1bad2e9f8eb98d2 ]
+
+Go-To-Vport action on RX is not allowed when the vport is uplink.
+In such case, the packet should be dropped.
+
+Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
+Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
+Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ .../mellanox/mlx5/core/steering/dr_action.c      | 16 +++++++++++-----
+ 1 file changed, 11 insertions(+), 5 deletions(-)
+
+diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+index 8c265250bbaf..bf7517725d8c 100644
+--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+@@ -762,11 +762,17 @@ int mlx5dr_actions_build_ste_arr(struct mlx5dr_matcher *matcher,
+                                                       action->sampler->tx_icm_addr;
+                       break;
+               case DR_ACTION_TYP_VPORT:
+-                      attr.hit_gvmi = action->vport->caps->vhca_gvmi;
+-                      dest_action = action;
+-                      attr.final_icm_addr = rx_rule ?
+-                              action->vport->caps->icm_address_rx :
+-                              action->vport->caps->icm_address_tx;
++                      if (unlikely(rx_rule && action->vport->caps->num == MLX5_VPORT_UPLINK)) {
++                              /* can't go to uplink on RX rule - dropping instead */
++                              attr.final_icm_addr = nic_dmn->drop_icm_addr;
++                              attr.hit_gvmi = nic_dmn->drop_icm_addr >> 48;
++                      } else {
++                              attr.hit_gvmi = action->vport->caps->vhca_gvmi;
++                              dest_action = action;
++                              attr.final_icm_addr = rx_rule ?
++                                                    action->vport->caps->icm_address_rx :
++                                                    action->vport->caps->icm_address_tx;
++                      }
+                       break;
+               case DR_ACTION_TYP_POP_VLAN:
+                       if (!rx_rule && !(dmn->ste_ctx->actions_caps &
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mlx5-dr-use-the-right-gvmi-number-for-drop-actio.patch b/queue-6.1/net-mlx5-dr-use-the-right-gvmi-number-for-drop-actio.patch

new file mode 100644 (file)

index 0000000..e8227ff
--- /dev/null
+++ b/queue-6.1/net-mlx5-dr-use-the-right-gvmi-number-for-drop-actio.patch
@@ -0,0 +1,39 @@
+From 056c15479c3d1d6c0a222499c46e620103a170e8 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 17 Dec 2023 11:24:08 +0200
+Subject: net/mlx5: DR, Use the right GVMI number for drop action
+
+From: Yevgeny Kliteynik <kliteyn@nvidia.com>
+
+[ Upstream commit 5665954293f13642f9c052ead83c1e9d8cff186f ]
+
+When FW provides ICM addresses for drop RX/TX, the provided capability
+is 64 bits that contain its GVMI as well as the ICM address itself.
+In case of TX DROP this GVMI is different from the GVMI that the
+domain is operating on.
+
+This patch fixes the action to use these GVMI IDs, as provided by FW.
+
+Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
+Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+index a3e7602b044e..8c265250bbaf 100644
+--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+@@ -673,6 +673,7 @@ int mlx5dr_actions_build_ste_arr(struct mlx5dr_matcher *matcher,
+               switch (action_type) {
+               case DR_ACTION_TYP_DROP:
+                       attr.final_icm_addr = nic_dmn->drop_icm_addr;
++                      attr.hit_gvmi = nic_dmn->drop_icm_addr >> 48;
+                       break;
+               case DR_ACTION_TYP_FT:
+                       dest_action = action;
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mlx5-use-mlx5-device-constant-for-selecting-cq-p.patch b/queue-6.1/net-mlx5-use-mlx5-device-constant-for-selecting-cq-p.patch

new file mode 100644 (file)

index 0000000..7424e24
--- /dev/null
+++ b/queue-6.1/net-mlx5-use-mlx5-device-constant-for-selecting-cq-p.patch
@@ -0,0 +1,39 @@
+From 18ef2e4c88df1f28aaa14ecd0f409236189a8700 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 28 Nov 2023 14:01:54 -0800
+Subject: net/mlx5: Use mlx5 device constant for selecting CQ period mode for
+ ASO
+
+From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
+
+[ Upstream commit 20cbf8cbb827094197f3b17db60d71449415db1e ]
+
+mlx5 devices have specific constants for choosing the CQ period mode. These
+constants do not have to match the constants used by the kernel software
+API for DIM period mode selection.
+
+Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
+Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
+Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
+index c971ff04dd04..c215252f2f53 100644
+--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
+@@ -98,7 +98,7 @@ static int create_aso_cq(struct mlx5_aso_cq *cq, void *cqc_data)
+       mlx5_fill_page_frag_array(&cq->wq_ctrl.buf,
+                                 (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas));
+ 
+-      MLX5_SET(cqc,   cqc, cq_period_mode, DIM_CQ_PERIOD_MODE_START_FROM_EQE);
++      MLX5_SET(cqc,   cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);
+       MLX5_SET(cqc,   cqc, c_eqn_or_apu_element, eqn);
+       MLX5_SET(cqc,   cqc, uar_page,      mdev->priv.uar->index);
+       MLX5_SET(cqc,   cqc, log_page_size, cq->wq_ctrl.buf.page_shift -
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mlx5e-allow-software-parsing-when-ipsec-crypto-i.patch b/queue-6.1/net-mlx5e-allow-software-parsing-when-ipsec-crypto-i.patch

new file mode 100644 (file)

index 0000000..bb96d03
--- /dev/null
+++ b/queue-6.1/net-mlx5e-allow-software-parsing-when-ipsec-crypto-i.patch
@@ -0,0 +1,39 @@
+From a5b8335d1cb69923c39f57dde292856e046f9e40 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 12 Dec 2023 13:52:55 +0200
+Subject: net/mlx5e: Allow software parsing when IPsec crypto is enabled
+
+From: Leon Romanovsky <leonro@nvidia.com>
+
+[ Upstream commit 20f5468a7988dedd94a57ba8acd65ebda6a59723 ]
+
+All ConnectX devices have software parsing capability enabled, but it is
+more correct to set allow_swp only if capability exists, which for IPsec
+means that crypto offload is supported.
+
+Fixes: 2451da081a34 ("net/mlx5: Unify device IPsec capabilities check")
+Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+index 29dd3a04c154..d3de1b7a80bf 100644
+--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+@@ -990,8 +990,8 @@ void mlx5e_build_sq_param(struct mlx5_core_dev *mdev,
+       void *wq = MLX5_ADDR_OF(sqc, sqc, wq);
+       bool allow_swp;
+ 
+-      allow_swp =
+-              mlx5_geneve_tx_allowed(mdev) || !!mlx5_ipsec_device_caps(mdev);
++      allow_swp = mlx5_geneve_tx_allowed(mdev) ||
++                  (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_CRYPTO);
+       mlx5e_build_sq_param_common(mdev, param);
+       MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
+       MLX5_SET(sqc, sqc, allow_swp, allow_swp);
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mlx5e-fix-a-double-free-in-arfs_create_groups.patch b/queue-6.1/net-mlx5e-fix-a-double-free-in-arfs_create_groups.patch

new file mode 100644 (file)

index 0000000..30b7268
--- /dev/null
+++ b/queue-6.1/net-mlx5e-fix-a-double-free-in-arfs_create_groups.patch
@@ -0,0 +1,100 @@
+From 411e0a23d4f20e037b18c8c59e8178697812f800 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 17 Jan 2024 15:17:36 +0800
+Subject: net/mlx5e: fix a double-free in arfs_create_groups
+
+From: Zhipeng Lu <alexious@zju.edu.cn>
+
+[ Upstream commit 3c6d5189246f590e4e1f167991558bdb72a4738b ]
+
+When `in` allocated by kvzalloc fails, arfs_create_groups will free
+ft->g and return an error. However, arfs_create_table, the only caller of
+arfs_create_groups, will hold this error and call to
+mlx5e_destroy_flow_table, in which the ft->g will be freed again.
+
+Fixes: 1cabe6b0965e ("net/mlx5e: Create aRFS flow tables")
+Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
+Reviewed-by: Simon Horman <horms@kernel.org>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ .../net/ethernet/mellanox/mlx5/core/en_arfs.c | 26 +++++++++++--------
+ 1 file changed, 15 insertions(+), 11 deletions(-)
+
+diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+index dc0a0a27ac84..58eacba6de8c 100644
+--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+@@ -255,11 +255,13 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
+ 
+       ft->g = kcalloc(MLX5E_ARFS_NUM_GROUPS,
+                       sizeof(*ft->g), GFP_KERNEL);
+-      in = kvzalloc(inlen, GFP_KERNEL);
+-      if  (!in || !ft->g) {
+-              kfree(ft->g);
+-              kvfree(in);
++      if (!ft->g)
+               return -ENOMEM;
++
++      in = kvzalloc(inlen, GFP_KERNEL);
++      if (!in) {
++              err = -ENOMEM;
++              goto err_free_g;
+       }
+ 
+       mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria);
+@@ -279,7 +281,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
+               break;
+       default:
+               err = -EINVAL;
+-              goto out;
++              goto err_free_in;
+       }
+ 
+       switch (type) {
+@@ -301,7 +303,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
+               break;
+       default:
+               err = -EINVAL;
+-              goto out;
++              goto err_free_in;
+       }
+ 
+       MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS);
+@@ -310,7 +312,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
+       MLX5_SET_CFG(in, end_flow_index, ix - 1);
+       ft->g[ft->num_groups] = mlx5_create_flow_group(ft->t, in);
+       if (IS_ERR(ft->g[ft->num_groups]))
+-              goto err;
++              goto err_clean_group;
+       ft->num_groups++;
+ 
+       memset(in, 0, inlen);
+@@ -319,18 +321,20 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
+       MLX5_SET_CFG(in, end_flow_index, ix - 1);
+       ft->g[ft->num_groups] = mlx5_create_flow_group(ft->t, in);
+       if (IS_ERR(ft->g[ft->num_groups]))
+-              goto err;
++              goto err_clean_group;
+       ft->num_groups++;
+ 
+       kvfree(in);
+       return 0;
+ 
+-err:
++err_clean_group:
+       err = PTR_ERR(ft->g[ft->num_groups]);
+       ft->g[ft->num_groups] = NULL;
+-out:
++err_free_in:
+       kvfree(in);
+-
++err_free_g:
++      kfree(ft->g);
++      ft->g = NULL;
+       return err;
+ }
+ 
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mlx5e-fix-a-potential-double-free-in-fs_any_crea.patch b/queue-6.1/net-mlx5e-fix-a-potential-double-free-in-fs_any_crea.patch

new file mode 100644 (file)

index 0000000..c2f0a51
--- /dev/null
+++ b/queue-6.1/net-mlx5e-fix-a-potential-double-free-in-fs_any_crea.patch
@@ -0,0 +1,40 @@
+From f33403e04277064b7bbcdc711baea4171726e46f Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 28 Nov 2023 17:29:01 +0800
+Subject: net/mlx5e: fix a potential double-free in fs_any_create_groups
+
+From: Dinghao Liu <dinghao.liu@zju.edu.cn>
+
+[ Upstream commit aef855df7e1bbd5aa4484851561211500b22707e ]
+
+When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
+fs_any_create_groups() will free ft->g. However, its caller
+fs_any_create_table() will free ft->g again through calling
+mlx5e_destroy_flow_table(), which will lead to a double-free.
+Fix this by setting ft->g to NULL in fs_any_create_groups().
+
+Fixes: 0f575c20bf06 ("net/mlx5e: Introduce Flow Steering ANY API")
+Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
+Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
+Reviewed-by: Simon Horman <horms@kernel.org>
+Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c b/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c
+index e1283531e0b8..671adbad0a40 100644
+--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c
++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c
+@@ -436,6 +436,7 @@ static int fs_any_create_groups(struct mlx5e_flow_table *ft)
+       in = kvzalloc(inlen, GFP_KERNEL);
+       if  (!in || !ft->g) {
+               kfree(ft->g);
++              ft->g = NULL;
+               kvfree(in);
+               return -ENOMEM;
+       }
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-mvpp2-clear-bm-pool-before-initialization.patch b/queue-6.1/net-mvpp2-clear-bm-pool-before-initialization.patch

new file mode 100644 (file)

index 0000000..21d5ad6
--- /dev/null
+++ b/queue-6.1/net-mvpp2-clear-bm-pool-before-initialization.patch
@@ -0,0 +1,77 @@
+From d0a546e89e828662c31863ea0553e6628318f086 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 19:59:14 -0800
+Subject: net: mvpp2: clear BM pool before initialization
+
+From: Jenishkumar Maheshbhai Patel <jpatel2@marvell.com>
+
+[ Upstream commit 9f538b415db862e74b8c5d3abbccfc1b2b6caa38 ]
+
+Register value persist after booting the kernel using
+kexec which results in kernel panic. Thus clear the
+BM pool registers before initialisation to fix the issue.
+
+Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
+Signed-off-by: Jenishkumar Maheshbhai Patel <jpatel2@marvell.com>
+Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
+Link: https://lore.kernel.org/r/20240119035914.2595665-1-jpatel2@marvell.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 27 ++++++++++++++++++-
+ 1 file changed, 26 insertions(+), 1 deletion(-)
+
+diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+index f936640cca4e..2f80ee84c7ec 100644
+--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
++++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+@@ -614,12 +614,38 @@ static void mvpp23_bm_set_8pool_mode(struct mvpp2 *priv)
+       mvpp2_write(priv, MVPP22_BM_POOL_BASE_ADDR_HIGH_REG, val);
+ }
+ 
++/* Cleanup pool before actual initialization in the OS */
++static void mvpp2_bm_pool_cleanup(struct mvpp2 *priv, int pool_id)
++{
++      unsigned int thread = mvpp2_cpu_to_thread(priv, get_cpu());
++      u32 val;
++      int i;
++
++      /* Drain the BM from all possible residues left by firmware */
++      for (i = 0; i < MVPP2_BM_POOL_SIZE_MAX; i++)
++              mvpp2_thread_read(priv, thread, MVPP2_BM_PHY_ALLOC_REG(pool_id));
++
++      put_cpu();
++
++      /* Stop the BM pool */
++      val = mvpp2_read(priv, MVPP2_BM_POOL_CTRL_REG(pool_id));
++      val |= MVPP2_BM_STOP_MASK;
++      mvpp2_write(priv, MVPP2_BM_POOL_CTRL_REG(pool_id), val);
++}
++
+ static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
+ {
+       enum dma_data_direction dma_dir = DMA_FROM_DEVICE;
+       int i, err, poolnum = MVPP2_BM_POOLS_NUM;
+       struct mvpp2_port *port;
+ 
++      if (priv->percpu_pools)
++              poolnum = mvpp2_get_nrxqs(priv) * 2;
++
++      /* Clean up the pool state in case it contains stale state */
++      for (i = 0; i < poolnum; i++)
++              mvpp2_bm_pool_cleanup(priv, i);
++
+       if (priv->percpu_pools) {
+               for (i = 0; i < priv->port_count; i++) {
+                       port = priv->port_list[i];
+@@ -629,7 +655,6 @@ static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
+                       }
+               }
+ 
+-              poolnum = mvpp2_get_nrxqs(priv) * 2;
+               for (i = 0; i < poolnum; i++) {
+                       /* the pool in use */
+                       int pn = i / (poolnum / 2);
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-rds-fix-ubsan-array-index-out-of-bounds-in-rds_c.patch b/queue-6.1/net-rds-fix-ubsan-array-index-out-of-bounds-in-rds_c.patch

new file mode 100644 (file)

index 0000000..0101234
--- /dev/null
+++ b/queue-6.1/net-rds-fix-ubsan-array-index-out-of-bounds-in-rds_c.patch
@@ -0,0 +1,71 @@
+From a4ce53596a75d7f6d9a74979fdd24b88a6962381 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 17:48:39 -0800
+Subject: net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
+
+From: Sharath Srinivasan <sharath.srinivasan@oracle.com>
+
+[ Upstream commit 13e788deb7348cc88df34bed736c3b3b9927ea52 ]
+
+Syzcaller UBSAN crash occurs in rds_cmsg_recv(),
+which reads inc->i_rx_lat_trace[j + 1] with index 4 (3 + 1),
+but with array size of 4 (RDS_RX_MAX_TRACES).
+Here 'j' is assigned from rs->rs_rx_trace[i] and in-turn from
+trace.rx_trace_pos[i] in rds_recv_track_latency(),
+with both arrays sized 3 (RDS_MSG_RX_DGRAM_TRACE_MAX). So fix the
+off-by-one bounds check in rds_recv_track_latency() to prevent
+a potential crash in rds_cmsg_recv().
+
+Found by syzcaller:
+=================================================================
+UBSAN: array-index-out-of-bounds in net/rds/recv.c:585:39
+index 4 is out of range for type 'u64 [4]'
+CPU: 1 PID: 8058 Comm: syz-executor228 Not tainted 6.6.0-gd2f51b3516da #1
+Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
+BIOS 1.15.0-1 04/01/2014
+Call Trace:
+ <TASK>
+ __dump_stack lib/dump_stack.c:88 [inline]
+ dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106
+ ubsan_epilogue lib/ubsan.c:217 [inline]
+ __ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:348
+ rds_cmsg_recv+0x60d/0x700 net/rds/recv.c:585
+ rds_recvmsg+0x3fb/0x1610 net/rds/recv.c:716
+ sock_recvmsg_nosec net/socket.c:1044 [inline]
+ sock_recvmsg+0xe2/0x160 net/socket.c:1066
+ __sys_recvfrom+0x1b6/0x2f0 net/socket.c:2246
+ __do_sys_recvfrom net/socket.c:2264 [inline]
+ __se_sys_recvfrom net/socket.c:2260 [inline]
+ __x64_sys_recvfrom+0xe0/0x1b0 net/socket.c:2260
+ do_syscall_x64 arch/x86/entry/common.c:51 [inline]
+ do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
+ entry_SYSCALL_64_after_hwframe+0x63/0x6b
+==================================================================
+
+Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
+Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
+Closes: https://lore.kernel.org/linux-rdma/CALGdzuoVdq-wtQ4Az9iottBqC5cv9ZhcE5q8N7LfYFvkRsOVcw@mail.gmail.com/
+Signed-off-by: Sharath Srinivasan <sharath.srinivasan@oracle.com>
+Reviewed-by: Simon Horman <horms@kernel.org>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/rds/af_rds.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
+index 3ff6995244e5..d107f7605db4 100644
+--- a/net/rds/af_rds.c
++++ b/net/rds/af_rds.c
+@@ -419,7 +419,7 @@ static int rds_recv_track_latency(struct rds_sock *rs, sockptr_t optval,
+ 
+       rs->rs_rx_traces = trace.rx_traces;
+       for (i = 0; i < rs->rs_rx_traces; i++) {
+-              if (trace.rx_trace_pos[i] > RDS_MSG_RX_DGRAM_TRACE_MAX) {
++              if (trace.rx_trace_pos[i] >= RDS_MSG_RX_DGRAM_TRACE_MAX) {
+                       rs->rs_rx_traces = 0;
+                       return -EFAULT;
+               }
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-smc-fix-illegal-rmb_desc-access-in-smc-d-connect.patch b/queue-6.1/net-smc-fix-illegal-rmb_desc-access-in-smc-d-connect.patch

new file mode 100644 (file)

index 0000000..c736e5b
--- /dev/null
+++ b/queue-6.1/net-smc-fix-illegal-rmb_desc-access-in-smc-d-connect.patch
@@ -0,0 +1,87 @@
+From f656bbfc246ac108ebed7f3b18e5bb9a6343cb20 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 12:32:10 +0800
+Subject: net/smc: fix illegal rmb_desc access in SMC-D connection dump
+
+From: Wen Gu <guwen@linux.alibaba.com>
+
+[ Upstream commit dbc153fd3c142909e564bb256da087e13fbf239c ]
+
+A crash was found when dumping SMC-D connections. It can be reproduced
+by following steps:
+
+- run nginx/wrk test:
+  smc_run nginx
+  smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
+
+- continuously dump SMC-D connections in parallel:
+  watch -n 1 'smcss -D'
+
+ BUG: kernel NULL pointer dereference, address: 0000000000000030
+ CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E      6.7.0+ #55
+ RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
+ Call Trace:
+  <TASK>
+  ? __die+0x24/0x70
+  ? page_fault_oops+0x66/0x150
+  ? exc_page_fault+0x69/0x140
+  ? asm_exc_page_fault+0x26/0x30
+  ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
+  ? __kmalloc_node_track_caller+0x35d/0x430
+  ? __alloc_skb+0x77/0x170
+  smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
+  smc_diag_dump+0x26/0x60 [smc_diag]
+  netlink_dump+0x19f/0x320
+  __netlink_dump_start+0x1dc/0x300
+  smc_diag_handler_dump+0x6a/0x80 [smc_diag]
+  ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
+  sock_diag_rcv_msg+0x121/0x140
+  ? __pfx_sock_diag_rcv_msg+0x10/0x10
+  netlink_rcv_skb+0x5a/0x110
+  sock_diag_rcv+0x28/0x40
+  netlink_unicast+0x22a/0x330
+  netlink_sendmsg+0x1f8/0x420
+  __sock_sendmsg+0xb0/0xc0
+  ____sys_sendmsg+0x24e/0x300
+  ? copy_msghdr_from_user+0x62/0x80
+  ___sys_sendmsg+0x7c/0xd0
+  ? __do_fault+0x34/0x160
+  ? do_read_fault+0x5f/0x100
+  ? do_fault+0xb0/0x110
+  ? __handle_mm_fault+0x2b0/0x6c0
+  __sys_sendmsg+0x4d/0x80
+  do_syscall_64+0x69/0x180
+  entry_SYSCALL_64_after_hwframe+0x6e/0x76
+
+It is possible that the connection is in process of being established
+when we dump it. Assumed that the connection has been registered in a
+link group by smc_conn_create() but the rmb_desc has not yet been
+initialized by smc_buf_create(), thus causing the illegal access to
+conn->rmb_desc. So fix it by checking before dump.
+
+Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
+Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
+Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
+Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/smc/smc_diag.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
+index 801044e7d194..7a907186a33a 100644
+--- a/net/smc/smc_diag.c
++++ b/net/smc/smc_diag.c
+@@ -163,7 +163,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
+       }
+       if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
+           (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
+-          !list_empty(&smc->conn.lgr->list)) {
++          !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
+               struct smc_connection *conn = &smc->conn;
+               struct smcd_diag_dmbinfo dinfo;
+ 
+-- 
+2.43.0
+
diff --git a/queue-6.1/net-stmmac-wait-a-bit-for-the-reset-to-take-effect.patch b/queue-6.1/net-stmmac-wait-a-bit-for-the-reset-to-take-effect.patch

new file mode 100644 (file)

index 0000000..2427232
--- /dev/null
+++ b/queue-6.1/net-stmmac-wait-a-bit-for-the-reset-to-take-effect.patch
@@ -0,0 +1,63 @@
+From 8641c1c61519b71c193cf7aeb6ff9a41bdebb6a2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 22 Jan 2024 19:19:09 +0100
+Subject: net: stmmac: Wait a bit for the reset to take effect
+
+From: Bernd Edlinger <bernd.edlinger@hotmail.de>
+
+[ Upstream commit a5f5eee282a0aae80227697e1d9c811b1726d31d ]
+
+otherwise the synopsys_id value may be read out wrong,
+because the GMAC_VERSION register might still be in reset
+state, for at least 1 us after the reset is de-asserted.
+
+Add a wait for 10 us before continuing to be on the safe side.
+
+> From what have you got that delay value?
+
+Just try and error, with very old linux versions and old gcc versions
+the synopsys_id was read out correctly most of the time (but not always),
+with recent linux versions and recnet gcc versions it was read out
+wrongly most of the time, but again not always.
+I don't have access to the VHDL code in question, so I cannot
+tell why it takes so long to get the correct values, I also do not
+have more than a few hardware samples, so I cannot tell how long
+this timeout must be in worst case.
+Experimentally I can tell that the register is read several times
+as zero immediately after the reset is de-asserted, also adding several
+no-ops is not enough, adding a printk is enough, also udelay(1) seems to
+be enough but I tried that not very often, and I have not access to many
+hardware samples to be 100% sure about the necessary delay.
+And since the udelay here is only executed once per device instance,
+it seems acceptable to delay the boot for 10 us.
+
+BTW: my hardware's synopsys id is 0x37.
+
+Fixes: c5e4ddbdfa11 ("net: stmmac: Add support for optional reset control")
+Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
+Reviewed-by: Jiri Pirko <jiri@nvidia.com>
+Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
+Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+index 8f8de14347a9..e988a60c8561 100644
+--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
++++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+@@ -7166,6 +7166,9 @@ int stmmac_dvr_probe(struct device *device,
+               dev_err(priv->device, "unable to bring out of ahb reset: %pe\n",
+                       ERR_PTR(ret));
+ 
++      /* Wait a bit for the reset to take effect */
++      udelay(10);
++
+       /* Init MAC and get the capabilities */
+       ret = stmmac_hw_init(priv);
+       if (ret)
+-- 
+2.43.0
+
diff --git a/queue-6.1/netfilter-nf_tables-restrict-anonymous-set-and-map-n.patch b/queue-6.1/netfilter-nf_tables-restrict-anonymous-set-and-map-n.patch

new file mode 100644 (file)

index 0000000..39775f4
--- /dev/null
+++ b/queue-6.1/netfilter-nf_tables-restrict-anonymous-set-and-map-n.patch
@@ -0,0 +1,60 @@
+From 9c5a4f6438fff21cebb940121d2bb3cb4d9a3518 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 13:34:32 +0100
+Subject: netfilter: nf_tables: restrict anonymous set and map names to 16
+ bytes
+
+From: Florian Westphal <fw@strlen.de>
+
+[ Upstream commit b462579b2b86a8f5230543cadd3a4836be27baf7 ]
+
+nftables has two types of sets/maps, one where userspace defines the
+name, and anonymous sets/maps, where userspace defines a template name.
+
+For the latter, kernel requires presence of exactly one "%d".
+nftables uses "__set%d" and "__map%d" for this.  The kernel will
+expand the format specifier and replaces it with the smallest unused
+number.
+
+As-is, userspace could define a template name that allows to move
+the set name past the 256 bytes upperlimit (post-expansion).
+
+I don't see how this could be a problem, but I would prefer if userspace
+cannot do this, so add a limit of 16 bytes for the '%d' template name.
+
+16 bytes is the old total upper limit for set names that existed when
+nf_tables was merged initially.
+
+Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
+Signed-off-by: Florian Westphal <fw@strlen.de>
+Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/netfilter/nf_tables_api.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
+index 2702294ac46c..75b21cd1b2c9 100644
+--- a/net/netfilter/nf_tables_api.c
++++ b/net/netfilter/nf_tables_api.c
+@@ -24,6 +24,7 @@
+ #include <net/sock.h>
+ 
+ #define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-"))
++#define NFT_SET_MAX_ANONLEN 16
+ 
+ unsigned int nf_tables_net_id __read_mostly;
+ 
+@@ -4127,6 +4128,9 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set,
+               if (p[1] != 'd' || strchr(p + 2, '%'))
+                       return -EINVAL;
+ 
++              if (strnlen(name, NFT_SET_MAX_ANONLEN) >= NFT_SET_MAX_ANONLEN)
++                      return -EINVAL;
++
+               inuse = (unsigned long *)get_zeroed_page(GFP_KERNEL);
+               if (inuse == NULL)
+                       return -ENOMEM;
+-- 
+2.43.0
+
diff --git a/queue-6.1/netfilter-nf_tables-validate-nfproto_-family.patch b/queue-6.1/netfilter-nf_tables-validate-nfproto_-family.patch

new file mode 100644 (file)

index 0000000..e159167
--- /dev/null
+++ b/queue-6.1/netfilter-nf_tables-validate-nfproto_-family.patch
@@ -0,0 +1,196 @@
+From fda79b25a120930c9dcd50196fb56327f1f9adfd Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 23 Jan 2024 16:38:25 +0100
+Subject: netfilter: nf_tables: validate NFPROTO_* family
+
+From: Pablo Neira Ayuso <pablo@netfilter.org>
+
+[ Upstream commit d0009effa8862c20a13af4cb7475d9771b905693 ]
+
+Several expressions explicitly refer to NF_INET_* hook definitions
+from expr->ops->validate, however, family is not validated.
+
+Bail out with EOPNOTSUPP in case they are used from unsupported
+families.
+
+Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
+Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
+Fixes: 2fa841938c64 ("netfilter: nf_tables: introduce routing expression")
+Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
+Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support")
+Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
+Fixes: 6c47260250fc ("netfilter: nf_tables: add xfrm expression")
+Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/netfilter/nft_compat.c       | 12 ++++++++++++
+ net/netfilter/nft_flow_offload.c |  5 +++++
+ net/netfilter/nft_nat.c          |  5 +++++
+ net/netfilter/nft_rt.c           |  5 +++++
+ net/netfilter/nft_socket.c       |  5 +++++
+ net/netfilter/nft_synproxy.c     |  7 +++++--
+ net/netfilter/nft_tproxy.c       |  5 +++++
+ net/netfilter/nft_xfrm.c         |  5 +++++
+ 8 files changed, 47 insertions(+), 2 deletions(-)
+
+diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
+index c16172427622..6952da7dfc02 100644
+--- a/net/netfilter/nft_compat.c
++++ b/net/netfilter/nft_compat.c
+@@ -349,6 +349,12 @@ static int nft_target_validate(const struct nft_ctx *ctx,
+       unsigned int hook_mask = 0;
+       int ret;
+ 
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_BRIDGE &&
++          ctx->family != NFPROTO_ARP)
++              return -EOPNOTSUPP;
++
+       if (nft_is_base_chain(ctx->chain)) {
+               const struct nft_base_chain *basechain =
+                                               nft_base_chain(ctx->chain);
+@@ -592,6 +598,12 @@ static int nft_match_validate(const struct nft_ctx *ctx,
+       unsigned int hook_mask = 0;
+       int ret;
+ 
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_BRIDGE &&
++          ctx->family != NFPROTO_ARP)
++              return -EOPNOTSUPP;
++
+       if (nft_is_base_chain(ctx->chain)) {
+               const struct nft_base_chain *basechain =
+                                               nft_base_chain(ctx->chain);
+diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
+index 8a43f6f9c90b..3d9f6dda5aeb 100644
+--- a/net/netfilter/nft_flow_offload.c
++++ b/net/netfilter/nft_flow_offload.c
+@@ -380,6 +380,11 @@ static int nft_flow_offload_validate(const struct nft_ctx *ctx,
+ {
+       unsigned int hook_mask = (1 << NF_INET_FORWARD);
+ 
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       return nft_chain_validate_hooks(ctx->chain, hook_mask);
+ }
+ 
+diff --git a/net/netfilter/nft_nat.c b/net/netfilter/nft_nat.c
+index 353c090f8891..ba7bcce724ef 100644
+--- a/net/netfilter/nft_nat.c
++++ b/net/netfilter/nft_nat.c
+@@ -142,6 +142,11 @@ static int nft_nat_validate(const struct nft_ctx *ctx,
+       struct nft_nat *priv = nft_expr_priv(expr);
+       int err;
+ 
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       err = nft_chain_validate_dependency(ctx->chain, NFT_CHAIN_T_NAT);
+       if (err < 0)
+               return err;
+diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c
+index 71931ec91721..7d21e16499bf 100644
+--- a/net/netfilter/nft_rt.c
++++ b/net/netfilter/nft_rt.c
+@@ -166,6 +166,11 @@ static int nft_rt_validate(const struct nft_ctx *ctx, const struct nft_expr *exp
+       const struct nft_rt *priv = nft_expr_priv(expr);
+       unsigned int hooks;
+ 
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       switch (priv->key) {
+       case NFT_RT_NEXTHOP4:
+       case NFT_RT_NEXTHOP6:
+diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c
+index 777561b71fcb..f28324fd8d71 100644
+--- a/net/netfilter/nft_socket.c
++++ b/net/netfilter/nft_socket.c
+@@ -242,6 +242,11 @@ static int nft_socket_validate(const struct nft_ctx *ctx,
+                              const struct nft_expr *expr,
+                              const struct nft_data **data)
+ {
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       return nft_chain_validate_hooks(ctx->chain,
+                                       (1 << NF_INET_PRE_ROUTING) |
+                                       (1 << NF_INET_LOCAL_IN) |
+diff --git a/net/netfilter/nft_synproxy.c b/net/netfilter/nft_synproxy.c
+index 6cf9a04fbfe2..a450f28a5ef6 100644
+--- a/net/netfilter/nft_synproxy.c
++++ b/net/netfilter/nft_synproxy.c
+@@ -186,7 +186,6 @@ static int nft_synproxy_do_init(const struct nft_ctx *ctx,
+               break;
+ #endif
+       case NFPROTO_INET:
+-      case NFPROTO_BRIDGE:
+               err = nf_synproxy_ipv4_init(snet, ctx->net);
+               if (err)
+                       goto nf_ct_failure;
+@@ -219,7 +218,6 @@ static void nft_synproxy_do_destroy(const struct nft_ctx *ctx)
+               break;
+ #endif
+       case NFPROTO_INET:
+-      case NFPROTO_BRIDGE:
+               nf_synproxy_ipv4_fini(snet, ctx->net);
+               nf_synproxy_ipv6_fini(snet, ctx->net);
+               break;
+@@ -253,6 +251,11 @@ static int nft_synproxy_validate(const struct nft_ctx *ctx,
+                                const struct nft_expr *expr,
+                                const struct nft_data **data)
+ {
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) |
+                                                   (1 << NF_INET_FORWARD));
+ }
+diff --git a/net/netfilter/nft_tproxy.c b/net/netfilter/nft_tproxy.c
+index 62da25ad264b..adb50c39572e 100644
+--- a/net/netfilter/nft_tproxy.c
++++ b/net/netfilter/nft_tproxy.c
+@@ -316,6 +316,11 @@ static int nft_tproxy_validate(const struct nft_ctx *ctx,
+                              const struct nft_expr *expr,
+                              const struct nft_data **data)
+ {
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       return nft_chain_validate_hooks(ctx->chain, 1 << NF_INET_PRE_ROUTING);
+ }
+ 
+diff --git a/net/netfilter/nft_xfrm.c b/net/netfilter/nft_xfrm.c
+index 1c5343c936a8..30259846c352 100644
+--- a/net/netfilter/nft_xfrm.c
++++ b/net/netfilter/nft_xfrm.c
+@@ -235,6 +235,11 @@ static int nft_xfrm_validate(const struct nft_ctx *ctx, const struct nft_expr *e
+       const struct nft_xfrm *priv = nft_expr_priv(expr);
+       unsigned int hooks;
+ 
++      if (ctx->family != NFPROTO_IPV4 &&
++          ctx->family != NFPROTO_IPV6 &&
++          ctx->family != NFPROTO_INET)
++              return -EOPNOTSUPP;
++
+       switch (priv->dir) {
+       case XFRM_POLICY_IN:
+               hooks = (1 << NF_INET_FORWARD) |
+-- 
+2.43.0
+
diff --git a/queue-6.1/netfilter-nft_limit-reject-configurations-that-cause.patch b/queue-6.1/netfilter-nft_limit-reject-configurations-that-cause.patch

new file mode 100644 (file)

index 0000000..5da7f75
--- /dev/null
+++ b/queue-6.1/netfilter-nft_limit-reject-configurations-that-cause.patch
@@ -0,0 +1,83 @@
+From 871447d8f68f3c20fc1ff5b547ffcd17e0404fac Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 13:11:32 +0100
+Subject: netfilter: nft_limit: reject configurations that cause integer
+ overflow
+
+From: Florian Westphal <fw@strlen.de>
+
+[ Upstream commit c9d9eb9c53d37cdebbad56b91e40baf42d5a97aa ]
+
+Reject bogus configs where internal token counter wraps around.
+This only occurs with very very large requests, such as 17gbyte/s.
+
+Its better to reject this rather than having incorrect ratelimit.
+
+Fixes: d2168e849ebf ("netfilter: nft_limit: add per-byte limiting")
+Signed-off-by: Florian Westphal <fw@strlen.de>
+Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/netfilter/nft_limit.c | 23 ++++++++++++++++-------
+ 1 file changed, 16 insertions(+), 7 deletions(-)
+
+diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
+index 75c05ef885a9..36ded7d43262 100644
+--- a/net/netfilter/nft_limit.c
++++ b/net/netfilter/nft_limit.c
+@@ -58,17 +58,19 @@ static inline bool nft_limit_eval(struct nft_limit_priv *priv, u64 cost)
+ static int nft_limit_init(struct nft_limit_priv *priv,
+                         const struct nlattr * const tb[], bool pkts)
+ {
++      u64 unit, tokens, rate_with_burst;
+       bool invert = false;
+-      u64 unit, tokens;
+ 
+       if (tb[NFTA_LIMIT_RATE] == NULL ||
+           tb[NFTA_LIMIT_UNIT] == NULL)
+               return -EINVAL;
+ 
+       priv->rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
++      if (priv->rate == 0)
++              return -EINVAL;
++
+       unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
+-      priv->nsecs = unit * NSEC_PER_SEC;
+-      if (priv->rate == 0 || priv->nsecs < unit)
++      if (check_mul_overflow(unit, NSEC_PER_SEC, &priv->nsecs))
+               return -EOVERFLOW;
+ 
+       if (tb[NFTA_LIMIT_BURST])
+@@ -77,18 +79,25 @@ static int nft_limit_init(struct nft_limit_priv *priv,
+       if (pkts && priv->burst == 0)
+               priv->burst = NFT_LIMIT_PKT_BURST_DEFAULT;
+ 
+-      if (priv->rate + priv->burst < priv->rate)
++      if (check_add_overflow(priv->rate, priv->burst, &rate_with_burst))
+               return -EOVERFLOW;
+ 
+       if (pkts) {
+-              tokens = div64_u64(priv->nsecs, priv->rate) * priv->burst;
++              u64 tmp = div64_u64(priv->nsecs, priv->rate);
++
++              if (check_mul_overflow(tmp, priv->burst, &tokens))
++                      return -EOVERFLOW;
+       } else {
++              u64 tmp;
++
+               /* The token bucket size limits the number of tokens can be
+                * accumulated. tokens_max specifies the bucket size.
+                * tokens_max = unit * (rate + burst) / rate.
+                */
+-              tokens = div64_u64(priv->nsecs * (priv->rate + priv->burst),
+-                               priv->rate);
++              if (check_mul_overflow(priv->nsecs, rate_with_burst, &tmp))
++                      return -EOVERFLOW;
++
++              tokens = div64_u64(tmp, priv->rate);
+       }
+ 
+       if (tb[NFTA_LIMIT_FLAGS]) {
+-- 
+2.43.0
+
diff --git a/queue-6.1/netfs-fscache-prevent-oops-in-fscache_put_cache.patch b/queue-6.1/netfs-fscache-prevent-oops-in-fscache_put_cache.patch

new file mode 100644 (file)

index 0000000..a7914b5
--- /dev/null
+++ b/queue-6.1/netfs-fscache-prevent-oops-in-fscache_put_cache.patch
@@ -0,0 +1,44 @@
+From 723724ba06384273dbcff61283f5d1104602cda8 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 12 Jan 2024 09:59:41 +0300
+Subject: netfs, fscache: Prevent Oops in fscache_put_cache()
+
+From: Dan Carpenter <dan.carpenter@linaro.org>
+
+[ Upstream commit 3be0b3ed1d76c6703b9ee482b55f7e01c369cc68 ]
+
+This function dereferences "cache" and then checks if it's
+IS_ERR_OR_NULL().  Check first, then dereference.
+
+Fixes: 9549332df4ed ("fscache: Implement cache registration")
+Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
+Signed-off-by: David Howells <dhowells@redhat.com>
+Link: https://lore.kernel.org/r/e84bc740-3502-4f16-982a-a40d5676615c@moroto.mountain/ # v2
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/fscache/cache.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
+index d645f8b302a2..9397ed39b0b4 100644
+--- a/fs/fscache/cache.c
++++ b/fs/fscache/cache.c
+@@ -179,13 +179,14 @@ EXPORT_SYMBOL(fscache_acquire_cache);
+ void fscache_put_cache(struct fscache_cache *cache,
+                      enum fscache_cache_trace where)
+ {
+-      unsigned int debug_id = cache->debug_id;
++      unsigned int debug_id;
+       bool zero;
+       int ref;
+ 
+       if (IS_ERR_OR_NULL(cache))
+               return;
+ 
++      debug_id = cache->debug_id;
+       zero = __refcount_dec_and_test(&cache->ref, &ref);
+       trace_fscache_cache(debug_id, ref - 1, where);
+ 
+-- 
+2.43.0
+
diff --git a/queue-6.1/netlink-fix-potential-sleeping-issue-in-mqueue_flush.patch b/queue-6.1/netlink-fix-potential-sleeping-issue-in-mqueue_flush.patch

new file mode 100644 (file)

index 0000000..7819531
--- /dev/null
+++ b/queue-6.1/netlink-fix-potential-sleeping-issue-in-mqueue_flush.patch
@@ -0,0 +1,76 @@
+From a9a8bc6ad086571b4d696bda2b83a19163225bf1 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 22 Jan 2024 09:18:07 +0800
+Subject: netlink: fix potential sleeping issue in mqueue_flush_file
+
+From: Zhengchao Shao <shaozhengchao@huawei.com>
+
+[ Upstream commit 234ec0b6034b16869d45128b8cd2dc6ffe596f04 ]
+
+I analyze the potential sleeping issue of the following processes:
+Thread A                                Thread B
+...                                     netlink_create  //ref = 1
+do_mq_notify                            ...
+  sock = netlink_getsockbyfilp          ...     //ref = 2
+  info->notify_sock = sock;             ...
+...                                     netlink_sendmsg
+...                                       skb = netlink_alloc_large_skb  //skb->head is vmalloced
+...                                       netlink_unicast
+...                                         sk = netlink_getsockbyportid //ref = 3
+...                                         netlink_sendskb
+...                                           __netlink_sendskb
+...                                             skb_queue_tail //put skb to sk_receive_queue
+...                                         sock_put //ref = 2
+...                                     ...
+...                                     netlink_release
+...                                       deferred_put_nlk_sk //ref = 1
+mqueue_flush_file
+  spin_lock
+  remove_notification
+    netlink_sendskb
+      sock_put  //ref = 0
+        sk_free
+          ...
+          __sk_destruct
+            netlink_sock_destruct
+              skb_queue_purge  //get skb from sk_receive_queue
+                ...
+                __skb_queue_purge_reason
+                  kfree_skb_reason
+                    __kfree_skb
+                    ...
+                    skb_release_all
+                      skb_release_head_state
+                        netlink_skb_destructor
+                          vfree(skb->head)  //sleeping while holding spinlock
+
+In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
+vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
+When the mqueue executes flush, the sleeping bug will occur. Use
+vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
+
+Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space")
+Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
+Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/netlink/af_netlink.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index cb833302270a..6857a4965fe8 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -374,7 +374,7 @@ static void netlink_skb_destructor(struct sk_buff *skb)
+       if (is_vmalloc_addr(skb->head)) {
+               if (!skb->cloned ||
+                   !atomic_dec_return(&(skb_shinfo(skb)->dataref)))
+-                      vfree(skb->head);
++                      vfree_atomic(skb->head);
+ 
+               skb->head = NULL;
+       }
+-- 
+2.43.0
+
diff --git a/queue-6.1/rcu-defer-rcu-kthreads-wakeup-when-cpu-is-dying.patch b/queue-6.1/rcu-defer-rcu-kthreads-wakeup-when-cpu-is-dying.patch

new file mode 100644 (file)

index 0000000..758a68a
--- /dev/null
+++ b/queue-6.1/rcu-defer-rcu-kthreads-wakeup-when-cpu-is-dying.patch
@@ -0,0 +1,141 @@
+From 379596277bce348be1a6106afd51349fbc33f251 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 19 Dec 2023 00:19:15 +0100
+Subject: rcu: Defer RCU kthreads wakeup when CPU is dying
+
+From: Frederic Weisbecker <frederic@kernel.org>
+
+[ Upstream commit e787644caf7628ad3269c1fbd321c3255cf51710 ]
+
+When the CPU goes idle for the last time during the CPU down hotplug
+process, RCU reports a final quiescent state for the current CPU. If
+this quiescent state propagates up to the top, some tasks may then be
+woken up to complete the grace period: the main grace period kthread
+and/or the expedited main workqueue (or kworker).
+
+If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
+arm the RT bandwith timer to the local offline CPU. Since this happens
+after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
+timer gets ignored. Therefore if the RCU kthreads are waiting for RT
+bandwidth to be available, they may never be actually scheduled.
+
+This triggers TREE03 rcutorture hangs:
+
+        rcu: INFO: rcu_preempt self-detected stall on CPU
+        rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
+        rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
+        rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
+        rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
+        rcu: RCU grace-period kthread stack dump:
+        task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
+        Call Trace:
+         <TASK>
+         __schedule+0x2eb/0xa80
+         schedule+0x1f/0x90
+         schedule_timeout+0x163/0x270
+         ? __pfx_process_timeout+0x10/0x10
+         rcu_gp_fqs_loop+0x37c/0x5b0
+         ? __pfx_rcu_gp_kthread+0x10/0x10
+         rcu_gp_kthread+0x17c/0x200
+         kthread+0xde/0x110
+         ? __pfx_kthread+0x10/0x10
+         ret_from_fork+0x2b/0x40
+         ? __pfx_kthread+0x10/0x10
+         ret_from_fork_asm+0x1b/0x30
+         </TASK>
+
+The situation can't be solved with just unpinning the timer. The hrtimer
+infrastructure and the nohz heuristics involved in finding the best
+remote target for an unpinned timer would then also need to handle
+enqueues from an offline CPU in the most horrendous way.
+
+So fix this on the RCU side instead and defer the wake up to an online
+CPU if it's too late for the local one.
+
+Reported-by: Paul E. McKenney <paulmck@kernel.org>
+Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
+Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
+Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
+Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/rcu/tree.c     | 34 +++++++++++++++++++++++++++++++++-
+ kernel/rcu/tree_exp.h |  3 +--
+ 2 files changed, 34 insertions(+), 3 deletions(-)
+
+diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
+index 15df37bc052a..9d7464a90f85 100644
+--- a/kernel/rcu/tree.c
++++ b/kernel/rcu/tree.c
+@@ -1051,6 +1051,38 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
+       return needmore;
+ }
+ 
++static void swake_up_one_online_ipi(void *arg)
++{
++      struct swait_queue_head *wqh = arg;
++
++      swake_up_one(wqh);
++}
++
++static void swake_up_one_online(struct swait_queue_head *wqh)
++{
++      int cpu = get_cpu();
++
++      /*
++       * If called from rcutree_report_cpu_starting(), wake up
++       * is dangerous that late in the CPU-down hotplug process. The
++       * scheduler might queue an ignored hrtimer. Defer the wake up
++       * to an online CPU instead.
++       */
++      if (unlikely(cpu_is_offline(cpu))) {
++              int target;
++
++              target = cpumask_any_and(housekeeping_cpumask(HK_TYPE_RCU),
++                                       cpu_online_mask);
++
++              smp_call_function_single(target, swake_up_one_online_ipi,
++                                       wqh, 0);
++              put_cpu();
++      } else {
++              put_cpu();
++              swake_up_one(wqh);
++      }
++}
++
+ /*
+  * Awaken the grace-period kthread.  Don't do a self-awaken (unless in an
+  * interrupt or softirq handler, in which case we just might immediately
+@@ -1075,7 +1107,7 @@ static void rcu_gp_kthread_wake(void)
+               return;
+       WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
+       WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
+-      swake_up_one(&rcu_state.gp_wq);
++      swake_up_one_online(&rcu_state.gp_wq);
+ }
+ 
+ /*
+diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
+index aa3ec3c3b9f7..6d2cbed96b46 100644
+--- a/kernel/rcu/tree_exp.h
++++ b/kernel/rcu/tree_exp.h
+@@ -172,7 +172,6 @@ static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp)
+       return ret;
+ }
+ 
+-
+ /*
+  * Report the exit from RCU read-side critical section for the last task
+  * that queued itself during or before the current expedited preemptible-RCU
+@@ -200,7 +199,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
+                       raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+                       if (wake) {
+                               smp_mb(); /* EGP done before wake_up(). */
+-                              swake_up_one(&rcu_state.expedited_wq);
++                              swake_up_one_online(&rcu_state.expedited_wq);
+                       }
+                       break;
+               }
+-- 
+2.43.0
+
diff --git a/queue-6.1/selftests-netdevsim-fix-the-udp_tunnel_nic-test.patch b/queue-6.1/selftests-netdevsim-fix-the-udp_tunnel_nic-test.patch

new file mode 100644 (file)

index 0000000..dcbbad4
--- /dev/null
+++ b/queue-6.1/selftests-netdevsim-fix-the-udp_tunnel_nic-test.patch
@@ -0,0 +1,102 @@
+From d3f9c11716ea1a561bc0cf9f533cd8746d7997bf Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 22 Jan 2024 22:05:29 -0800
+Subject: selftests: netdevsim: fix the udp_tunnel_nic test
+
+From: Jakub Kicinski <kuba@kernel.org>
+
+[ Upstream commit 0879020a7817e7ce636372c016b4528f541c9f4d ]
+
+This test is missing a whole bunch of checks for interface
+renaming and one ifup. Presumably it was only used on a system
+with renaming disabled and NetworkManager running.
+
+Fixes: 91f430b2c49d ("selftests: net: add a test for UDP tunnel info infra")
+Acked-by: Paolo Abeni <pabeni@redhat.com>
+Reviewed-by: Simon Horman <horms@kernel.org>
+Link: https://lore.kernel.org/r/20240123060529.1033912-1-kuba@kernel.org
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ .../selftests/drivers/net/netdevsim/udp_tunnel_nic.sh    | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+diff --git a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
+index 1b08e042cf94..185b02d2d4cd 100755
+--- a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
++++ b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
+@@ -269,6 +269,7 @@ for port in 0 1; do
+       echo 1 > $NSIM_DEV_SYS/new_port
+     fi
+     NSIM_NETDEV=`get_netdev_name old_netdevs`
++    ifconfig $NSIM_NETDEV up
+ 
+     msg="new NIC device created"
+     exp0=( 0 0 0 0 )
+@@ -430,6 +431,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     overflow_table0 "overflow NIC table"
+@@ -487,6 +489,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     overflow_table0 "overflow NIC table"
+@@ -543,6 +546,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     overflow_table0 "destroy NIC"
+@@ -572,6 +576,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     msg="create VxLANs v6"
+@@ -632,6 +637,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     echo 110 > $NSIM_DEV_DFS/ports/$port/udp_ports_inject_error
+@@ -687,6 +693,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     msg="create VxLANs v6"
+@@ -746,6 +753,7 @@ for port in 0 1; do
+     fi
+ 
+     echo $port > $NSIM_DEV_SYS/new_port
++    NSIM_NETDEV=`get_netdev_name old_netdevs`
+     ifconfig $NSIM_NETDEV up
+ 
+     msg="create VxLANs v6"
+@@ -876,6 +884,7 @@ msg="re-add a port"
+ 
+ echo 2 > $NSIM_DEV_SYS/del_port
+ echo 2 > $NSIM_DEV_SYS/new_port
++NSIM_NETDEV=`get_netdev_name old_netdevs`
+ check_tables
+ 
+ msg="replace VxLAN in overflow table"
+-- 
+2.43.0
+
diff --git a/queue-6.1/series b/queue-6.1/series

index a26dddf5931553dc809447c542df0aba9fddb55f..0f5ad4495e95a365f86f1979ce6f1f1ed0826689 100644 (file)
--- a/queue-6.1/series
+++ b/queue-6.1/series
@@ -75,3 +75,37 @@ ksmbd-send-lease-break-notification-on-file_rename_information.patch
  ksmbd-add-missing-set_freezable-for-freezable-kthread.patch
  revert-drm-amd-enable-pcie-pme-from-d3.patch
  drm-amd-display-pbn_div-need-be-updated-for-hotplug-event.patch
+wifi-mac80211-fix-potential-sta-link-leak.patch
+net-smc-fix-illegal-rmb_desc-access-in-smc-d-connect.patch
+tcp-make-sure-init-the-accept_queue-s-spinlocks-once.patch
+bnxt_en-wait-for-flr-to-complete-during-probe.patch
+vlan-skip-nested-type-that-is-not-ifla_vlan_qos_mapp.patch
+llc-make-llc_ui_sendmsg-more-robust-against-bonding-.patch
+llc-drop-support-for-eth_p_tr_802_2.patch
+udp-fix-busy-polling.patch
+net-fix-removing-a-namespace-with-conflicting-altnam.patch
+tun-fix-missing-dropped-counter-in-tun_xdp_act.patch
+tun-add-missing-rx-stats-accounting-in-tun_xdp_act.patch
+net-micrel-fix-ptp-frame-parsing-for-lan8814.patch
+net-rds-fix-ubsan-array-index-out-of-bounds-in-rds_c.patch
+netfs-fscache-prevent-oops-in-fscache_put_cache.patch
+tracing-ensure-visibility-when-inserting-an-element-.patch
+afs-hide-silly-rename-files-from-userspace.patch
+tcp-add-memory-barrier-to-tcp_push.patch
+netlink-fix-potential-sleeping-issue-in-mqueue_flush.patch
+ipv6-init-the-accept_queue-s-spinlocks-in-inet6_crea.patch
+net-mlx5-dr-use-the-right-gvmi-number-for-drop-actio.patch
+net-mlx5-dr-can-t-go-to-uplink-vport-on-rx-rule.patch
+net-mlx5-use-mlx5-device-constant-for-selecting-cq-p.patch
+net-mlx5e-allow-software-parsing-when-ipsec-crypto-i.patch
+net-mlx5e-fix-a-double-free-in-arfs_create_groups.patch
+net-mlx5e-fix-a-potential-double-free-in-fs_any_crea.patch
+rcu-defer-rcu-kthreads-wakeup-when-cpu-is-dying.patch
+netfilter-nft_limit-reject-configurations-that-cause.patch
+netfilter-nf_tables-restrict-anonymous-set-and-map-n.patch
+netfilter-nf_tables-validate-nfproto_-family.patch
+net-stmmac-wait-a-bit-for-the-reset-to-take-effect.patch
+net-mvpp2-clear-bm-pool-before-initialization.patch
+selftests-netdevsim-fix-the-udp_tunnel_nic-test.patch
+fjes-fix-memleaks-in-fjes_hw_setup.patch
+net-fec-fix-the-unhandled-context-fault-from-smmu.patch
diff --git a/queue-6.1/tcp-add-memory-barrier-to-tcp_push.patch b/queue-6.1/tcp-add-memory-barrier-to-tcp_push.patch

new file mode 100644 (file)

index 0000000..5ba607e
--- /dev/null
+++ b/queue-6.1/tcp-add-memory-barrier-to-tcp_push.patch
@@ -0,0 +1,101 @@
+From f962b841306854fcecb524b6eab082b606b5bf5e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 11:01:33 -0800
+Subject: tcp: Add memory barrier to tcp_push()
+
+From: Salvatore Dipietro <dipiets@amazon.com>
+
+[ Upstream commit 7267e8dcad6b2f9fce05a6a06335d7040acbc2b6 ]
+
+On CPUs with weak memory models, reads and updates performed by tcp_push
+to the sk variables can get reordered leaving the socket throttled when
+it should not. The tasklet running tcp_wfree() may also not observe the
+memory updates in time and will skip flushing any packets throttled by
+tcp_push(), delaying the sending. This can pathologically cause 40ms
+extra latency due to bad interactions with delayed acks.
+
+Adding a memory barrier in tcp_push removes the bug, similarly to the
+previous commit bf06200e732d ("tcp: tsq: fix nonagle handling").
+smp_mb__after_atomic() is used to not incur in unnecessary overhead
+on x86 since not affected.
+
+Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
+22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
+
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.io.PrintWriter;
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+
+public class HelloWorldServlet extends HttpServlet {
+    @Override
+    protected void doGet(HttpServletRequest request, HttpServletResponse response)
+      throws ServletException, IOException {
+        response.setContentType("text/html;charset=utf-8");
+        OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
+        String s = "a".repeat(3096);
+        osw.write(s,0,s.length());
+        osw.flush();
+    }
+}
+
+Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
+c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
+values is observed while, with the patch, the extra latency disappears.
+
+No patch and tcp_autocorking=1
+./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
+  ...
+ 50.000%    0.91ms
+ 75.000%    1.13ms
+ 90.000%    1.46ms
+ 99.000%    1.74ms
+ 99.900%    1.89ms
+ 99.990%   41.95ms  <<< 40+ ms extra latency
+ 99.999%   48.32ms
+100.000%   48.96ms
+
+With patch and tcp_autocorking=1
+./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
+  ...
+ 50.000%    0.90ms
+ 75.000%    1.13ms
+ 90.000%    1.45ms
+ 99.000%    1.72ms
+ 99.900%    1.83ms
+ 99.990%    2.11ms  <<< no 40+ ms extra latency
+ 99.999%    2.53ms
+100.000%    2.62ms
+
+Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
+affected by this issue and the patch doesn't introduce any additional
+delay.
+
+Fixes: 7aa5470c2c09 ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
+Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
+Reviewed-by: Eric Dumazet <edumazet@google.com>
+Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/tcp.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
+index 0b7844a8d571..90e24c3f6557 100644
+--- a/net/ipv4/tcp.c
++++ b/net/ipv4/tcp.c
+@@ -718,6 +718,7 @@ void tcp_push(struct sock *sk, int flags, int mss_now,
+               if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
+                       NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTOCORKING);
+                       set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
++                      smp_mb__after_atomic();
+               }
+               /* It is possible TX completion already happened
+                * before we set TSQ_THROTTLED.
+-- 
+2.43.0
+
diff --git a/queue-6.1/tcp-make-sure-init-the-accept_queue-s-spinlocks-once.patch b/queue-6.1/tcp-make-sure-init-the-accept_queue-s-spinlocks-once.patch

new file mode 100644 (file)

index 0000000..9a992df
--- /dev/null
+++ b/queue-6.1/tcp-make-sure-init-the-accept_queue-s-spinlocks-once.patch
@@ -0,0 +1,170 @@
+From d125249e6b542506ebdc6862d8d4ff00a4c5b72d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 09:20:19 +0800
+Subject: tcp: make sure init the accept_queue's spinlocks once
+
+From: Zhengchao Shao <shaozhengchao@huawei.com>
+
+[ Upstream commit 198bc90e0e734e5f98c3d2833e8390cac3df61b2 ]
+
+When I run syz's reproduction C program locally, it causes the following
+issue:
+pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0!
+WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508)
+Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
+RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508)
+Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7
+30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90
+RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282
+RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908
+RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900
+RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff
+R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000
+R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000
+FS:  00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000
+CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0
+Call Trace:
+<IRQ>
+  _raw_spin_unlock (kernel/locking/spinlock.c:186)
+  inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321)
+  inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358)
+  tcp_check_req (net/ipv4/tcp_minisocks.c:868)
+  tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260)
+  ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205)
+  ip_local_deliver_finish (net/ipv4/ip_input.c:234)
+  __netif_receive_skb_one_core (net/core/dev.c:5529)
+  process_backlog (./include/linux/rcupdate.h:779)
+  __napi_poll (net/core/dev.c:6533)
+  net_rx_action (net/core/dev.c:6604)
+  __do_softirq (./arch/x86/include/asm/jump_label.h:27)
+  do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
+</IRQ>
+<TASK>
+  __local_bh_enable_ip (kernel/softirq.c:381)
+  __dev_queue_xmit (net/core/dev.c:4374)
+  ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235)
+  __ip_queue_xmit (net/ipv4/ip_output.c:535)
+  __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
+  tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469)
+  tcp_rcv_state_process (net/ipv4/tcp_input.c:6657)
+  tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929)
+  __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968)
+  release_sock (net/core/sock.c:3536)
+  inet_wait_for_connect (net/ipv4/af_inet.c:609)
+  __inet_stream_connect (net/ipv4/af_inet.c:702)
+  inet_stream_connect (net/ipv4/af_inet.c:748)
+  __sys_connect (./include/linux/file.h:45 net/socket.c:2064)
+  __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070)
+  do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82)
+  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
+  RIP: 0033:0x7fa10ff05a3d
+  Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89
+  c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48
+  RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
+  RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d
+  RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003
+  RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000
+  R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640
+  R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20
+</TASK>
+
+The issue triggering process is analyzed as follows:
+Thread A                                       Thread B
+tcp_v4_rcv     //receive ack TCP packet       inet_shutdown
+  tcp_check_req                                  tcp_disconnect //disconnect sock
+  ...                                              tcp_set_state(sk, TCP_CLOSE)
+    inet_csk_complete_hashdance                ...
+      inet_csk_reqsk_queue_add                 inet_listen  //start listen
+        spin_lock(&queue->rskq_lock)             inet_csk_listen_start
+        ...                                        reqsk_queue_alloc
+        ...                                          spin_lock_init
+        spin_unlock(&queue->rskq_lock) //warning
+
+When the socket receives the ACK packet during the three-way handshake,
+it will hold spinlock. And then the user actively shutdowns the socket
+and listens to the socket immediately, the spinlock will be initialized.
+When the socket is going to release the spinlock, a warning is generated.
+Also the same issue to fastopenq.lock.
+
+Move init spinlock to inet_create and inet_accept to make sure init the
+accept_queue's spinlocks once.
+
+Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue")
+Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
+Reported-by: Ming Shu <sming56@aliyun.com>
+Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
+Reviewed-by: Eric Dumazet <edumazet@google.com>
+Link: https://lore.kernel.org/r/20240118012019.1751966-1-shaozhengchao@huawei.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/inet_connection_sock.h | 8 ++++++++
+ net/core/request_sock.c            | 3 ---
+ net/ipv4/af_inet.c                 | 3 +++
+ net/ipv4/inet_connection_sock.c    | 4 ++++
+ 4 files changed, 15 insertions(+), 3 deletions(-)
+
+diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
+index c2b15f7e5516..080968d6e6c5 100644
+--- a/include/net/inet_connection_sock.h
++++ b/include/net/inet_connection_sock.h
+@@ -346,4 +346,12 @@ static inline bool inet_csk_has_ulp(struct sock *sk)
+       return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops;
+ }
+ 
++static inline void inet_init_csk_locks(struct sock *sk)
++{
++      struct inet_connection_sock *icsk = inet_csk(sk);
++
++      spin_lock_init(&icsk->icsk_accept_queue.rskq_lock);
++      spin_lock_init(&icsk->icsk_accept_queue.fastopenq.lock);
++}
++
+ #endif /* _INET_CONNECTION_SOCK_H */
+diff --git a/net/core/request_sock.c b/net/core/request_sock.c
+index f35c2e998406..63de5c635842 100644
+--- a/net/core/request_sock.c
++++ b/net/core/request_sock.c
+@@ -33,9 +33,6 @@
+ 
+ void reqsk_queue_alloc(struct request_sock_queue *queue)
+ {
+-      spin_lock_init(&queue->rskq_lock);
+-
+-      spin_lock_init(&queue->fastopenq.lock);
+       queue->fastopenq.rskq_rst_head = NULL;
+       queue->fastopenq.rskq_rst_tail = NULL;
+       queue->fastopenq.qlen = 0;
+diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
+index c13b8ed63f87..2f646335d218 100644
+--- a/net/ipv4/af_inet.c
++++ b/net/ipv4/af_inet.c
+@@ -324,6 +324,9 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
+       if (INET_PROTOSW_REUSE & answer_flags)
+               sk->sk_reuse = SK_CAN_REUSE;
+ 
++      if (INET_PROTOSW_ICSK & answer_flags)
++              inet_init_csk_locks(sk);
++
+       inet = inet_sk(sk);
+       inet->is_icsk = (INET_PROTOSW_ICSK & answer_flags) != 0;
+ 
+diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
+index 80ce0112e24b..79fa19a36bbd 100644
+--- a/net/ipv4/inet_connection_sock.c
++++ b/net/ipv4/inet_connection_sock.c
+@@ -727,6 +727,10 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
+       }
+       if (req)
+               reqsk_put(req);
++
++      if (newsk)
++              inet_init_csk_locks(newsk);
++
+       return newsk;
+ out_err:
+       newsk = NULL;
+-- 
+2.43.0
+
diff --git a/queue-6.1/tracing-ensure-visibility-when-inserting-an-element-.patch b/queue-6.1/tracing-ensure-visibility-when-inserting-an-element-.patch

new file mode 100644 (file)

index 0000000..e380c6b
--- /dev/null
+++ b/queue-6.1/tracing-ensure-visibility-when-inserting-an-element-.patch
@@ -0,0 +1,129 @@
+From d41588695395b5065068afb4e6b696b45d2f8474 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 22 Jan 2024 16:09:28 +0100
+Subject: tracing: Ensure visibility when inserting an element into tracing_map
+
+From: Petr Pavlu <petr.pavlu@suse.com>
+
+[ Upstream commit 2b44760609e9eaafc9d234a6883d042fc21132a7 ]
+
+Running the following two commands in parallel on a multi-processor
+AArch64 machine can sporadically produce an unexpected warning about
+duplicate histogram entries:
+
+ $ while true; do
+     echo hist:key=id.syscall:val=hitcount > \
+       /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
+     cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
+     sleep 0.001
+   done
+ $ stress-ng --sysbadaddr $(nproc)
+
+The warning looks as follows:
+
+[ 2911.172474] ------------[ cut here ]------------
+[ 2911.173111] Duplicates detected: 1
+[ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408
+[ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E)
+[ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1
+[ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G            E      6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01
+[ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018
+[ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
+[ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408
+[ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408
+[ 2911.185310] sp : ffff8000a1513900
+[ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001
+[ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008
+[ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180
+[ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff
+[ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8
+[ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731
+[ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c
+[ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8
+[ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000
+[ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480
+[ 2911.194259] Call trace:
+[ 2911.194626]  tracing_map_sort_entries+0x3e0/0x408
+[ 2911.195220]  hist_show+0x124/0x800
+[ 2911.195692]  seq_read_iter+0x1d4/0x4e8
+[ 2911.196193]  seq_read+0xe8/0x138
+[ 2911.196638]  vfs_read+0xc8/0x300
+[ 2911.197078]  ksys_read+0x70/0x108
+[ 2911.197534]  __arm64_sys_read+0x24/0x38
+[ 2911.198046]  invoke_syscall+0x78/0x108
+[ 2911.198553]  el0_svc_common.constprop.0+0xd0/0xf8
+[ 2911.199157]  do_el0_svc+0x28/0x40
+[ 2911.199613]  el0_svc+0x40/0x178
+[ 2911.200048]  el0t_64_sync_handler+0x13c/0x158
+[ 2911.200621]  el0t_64_sync+0x1a8/0x1b0
+[ 2911.201115] ---[ end trace 0000000000000000 ]---
+
+The problem appears to be caused by CPU reordering of writes issued from
+__tracing_map_insert().
+
+The check for the presence of an element with a given key in this
+function is:
+
+ val = READ_ONCE(entry->val);
+ if (val && keys_match(key, val->key, map->key_size)) ...
+
+The write of a new entry is:
+
+ elt = get_free_elt(map);
+ memcpy(elt->key, key, map->key_size);
+ entry->val = elt;
+
+The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;"
+stores may become visible in the reversed order on another CPU. This
+second CPU might then incorrectly determine that a new key doesn't match
+an already present val->key and subsequently insert a new element,
+resulting in a duplicate.
+
+Fix the problem by adding a write barrier between
+"memcpy(elt->key, key, map->key_size);" and "entry->val = elt;", and for
+good measure, also use WRITE_ONCE(entry->val, elt) for publishing the
+element. The sequence pairs with the mentioned "READ_ONCE(entry->val);"
+and the "val->key" check which has an address dependency.
+
+The barrier is placed on a path executed when adding an element for
+a new key. Subsequent updates targeting the same key remain unaffected.
+
+From the user's perspective, the issue was introduced by commit
+c193707dde77 ("tracing: Remove code which merges duplicates"), which
+followed commit cbf4100efb8f ("tracing: Add support to detect and avoid
+duplicates"). The previous code operated differently; it inherently
+expected potential races which result in duplicates but merged them
+later when they occurred.
+
+Link: https://lore.kernel.org/linux-trace-kernel/20240122150928.27725-1-petr.pavlu@suse.com
+
+Fixes: c193707dde77 ("tracing: Remove code which merges duplicates")
+Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
+Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
+Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/trace/tracing_map.c | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
+index c774e560f2f9..a4dcf0f24352 100644
+--- a/kernel/trace/tracing_map.c
++++ b/kernel/trace/tracing_map.c
+@@ -574,7 +574,12 @@ __tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only)
+                               }
+ 
+                               memcpy(elt->key, key, map->key_size);
+-                              entry->val = elt;
++                              /*
++                               * Ensure the initialization is visible and
++                               * publish the elt.
++                               */
++                              smp_wmb();
++                              WRITE_ONCE(entry->val, elt);
+                               atomic64_inc(&map->hits);
+ 
+                               return entry->val;
+-- 
+2.43.0
+
diff --git a/queue-6.1/tun-add-missing-rx-stats-accounting-in-tun_xdp_act.patch b/queue-6.1/tun-add-missing-rx-stats-accounting-in-tun_xdp_act.patch

new file mode 100644 (file)

index 0000000..a2ec90f
--- /dev/null
+++ b/queue-6.1/tun-add-missing-rx-stats-accounting-in-tun_xdp_act.patch
@@ -0,0 +1,49 @@
+From 431cd7b5686119ee28281a0255a131f0a9f87a97 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 18:22:56 +0800
+Subject: tun: add missing rx stats accounting in tun_xdp_act
+
+From: Yunjian Wang <wangyunjian@huawei.com>
+
+[ Upstream commit f1084c427f55d573fcd5688d9ba7b31b78019716 ]
+
+The TUN can be used as vhost-net backend, and it is necessary to
+count the packets transmitted from TUN to vhost-net/virtio-net.
+However, there are some places in the receive path that were not
+taken into account when using XDP. It would be beneficial to also
+include new accounting for successfully received bytes using
+dev_sw_netstats_rx_add.
+
+Fixes: 761876c857cb ("tap: XDP support")
+Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
+Reviewed-by: Willem de Bruijn <willemb@google.com>
+Acked-by: Jason Wang <jasowang@redhat.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/tun.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/drivers/net/tun.c b/drivers/net/tun.c
+index af32aa599278..367255bb44cd 100644
+--- a/drivers/net/tun.c
++++ b/drivers/net/tun.c
+@@ -1626,6 +1626,7 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
+                       dev_core_stats_rx_dropped_inc(tun->dev);
+                       return err;
+               }
++              dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data);
+               break;
+       case XDP_TX:
+               err = tun_xdp_tx(tun->dev, xdp);
+@@ -1633,6 +1634,7 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
+                       dev_core_stats_rx_dropped_inc(tun->dev);
+                       return err;
+               }
++              dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data);
+               break;
+       case XDP_PASS:
+               break;
+-- 
+2.43.0
+
diff --git a/queue-6.1/tun-fix-missing-dropped-counter-in-tun_xdp_act.patch b/queue-6.1/tun-fix-missing-dropped-counter-in-tun_xdp_act.patch

new file mode 100644 (file)

index 0000000..4f45884
--- /dev/null
+++ b/queue-6.1/tun-fix-missing-dropped-counter-in-tun_xdp_act.patch
@@ -0,0 +1,52 @@
+From 83c7c28c306201fb98c5e638574e032b2edee22a Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 19 Jan 2024 18:22:35 +0800
+Subject: tun: fix missing dropped counter in tun_xdp_act
+
+From: Yunjian Wang <wangyunjian@huawei.com>
+
+[ Upstream commit 5744ba05e7c4bff8fec133dd0f9e51ddffba92f5 ]
+
+The commit 8ae1aff0b331 ("tuntap: split out XDP logic") includes
+dropped counter for XDP_DROP, XDP_ABORTED, and invalid XDP actions.
+Unfortunately, that commit missed the dropped counter when error
+occurs during XDP_TX and XDP_REDIRECT actions. This patch fixes
+this issue.
+
+Fixes: 8ae1aff0b331 ("tuntap: split out XDP logic")
+Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
+Reviewed-by: Willem de Bruijn <willemb@google.com>
+Acked-by: Jason Wang <jasowang@redhat.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/net/tun.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/drivers/net/tun.c b/drivers/net/tun.c
+index d373953ddc30..af32aa599278 100644
+--- a/drivers/net/tun.c
++++ b/drivers/net/tun.c
+@@ -1622,13 +1622,17 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
+       switch (act) {
+       case XDP_REDIRECT:
+               err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
+-              if (err)
++              if (err) {
++                      dev_core_stats_rx_dropped_inc(tun->dev);
+                       return err;
++              }
+               break;
+       case XDP_TX:
+               err = tun_xdp_tx(tun->dev, xdp);
+-              if (err < 0)
++              if (err < 0) {
++                      dev_core_stats_rx_dropped_inc(tun->dev);
+                       return err;
++              }
+               break;
+       case XDP_PASS:
+               break;
+-- 
+2.43.0
+
diff --git a/queue-6.1/udp-fix-busy-polling.patch b/queue-6.1/udp-fix-busy-polling.patch

new file mode 100644 (file)

index 0000000..230c931
--- /dev/null
+++ b/queue-6.1/udp-fix-busy-polling.patch
@@ -0,0 +1,134 @@
+From 619cdae95c29e29b399dc0a24b3dc8e616712c27 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 20:17:49 +0000
+Subject: udp: fix busy polling
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit a54d51fb2dfb846aedf3751af501e9688db447f5 ]
+
+Generic sk_busy_loop_end() only looks at sk->sk_receive_queue
+for presence of packets.
+
+Problem is that for UDP sockets after blamed commit, some packets
+could be present in another queue: udp_sk(sk)->reader_queue
+
+In some cases, a busy poller could spin until timeout expiration,
+even if some packets are available in udp_sk(sk)->reader_queue.
+
+v3: - make sk_busy_loop_end() nicer (Willem)
+
+v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats.
+    - add a sk_is_inet() check in sk_is_udp() (Willem feedback)
+    - add a sk_is_inet() check in sk_is_tcp().
+
+Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Paolo Abeni <pabeni@redhat.com>
+Reviewed-by: Willem de Bruijn <willemb@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/linux/skmsg.h   |  6 ------
+ include/net/inet_sock.h |  5 -----
+ include/net/sock.h      | 18 +++++++++++++++++-
+ net/core/sock.c         | 11 +++++++++--
+ 4 files changed, 26 insertions(+), 14 deletions(-)
+
+diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
+index c953b8c0d2f4..bd4418377bac 100644
+--- a/include/linux/skmsg.h
++++ b/include/linux/skmsg.h
+@@ -500,12 +500,6 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
+       return !!psock->saved_data_ready;
+ }
+ 
+-static inline bool sk_is_udp(const struct sock *sk)
+-{
+-      return sk->sk_type == SOCK_DGRAM &&
+-             sk->sk_protocol == IPPROTO_UDP;
+-}
+-
+ #if IS_ENABLED(CONFIG_NET_SOCK_MSG)
+ 
+ #define BPF_F_STRPARSER       (1UL << 1)
+diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
+index c2432c2addc8..c98890e21401 100644
+--- a/include/net/inet_sock.h
++++ b/include/net/inet_sock.h
+@@ -270,11 +270,6 @@ struct inet_sock {
+ #define IP_CMSG_CHECKSUM      BIT(7)
+ #define IP_CMSG_RECVFRAGSIZE  BIT(8)
+ 
+-static inline bool sk_is_inet(struct sock *sk)
+-{
+-      return sk->sk_family == AF_INET || sk->sk_family == AF_INET6;
+-}
+-
+ /**
+  * sk_to_full_sk - Access to a full socket
+  * @sk: pointer to a socket
+diff --git a/include/net/sock.h b/include/net/sock.h
+index 6b51e85ae69e..579732d47dfc 100644
+--- a/include/net/sock.h
++++ b/include/net/sock.h
+@@ -2824,9 +2824,25 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
+                          &skb_shinfo(skb)->tskey);
+ }
+ 
++static inline bool sk_is_inet(const struct sock *sk)
++{
++      int family = READ_ONCE(sk->sk_family);
++
++      return family == AF_INET || family == AF_INET6;
++}
++
+ static inline bool sk_is_tcp(const struct sock *sk)
+ {
+-      return sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP;
++      return sk_is_inet(sk) &&
++             sk->sk_type == SOCK_STREAM &&
++             sk->sk_protocol == IPPROTO_TCP;
++}
++
++static inline bool sk_is_udp(const struct sock *sk)
++{
++      return sk_is_inet(sk) &&
++             sk->sk_type == SOCK_DGRAM &&
++             sk->sk_protocol == IPPROTO_UDP;
+ }
+ 
+ static inline bool sk_is_stream_unix(const struct sock *sk)
+diff --git a/net/core/sock.c b/net/core/sock.c
+index c50a14a02edd..c8803b95ea0d 100644
+--- a/net/core/sock.c
++++ b/net/core/sock.c
+@@ -107,6 +107,7 @@
+ #include <linux/interrupt.h>
+ #include <linux/poll.h>
+ #include <linux/tcp.h>
++#include <linux/udp.h>
+ #include <linux/init.h>
+ #include <linux/highmem.h>
+ #include <linux/user_namespace.h>
+@@ -4109,8 +4110,14 @@ bool sk_busy_loop_end(void *p, unsigned long start_time)
+ {
+       struct sock *sk = p;
+ 
+-      return !skb_queue_empty_lockless(&sk->sk_receive_queue) ||
+-             sk_busy_loop_timeout(sk, start_time);
++      if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
++              return true;
++
++      if (sk_is_udp(sk) &&
++          !skb_queue_empty_lockless(&udp_sk(sk)->reader_queue))
++              return true;
++
++      return sk_busy_loop_timeout(sk, start_time);
+ }
+ EXPORT_SYMBOL(sk_busy_loop_end);
+ #endif /* CONFIG_NET_RX_BUSY_POLL */
+-- 
+2.43.0
+
diff --git a/queue-6.1/vlan-skip-nested-type-that-is-not-ifla_vlan_qos_mapp.patch b/queue-6.1/vlan-skip-nested-type-that-is-not-ifla_vlan_qos_mapp.patch

new file mode 100644 (file)

index 0000000..c8a0692
--- /dev/null
+++ b/queue-6.1/vlan-skip-nested-type-that-is-not-ifla_vlan_qos_mapp.patch
@@ -0,0 +1,58 @@
+From 89902339bcc2e78c3dd96a85a7b07d8b01f79c8f Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 18 Jan 2024 21:03:06 +0800
+Subject: vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
+
+From: Lin Ma <linma@zju.edu.cn>
+
+[ Upstream commit 6c21660fe221a15c789dee2bc2fd95516bc5aeaf ]
+
+In the vlan_changelink function, a loop is used to parse the nested
+attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to
+obtain the struct ifla_vlan_qos_mapping. These two nested attributes are
+checked in the vlan_validate_qos_map function, which calls
+nla_validate_nested_deprecated with the vlan_map_policy.
+
+However, this deprecated validator applies a LIBERAL strictness, allowing
+the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC.
+Consequently, the loop in vlan_changelink may parse an attribute of type
+IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of
+struct ifla_vlan_qos_mapping, which is not necessarily true.
+
+To address this issue and ensure compatibility, this patch introduces two
+type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING.
+
+Fixes: 07b5b17e157b ("[VLAN]: Use rtnl_link API")
+Signed-off-by: Lin Ma <linma@zju.edu.cn>
+Reviewed-by: Simon Horman <horms@kernel.org>
+Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/8021q/vlan_netlink.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
+index 214532173536..a3b68243fd4b 100644
+--- a/net/8021q/vlan_netlink.c
++++ b/net/8021q/vlan_netlink.c
+@@ -118,12 +118,16 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[],
+       }
+       if (data[IFLA_VLAN_INGRESS_QOS]) {
+               nla_for_each_nested(attr, data[IFLA_VLAN_INGRESS_QOS], rem) {
++                      if (nla_type(attr) != IFLA_VLAN_QOS_MAPPING)
++                              continue;
+                       m = nla_data(attr);
+                       vlan_dev_set_ingress_priority(dev, m->to, m->from);
+               }
+       }
+       if (data[IFLA_VLAN_EGRESS_QOS]) {
+               nla_for_each_nested(attr, data[IFLA_VLAN_EGRESS_QOS], rem) {
++                      if (nla_type(attr) != IFLA_VLAN_QOS_MAPPING)
++                              continue;
+                       m = nla_data(attr);
+                       err = vlan_dev_set_egress_priority(dev, m->from, m->to);
+                       if (err)
+-- 
+2.43.0
+
diff --git a/queue-6.1/wifi-mac80211-fix-potential-sta-link-leak.patch b/queue-6.1/wifi-mac80211-fix-potential-sta-link-leak.patch

new file mode 100644 (file)

index 0000000..7e43ab4
--- /dev/null
+++ b/queue-6.1/wifi-mac80211-fix-potential-sta-link-leak.patch
@@ -0,0 +1,44 @@
+From bc16ba06d185965a5955d310cd3bd6ba760bba3c Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 11 Jan 2024 18:17:44 +0200
+Subject: wifi: mac80211: fix potential sta-link leak
+
+From: Johannes Berg <johannes.berg@intel.com>
+
+[ Upstream commit b01a74b3ca6fd51b62c67733ba7c3280fa6c5d26 ]
+
+When a station is allocated, links are added but not
+set to valid yet (e.g. during connection to an AP MLD),
+we might remove the station without ever marking links
+valid, and leak them. Fix that.
+
+Fixes: cb71f1d136a6 ("wifi: mac80211: add sta link addition/removal")
+Signed-off-by: Johannes Berg <johannes.berg@intel.com>
+Reviewed-by: Ilan Peer <ilan.peer@intel.com>
+Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
+Link: https://msgid.link/20240111181514.6573998beaf8.I09ac2e1d41c80f82a5a616b8bd1d9d8dd709a6a6@changeid
+Signed-off-by: Johannes Berg <johannes.berg@intel.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/mac80211/sta_info.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
+index 49b71453dec3..f3d6c3e4c970 100644
+--- a/net/mac80211/sta_info.c
++++ b/net/mac80211/sta_info.c
+@@ -396,7 +396,10 @@ void sta_info_free(struct ieee80211_local *local, struct sta_info *sta)
+       int i;
+ 
+       for (i = 0; i < ARRAY_SIZE(sta->link); i++) {
+-              if (!(sta->sta.valid_links & BIT(i)))
++              struct link_sta_info *link_sta;
++
++              link_sta = rcu_access_pointer(sta->link[i]);
++              if (!link_sta)
+                       continue;
+ 
+               sta_remove_link(sta, i, false);
+-- 
+2.43.0
+
author	Sasha Levin <sashal@kernel.org>
	Sat, 27 Jan 2024 12:47:05 +0000 (07:47 -0500)
committer	Sasha Levin <sashal@kernel.org>
	Sat, 27 Jan 2024 12:47:05 +0000 (07:47 -0500)
queue-6.1/afs-hide-silly-rename-files-from-userspace.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/bnxt_en-wait-for-flr-to-complete-during-probe.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/fjes-fix-memleaks-in-fjes_hw_setup.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/ipv6-init-the-accept_queue-s-spinlocks-in-inet6_crea.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/llc-drop-support-for-eth_p_tr_802_2.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/llc-make-llc_ui_sendmsg-more-robust-against-bonding-.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-fec-fix-the-unhandled-context-fault-from-smmu.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-fix-removing-a-namespace-with-conflicting-altnam.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-micrel-fix-ptp-frame-parsing-for-lan8814.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mlx5-dr-can-t-go-to-uplink-vport-on-rx-rule.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mlx5-dr-use-the-right-gvmi-number-for-drop-actio.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mlx5-use-mlx5-device-constant-for-selecting-cq-p.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mlx5e-allow-software-parsing-when-ipsec-crypto-i.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mlx5e-fix-a-double-free-in-arfs_create_groups.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mlx5e-fix-a-potential-double-free-in-fs_any_crea.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-mvpp2-clear-bm-pool-before-initialization.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-rds-fix-ubsan-array-index-out-of-bounds-in-rds_c.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-smc-fix-illegal-rmb_desc-access-in-smc-d-connect.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/net-stmmac-wait-a-bit-for-the-reset-to-take-effect.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/netfilter-nf_tables-restrict-anonymous-set-and-map-n.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/netfilter-nf_tables-validate-nfproto_-family.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/netfilter-nft_limit-reject-configurations-that-cause.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/netfs-fscache-prevent-oops-in-fscache_put_cache.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/netlink-fix-potential-sleeping-issue-in-mqueue_flush.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/rcu-defer-rcu-kthreads-wakeup-when-cpu-is-dying.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/selftests-netdevsim-fix-the-udp_tunnel_nic-test.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/series		patch \| blob \| blame \| history
queue-6.1/tcp-add-memory-barrier-to-tcp_push.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/tcp-make-sure-init-the-accept_queue-s-spinlocks-once.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/tracing-ensure-visibility-when-inserting-an-element-.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/tun-add-missing-rx-stats-accounting-in-tun_xdp_act.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/tun-fix-missing-dropped-counter-in-tun_xdp_act.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/udp-fix-busy-polling.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/vlan-skip-nested-type-that-is-not-ifla_vlan_qos_mapp.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/wifi-mac80211-fix-potential-sta-link-leak.patch	[new file with mode: 0644]	patch \| blob