From: Greg Kroah-Hartman Date: Thu, 19 Mar 2026 11:37:11 +0000 (+0100) Subject: 6.6-stable patches X-Git-Tag: v6.18.19~10 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=15cbfeacbfe51a9b6e533601cd9824b3f69a1c75;p=thirdparty%2Fkernel%2Fstable-queue.git 6.6-stable patches added patches: btrfs-do-not-strictly-require-dirty-metadata-threshold-for-metadata-writepages.patch dmaengine-mmp_pdma-fix-race-condition-in-mmp_pdma_residue.patch iomap-allocate-s_dio_done_wq-for-async-reads-as-well.patch ipv6-use-rcu-in-ip6_xmit.patch net-add-support-for-segmenting-tcp-fraglist-gso-packets.patch net-fix-segmentation-of-forwarding-fraglist-gro.patch net-gso-fix-tcp-fraglist-segmentation-after-pull-from-frag_list.patch riscv-sanitize-syscall-table-indexing-under-speculation.patch rxrpc-fix-data-race-warning-and-potential-load-store-tearing.patch tracing-add-recursion-protection-in-kernel-stack-trace-recording.patch x86-sev-check-for-mwaitx-and-monitorx-opcodes-in-the-vc-handler.patch x86-sev-harden-vc-instruction-emulation-somewhat.patch --- diff --git a/queue-6.6/btrfs-do-not-strictly-require-dirty-metadata-threshold-for-metadata-writepages.patch b/queue-6.6/btrfs-do-not-strictly-require-dirty-metadata-threshold-for-metadata-writepages.patch new file mode 100644 index 0000000000..5d1f7df483 --- /dev/null +++ b/queue-6.6/btrfs-do-not-strictly-require-dirty-metadata-threshold-for-metadata-writepages.patch @@ -0,0 +1,167 @@ +From black.hawk@163.com Fri Feb 27 04:43:55 2026 +From: Rahul Sharma +Date: Fri, 27 Feb 2026 11:43:00 +0800 +Subject: btrfs: do not strictly require dirty metadata threshold for metadata writepages +To: gregkh@linuxfoundation.org, stable@vger.kernel.org +Cc: linux-kernel@vger.kernel.org, Qu Wenruo , Jan Kara , Boris Burkov , David Sterba , Rahul Sharma +Message-ID: <20260227034300.1505150-1-black.hawk@163.com> + +From: Qu Wenruo + +[ Upstream commit 4e159150a9a56d66d247f4b5510bed46fe58aa1c ] + +[BUG] +There is an internal report that over 1000 processes are +waiting at the io_schedule_timeout() of balance_dirty_pages(), causing +a system hang and trigger a kernel coredump. + +The kernel is v6.4 kernel based, but the root problem still applies to +any upstream kernel before v6.18. + +[CAUSE] +>From Jan Kara for his wisdom on the dirty page balance behavior first. + + This cgroup dirty limit was what was actually playing the role here + because the cgroup had only a small amount of memory and so the dirty + limit for it was something like 16MB. + + Dirty throttling is responsible for enforcing that nobody can dirty + (significantly) more dirty memory than there's dirty limit. Thus when + a task is dirtying pages it periodically enters into balance_dirty_pages() + and we let it sleep there to slow down the dirtying. + + When the system is over dirty limit already (either globally or within + a cgroup of the running task), we will not let the task exit from + balance_dirty_pages() until the number of dirty pages drops below the + limit. + + So in this particular case, as I already mentioned, there was a cgroup + with relatively small amount of memory and as a result with dirty limit + set at 16MB. A task from that cgroup has dirtied about 28MB worth of + pages in btrfs btree inode and these were practically the only dirty + pages in that cgroup. + +So that means the only way to reduce the dirty pages of that cgroup is +to writeback the dirty pages of btrfs btree inode, and only after that +those processes can exit balance_dirty_pages(). + +Now back to the btrfs part, btree_writepages() is responsible for +writing back dirty btree inode pages. + +The problem here is, there is a btrfs internal threshold that if the +btree inode's dirty bytes are below the 32M threshold, it will not +do any writeback. + +This behavior is to batch as much metadata as possible so we won't write +back those tree blocks and then later re-COW them again for another +modification. + +This internal 32MiB is higher than the existing dirty page size (28MiB), +meaning no writeback will happen, causing a deadlock between btrfs and +cgroup: + +- Btrfs doesn't want to write back btree inode until more dirty pages + +- Cgroup/MM doesn't want more dirty pages for btrfs btree inode + Thus any process touching that btree inode is put into sleep until + the number of dirty pages is reduced. + +Thanks Jan Kara a lot for the analysis of the root cause. + +[ENHANCEMENT] +Since kernel commit b55102826d7d ("btrfs: set AS_KERNEL_FILE on the +btree_inode"), btrfs btree inode pages will only be charged to the root +cgroup which should have a much larger limit than btrfs' 32MiB +threshold. +So it should not affect newer kernels. + +But for all current LTS kernels, they are all affected by this problem, +and backporting the whole AS_KERNEL_FILE may not be a good idea. + +Even for newer kernels I still think it's a good idea to get +rid of the internal threshold at btree_writepages(), since for most cases +cgroup/MM has a better view of full system memory usage than btrfs' fixed +threshold. + +For internal callers using btrfs_btree_balance_dirty() since that +function is already doing internal threshold check, we don't need to +bother them. + +But for external callers of btree_writepages(), just respect their +requests and write back whatever they want, ignoring the internal +btrfs threshold to avoid such deadlock on btree inode dirty page +balancing. + +CC: stable@vger.kernel.org +CC: Jan Kara +Reviewed-by: Boris Burkov +Signed-off-by: Qu Wenruo +Signed-off-by: David Sterba +[ The context change is due to the commit 41044b41ad2c +("btrfs: add helper to get fs_info from struct inode pointer") +in v6.9 and the commit c66f2afc7148 +("btrfs: remove pointless writepages callback wrapper") +in v6.10 which are irrelevant to the logic of this patch. ] +Signed-off-by: Rahul Sharma +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/disk-io.c | 22 ---------------------- + fs/btrfs/extent_io.c | 3 +-- + fs/btrfs/extent_io.h | 3 +-- + 3 files changed, 2 insertions(+), 26 deletions(-) + +--- a/fs/btrfs/disk-io.c ++++ b/fs/btrfs/disk-io.c +@@ -470,28 +470,6 @@ static int btree_migrate_folio(struct ad + #define btree_migrate_folio NULL + #endif + +-static int btree_writepages(struct address_space *mapping, +- struct writeback_control *wbc) +-{ +- struct btrfs_fs_info *fs_info; +- int ret; +- +- if (wbc->sync_mode == WB_SYNC_NONE) { +- +- if (wbc->for_kupdate) +- return 0; +- +- fs_info = BTRFS_I(mapping->host)->root->fs_info; +- /* this is a bit racy, but that's ok */ +- ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes, +- BTRFS_DIRTY_METADATA_THRESH, +- fs_info->dirty_metadata_batch); +- if (ret < 0) +- return 0; +- } +- return btree_write_cache_pages(mapping, wbc); +-} +- + static bool btree_release_folio(struct folio *folio, gfp_t gfp_flags) + { + if (folio_test_writeback(folio) || folio_test_dirty(folio)) +--- a/fs/btrfs/extent_io.c ++++ b/fs/btrfs/extent_io.c +@@ -1921,8 +1921,7 @@ static int submit_eb_page(struct page *p + return 1; + } + +-int btree_write_cache_pages(struct address_space *mapping, +- struct writeback_control *wbc) ++int btree_writepages(struct address_space *mapping, struct writeback_control *wbc) + { + struct btrfs_eb_write_context ctx = { .wbc = wbc }; + struct btrfs_fs_info *fs_info = BTRFS_I(mapping->host)->root->fs_info; +--- a/fs/btrfs/extent_io.h ++++ b/fs/btrfs/extent_io.h +@@ -189,8 +189,7 @@ void extent_write_locked_range(struct in + bool pages_dirty); + int extent_writepages(struct address_space *mapping, + struct writeback_control *wbc); +-int btree_write_cache_pages(struct address_space *mapping, +- struct writeback_control *wbc); ++int btree_writepages(struct address_space *mapping, struct writeback_control *wbc); + void extent_readahead(struct readahead_control *rac); + int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, + u64 start, u64 len); diff --git a/queue-6.6/dmaengine-mmp_pdma-fix-race-condition-in-mmp_pdma_residue.patch b/queue-6.6/dmaengine-mmp_pdma-fix-race-condition-in-mmp_pdma_residue.patch new file mode 100644 index 0000000000..3db3c381a7 --- /dev/null +++ b/queue-6.6/dmaengine-mmp_pdma-fix-race-condition-in-mmp_pdma_residue.patch @@ -0,0 +1,85 @@ +From stable+bounces-220047-greg=kroah.com@vger.kernel.org Sat Feb 28 07:50:00 2026 +From: Wenshan Lan +Date: Sat, 28 Feb 2026 14:47:36 +0800 +Subject: dmaengine: mmp_pdma: Fix race condition in mmp_pdma_residue() +To: gregkh@linuxfoundation.org, stable@vger.kernel.org +Cc: Guodong Xu , Juan Li , Vinod Koul , Wenshan Lan +Message-ID: <20260228064736.4134-1-jetlan9@163.com> + +From: Guodong Xu + +[ Upstream commit a143545855bc2c6e1330f6f57ae375ac44af00a7 ] + +Add proper locking in mmp_pdma_residue() to prevent use-after-free when +accessing descriptor list and descriptor contents. + +The race occurs when multiple threads call tx_status() while the tasklet +on another CPU is freeing completed descriptors: + +CPU 0 CPU 1 +----- ----- +mmp_pdma_tx_status() +mmp_pdma_residue() + -> NO LOCK held + list_for_each_entry(sw, ..) + DMA interrupt + dma_do_tasklet() + -> spin_lock(&desc_lock) + list_move(sw->node, ...) + spin_unlock(&desc_lock) + | dma_pool_free(sw) <- FREED! + -> access sw->desc <- UAF! + +This issue can be reproduced when running dmatest on the same channel with +multiple threads (threads_per_chan > 1). + +Fix by protecting the chain_running list iteration and descriptor access +with the chan->desc_lock spinlock. + +Signed-off-by: Juan Li +Signed-off-by: Guodong Xu +Link: https://patch.msgid.link/20251216-mmp-pdma-race-v1-1-976a224bb622@riscstar.com +Signed-off-by: Vinod Koul +[ Minor context conflict resolved. ] +Signed-off-by: Wenshan Lan +Signed-off-by: Greg Kroah-Hartman +--- + drivers/dma/mmp_pdma.c | 6 ++++++ + 1 file changed, 6 insertions(+) + +--- a/drivers/dma/mmp_pdma.c ++++ b/drivers/dma/mmp_pdma.c +@@ -764,6 +764,7 @@ static unsigned int mmp_pdma_residue(str + { + struct mmp_pdma_desc_sw *sw; + u32 curr, residue = 0; ++ unsigned long flags; + bool passed = false; + bool cyclic = chan->cyclic_first != NULL; + +@@ -779,6 +780,8 @@ static unsigned int mmp_pdma_residue(str + else + curr = readl(chan->phy->base + DSADR(chan->phy->idx)); + ++ spin_lock_irqsave(&chan->desc_lock, flags); ++ + list_for_each_entry(sw, &chan->chain_running, node) { + u32 start, end, len; + +@@ -822,6 +825,7 @@ static unsigned int mmp_pdma_residue(str + continue; + + if (sw->async_tx.cookie == cookie) { ++ spin_unlock_irqrestore(&chan->desc_lock, flags); + return residue; + } else { + residue = 0; +@@ -829,6 +833,8 @@ static unsigned int mmp_pdma_residue(str + } + } + ++ spin_unlock_irqrestore(&chan->desc_lock, flags); ++ + /* We should only get here in case of cyclic transactions */ + return residue; + } diff --git a/queue-6.6/iomap-allocate-s_dio_done_wq-for-async-reads-as-well.patch b/queue-6.6/iomap-allocate-s_dio_done_wq-for-async-reads-as-well.patch new file mode 100644 index 0000000000..2614058192 --- /dev/null +++ b/queue-6.6/iomap-allocate-s_dio_done_wq-for-async-reads-as-well.patch @@ -0,0 +1,49 @@ +From stable+bounces-219901-greg=kroah.com@vger.kernel.org Fri Feb 27 04:32:32 2026 +From: Chen Yu +Date: Fri, 27 Feb 2026 11:30:18 +0800 +Subject: iomap: allocate s_dio_done_wq for async reads as well +To: hch@lst.de, dchinner@redhat.com, djwong@kernel.org, brauner@kernel.org, stable@vger.kernel.org +Message-ID: <20260227033018.2506-1-xnguchen@sina.cn> + +From: Christoph Hellwig + +commit 7fd8720dff2d9c70cf5a1a13b7513af01952ec02 upstream. + +Since commit 222f2c7c6d14 ("iomap: always run error completions in user +context"), read error completions are deferred to s_dio_done_wq. This +means the workqueue also needs to be allocated for async reads. + +Fixes: 222f2c7c6d14 ("iomap: always run error completions in user context") +Reported-by: syzbot+a2b9a4ed0d61b1efb3f5@syzkaller.appspotmail.com +Signed-off-by: Christoph Hellwig +Link: https://patch.msgid.link/20251124140013.902853-1-hch@lst.de +Tested-by: syzbot+a2b9a4ed0d61b1efb3f5@syzkaller.appspotmail.com +Reviewed-by: Dave Chinner +Reviewed-by: Darrick J. Wong +Signed-off-by: Christian Brauner +Signed-off-by: Chen Yu +Signed-off-by: Greg Kroah-Hartman +--- + fs/iomap/direct-io.c | 10 +++++----- + 1 file changed, 5 insertions(+), 5 deletions(-) + +--- a/fs/iomap/direct-io.c ++++ b/fs/iomap/direct-io.c +@@ -659,12 +659,12 @@ __iomap_dio_rw(struct kiocb *iocb, struc + } + goto out_free_dio; + } ++ } + +- if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { +- ret = sb_init_dio_done_wq(inode->i_sb); +- if (ret < 0) +- goto out_free_dio; +- } ++ if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { ++ ret = sb_init_dio_done_wq(inode->i_sb); ++ if (ret < 0) ++ goto out_free_dio; + } + + inode_dio_begin(inode); diff --git a/queue-6.6/ipv6-use-rcu-in-ip6_xmit.patch b/queue-6.6/ipv6-use-rcu-in-ip6_xmit.patch new file mode 100644 index 0000000000..9c2b1b9f42 --- /dev/null +++ b/queue-6.6/ipv6-use-rcu-in-ip6_xmit.patch @@ -0,0 +1,109 @@ +From 9085e56501d93af9f2d7bd16f7fcfacdde47b99c Mon Sep 17 00:00:00 2001 +From: Eric Dumazet +Date: Thu, 28 Aug 2025 19:58:18 +0000 +Subject: ipv6: use RCU in ip6_xmit() + +From: Eric Dumazet + +commit 9085e56501d93af9f2d7bd16f7fcfacdde47b99c upstream. + +Use RCU in ip6_xmit() in order to use dst_dev_rcu() to prevent +possible UAF. + +Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Link: https://patch.msgid.link/20250828195823.3958522-4-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +Signed-off-by: Keerthana K +Signed-off-by: Shivani Agarwal +--- + net/ipv6/ip6_output.c | 35 +++++++++++++++++++++-------------- + 1 file changed, 21 insertions(+), 14 deletions(-) + +--- a/net/ipv6/ip6_output.c ++++ b/net/ipv6/ip6_output.c +@@ -261,35 +261,36 @@ bool ip6_autoflowlabel(struct net *net, + int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6, + __u32 mark, struct ipv6_txoptions *opt, int tclass, u32 priority) + { +- struct net *net = sock_net(sk); + const struct ipv6_pinfo *np = inet6_sk(sk); + struct in6_addr *first_hop = &fl6->daddr; + struct dst_entry *dst = skb_dst(skb); +- struct net_device *dev = dst->dev; + struct inet6_dev *idev = ip6_dst_idev(dst); + struct hop_jumbo_hdr *hop_jumbo; + int hoplen = sizeof(*hop_jumbo); ++ struct net *net = sock_net(sk); + unsigned int head_room; ++ struct net_device *dev; + struct ipv6hdr *hdr; + u8 proto = fl6->flowi6_proto; + int seg_len = skb->len; +- int hlimit = -1; ++ int ret, hlimit = -1; + u32 mtu; + ++ rcu_read_lock(); ++ ++ dev = dst_dev_rcu(dst); + head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev); + if (opt) + head_room += opt->opt_nflen + opt->opt_flen; + + if (unlikely(head_room > skb_headroom(skb))) { +- /* Make sure idev stays alive */ +- rcu_read_lock(); ++ /* idev stays alive while we hold rcu_read_lock(). */ + skb = skb_expand_head(skb, head_room); + if (!skb) { + IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); +- rcu_read_unlock(); +- return -ENOBUFS; ++ ret = -ENOBUFS; ++ goto unlock; + } +- rcu_read_unlock(); + } + + if (opt) { +@@ -351,17 +352,21 @@ int ip6_xmit(const struct sock *sk, stru + * skb to its handler for processing + */ + skb = l3mdev_ip6_out((struct sock *)sk, skb); +- if (unlikely(!skb)) +- return 0; ++ if (unlikely(!skb)) { ++ ret = 0; ++ goto unlock; ++ } + + /* hooks should never assume socket lock is held. + * we promote our socket to non const + */ +- return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, +- net, (struct sock *)sk, skb, NULL, dev, +- dst_output); ++ ret = NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, ++ net, (struct sock *)sk, skb, NULL, dev, ++ dst_output); ++ goto unlock; + } + ++ ret = -EMSGSIZE; + skb->dev = dev; + /* ipv6_local_error() does not require socket lock, + * we promote our socket to non const +@@ -370,7 +375,9 @@ int ip6_xmit(const struct sock *sk, stru + + IP6_INC_STATS(net, idev, IPSTATS_MIB_FRAGFAILS); + kfree_skb(skb); +- return -EMSGSIZE; ++unlock: ++ rcu_read_unlock(); ++ return ret; + } + EXPORT_SYMBOL(ip6_xmit); + diff --git a/queue-6.6/net-add-support-for-segmenting-tcp-fraglist-gso-packets.patch b/queue-6.6/net-add-support-for-segmenting-tcp-fraglist-gso-packets.patch new file mode 100644 index 0000000000..36a8d300f4 --- /dev/null +++ b/queue-6.6/net-add-support-for-segmenting-tcp-fraglist-gso-packets.patch @@ -0,0 +1,186 @@ +From stable+bounces-222523-greg=kroah.com@vger.kernel.org Mon Mar 2 07:55:21 2026 +From: Li hongliang <1468888505@139.com> +Date: Mon, 2 Mar 2026 14:55:13 +0800 +Subject: net: add support for segmenting TCP fraglist GSO packets +To: gregkh@linuxfoundation.org, stable@vger.kernel.org, nbd@nbd.name +Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, edumazet@google.com, davem@davemloft.net, yoshfuji@linux-ipv6.org, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, willemb@google.com +Message-ID: <20260302065513.2695586-1-1468888505@139.com> + +From: Felix Fietkau + +[ Upstream commit bee88cd5bd83d40b8aec4d6cb729378f707f6197 ] + +Preparation for adding TCP fraglist GRO support. It expects packets to be +combined in a similar way as UDP fraglist GSO packets. +For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO. + +Acked-by: Paolo Abeni +Reviewed-by: Eric Dumazet +Signed-off-by: Felix Fietkau +Reviewed-by: David Ahern +Reviewed-by: Willem de Bruijn +Signed-off-by: Paolo Abeni +Signed-off-by: Li hongliang <1468888505@139.com> +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_offload.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++ + net/ipv6/tcpv6_offload.c | 58 ++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 125 insertions(+) + +--- a/net/ipv4/tcp_offload.c ++++ b/net/ipv4/tcp_offload.c +@@ -31,6 +31,70 @@ static void tcp_gso_tstamp(struct sk_buf + } + } + ++static void __tcpv4_gso_segment_csum(struct sk_buff *seg, ++ __be32 *oldip, __be32 newip, ++ __be16 *oldport, __be16 newport) ++{ ++ struct tcphdr *th; ++ struct iphdr *iph; ++ ++ if (*oldip == newip && *oldport == newport) ++ return; ++ ++ th = tcp_hdr(seg); ++ iph = ip_hdr(seg); ++ ++ inet_proto_csum_replace4(&th->check, seg, *oldip, newip, true); ++ inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false); ++ *oldport = newport; ++ ++ csum_replace4(&iph->check, *oldip, newip); ++ *oldip = newip; ++} ++ ++static struct sk_buff *__tcpv4_gso_segment_list_csum(struct sk_buff *segs) ++{ ++ const struct tcphdr *th; ++ const struct iphdr *iph; ++ struct sk_buff *seg; ++ struct tcphdr *th2; ++ struct iphdr *iph2; ++ ++ seg = segs; ++ th = tcp_hdr(seg); ++ iph = ip_hdr(seg); ++ th2 = tcp_hdr(seg->next); ++ iph2 = ip_hdr(seg->next); ++ ++ if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) && ++ iph->daddr == iph2->daddr && iph->saddr == iph2->saddr) ++ return segs; ++ ++ while ((seg = seg->next)) { ++ th2 = tcp_hdr(seg); ++ iph2 = ip_hdr(seg); ++ ++ __tcpv4_gso_segment_csum(seg, ++ &iph2->saddr, iph->saddr, ++ &th2->source, th->source); ++ __tcpv4_gso_segment_csum(seg, ++ &iph2->daddr, iph->daddr, ++ &th2->dest, th->dest); ++ } ++ ++ return segs; ++} ++ ++static struct sk_buff *__tcp4_gso_segment_list(struct sk_buff *skb, ++ netdev_features_t features) ++{ ++ skb = skb_segment_list(skb, features, skb_mac_header_len(skb)); ++ if (IS_ERR(skb)) ++ return skb; ++ ++ return __tcpv4_gso_segment_list_csum(skb); ++} ++ + static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb, + netdev_features_t features) + { +@@ -40,6 +104,9 @@ static struct sk_buff *tcp4_gso_segment( + if (!pskb_may_pull(skb, sizeof(struct tcphdr))) + return ERR_PTR(-EINVAL); + ++ if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) ++ return __tcp4_gso_segment_list(skb, features); ++ + if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) { + const struct iphdr *iph = ip_hdr(skb); + struct tcphdr *th = tcp_hdr(skb); +--- a/net/ipv6/tcpv6_offload.c ++++ b/net/ipv6/tcpv6_offload.c +@@ -40,6 +40,61 @@ INDIRECT_CALLABLE_SCOPE int tcp6_gro_com + return 0; + } + ++static void __tcpv6_gso_segment_csum(struct sk_buff *seg, ++ __be16 *oldport, __be16 newport) ++{ ++ struct tcphdr *th; ++ ++ if (*oldport == newport) ++ return; ++ ++ th = tcp_hdr(seg); ++ inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false); ++ *oldport = newport; ++} ++ ++static struct sk_buff *__tcpv6_gso_segment_list_csum(struct sk_buff *segs) ++{ ++ const struct tcphdr *th; ++ const struct ipv6hdr *iph; ++ struct sk_buff *seg; ++ struct tcphdr *th2; ++ struct ipv6hdr *iph2; ++ ++ seg = segs; ++ th = tcp_hdr(seg); ++ iph = ipv6_hdr(seg); ++ th2 = tcp_hdr(seg->next); ++ iph2 = ipv6_hdr(seg->next); ++ ++ if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) && ++ ipv6_addr_equal(&iph->saddr, &iph2->saddr) && ++ ipv6_addr_equal(&iph->daddr, &iph2->daddr)) ++ return segs; ++ ++ while ((seg = seg->next)) { ++ th2 = tcp_hdr(seg); ++ iph2 = ipv6_hdr(seg); ++ ++ iph2->saddr = iph->saddr; ++ iph2->daddr = iph->daddr; ++ __tcpv6_gso_segment_csum(seg, &th2->source, th->source); ++ __tcpv6_gso_segment_csum(seg, &th2->dest, th->dest); ++ } ++ ++ return segs; ++} ++ ++static struct sk_buff *__tcp6_gso_segment_list(struct sk_buff *skb, ++ netdev_features_t features) ++{ ++ skb = skb_segment_list(skb, features, skb_mac_header_len(skb)); ++ if (IS_ERR(skb)) ++ return skb; ++ ++ return __tcpv6_gso_segment_list_csum(skb); ++} ++ + static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb, + netdev_features_t features) + { +@@ -51,6 +106,9 @@ static struct sk_buff *tcp6_gso_segment( + if (!pskb_may_pull(skb, sizeof(*th))) + return ERR_PTR(-EINVAL); + ++ if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) ++ return __tcp6_gso_segment_list(skb, features); ++ + if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) { + const struct ipv6hdr *ipv6h = ipv6_hdr(skb); + struct tcphdr *th = tcp_hdr(skb); diff --git a/queue-6.6/net-fix-segmentation-of-forwarding-fraglist-gro.patch b/queue-6.6/net-fix-segmentation-of-forwarding-fraglist-gro.patch new file mode 100644 index 0000000000..c74d9714b5 --- /dev/null +++ b/queue-6.6/net-fix-segmentation-of-forwarding-fraglist-gro.patch @@ -0,0 +1,103 @@ +From stable+bounces-222525-greg=kroah.com@vger.kernel.org Mon Mar 2 07:56:28 2026 +From: Li hongliang <1468888505@139.com> +Date: Mon, 2 Mar 2026 14:55:28 +0800 +Subject: net: fix segmentation of forwarding fraglist GRO +To: gregkh@linuxfoundation.org, stable@vger.kernel.org, jibin.zhang@mediatek.com +Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, yoshfuji@linux-ipv6.org, dsahern@kernel.org, matthias.bgg@gmail.com, willemb@google.com, steffen.klassert@secunet.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org +Message-ID: <20260302065528.2695652-1-1468888505@139.com> + +From: Jibin Zhang + +[ Upstream commit 426ca15c7f6cb6562a081341ca88893a50c59fa2 ] + +This patch enhances GSO segment handling by properly checking +the SKB_GSO_DODGY flag for frag_list GSO packets, addressing +low throughput issues observed when a station accesses IPv4 +servers via hotspots with an IPv6-only upstream interface. + +Specifically, it fixes a bug in GSO segmentation when forwarding +GRO packets containing a frag_list. The function skb_segment_list +cannot correctly process GRO skbs that have been converted by XLAT, +since XLAT only translates the header of the head skb. Consequently, +skbs in the frag_list may remain untranslated, resulting in protocol +inconsistencies and reduced throughput. + +To address this, the patch explicitly sets the SKB_GSO_DODGY flag +for GSO packets in XLAT's IPv4/IPv6 protocol translation helpers +(bpf_skb_proto_4_to_6 and bpf_skb_proto_6_to_4). This marks GSO +packets as potentially modified after protocol translation. As a +result, GSO segmentation will avoid using skb_segment_list and +instead falls back to skb_segment for packets with the SKB_GSO_DODGY +flag. This ensures that only safe and fully translated frag_list +packets are processed by skb_segment_list, resolving protocol +inconsistencies and improving throughput when forwarding GRO packets +converted by XLAT. + +Signed-off-by: Jibin Zhang +Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") +Cc: stable@vger.kernel.org +Link: https://patch.msgid.link/20260126152114.1211-1-jibin.zhang@mediatek.com +Signed-off-by: Paolo Abeni +Signed-off-by: Li hongliang <1468888505@139.com> +Signed-off-by: Greg Kroah-Hartman +--- + net/core/filter.c | 2 ++ + net/ipv4/tcp_offload.c | 3 ++- + net/ipv4/udp_offload.c | 3 ++- + net/ipv6/tcpv6_offload.c | 3 ++- + 4 files changed, 8 insertions(+), 3 deletions(-) + +--- a/net/core/filter.c ++++ b/net/core/filter.c +@@ -3340,6 +3340,7 @@ static int bpf_skb_proto_4_to_6(struct s + shinfo->gso_type &= ~SKB_GSO_TCPV4; + shinfo->gso_type |= SKB_GSO_TCPV6; + } ++ shinfo->gso_type |= SKB_GSO_DODGY; + } + + bpf_skb_change_protocol(skb, ETH_P_IPV6); +@@ -3370,6 +3371,7 @@ static int bpf_skb_proto_6_to_4(struct s + shinfo->gso_type &= ~SKB_GSO_TCPV6; + shinfo->gso_type |= SKB_GSO_TCPV4; + } ++ shinfo->gso_type |= SKB_GSO_DODGY; + } + + bpf_skb_change_protocol(skb, ETH_P_IP); +--- a/net/ipv4/tcp_offload.c ++++ b/net/ipv4/tcp_offload.c +@@ -107,7 +107,8 @@ static struct sk_buff *tcp4_gso_segment( + if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) { + struct tcphdr *th = tcp_hdr(skb); + +- if (skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) ++ if ((skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) && ++ !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY)) + return __tcp4_gso_segment_list(skb, features); + + skb->ip_summed = CHECKSUM_NONE; +--- a/net/ipv4/udp_offload.c ++++ b/net/ipv4/udp_offload.c +@@ -352,7 +352,8 @@ struct sk_buff *__udp_gso_segment(struct + + if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) { + /* Detect modified geometry and pass those to skb_segment. */ +- if (skb_pagelen(gso_skb) - sizeof(*uh) == skb_shinfo(gso_skb)->gso_size) ++ if ((skb_pagelen(gso_skb) - sizeof(*uh) == skb_shinfo(gso_skb)->gso_size) && ++ !(skb_shinfo(gso_skb)->gso_type & SKB_GSO_DODGY)) + return __udp_gso_segment_list(gso_skb, features, is_ipv6); + + ret = __skb_linearize(gso_skb); +--- a/net/ipv6/tcpv6_offload.c ++++ b/net/ipv6/tcpv6_offload.c +@@ -109,7 +109,8 @@ static struct sk_buff *tcp6_gso_segment( + if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) { + struct tcphdr *th = tcp_hdr(skb); + +- if (skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) ++ if ((skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) && ++ !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY)) + return __tcp6_gso_segment_list(skb, features); + + skb->ip_summed = CHECKSUM_NONE; diff --git a/queue-6.6/net-gso-fix-tcp-fraglist-segmentation-after-pull-from-frag_list.patch b/queue-6.6/net-gso-fix-tcp-fraglist-segmentation-after-pull-from-frag_list.patch new file mode 100644 index 0000000000..aece2466f2 --- /dev/null +++ b/queue-6.6/net-gso-fix-tcp-fraglist-segmentation-after-pull-from-frag_list.patch @@ -0,0 +1,88 @@ +From stable+bounces-222524-greg=kroah.com@vger.kernel.org Mon Mar 2 07:56:08 2026 +From: Li hongliang <1468888505@139.com> +Date: Mon, 2 Mar 2026 14:55:22 +0800 +Subject: net: gso: fix tcp fraglist segmentation after pull from frag_list +To: gregkh@linuxfoundation.org, stable@vger.kernel.org, nbd@nbd.name +Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, edumazet@google.com, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com, willemb@google.com, netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, bpf@vger.kernel.org +Message-ID: <20260302065522.2695626-1-1468888505@139.com> + +From: Felix Fietkau + +[ Upstream commit 17bd3bd82f9f79f3feba15476c2b2c95a9b11ff8 ] + +Detect tcp gso fraglist skbs with corrupted geometry (see below) and +pass these to skb_segment instead of skb_segment_list, as the first +can segment them correctly. + +Valid SKB_GSO_FRAGLIST skbs +- consist of two or more segments +- the head_skb holds the protocol headers plus first gso_size +- one or more frag_list skbs hold exactly one segment +- all but the last must be gso_size + +Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can +modify these skbs, breaking these invariants. + +In extreme cases they pull all data into skb linear. For TCP, this +causes a NULL ptr deref in __tcpv4_gso_segment_list_csum at +tcp_hdr(seg->next). + +Detect invalid geometry due to pull, by checking head_skb size. +Don't just drop, as this may blackhole a destination. Convert to be +able to pass to regular skb_segment. + +Approach and description based on a patch by Willem de Bruijn. + +Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/ +Link: https://lore.kernel.org/netdev/20240922150450.3873767-1-willemdebruijn.kernel@gmail.com/ +Fixes: bee88cd5bd83 ("net: add support for segmenting TCP fraglist GSO packets") +Cc: stable@vger.kernel.org +Signed-off-by: Felix Fietkau +Reviewed-by: Willem de Bruijn +Link: https://patch.msgid.link/20240926085315.51524-1-nbd@nbd.name +Signed-off-by: Jakub Kicinski +Signed-off-by: Li hongliang <1468888505@139.com> +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_offload.c | 10 ++++++++-- + net/ipv6/tcpv6_offload.c | 10 ++++++++-- + 2 files changed, 16 insertions(+), 4 deletions(-) + +--- a/net/ipv4/tcp_offload.c ++++ b/net/ipv4/tcp_offload.c +@@ -104,8 +104,14 @@ static struct sk_buff *tcp4_gso_segment( + if (!pskb_may_pull(skb, sizeof(struct tcphdr))) + return ERR_PTR(-EINVAL); + +- if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) +- return __tcp4_gso_segment_list(skb, features); ++ if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) { ++ struct tcphdr *th = tcp_hdr(skb); ++ ++ if (skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) ++ return __tcp4_gso_segment_list(skb, features); ++ ++ skb->ip_summed = CHECKSUM_NONE; ++ } + + if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) { + const struct iphdr *iph = ip_hdr(skb); +--- a/net/ipv6/tcpv6_offload.c ++++ b/net/ipv6/tcpv6_offload.c +@@ -106,8 +106,14 @@ static struct sk_buff *tcp6_gso_segment( + if (!pskb_may_pull(skb, sizeof(*th))) + return ERR_PTR(-EINVAL); + +- if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) +- return __tcp6_gso_segment_list(skb, features); ++ if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) { ++ struct tcphdr *th = tcp_hdr(skb); ++ ++ if (skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) ++ return __tcp6_gso_segment_list(skb, features); ++ ++ skb->ip_summed = CHECKSUM_NONE; ++ } + + if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) { + const struct ipv6hdr *ipv6h = ipv6_hdr(skb); diff --git a/queue-6.6/riscv-sanitize-syscall-table-indexing-under-speculation.patch b/queue-6.6/riscv-sanitize-syscall-table-indexing-under-speculation.patch new file mode 100644 index 0000000000..7c2a88adf1 --- /dev/null +++ b/queue-6.6/riscv-sanitize-syscall-table-indexing-under-speculation.patch @@ -0,0 +1,48 @@ +From stable+bounces-220046-greg=kroah.com@vger.kernel.org Sat Feb 28 07:27:47 2026 +From: Leon Chen +Date: Sat, 28 Feb 2026 14:27:27 +0800 +Subject: riscv: Sanitize syscall table indexing under speculation +To: lukas.gerlach@cispa.de, pjw@kernel.org, stable@vger.kernel.org +Message-ID: <20260228062728.8017-1-leonchen.oss@139.com> + +From: Lukas Gerlach + +[ Upstream commit 25fd7ee7bf58ac3ec7be3c9f82ceff153451946c ] + +The syscall number is a user-controlled value used to index into the +syscall table. Use array_index_nospec() to clamp this value after the +bounds check to prevent speculative out-of-bounds access and subsequent +data leakage via cache side channels. + +Signed-off-by: Lukas Gerlach +Link: https://patch.msgid.link/20251218191332.35849-3-lukas.gerlach@cispa.de +Signed-off-by: Paul Walmsley +[ Added linux/nospec.h for array_index_nospec() to make sure compile without error ] +Signed-off-by: Leon Chen +Signed-off-by: Greg Kroah-Hartman +--- + arch/riscv/kernel/traps.c | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +--- a/arch/riscv/kernel/traps.c ++++ b/arch/riscv/kernel/traps.c +@@ -20,6 +20,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -317,8 +318,10 @@ asmlinkage __visible __trap_section void + + syscall = syscall_enter_from_user_mode(regs, syscall); + +- if (syscall >= 0 && syscall < NR_syscalls) ++ if (syscall >= 0 && syscall < NR_syscalls) { ++ syscall = array_index_nospec(syscall, NR_syscalls); + syscall_handler(regs, syscall); ++ } + + syscall_exit_to_user_mode(regs); + } else { diff --git a/queue-6.6/rxrpc-fix-data-race-warning-and-potential-load-store-tearing.patch b/queue-6.6/rxrpc-fix-data-race-warning-and-potential-load-store-tearing.patch new file mode 100644 index 0000000000..cb15669a4a --- /dev/null +++ b/queue-6.6/rxrpc-fix-data-race-warning-and-potential-load-store-tearing.patch @@ -0,0 +1,219 @@ +From stable+bounces-219886-greg=kroah.com@vger.kernel.org Fri Feb 27 01:56:56 2026 +From: Rahul Sharma +Date: Fri, 27 Feb 2026 08:54:46 +0800 +Subject: rxrpc: Fix data-race warning and potential load/store tearing +To: gregkh@linuxfoundation.org, stable@vger.kernel.org +Cc: linux-kernel@vger.kernel.org, David Howells , syzbot+6182afad5045e6703b3d@syzkaller.appspotmail.com, Marc Dionne , Simon Horman , linux-afs@lists.infradead.org, stable@kernel.org, Jakub Kicinski , Rahul Sharma +Message-ID: <20260227005446.552393-1-black.hawk@163.com> + +From: David Howells + +[ Upstream commit 5d5fe8bcd331f1e34e0943ec7c18432edfcf0e8b ] + +Fix the following: + + BUG: KCSAN: data-race in rxrpc_peer_keepalive_worker / rxrpc_send_data_packet + +which is reporting an issue with the reads and writes to ->last_tx_at in: + + conn->peer->last_tx_at = ktime_get_seconds(); + +and: + + keepalive_at = peer->last_tx_at + RXRPC_KEEPALIVE_TIME; + +The lockless accesses to these to values aren't actually a problem as the +read only needs an approximate time of last transmission for the purposes +of deciding whether or not the transmission of a keepalive packet is +warranted yet. + +Also, as ->last_tx_at is a 64-bit value, tearing can occur on a 32-bit +arch. + +Fix both of these by switching to an unsigned int for ->last_tx_at and only +storing the LSW of the time64_t. It can then be reconstructed at need +provided no more than 68 years has elapsed since the last transmission. + +Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") +Reported-by: syzbot+6182afad5045e6703b3d@syzkaller.appspotmail.com +Closes: https://lore.kernel.org/r/695e7cfb.050a0220.1c677c.036b.GAE@google.com/ +Signed-off-by: David Howells +cc: Marc Dionne +cc: Simon Horman +cc: linux-afs@lists.infradead.org +cc: stable@kernel.org +Link: https://patch.msgid.link/1107124.1768903985@warthog.procyon.org.uk +Signed-off-by: Jakub Kicinski +[ The context change is due to the commit f3a123b25429 +("rxrpc: Allow the app to store private data on peer structs"), +the commit 372d12d191cb +("rxrpc: Add a reason indicator to the tx_data tracepoint"), +the commit 153f90a066dd +("rxrpc: Use ktimes for call timeout tracking and set the timer lazily") +and the commit 9d1d2b59341f +("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)) +which are irrelevant to the logic of this patch. ] +Signed-off-by: Rahul Sharma +Signed-off-by: Greg Kroah-Hartman +--- + net/rxrpc/ar-internal.h | 9 ++++++++- + net/rxrpc/conn_event.c | 2 +- + net/rxrpc/output.c | 11 ++++++----- + net/rxrpc/peer_event.c | 17 ++++++++++++++++- + net/rxrpc/proc.c | 4 ++-- + net/rxrpc/rxkad.c | 2 +- + 6 files changed, 34 insertions(+), 11 deletions(-) + +--- a/net/rxrpc/ar-internal.h ++++ b/net/rxrpc/ar-internal.h +@@ -331,7 +331,7 @@ struct rxrpc_peer { + struct hlist_head error_targets; /* targets for net error distribution */ + struct rb_root service_conns; /* Service connections */ + struct list_head keepalive_link; /* Link in net->peer_keepalive[] */ +- time64_t last_tx_at; /* Last time packet sent here */ ++ unsigned int last_tx_at; /* Last time packet sent here (time64_t LSW) */ + seqlock_t service_conn_lock; + spinlock_t lock; /* access lock */ + unsigned int if_mtu; /* interface MTU for this peer */ +@@ -1171,6 +1171,13 @@ void rxrpc_transmit_one(struct rxrpc_cal + void rxrpc_input_error(struct rxrpc_local *, struct sk_buff *); + void rxrpc_peer_keepalive_worker(struct work_struct *); + ++/* Update the last transmission time on a peer for keepalive purposes. */ ++static inline void rxrpc_peer_mark_tx(struct rxrpc_peer *peer) ++{ ++ /* To avoid tearing on 32-bit systems, we only keep the LSW. */ ++ WRITE_ONCE(peer->last_tx_at, ktime_get_seconds()); ++} ++ + /* + * peer_object.c + */ +--- a/net/rxrpc/conn_event.c ++++ b/net/rxrpc/conn_event.c +@@ -180,7 +180,7 @@ void rxrpc_conn_retransmit_call(struct r + } + + ret = kernel_sendmsg(conn->local->socket, &msg, iov, ioc, len); +- conn->peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(conn->peer); + if (ret < 0) + trace_rxrpc_tx_fail(chan->call_debug_id, serial, ret, + rxrpc_tx_point_call_final_resend); +--- a/net/rxrpc/output.c ++++ b/net/rxrpc/output.c +@@ -233,7 +233,7 @@ int rxrpc_send_ack_packet(struct rxrpc_c + + iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, len); + ret = do_udp_sendmsg(conn->local->socket, &msg, len); +- call->peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(call->peer); + if (ret < 0) { + trace_rxrpc_tx_fail(call->debug_id, serial, ret, + rxrpc_tx_point_call_ack); +@@ -307,7 +307,7 @@ int rxrpc_send_abort_packet(struct rxrpc + + iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, sizeof(pkt)); + ret = do_udp_sendmsg(conn->local->socket, &msg, sizeof(pkt)); +- conn->peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(conn->peer); + if (ret < 0) + trace_rxrpc_tx_fail(call->debug_id, serial, ret, + rxrpc_tx_point_call_abort); +@@ -392,6 +392,7 @@ dont_set_request_ack: + txb->wire.flags, + test_bit(RXRPC_TXBUF_RESENT, &txb->flags), + true); ++ rxrpc_peer_mark_tx(conn->peer); + goto done; + } + } +@@ -425,7 +426,7 @@ dont_set_request_ack: + */ + rxrpc_inc_stat(call->rxnet, stat_tx_data_send); + ret = do_udp_sendmsg(conn->local->socket, &msg, len); +- conn->peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(conn->peer); + + if (ret < 0) { + rxrpc_inc_stat(call->rxnet, stat_tx_data_send_fail); +@@ -572,7 +573,7 @@ void rxrpc_send_conn_abort(struct rxrpc_ + + trace_rxrpc_tx_packet(conn->debug_id, &whdr, rxrpc_tx_point_conn_abort); + +- conn->peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(conn->peer); + } + + /* +@@ -691,7 +692,7 @@ void rxrpc_send_keepalive(struct rxrpc_p + trace_rxrpc_tx_packet(peer->debug_id, &whdr, + rxrpc_tx_point_version_keepalive); + +- peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(peer); + _leave(""); + } + +--- a/net/rxrpc/peer_event.c ++++ b/net/rxrpc/peer_event.c +@@ -225,6 +225,21 @@ static void rxrpc_distribute_error(struc + } + + /* ++ * Reconstruct the last transmission time. The difference calculated should be ++ * valid provided no more than ~68 years elapsed since the last transmission. ++ */ ++static time64_t rxrpc_peer_get_tx_mark(const struct rxrpc_peer *peer, time64_t base) ++{ ++ s32 last_tx_at = READ_ONCE(peer->last_tx_at); ++ s32 base_lsw = base; ++ s32 diff = last_tx_at - base_lsw; ++ ++ diff = clamp(diff, -RXRPC_KEEPALIVE_TIME, RXRPC_KEEPALIVE_TIME); ++ ++ return diff + base; ++} ++ ++/* + * Perform keep-alive pings. + */ + static void rxrpc_peer_keepalive_dispatch(struct rxrpc_net *rxnet, +@@ -252,7 +267,7 @@ static void rxrpc_peer_keepalive_dispatc + spin_unlock(&rxnet->peer_hash_lock); + + if (use) { +- keepalive_at = peer->last_tx_at + RXRPC_KEEPALIVE_TIME; ++ keepalive_at = rxrpc_peer_get_tx_mark(peer, base) + RXRPC_KEEPALIVE_TIME; + slot = keepalive_at - base; + _debug("%02x peer %u t=%d {%pISp}", + cursor, peer->debug_id, slot, &peer->srx.transport); +--- a/net/rxrpc/proc.c ++++ b/net/rxrpc/proc.c +@@ -225,13 +225,13 @@ static int rxrpc_peer_seq_show(struct se + now = ktime_get_seconds(); + seq_printf(seq, + "UDP %-47.47s %-47.47s %3u" +- " %3u %5u %6llus %8u %8u\n", ++ " %3u %5u %6ds %8u %8u\n", + lbuff, + rbuff, + refcount_read(&peer->ref), + peer->cong_ssthresh, + peer->mtu, +- now - peer->last_tx_at, ++ (s32)now - (s32)READ_ONCE(peer->last_tx_at), + peer->srtt_us >> 3, + jiffies_to_usecs(peer->rto_j)); + +--- a/net/rxrpc/rxkad.c ++++ b/net/rxrpc/rxkad.c +@@ -674,7 +674,7 @@ static int rxkad_issue_challenge(struct + return -EAGAIN; + } + +- conn->peer->last_tx_at = ktime_get_seconds(); ++ rxrpc_peer_mark_tx(conn->peer); + trace_rxrpc_tx_packet(conn->debug_id, &whdr, + rxrpc_tx_point_rxkad_challenge); + _leave(" = 0"); diff --git a/queue-6.6/series b/queue-6.6/series index 725e1c387d..8fc15a3346 100644 --- a/queue-6.6/series +++ b/queue-6.6/series @@ -412,3 +412,15 @@ eth-bnxt-always-recalculate-features-after-xdp-clearing-fix-null-deref.patch ext4-always-allocate-blocks-only-from-groups-inode-can-use.patch rxrpc-fix-recvmsg-unconditional-requeue.patch dm-verity-disable-recursive-forward-error-correction.patch +ipv6-use-rcu-in-ip6_xmit.patch +x86-sev-harden-vc-instruction-emulation-somewhat.patch +x86-sev-check-for-mwaitx-and-monitorx-opcodes-in-the-vc-handler.patch +rxrpc-fix-data-race-warning-and-potential-load-store-tearing.patch +iomap-allocate-s_dio_done_wq-for-async-reads-as-well.patch +btrfs-do-not-strictly-require-dirty-metadata-threshold-for-metadata-writepages.patch +riscv-sanitize-syscall-table-indexing-under-speculation.patch +dmaengine-mmp_pdma-fix-race-condition-in-mmp_pdma_residue.patch +tracing-add-recursion-protection-in-kernel-stack-trace-recording.patch +net-add-support-for-segmenting-tcp-fraglist-gso-packets.patch +net-gso-fix-tcp-fraglist-segmentation-after-pull-from-frag_list.patch +net-fix-segmentation-of-forwarding-fraglist-gro.patch diff --git a/queue-6.6/tracing-add-recursion-protection-in-kernel-stack-trace-recording.patch b/queue-6.6/tracing-add-recursion-protection-in-kernel-stack-trace-recording.patch new file mode 100644 index 0000000000..dbaf8fd5ce --- /dev/null +++ b/queue-6.6/tracing-add-recursion-protection-in-kernel-stack-trace-recording.patch @@ -0,0 +1,93 @@ +From stable+bounces-220048-greg=kroah.com@vger.kernel.org Sat Feb 28 07:55:13 2026 +From: Leon Chen +Date: Sat, 28 Feb 2026 14:55:00 +0800 +Subject: tracing: Add recursion protection in kernel stack trace recording +To: mhiramat@kernel.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, joel@joelfernandes.org, paulmck@kernel.org, boqun.feng@gmail.com, stable@vger.kernel.org +Message-ID: <20260228065500.8610-1-leonchen.oss@139.com> + +From: Steven Rostedt + +[ Upstream commit 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb ] + +A bug was reported about an infinite recursion caused by tracing the rcu +events with the kernel stack trace trigger enabled. The stack trace code +called back into RCU which then called the stack trace again. + +Expand the ftrace recursion protection to add a set of bits to protect +events from recursion. Each bit represents the context that the event is +in (normal, softirq, interrupt and NMI). + +Have the stack trace code use the interrupt context to protect against +recursion. + +Note, the bug showed an issue in both the RCU code as well as the tracing +stacktrace code. This only handles the tracing stack trace side of the +bug. The RCU fix will be handled separately. + +Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/ + +Cc: stable@vger.kernel.org +Cc: Masami Hiramatsu +Cc: Mathieu Desnoyers +Cc: Joel Fernandes +Cc: "Paul E. McKenney" +Cc: Boqun Feng +Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home +Reported-by: Yao Kai +Tested-by: Yao Kai +Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Leon Chen +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/trace_recursion.h | 9 +++++++++ + kernel/trace/trace.c | 6 ++++++ + 2 files changed, 15 insertions(+) + +--- a/include/linux/trace_recursion.h ++++ b/include/linux/trace_recursion.h +@@ -34,6 +34,13 @@ enum { + TRACE_INTERNAL_SIRQ_BIT, + TRACE_INTERNAL_TRANSITION_BIT, + ++ /* Internal event use recursion bits */ ++ TRACE_INTERNAL_EVENT_BIT, ++ TRACE_INTERNAL_EVENT_NMI_BIT, ++ TRACE_INTERNAL_EVENT_IRQ_BIT, ++ TRACE_INTERNAL_EVENT_SIRQ_BIT, ++ TRACE_INTERNAL_EVENT_TRANSITION_BIT, ++ + TRACE_BRANCH_BIT, + /* + * Abuse of the trace_recursion. +@@ -97,6 +104,8 @@ enum { + + #define TRACE_LIST_START TRACE_INTERNAL_BIT + ++#define TRACE_EVENT_START TRACE_INTERNAL_EVENT_BIT ++ + #define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1) + + /* +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -3105,6 +3105,11 @@ static void __ftrace_trace_stack(struct + struct ftrace_stack *fstack; + struct stack_entry *entry; + int stackidx; ++ int bit; ++ ++ bit = trace_test_and_set_recursion(_THIS_IP_, _RET_IP_, TRACE_EVENT_START); ++ if (bit < 0) ++ return; + + /* + * Add one, for this function and the call to save_stack_trace() +@@ -3162,6 +3167,7 @@ static void __ftrace_trace_stack(struct + __this_cpu_dec(ftrace_stack_reserve); + preempt_enable_notrace(); + ++ trace_clear_recursion(bit); + } + + static inline void ftrace_trace_stack(struct trace_array *tr, diff --git a/queue-6.6/x86-sev-check-for-mwaitx-and-monitorx-opcodes-in-the-vc-handler.patch b/queue-6.6/x86-sev-check-for-mwaitx-and-monitorx-opcodes-in-the-vc-handler.patch new file mode 100644 index 0000000000..fdf3c67c38 --- /dev/null +++ b/queue-6.6/x86-sev-check-for-mwaitx-and-monitorx-opcodes-in-the-vc-handler.patch @@ -0,0 +1,45 @@ +From stable+bounces-219763-greg=kroah.com@vger.kernel.org Thu Feb 26 07:42:25 2026 +From: Wenshan Lan +Date: Thu, 26 Feb 2026 14:41:12 +0800 +Subject: x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler +To: gregkh@linuxfoundation.org, stable@vger.kernel.org +Cc: Tom Lendacky , Borislav Petkov , Wenshan Lan +Message-ID: <20260226064112.2737715-2-jetlan9@163.com> + +From: Tom Lendacky + +[ Upstream commit e70316d17f6ab49a6038ffd115397fd68f8c7be8 ] + +The MWAITX and MONITORX instructions generate the same #VC error code as +the MWAIT and MONITOR instructions, respectively. Update the #VC handler +opcode checking to also support the MWAITX and MONITORX opcodes. + +Fixes: e3ef461af35a ("x86/sev: Harden #VC instruction emulation somewhat") +Signed-off-by: Tom Lendacky +Signed-off-by: Borislav Petkov (AMD) +Link: https://lore.kernel.org/r/453d5a7cfb4b9fe818b6fb67f93ae25468bc9e23.1713793161.git.thomas.lendacky@amd.com +Signed-off-by: Wenshan Lan +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kernel/sev-shared.c | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +--- a/arch/x86/kernel/sev-shared.c ++++ b/arch/x86/kernel/sev-shared.c +@@ -1237,12 +1237,14 @@ static enum es_result vc_check_opcode_by + break; + + case SVM_EXIT_MONITOR: +- if (opcode == 0x010f && modrm == 0xc8) ++ /* MONITOR and MONITORX instructions generate the same error code */ ++ if (opcode == 0x010f && (modrm == 0xc8 || modrm == 0xfa)) + return ES_OK; + break; + + case SVM_EXIT_MWAIT: +- if (opcode == 0x010f && modrm == 0xc9) ++ /* MWAIT and MWAITX instructions generate the same error code */ ++ if (opcode == 0x010f && (modrm == 0xc9 || modrm == 0xfb)) + return ES_OK; + break; + diff --git a/queue-6.6/x86-sev-harden-vc-instruction-emulation-somewhat.patch b/queue-6.6/x86-sev-harden-vc-instruction-emulation-somewhat.patch new file mode 100644 index 0000000000..1884f889a3 --- /dev/null +++ b/queue-6.6/x86-sev-harden-vc-instruction-emulation-somewhat.patch @@ -0,0 +1,185 @@ +From stable+bounces-219762-greg=kroah.com@vger.kernel.org Thu Feb 26 07:42:24 2026 +From: Wenshan Lan +Date: Thu, 26 Feb 2026 14:41:11 +0800 +Subject: x86/sev: Harden #VC instruction emulation somewhat +To: gregkh@linuxfoundation.org, stable@vger.kernel.org +Cc: "Borislav Petkov (AMD)" , Tom Lendacky , Wenshan Lan +Message-ID: <20260226064112.2737715-1-jetlan9@163.com> + +From: "Borislav Petkov (AMD)" + +[ Upstream commit e3ef461af35a8c74f2f4ce6616491ddb355a208f ] + +Compare the opcode bytes at rIP for each #VC exit reason to verify the +instruction which raised the #VC exception is actually the right one. + +Signed-off-by: Borislav Petkov (AMD) +Acked-by: Tom Lendacky +Link: https://lore.kernel.org/r/20240105101407.11694-1-bp@alien8.de +Signed-off-by: Wenshan Lan +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/boot/compressed/sev.c | 4 + + arch/x86/kernel/sev-shared.c | 102 ++++++++++++++++++++++++++++++++++++++++- + arch/x86/kernel/sev.c | 5 +- + 3 files changed, 108 insertions(+), 3 deletions(-) + +--- a/arch/x86/boot/compressed/sev.c ++++ b/arch/x86/boot/compressed/sev.c +@@ -277,6 +277,10 @@ void do_boot_stage2_vc(struct pt_regs *r + if (result != ES_OK) + goto finish; + ++ result = vc_check_opcode_bytes(&ctxt, exit_code); ++ if (result != ES_OK) ++ goto finish; ++ + switch (exit_code) { + case SVM_EXIT_RDTSC: + case SVM_EXIT_RDTSCP: +--- a/arch/x86/kernel/sev-shared.c ++++ b/arch/x86/kernel/sev-shared.c +@@ -10,11 +10,15 @@ + */ + + #ifndef __BOOT_COMPRESSED +-#define error(v) pr_err(v) +-#define has_cpuflag(f) boot_cpu_has(f) ++#define error(v) pr_err(v) ++#define has_cpuflag(f) boot_cpu_has(f) ++#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__) ++#define sev_printk_rtl(fmt, ...) printk_ratelimited(fmt, ##__VA_ARGS__) + #else + #undef WARN + #define WARN(condition, format...) (!!(condition)) ++#define sev_printk(fmt, ...) ++#define sev_printk_rtl(fmt, ...) + #endif + + /* I/O parameters for CPUID-related helpers */ +@@ -570,6 +574,7 @@ void __head do_vc_no_ghcb(struct pt_regs + { + unsigned int subfn = lower_bits(regs->cx, 32); + unsigned int fn = lower_bits(regs->ax, 32); ++ u16 opcode = *(unsigned short *)regs->ip; + struct cpuid_leaf leaf; + int ret; + +@@ -577,6 +582,10 @@ void __head do_vc_no_ghcb(struct pt_regs + if (exit_code != SVM_EXIT_CPUID) + goto fail; + ++ /* Is it really a CPUID insn? */ ++ if (opcode != 0xa20f) ++ goto fail; ++ + leaf.fn = fn; + leaf.subfn = subfn; + +@@ -1203,3 +1212,92 @@ static int vmgexit_psc(struct ghcb *ghcb + out: + return ret; + } ++ ++static enum es_result vc_check_opcode_bytes(struct es_em_ctxt *ctxt, ++ unsigned long exit_code) ++{ ++ unsigned int opcode = (unsigned int)ctxt->insn.opcode.value; ++ u8 modrm = ctxt->insn.modrm.value; ++ ++ switch (exit_code) { ++ ++ case SVM_EXIT_IOIO: ++ case SVM_EXIT_NPF: ++ /* handled separately */ ++ return ES_OK; ++ ++ case SVM_EXIT_CPUID: ++ if (opcode == 0xa20f) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_INVD: ++ if (opcode == 0x080f) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_MONITOR: ++ if (opcode == 0x010f && modrm == 0xc8) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_MWAIT: ++ if (opcode == 0x010f && modrm == 0xc9) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_MSR: ++ /* RDMSR */ ++ if (opcode == 0x320f || ++ /* WRMSR */ ++ opcode == 0x300f) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_RDPMC: ++ if (opcode == 0x330f) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_RDTSC: ++ if (opcode == 0x310f) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_RDTSCP: ++ if (opcode == 0x010f && modrm == 0xf9) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_READ_DR7: ++ if (opcode == 0x210f && ++ X86_MODRM_REG(ctxt->insn.modrm.value) == 7) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_VMMCALL: ++ if (opcode == 0x010f && modrm == 0xd9) ++ return ES_OK; ++ ++ break; ++ ++ case SVM_EXIT_WRITE_DR7: ++ if (opcode == 0x230f && ++ X86_MODRM_REG(ctxt->insn.modrm.value) == 7) ++ return ES_OK; ++ break; ++ ++ case SVM_EXIT_WBINVD: ++ if (opcode == 0x90f) ++ return ES_OK; ++ break; ++ ++ default: ++ break; ++ } ++ ++ sev_printk(KERN_ERR "Wrong/unhandled opcode bytes: 0x%x, exit_code: 0x%lx, rIP: 0x%lx\n", ++ opcode, exit_code, ctxt->regs->ip); ++ ++ return ES_UNSUPPORTED; ++} +--- a/arch/x86/kernel/sev.c ++++ b/arch/x86/kernel/sev.c +@@ -1749,7 +1749,10 @@ static enum es_result vc_handle_exitcode + struct ghcb *ghcb, + unsigned long exit_code) + { +- enum es_result result; ++ enum es_result result = vc_check_opcode_bytes(ctxt, exit_code); ++ ++ if (result != ES_OK) ++ return result; + + switch (exit_code) { + case SVM_EXIT_READ_DR7: