From: Sasha Levin Date: Sat, 9 May 2026 12:46:40 +0000 (-0400) Subject: Fixes for all trees X-Git-Tag: v6.18.29~6 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=e76aeea95a0ec7c3bda9ebaa21a2c54c4dc252a7;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for all trees Signed-off-by: Sasha Levin --- diff --git a/queue-6.1/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch b/queue-6.1/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch new file mode 100644 index 0000000000..ce3e0eb07b --- /dev/null +++ b/queue-6.1/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch @@ -0,0 +1,97 @@ +From 9a66da3f23590e887b3e0f8ad5c842682215388f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 15 Apr 2026 10:24:50 +0800 +Subject: flow_dissector: do not dissect PPPoE PFC frames + +From: Qingfang Deng + +[ Upstream commit d6c19b31a3c1d519fabdcf0aa239e6b6109b9473 ] + +RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT +RECOMMENDED for PPPoE. In practice, pppd does not support negotiating +PFC for PPPoE sessions, and the flow dissector driver has assumed an +uncompressed frame until the blamed commit. + +During the review process of that commit [1], support for PFC is +suggested. However, having a compressed (1-byte) protocol field means +the subsequent PPP payload is shifted by one byte, causing 4-byte +misalignment for the network header and an unaligned access exception +on some architectures. + +The exception can be reproduced by sending a PPPoE PFC frame to an +ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE +session is active on that interface: + +$ 0 : 00000000 80c40000 00000000 85144817 +$ 4 : 00000008 00000100 80a75758 81dc9bb8 +$ 8 : 00000010 8087ae2c 0000003d 00000000 +$12 : 000000e0 00000039 00000000 00000000 +$16 : 85043240 80a75758 81dc9bb8 00006488 +$20 : 0000002f 00000007 85144810 80a70000 +$24 : 81d1bda0 00000000 +$28 : 81dc8000 81dc9aa8 00000000 805ead08 +Hi : 00009d51 +Lo : 2163358a +epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50 +ra : 805ead08 __skb_get_hash_net+0x74/0x12c +Status: 11000403 KERNEL EXL IE +Cause : 40800010 (ExcCode 04) +BadVA : 85144817 +PrId : 0001992f (MIPS 1004Kc) +Call Trace: +[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50 +[<805ead08>] __skb_get_hash_net+0x74/0x12c +[<805ef330>] get_rps_cpu+0x1b8/0x3fc +[<805fca70>] netif_receive_skb_list_internal+0x324/0x364 +[<805fd120>] napi_complete_done+0x68/0x2a4 +[<8058de5c>] mtk_napi_rx+0x228/0xfec +[<805fd398>] __napi_poll+0x3c/0x1c4 +[<805fd754>] napi_threaded_poll_loop+0x234/0x29c +[<805fd848>] napi_threaded_poll+0x8c/0xb0 +[<80053544>] kthread+0x104/0x12c +[<80002bd8>] ret_from_kernel_thread+0x14/0x1c + +Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000 + +To reduce the attack surface and maintain performance, do not process +PPPoE PFC frames. + +[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home +Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors") +Signed-off-by: Qingfang Deng +Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/flow_dissector.c | 13 +++++-------- + 1 file changed, 5 insertions(+), 8 deletions(-) + +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index 5f50e182acd57..77c65b2968a37 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -1270,16 +1270,13 @@ bool __skb_flow_dissect(const struct net *net, + break; + } + +- /* least significant bit of the most significant octet +- * indicates if protocol field was compressed ++ /* PFC (compressed 1-byte protocol) frames are not processed. ++ * A compressed protocol field has the least significant bit of ++ * the most significant octet set, which will fail the following ++ * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD. + */ + ppp_proto = ntohs(hdr->proto); +- if (ppp_proto & 0x0100) { +- ppp_proto = ppp_proto >> 8; +- nhoff += PPPOE_SES_HLEN - 1; +- } else { +- nhoff += PPPOE_SES_HLEN; +- } ++ nhoff += PPPOE_SES_HLEN; + + if (ppp_proto == PPP_IP) { + proto = htons(ETH_P_IP); +-- +2.53.0 + diff --git a/queue-6.1/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch b/queue-6.1/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch new file mode 100644 index 0000000000..b94fd82d6c --- /dev/null +++ b/queue-6.1/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch @@ -0,0 +1,156 @@ +From e954ca27838c42a4477251ccd3fbf96a9e91f06b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 May 2026 09:08:12 +0200 +Subject: KVM: x86: Fix shadow paging use-after-free due to unexpected GFN + +From: Sean Christopherson + +commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream. + +The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus +the SPTE index. This assumption breaks for shadow paging if the guest +page tables are modified between VM entries (similar to commit +aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even +when creating an MMIO SPTE", 2026-03-27). The flow is as follows: + +- a PDE is installed for a 2MB mapping, and a page in that area is + accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages; + the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because + the guest's mapping is a huge page (and thus contiguous). + +- the PDE mapping is changed from outside the guest. + +- the guest accesses another page in the same 2MB area. KVM installs + a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN + (i.e. based on the new mapping, as changed in the previous step) but + that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore + the rmap entry cannot be found and removed when the kvm_mmu_page + is zapped. + +- the memslot that covers the first 2MB mapping is deleted, and the + kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove() + only looks at the [sp->gfn, sp->gfn + 511] range established in step 1, + and fails to find the rmap entry that was recorded by step 3. + +- any operation that causes an rmap walk for the same page accessed + by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page. + This includes dirty logging or MMU notifier invalidations (e.g., from + MADV_DONTNEED). + +The underlying issue is that KVM's walking of shadow PTEs assumes that +if a SPTE is present when KVM wants to install a non-leaf SPTE, then the +existing kvm_mmu_page must be for the correct gfn. Because the only way +for the gfn to be wrong is if KVM messed up and failed to zap a SPTE... +which shouldn't happen, but *actually* only happens in response to a +guest write. + +That bug dates back literally forever, as even the first version of KVM +assumes that the GFN matches and walks into the "wrong" shadow page. +However, that was only an imprecision until 2032a93d66fa ("KVM: MMU: +Don't allocate gfns page for direct mmu pages") came along. + +Fix it by checking for a target gfn mismatch and zapping the existing +SPTE. That way the old SP and rmap entries are gone, KVM installs +the rmap in the right location, and everyone is happy. + +Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") +Fixes: 6aa8b732ca01 ("kvm: userspace interface") +Reported-by: Alexander Bulekov +Reported-by: Fred Griffoul +Cc: stable@vger.kernel.org +Signed-off-by: Sean Christopherson +Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/ +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++---------------------- + arch/x86/kvm/mmu/spte.h | 5 +++++ + 2 files changed, 19 insertions(+), 22 deletions(-) + +diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c +index ed5ba38bec869..58d67e5ab2c58 100644 +--- a/arch/x86/kvm/mmu/mmu.c ++++ b/arch/x86/kvm/mmu/mmu.c +@@ -163,6 +163,8 @@ struct kmem_cache *mmu_page_header_cache; + static struct percpu_counter kvm_total_used_mmu_pages; + + static void mmu_spte_set(u64 *sptep, u64 spte); ++static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, ++ u64 *spte, struct list_head *invalid_list); + + struct kvm_mmu_role_regs { + const unsigned long cr0; +@@ -1156,20 +1158,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) + rmap_remove(kvm, sptep); + } + +-static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush) +-{ +- struct kvm_mmu_page *sp; +- +- sp = sptep_to_sp(sptep); +- WARN_ON(sp->role.level == PG_LEVEL_4K); +- +- drop_spte(kvm, sptep); +- +- if (flush) +- kvm_flush_remote_tlbs_with_address(kvm, sp->gfn, +- KVM_PAGES_PER_HPAGE(sp->role.level)); +-} +- + /* + * Write-protect on the specified @sptep, @pt_protect indicates whether + * spte write-protection is caused by protecting shadow page table. +@@ -2253,7 +2241,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + { + union kvm_mmu_page_role role; + +- if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) ++ if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) && ++ spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn) + return ERR_PTR(-EEXIST); + + role = kvm_mmu_child_role(sptep, direct, access); +@@ -2331,13 +2320,16 @@ static void __link_shadow_page(struct kvm *kvm, + + BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); + +- /* +- * If an SPTE is present already, it must be a leaf and therefore +- * a large one. Drop it, and flush the TLB if needed, before +- * installing sp. +- */ +- if (is_shadow_present_pte(*sptep)) +- drop_large_spte(kvm, sptep, flush); ++ if (is_shadow_present_pte(*sptep)) { ++ struct kvm_mmu_page *parent_sp; ++ LIST_HEAD(invalid_list); ++ ++ parent_sp = sptep_to_sp(sptep); ++ WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); ++ ++ mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); ++ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); ++ } + + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); + +diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h +index 7670c13ce251b..0ed97eb1c2e6b 100644 +--- a/arch/x86/kvm/mmu/spte.h ++++ b/arch/x86/kvm/mmu/spte.h +@@ -295,6 +295,11 @@ static inline bool is_executable_pte(u64 spte) + return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask; + } + ++static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte) ++{ ++ return to_shadow_page(spte & SPTE_BASE_ADDR_MASK); ++} ++ + static inline kvm_pfn_t spte_to_pfn(u64 pte) + { + return (pte & SPTE_BASE_ADDR_MASK) >> PAGE_SHIFT; +-- +2.53.0 + diff --git a/queue-6.1/net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch b/queue-6.1/net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch new file mode 100644 index 0000000000..4be8045bd7 --- /dev/null +++ b/queue-6.1/net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch @@ -0,0 +1,79 @@ +From cc4e2de3b8e4345347356f7e5837440265da6ebc Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 6 May 2026 09:20:57 +0800 +Subject: net: Fix icmp host relookup triggering ip_rt_bug + +From: Dong Chenchen + +[ Upstream commit c44daa7e3c73229f7ac74985acb8c7fb909c4e0a ] + +arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is: + +WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20 +Modules linked in: +CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 +Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), +BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 +RIP: 0010:ip_rt_bug+0x14/0x20 +Call Trace: + + ip_send_skb+0x14/0x40 + __icmp_send+0x42d/0x6a0 + ipv4_link_failure+0xe2/0x1d0 + arp_error_report+0x3c/0x50 + neigh_invalidate+0x8d/0x100 + neigh_timer_handler+0x2e1/0x330 + call_timer_fn+0x21/0x120 + __run_timer_base.part.0+0x1c9/0x270 + run_timer_softirq+0x4c/0x80 + handle_softirqs+0xac/0x280 + irq_exit_rcu+0x62/0x80 + sysvec_apic_timer_interrupt+0x77/0x90 + +The script below reproduces this scenario: +ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \ + dir out priority 0 ptype main flag localok icmp +ip l a veth1 type veth +ip a a 192.168.141.111/24 dev veth0 +ip l s veth0 up +ping 192.168.141.155 -c 1 + +icmp_route_lookup() create input routes for locally generated packets +while xfrm relookup ICMP traffic.Then it will set input route +(dst->out = ip_rt_bug) to skb for DESTUNREACH. + +For ICMP err triggered by locally generated packets, dst->dev of output +route is loopback. Generally, xfrm relookup verification is not required +on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1). + +Skip icmp relookup for locally generated packets to fix it. + +Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") +Signed-off-by: Dong Chenchen +Reviewed-by: David Ahern +Reviewed-by: Eric Dumazet +Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Jiayuan Chen +Signed-off-by: Sasha Levin +--- + net/ipv4/icmp.c | 3 +++ + 1 file changed, 3 insertions(+) + +diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c +index 66def7f98f704..a9aef281631ee 100644 +--- a/net/ipv4/icmp.c ++++ b/net/ipv4/icmp.c +@@ -518,6 +518,9 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4, + if (!IS_ERR(rt)) { + if (rt != rt2) + return rt; ++ if (inet_addr_type_dev_table(net, route_lookup_dev, ++ fl4->daddr) == RTN_LOCAL) ++ return rt; + } else if (PTR_ERR(rt) == -EPERM) { + rt = NULL; + } else +-- +2.53.0 + diff --git a/queue-6.1/series b/queue-6.1/series index c1d1a3a8d5..35ea92b42a 100644 --- a/queue-6.1/series +++ b/queue-6.1/series @@ -293,3 +293,6 @@ x86-cpu-amd-add-x86_feature_zen1.patch drm-amd-display-do-not-skip-unrelated-mode-changes-i.patch spi-meson-spicc-fix-double-put-in-remove-path.patch ext4-validate-p_idx-bounds-in-ext4_ext_correct_index.patch +kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch +net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch +flow_dissector-do-not-dissect-pppoe-pfc-frames.patch diff --git a/queue-6.12/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch b/queue-6.12/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch new file mode 100644 index 0000000000..2fba18c3b2 --- /dev/null +++ b/queue-6.12/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch @@ -0,0 +1,97 @@ +From fad87abe0517eaa413a35286776296daa70b852d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 15 Apr 2026 10:24:50 +0800 +Subject: flow_dissector: do not dissect PPPoE PFC frames + +From: Qingfang Deng + +[ Upstream commit d6c19b31a3c1d519fabdcf0aa239e6b6109b9473 ] + +RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT +RECOMMENDED for PPPoE. In practice, pppd does not support negotiating +PFC for PPPoE sessions, and the flow dissector driver has assumed an +uncompressed frame until the blamed commit. + +During the review process of that commit [1], support for PFC is +suggested. However, having a compressed (1-byte) protocol field means +the subsequent PPP payload is shifted by one byte, causing 4-byte +misalignment for the network header and an unaligned access exception +on some architectures. + +The exception can be reproduced by sending a PPPoE PFC frame to an +ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE +session is active on that interface: + +$ 0 : 00000000 80c40000 00000000 85144817 +$ 4 : 00000008 00000100 80a75758 81dc9bb8 +$ 8 : 00000010 8087ae2c 0000003d 00000000 +$12 : 000000e0 00000039 00000000 00000000 +$16 : 85043240 80a75758 81dc9bb8 00006488 +$20 : 0000002f 00000007 85144810 80a70000 +$24 : 81d1bda0 00000000 +$28 : 81dc8000 81dc9aa8 00000000 805ead08 +Hi : 00009d51 +Lo : 2163358a +epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50 +ra : 805ead08 __skb_get_hash_net+0x74/0x12c +Status: 11000403 KERNEL EXL IE +Cause : 40800010 (ExcCode 04) +BadVA : 85144817 +PrId : 0001992f (MIPS 1004Kc) +Call Trace: +[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50 +[<805ead08>] __skb_get_hash_net+0x74/0x12c +[<805ef330>] get_rps_cpu+0x1b8/0x3fc +[<805fca70>] netif_receive_skb_list_internal+0x324/0x364 +[<805fd120>] napi_complete_done+0x68/0x2a4 +[<8058de5c>] mtk_napi_rx+0x228/0xfec +[<805fd398>] __napi_poll+0x3c/0x1c4 +[<805fd754>] napi_threaded_poll_loop+0x234/0x29c +[<805fd848>] napi_threaded_poll+0x8c/0xb0 +[<80053544>] kthread+0x104/0x12c +[<80002bd8>] ret_from_kernel_thread+0x14/0x1c + +Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000 + +To reduce the attack surface and maintain performance, do not process +PPPoE PFC frames. + +[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home +Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors") +Signed-off-by: Qingfang Deng +Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/flow_dissector.c | 13 +++++-------- + 1 file changed, 5 insertions(+), 8 deletions(-) + +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index 9cd8de6bebb54..8be2ef8088be2 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -1374,16 +1374,13 @@ bool __skb_flow_dissect(const struct net *net, + break; + } + +- /* least significant bit of the most significant octet +- * indicates if protocol field was compressed ++ /* PFC (compressed 1-byte protocol) frames are not processed. ++ * A compressed protocol field has the least significant bit of ++ * the most significant octet set, which will fail the following ++ * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD. + */ + ppp_proto = ntohs(hdr->proto); +- if (ppp_proto & 0x0100) { +- ppp_proto = ppp_proto >> 8; +- nhoff += PPPOE_SES_HLEN - 1; +- } else { +- nhoff += PPPOE_SES_HLEN; +- } ++ nhoff += PPPOE_SES_HLEN; + + if (ppp_proto == PPP_IP) { + proto = htons(ETH_P_IP); +-- +2.53.0 + diff --git a/queue-6.12/iommu-amd-serialize-sequence-allocation-under-concur.patch b/queue-6.12/iommu-amd-serialize-sequence-allocation-under-concur.patch new file mode 100644 index 0000000000..919df1759c --- /dev/null +++ b/queue-6.12/iommu-amd-serialize-sequence-allocation-under-concur.patch @@ -0,0 +1,119 @@ +From 3b531ab50afebe4f39b8d3d86e34b537b986411f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Jan 2026 15:30:38 +0000 +Subject: iommu/amd: serialize sequence allocation under concurrent TLB + invalidations + +From: Ankit Soni + +commit 9e249c48412828e807afddc21527eb734dc9bd3d upstream. + +With concurrent TLB invalidations, completion wait randomly gets timed out +because cmd_sem_val was incremented outside the IOMMU spinlock, allowing +CMD_COMPL_WAIT commands to be queued out of sequence and breaking the +ordering assumption in wait_on_sem(). +Move the cmd_sem_val increment under iommu->lock so completion sequence +allocation is serialized with command queuing. +And remove the unnecessary return. + +Fixes: d2a0cac10597 ("iommu/amd: move wait_on_sem() out of spinlock") + +Tested-by: Srikanth Aithal +Reported-by: Srikanth Aithal +Signed-off-by: Ankit Soni +Reviewed-by: Vasant Hegde +Signed-off-by: Joerg Roedel +[Salvatore Bonaccorso: Backport to v6.12.y where f32fe7cb0198 +("iommu/amd: Add support to remap/unmap IOMMU buffers for kdump") is not +present] +Signed-off-by: Salvatore Bonaccorso +Signed-off-by: Sasha Levin +--- + drivers/iommu/amd/amd_iommu_types.h | 2 +- + drivers/iommu/amd/init.c | 2 +- + drivers/iommu/amd/iommu.c | 18 ++++++++++++------ + 3 files changed, 14 insertions(+), 8 deletions(-) + +diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h +index a14ee649d3da3..df2aa1c4fafcf 100644 +--- a/drivers/iommu/amd/amd_iommu_types.h ++++ b/drivers/iommu/amd/amd_iommu_types.h +@@ -781,7 +781,7 @@ struct amd_iommu { + + u32 flags; + volatile u64 *cmd_sem; +- atomic64_t cmd_sem_val; ++ u64 cmd_sem_val; + + #ifdef CONFIG_AMD_IOMMU_DEBUGFS + /* DebugFS Info */ +diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c +index e1816ae8699dd..78e9ceda2338f 100644 +--- a/drivers/iommu/amd/init.c ++++ b/drivers/iommu/amd/init.c +@@ -1742,7 +1742,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h, + iommu->pci_seg = pci_seg; + + raw_spin_lock_init(&iommu->lock); +- atomic64_set(&iommu->cmd_sem_val, 0); ++ iommu->cmd_sem_val = 0; + + /* Add IOMMU to internal data structures */ + list_add_tail(&iommu->list, &amd_iommu_list); +diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c +index 24e2de90ac2e4..d0e53a03eff02 100644 +--- a/drivers/iommu/amd/iommu.c ++++ b/drivers/iommu/amd/iommu.c +@@ -1252,6 +1252,12 @@ static int iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd) + return iommu_queue_command_sync(iommu, cmd, true); + } + ++static u64 get_cmdsem_val(struct amd_iommu *iommu) ++{ ++ lockdep_assert_held(&iommu->lock); ++ return ++iommu->cmd_sem_val; ++} ++ + /* + * This function queues a completion wait command into the command + * buffer of an IOMMU +@@ -1266,11 +1272,11 @@ static int iommu_completion_wait(struct amd_iommu *iommu) + if (!iommu->need_sync) + return 0; + +- data = atomic64_inc_return(&iommu->cmd_sem_val); +- build_completion_wait(&cmd, iommu, data); +- + raw_spin_lock_irqsave(&iommu->lock, flags); + ++ data = get_cmdsem_val(iommu); ++ build_completion_wait(&cmd, iommu, data); ++ + ret = __iommu_queue_command_sync(iommu, &cmd, false); + raw_spin_unlock_irqrestore(&iommu->lock, flags); + +@@ -2929,10 +2935,11 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) + return; + + build_inv_irt(&cmd, devid); +- data = atomic64_inc_return(&iommu->cmd_sem_val); +- build_completion_wait(&cmd2, iommu, data); + + raw_spin_lock_irqsave(&iommu->lock, flags); ++ data = get_cmdsem_val(iommu); ++ build_completion_wait(&cmd2, iommu, data); ++ + ret = __iommu_queue_command_sync(iommu, &cmd, true); + if (ret) + goto out_err; +@@ -2946,7 +2953,6 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) + + out_err: + raw_spin_unlock_irqrestore(&iommu->lock, flags); +- return; + } + + static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid, +-- +2.53.0 + diff --git a/queue-6.12/iommu-amd-use-atomic64_inc_return-in-iommu.c.patch b/queue-6.12/iommu-amd-use-atomic64_inc_return-in-iommu.c.patch new file mode 100644 index 0000000000..7415fd2ed9 --- /dev/null +++ b/queue-6.12/iommu-amd-use-atomic64_inc_return-in-iommu.c.patch @@ -0,0 +1,53 @@ +From 6cdf0a224c7c33aa28b0489da7d9e60bc1589b0f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 7 Oct 2024 10:43:31 +0200 +Subject: iommu/amd: Use atomic64_inc_return() in iommu.c + +From: Uros Bizjak + +commit 5ce73c524f5fb5abd7b1bfed0115474b4fb437b4 upstream. + +Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref) +to use optimized implementation and ease register pressure around +the primitive for targets that implement optimized variant. + +Signed-off-by: Uros Bizjak +Cc: Joerg Roedel +Cc: Suravee Suthikulpanit +Cc: Will Deacon +Cc: Robin Murphy +Reviewed-by: Jason Gunthorpe +Link: https://lore.kernel.org/r/20241007084356.47799-1-ubizjak@gmail.com +Signed-off-by: Joerg Roedel +Signed-off-by: Salvatore Bonaccorso +Stable-dep-of: 9e249c48412828e807afddc21527eb734dc9bd3d ("iommu/amd: serialize sequence allocation under concurrent TLB invalidations") +Signed-off-by: Sasha Levin +--- + drivers/iommu/amd/iommu.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c +index fecca5c32e8a2..24e2de90ac2e4 100644 +--- a/drivers/iommu/amd/iommu.c ++++ b/drivers/iommu/amd/iommu.c +@@ -1266,7 +1266,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu) + if (!iommu->need_sync) + return 0; + +- data = atomic64_add_return(1, &iommu->cmd_sem_val); ++ data = atomic64_inc_return(&iommu->cmd_sem_val); + build_completion_wait(&cmd, iommu, data); + + raw_spin_lock_irqsave(&iommu->lock, flags); +@@ -2929,7 +2929,7 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) + return; + + build_inv_irt(&cmd, devid); +- data = atomic64_add_return(1, &iommu->cmd_sem_val); ++ data = atomic64_inc_return(&iommu->cmd_sem_val); + build_completion_wait(&cmd2, iommu, data); + + raw_spin_lock_irqsave(&iommu->lock, flags); +-- +2.53.0 + diff --git a/queue-6.12/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch b/queue-6.12/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch new file mode 100644 index 0000000000..6aad69835d --- /dev/null +++ b/queue-6.12/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch @@ -0,0 +1,138 @@ +From 74deca56dc837b8dff6af186a9652a7869692768 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 May 2026 08:59:56 +0200 +Subject: KVM: x86: Fix shadow paging use-after-free due to unexpected GFN + +From: Sean Christopherson + +commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream. + +The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus +the SPTE index. This assumption breaks for shadow paging if the guest +page tables are modified between VM entries (similar to commit +aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even +when creating an MMIO SPTE", 2026-03-27). The flow is as follows: + +- a PDE is installed for a 2MB mapping, and a page in that area is + accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages; + the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because + the guest's mapping is a huge page (and thus contiguous). + +- the PDE mapping is changed from outside the guest. + +- the guest accesses another page in the same 2MB area. KVM installs + a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN + (i.e. based on the new mapping, as changed in the previous step) but + that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore + the rmap entry cannot be found and removed when the kvm_mmu_page + is zapped. + +- the memslot that covers the first 2MB mapping is deleted, and the + kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove() + only looks at the [sp->gfn, sp->gfn + 511] range established in step 1, + and fails to find the rmap entry that was recorded by step 3. + +- any operation that causes an rmap walk for the same page accessed + by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page. + This includes dirty logging or MMU notifier invalidations (e.g., from + MADV_DONTNEED). + +The underlying issue is that KVM's walking of shadow PTEs assumes that +if a SPTE is present when KVM wants to install a non-leaf SPTE, then the +existing kvm_mmu_page must be for the correct gfn. Because the only way +for the gfn to be wrong is if KVM messed up and failed to zap a SPTE... +which shouldn't happen, but *actually* only happens in response to a +guest write. + +That bug dates back literally forever, as even the first version of KVM +assumes that the GFN matches and walks into the "wrong" shadow page. +However, that was only an imprecision until 2032a93d66fa ("KVM: MMU: +Don't allocate gfns page for direct mmu pages") came along. + +Fix it by checking for a target gfn mismatch and zapping the existing +SPTE. That way the old SP and rmap entries are gone, KVM installs +the rmap in the right location, and everyone is happy. + +Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") +Fixes: 6aa8b732ca01 ("kvm: userspace interface") +Reported-by: Alexander Bulekov +Reported-by: Fred Griffoul +Cc: stable@vger.kernel.org +Signed-off-by: Sean Christopherson +Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/ +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++--------------------- + 1 file changed, 14 insertions(+), 21 deletions(-) + +diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c +index 2c11819bd216c..d288c60ae200b 100644 +--- a/arch/x86/kvm/mmu/mmu.c ++++ b/arch/x86/kvm/mmu/mmu.c +@@ -182,6 +182,8 @@ struct kmem_cache *mmu_page_header_cache; + static struct percpu_counter kvm_total_used_mmu_pages; + + static void mmu_spte_set(u64 *sptep, u64 spte); ++static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, ++ u64 *spte, struct list_head *invalid_list); + + struct kvm_mmu_role_regs { + const unsigned long cr0; +@@ -1187,19 +1189,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) + rmap_remove(kvm, sptep); + } + +-static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush) +-{ +- struct kvm_mmu_page *sp; +- +- sp = sptep_to_sp(sptep); +- WARN_ON_ONCE(sp->role.level == PG_LEVEL_4K); +- +- drop_spte(kvm, sptep); +- +- if (flush) +- kvm_flush_remote_tlbs_sptep(kvm, sptep); +-} +- + /* + * Write-protect on the specified @sptep, @pt_protect indicates whether + * spte write-protection is caused by protecting shadow page table. +@@ -2342,7 +2331,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + { + union kvm_mmu_page_role role; + +- if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) ++ if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) && ++ spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn) + return ERR_PTR(-EEXIST); + + role = kvm_mmu_child_role(sptep, direct, access); +@@ -2420,13 +2410,16 @@ static void __link_shadow_page(struct kvm *kvm, + + BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); + +- /* +- * If an SPTE is present already, it must be a leaf and therefore +- * a large one. Drop it, and flush the TLB if needed, before +- * installing sp. +- */ +- if (is_shadow_present_pte(*sptep)) +- drop_large_spte(kvm, sptep, flush); ++ if (is_shadow_present_pte(*sptep)) { ++ struct kvm_mmu_page *parent_sp; ++ LIST_HEAD(invalid_list); ++ ++ parent_sp = sptep_to_sp(sptep); ++ WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); ++ ++ mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); ++ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); ++ } + + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); + +-- +2.53.0 + diff --git a/queue-6.12/net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch b/queue-6.12/net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch new file mode 100644 index 0000000000..f7df4b7a4f --- /dev/null +++ b/queue-6.12/net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch @@ -0,0 +1,102 @@ +From 2a8f40fc8507d830dccab4f56f5ebae4a247f1b1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 6 May 2026 14:29:26 +0800 +Subject: net: txgbe: fix RTNL assertion warning when remove module + +From: Jiawen Wu + +[ Upstream commit e159f05e12cc1111a3103b99375ddf0dfd0e7d63 ] + +For the copper NIC with external PHY, the driver called +phylink_connect_phy() during probe and phylink_disconnect_phy() during +remove. It caused an RTNL assertion warning in phylink_disconnect_phy() +upon module remove. + +To fix this, add rtnl_lock() and rtnl_unlock() around the +phylink_disconnect_phy() in remove function. + + ------------[ cut here ]------------ + RTNL: assertion failed at drivers/net/phy/phylink.c (2351) + WARNING: drivers/net/phy/phylink.c:2351 at +phylink_disconnect_phy+0xd8/0xf0 [phylink], CPU#0: rmmod/4464 + Modules linked in: ... + CPU: 0 UID: 0 PID: 4464 Comm: rmmod Kdump: loaded Not tainted 7.0.0-rc4+ + Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING +PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024 + RIP: 0010:phylink_disconnect_phy+0xe4/0xf0 [phylink] + Code: 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 f6 31 ff e9 3a 38 8f e7 +48 8d 3d 48 87 e2 ff ba 2f 09 00 00 48 c7 c6 c1 22 24 c0 <67> 48 0f b9 3a +e9 34 ff ff ff 66 90 90 90 90 90 90 90 90 90 90 90 + RSP: 0018:ffffce7288363ac0 EFLAGS: 00010246 + RAX: 0000000000000000 RBX: ffff89654b2a1a00 RCX: 0000000000000000 + RDX: 000000000000092f RSI: ffffffffc02422c1 RDI: ffffffffc0239020 + RBP: ffffce7288363ae8 R08: 0000000000000000 R09: 0000000000000000 + R10: 0000000000000000 R11: 0000000000000000 R12: ffff8964c4022000 + R13: ffff89654fce3028 R14: ffff89654ebb4000 R15: ffffffffc0226348 + FS: 0000795e80d93780(0000) GS:ffff896c52857000(0000) +knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 00005b528b592000 CR3: 0000000170d0f000 CR4: 0000000000f50ef0 + PKRU: 55555554 + Call Trace: + + txgbe_remove_phy+0xbb/0xd0 [txgbe] + txgbe_remove+0x4c/0xb0 [txgbe] + pci_device_remove+0x41/0xb0 + device_remove+0x43/0x80 + device_release_driver_internal+0x206/0x270 + driver_detach+0x4a/0xa0 + bus_remove_driver+0x83/0x120 + driver_unregister+0x2f/0x60 + pci_unregister_driver+0x40/0x90 + txgbe_driver_exit+0x10/0x850 [txgbe] + __do_sys_delete_module.isra.0+0x1c3/0x2f0 + __x64_sys_delete_module+0x12/0x20 + x64_sys_call+0x20c3/0x2390 + do_syscall_64+0x11c/0x1500 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? do_syscall_64+0x15a/0x1500 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? do_fault+0x312/0x580 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? __handle_mm_fault+0x9d5/0x1040 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? count_memcg_events+0x101/0x1d0 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? handle_mm_fault+0x1e8/0x2f0 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? do_user_addr_fault+0x2f8/0x820 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? irqentry_exit+0xb2/0x600 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? exc_page_fault+0x92/0x1c0 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + +Fixes: 02b2a6f91b90 ("net: txgbe: support copper NIC with external PHY") +Cc: stable@vger.kernel.org +Signed-off-by: Jiawen Wu +Reviewed-by: Russell King (Oracle) +Link: https://patch.msgid.link/8B47A5872884147D+20260407094041.4646-1-jiawenwu@trustnetic.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c +index f26946198a2fb..9726622a96bfb 100644 +--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c ++++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c +@@ -622,7 +622,9 @@ int txgbe_init_phy(struct txgbe *txgbe) + void txgbe_remove_phy(struct txgbe *txgbe) + { + if (txgbe->wx->media_type == sp_media_copper) { ++ rtnl_lock(); + phylink_disconnect_phy(txgbe->wx->phylink); ++ rtnl_unlock(); + phylink_destroy(txgbe->wx->phylink); + return; + } +-- +2.53.0 + diff --git a/queue-6.12/series b/queue-6.12/series index 606bbdc841..ad80bcb142 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -14,3 +14,8 @@ ksmbd-rewrite-stop_sessions-with-restartable-iteration.patch mm-convert-mm_lock_seq-to-a-proper-seqcount.patch x86-shadow-stacks-proper-error-handling-for-mmap-loc.patch x86-shstk-prevent-deadlock-during-shstk-sigreturn.patch +kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch +iommu-amd-use-atomic64_inc_return-in-iommu.c.patch +iommu-amd-serialize-sequence-allocation-under-concur.patch +flow_dissector-do-not-dissect-pppoe-pfc-frames.patch +net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch diff --git a/queue-6.18/ceph-fix-num_ops-off-by-one-when-crypto-allocation-f.patch b/queue-6.18/ceph-fix-num_ops-off-by-one-when-crypto-allocation-f.patch new file mode 100644 index 0000000000..4e666a7ac0 --- /dev/null +++ b/queue-6.18/ceph-fix-num_ops-off-by-one-when-crypto-allocation-f.patch @@ -0,0 +1,69 @@ +From de53dcf1d37c194c09ff64f8d92200fb23e9ea43 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 May 2026 18:43:03 -0700 +Subject: ceph: fix num_ops off-by-one when crypto allocation fails + +From: Sam Edwards + +commit a0d9555bf9eaeba34fe6b6bb86f442fe08ba3842 upstream. + +move_dirty_folio_in_page_array() may fail if the file is encrypted, the +dirty folio is not the first in the batch, and it fails to allocate a +bounce buffer to hold the ciphertext. When that happens, +ceph_process_folio_batch() simply redirties the folio and flushes the +current batch -- it can retry that folio in a future batch. + +However, if this failed folio is not contiguous with the last folio that +did make it into the batch, then ceph_process_folio_batch() has already +incremented `ceph_wbc->num_ops`; because it doesn't follow through and +add the discontiguous folio to the array, ceph_submit_write() -- which +expects that `ceph_wbc->num_ops` accurately reflects the number of +contiguous ranges (and therefore the required number of "write extent" +ops) in the writeback -- will panic the kernel: + + BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops); + +This issue can be reproduced on affected kernels by writing to +fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat +pattern (total filesize should not matter) and gradually increasing the +system's memory pressure until a bounce buffer allocation fails. + +Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct +value when move_dirty_folio_in_page_array() fails, but the folio already +started counting a new (i.e. still-empty) extent. + +The defect corrected by this patch has existed since 2022 (see first +`Fixes:`), but another bug blocked multi-folio encrypted writeback until +recently (see second `Fixes:`). The second commit made it into 6.18.16, +6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch +therefore fixes a regression (panic) introduced by cac190c7674f. + +Cc: stable@vger.kernel.org +Fixes: d55207717ded ("ceph: add encryption support to writepage and writepages") +Fixes: cac190c7674f ("ceph: fix write storm on fscrypted files") +Signed-off-by: Sam Edwards +Reviewed-by: Viacheslav Dubeyko +Signed-off-by: Ilya Dryomov +Signed-off-by: Sasha Levin +--- + fs/ceph/addr.c | 4 ++++ + 1 file changed, 4 insertions(+) + +diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c +index 390f122feeaaf..3af6795cb3c15 100644 +--- a/fs/ceph/addr.c ++++ b/fs/ceph/addr.c +@@ -1373,6 +1373,10 @@ int ceph_process_folio_batch(struct address_space *mapping, + rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc, + folio); + if (rc) { ++ /* Did we just begin a new contiguous op? Nevermind! */ ++ if (ceph_wbc->len == 0) ++ ceph_wbc->num_ops--; ++ + rc = 0; + folio_redirty_for_writepage(wbc, folio); + folio_unlock(folio); +-- +2.53.0 + diff --git a/queue-6.18/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch b/queue-6.18/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch new file mode 100644 index 0000000000..a150412076 --- /dev/null +++ b/queue-6.18/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch @@ -0,0 +1,97 @@ +From 157e2779a33686e26123d7263cfee6de4f4cbb74 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 15 Apr 2026 10:24:50 +0800 +Subject: flow_dissector: do not dissect PPPoE PFC frames + +From: Qingfang Deng + +[ Upstream commit d6c19b31a3c1d519fabdcf0aa239e6b6109b9473 ] + +RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT +RECOMMENDED for PPPoE. In practice, pppd does not support negotiating +PFC for PPPoE sessions, and the flow dissector driver has assumed an +uncompressed frame until the blamed commit. + +During the review process of that commit [1], support for PFC is +suggested. However, having a compressed (1-byte) protocol field means +the subsequent PPP payload is shifted by one byte, causing 4-byte +misalignment for the network header and an unaligned access exception +on some architectures. + +The exception can be reproduced by sending a PPPoE PFC frame to an +ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE +session is active on that interface: + +$ 0 : 00000000 80c40000 00000000 85144817 +$ 4 : 00000008 00000100 80a75758 81dc9bb8 +$ 8 : 00000010 8087ae2c 0000003d 00000000 +$12 : 000000e0 00000039 00000000 00000000 +$16 : 85043240 80a75758 81dc9bb8 00006488 +$20 : 0000002f 00000007 85144810 80a70000 +$24 : 81d1bda0 00000000 +$28 : 81dc8000 81dc9aa8 00000000 805ead08 +Hi : 00009d51 +Lo : 2163358a +epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50 +ra : 805ead08 __skb_get_hash_net+0x74/0x12c +Status: 11000403 KERNEL EXL IE +Cause : 40800010 (ExcCode 04) +BadVA : 85144817 +PrId : 0001992f (MIPS 1004Kc) +Call Trace: +[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50 +[<805ead08>] __skb_get_hash_net+0x74/0x12c +[<805ef330>] get_rps_cpu+0x1b8/0x3fc +[<805fca70>] netif_receive_skb_list_internal+0x324/0x364 +[<805fd120>] napi_complete_done+0x68/0x2a4 +[<8058de5c>] mtk_napi_rx+0x228/0xfec +[<805fd398>] __napi_poll+0x3c/0x1c4 +[<805fd754>] napi_threaded_poll_loop+0x234/0x29c +[<805fd848>] napi_threaded_poll+0x8c/0xb0 +[<80053544>] kthread+0x104/0x12c +[<80002bd8>] ret_from_kernel_thread+0x14/0x1c + +Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000 + +To reduce the attack surface and maintain performance, do not process +PPPoE PFC frames. + +[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home +Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors") +Signed-off-by: Qingfang Deng +Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/flow_dissector.c | 13 +++++-------- + 1 file changed, 5 insertions(+), 8 deletions(-) + +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index 1b61bb25ba0e5..2a98f5fa74eb0 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -1374,16 +1374,13 @@ bool __skb_flow_dissect(const struct net *net, + break; + } + +- /* least significant bit of the most significant octet +- * indicates if protocol field was compressed ++ /* PFC (compressed 1-byte protocol) frames are not processed. ++ * A compressed protocol field has the least significant bit of ++ * the most significant octet set, which will fail the following ++ * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD. + */ + ppp_proto = ntohs(hdr->proto); +- if (ppp_proto & 0x0100) { +- ppp_proto = ppp_proto >> 8; +- nhoff += PPPOE_SES_HLEN - 1; +- } else { +- nhoff += PPPOE_SES_HLEN; +- } ++ nhoff += PPPOE_SES_HLEN; + + if (ppp_proto == PPP_IP) { + proto = htons(ETH_P_IP); +-- +2.53.0 + diff --git a/queue-6.18/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch b/queue-6.18/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch new file mode 100644 index 0000000000..6a7432b534 --- /dev/null +++ b/queue-6.18/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch @@ -0,0 +1,138 @@ +From ef52daf71ad9cfd9008b8f577867d0af776b8dda Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 May 2026 08:58:57 +0200 +Subject: KVM: x86: Fix shadow paging use-after-free due to unexpected GFN + +From: Sean Christopherson + +commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream. + +The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus +the SPTE index. This assumption breaks for shadow paging if the guest +page tables are modified between VM entries (similar to commit +aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even +when creating an MMIO SPTE", 2026-03-27). The flow is as follows: + +- a PDE is installed for a 2MB mapping, and a page in that area is + accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages; + the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because + the guest's mapping is a huge page (and thus contiguous). + +- the PDE mapping is changed from outside the guest. + +- the guest accesses another page in the same 2MB area. KVM installs + a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN + (i.e. based on the new mapping, as changed in the previous step) but + that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore + the rmap entry cannot be found and removed when the kvm_mmu_page + is zapped. + +- the memslot that covers the first 2MB mapping is deleted, and the + kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove() + only looks at the [sp->gfn, sp->gfn + 511] range established in step 1, + and fails to find the rmap entry that was recorded by step 3. + +- any operation that causes an rmap walk for the same page accessed + by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page. + This includes dirty logging or MMU notifier invalidations (e.g., from + MADV_DONTNEED). + +The underlying issue is that KVM's walking of shadow PTEs assumes that +if a SPTE is present when KVM wants to install a non-leaf SPTE, then the +existing kvm_mmu_page must be for the correct gfn. Because the only way +for the gfn to be wrong is if KVM messed up and failed to zap a SPTE... +which shouldn't happen, but *actually* only happens in response to a +guest write. + +That bug dates back literally forever, as even the first version of KVM +assumes that the GFN matches and walks into the "wrong" shadow page. +However, that was only an imprecision until 2032a93d66fa ("KVM: MMU: +Don't allocate gfns page for direct mmu pages") came along. + +Fix it by checking for a target gfn mismatch and zapping the existing +SPTE. That way the old SP and rmap entries are gone, KVM installs +the rmap in the right location, and everyone is happy. + +Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") +Fixes: 6aa8b732ca01 ("kvm: userspace interface") +Reported-by: Alexander Bulekov +Reported-by: Fred Griffoul +Cc: stable@vger.kernel.org +Signed-off-by: Sean Christopherson +Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/ +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++--------------------- + 1 file changed, 14 insertions(+), 21 deletions(-) + +diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c +index dad7abb1112b7..0bd0cb8992c9f 100644 +--- a/arch/x86/kvm/mmu/mmu.c ++++ b/arch/x86/kvm/mmu/mmu.c +@@ -182,6 +182,8 @@ static struct kmem_cache *pte_list_desc_cache; + struct kmem_cache *mmu_page_header_cache; + + static void mmu_spte_set(u64 *sptep, u64 spte); ++static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, ++ u64 *spte, struct list_head *invalid_list); + + struct kvm_mmu_role_regs { + const unsigned long cr0; +@@ -1287,19 +1289,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) + rmap_remove(kvm, sptep); + } + +-static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush) +-{ +- struct kvm_mmu_page *sp; +- +- sp = sptep_to_sp(sptep); +- WARN_ON_ONCE(sp->role.level == PG_LEVEL_4K); +- +- drop_spte(kvm, sptep); +- +- if (flush) +- kvm_flush_remote_tlbs_sptep(kvm, sptep); +-} +- + /* + * Write-protect on the specified @sptep, @pt_protect indicates whether + * spte write-protection is caused by protecting shadow page table. +@@ -2466,7 +2455,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + { + union kvm_mmu_page_role role; + +- if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) ++ if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) && ++ spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn) + return ERR_PTR(-EEXIST); + + role = kvm_mmu_child_role(sptep, direct, access); +@@ -2544,13 +2534,16 @@ static void __link_shadow_page(struct kvm *kvm, + + BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); + +- /* +- * If an SPTE is present already, it must be a leaf and therefore +- * a large one. Drop it, and flush the TLB if needed, before +- * installing sp. +- */ +- if (is_shadow_present_pte(*sptep)) +- drop_large_spte(kvm, sptep, flush); ++ if (is_shadow_present_pte(*sptep)) { ++ struct kvm_mmu_page *parent_sp; ++ LIST_HEAD(invalid_list); ++ ++ parent_sp = sptep_to_sp(sptep); ++ WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); ++ ++ mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); ++ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); ++ } + + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); + +-- +2.53.0 + diff --git a/queue-6.18/mptcp-sync-the-msk-sndbuf-at-accept-time.patch b/queue-6.18/mptcp-sync-the-msk-sndbuf-at-accept-time.patch new file mode 100644 index 0000000000..e110e3be36 --- /dev/null +++ b/queue-6.18/mptcp-sync-the-msk-sndbuf-at-accept-time.patch @@ -0,0 +1,72 @@ +From 9ae6b91266dfb38f4954d346660fb6623483a36a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 6 May 2026 13:20:41 +0200 +Subject: mptcp: sync the msk->sndbuf at accept() time + +From: Gang Yan + +commit fcf04b14334641f4b0b8647824480935e9416d52 upstream. + +On passive MPTCP connections, the msk sndbuf is not updated correctly. + +The root cause is an order issue in the accept path: + +- tcp_check_req() -> subflow_syn_recv_sock() -> mptcp_sk_clone_init() + calls __mptcp_propagate_sndbuf() to copy the ssk sndbuf into msk + +- Later, tcp_child_process() -> tcp_init_transfer() -> + tcp_sndbuf_expand() grows the ssk sndbuf. + +So __mptcp_propagate_sndbuf() runs before the ssk sndbuf has been +expanded and the msk ends up with a much smaller sndbuf than the +subflow: + + MPTCP: msk->sndbuf:20480, msk->first->sndbuf:2626560 + +Fix this by moving the __mptcp_propagate_sndbuf() call from +mptcp_sk_clone_init() -- the ssk sndbuf is not yet finalized there -- to +__mptcp_propagate_sndbuf() at accept() time, when the ssk sndbuf has +been fully expanded by tcp_sndbuf_expand(). + +Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning") +Cc: stable@vger.kernel.org +Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/602 +Signed-off-by: Gang Yan +Acked-by: Paolo Abeni +Reviewed-by: Matthieu Baerts (NGI0) +Signed-off-by: Matthieu Baerts (NGI0) +Link: https://patch.msgid.link/20260420-net-mptcp-sync-sndbuf-accept-v1-1-e3523e3aeb44@kernel.org +Signed-off-by: Paolo Abeni +[ No conflicts, but move __mptcp_propagate_sndbuf() above the for-loop + (mptcp_for_each_subflow()) present in this version, which will modify + 'subflow' used by __mptcp_propagate_sndbuf() in this new patch. ] +Signed-off-by: Matthieu Baerts (NGI0) +Signed-off-by: Sasha Levin +--- + net/mptcp/protocol.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c +index 09e1a93b7daab..c805d36fe50d5 100644 +--- a/net/mptcp/protocol.c ++++ b/net/mptcp/protocol.c +@@ -3428,7 +3428,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk, + * uses the correct data + */ + mptcp_copy_inaddrs(nsk, ssk); +- __mptcp_propagate_sndbuf(nsk, ssk); + + mptcp_rcv_space_init(msk, ssk); + msk->rcvq_space.time = mptcp_stamp(); +@@ -4027,6 +4026,8 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, + msk = mptcp_sk(newsk); + msk->in_accept_queue = 0; + ++ __mptcp_propagate_sndbuf(newsk, mptcp_subflow_tcp_sock(subflow)); ++ + /* set ssk->sk_socket of accept()ed flows to mptcp socket. + * This is needed so NOSPACE flag can be set from tcp stack. + */ +-- +2.53.0 + diff --git a/queue-6.18/series b/queue-6.18/series index daa52157a9..45f5c841ea 100644 --- a/queue-6.18/series +++ b/queue-6.18/series @@ -13,3 +13,8 @@ asoc-sof-don-t-allow-pointer-operations-on-unconfigured-streams.patch wifi-mt76-mt7925-fix-incorrect-tlv-length-in-clc-command.patch spi-rockchip-fix-controller-deregistration.patch ksmbd-rewrite-stop_sessions-with-restartable-iteration.patch +kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch +ceph-fix-num_ops-off-by-one-when-crypto-allocation-f.patch +flow_dissector-do-not-dissect-pppoe-pfc-frames.patch +mptcp-sync-the-msk-sndbuf-at-accept-time.patch +smb-client-smbdirect-fix-mr-registration-for-coalesc.patch diff --git a/queue-6.18/smb-client-smbdirect-fix-mr-registration-for-coalesc.patch b/queue-6.18/smb-client-smbdirect-fix-mr-registration-for-coalesc.patch new file mode 100644 index 0000000000..ca9efc9cda --- /dev/null +++ b/queue-6.18/smb-client-smbdirect-fix-mr-registration-for-coalesc.patch @@ -0,0 +1,86 @@ +From 8867706f33407aad3e51307c4aeb901534f1ee11 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 8 May 2026 10:15:47 +0200 +Subject: smb: client/smbdirect: fix MR registration for coalesced SG lists + +From: Yi Kuo + +commit 9900b9fee5a0e0f72d7c744b37c7c851d5785ac6 upstream. +The stable backport to < 7.1 patches a different file. Also +the Fixes tag below is adjusted for the old code path. + +ib_dma_map_sg() modifies the provided scatterlist and returns the +number of mapped entries, which can be fewer than the requested +mr->sgt.nents if the DMA controller coalesces contiguous memory +segments. Passing the original, uncoalesced count to ib_map_mr_sg() +causes memory registration failures if coalescing actually occurs. + +Capture the actual mapped count returned by ib_dma_map_sg() and pass it +to ib_map_mr_sg() to ensure correct MR registration. + +Also update the ib_dma_map_sg() error logging to drop the error +pointer formatting, since the return value is an integer count +rather than an error code. + +Ensure a proper error code (-EIO) is assigned when DMA mapping or +MR registration fails. + +Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") +Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221408 +Reviewed-by: Stefan Metzmacher +Acked-by: Namjae Jeon +Signed-off-by: Yi Kuo +Signed-off-by: Steve French +Cc: stable@vger.kernel.org +Signed-off-by: Stefan Metzmacher +Signed-off-by: Sasha Levin +--- + fs/smb/client/smbdirect.c | 21 ++++++++++++--------- + 1 file changed, 12 insertions(+), 9 deletions(-) + +diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c +index ff44a2dc49938..e2b20219ba2c7 100644 +--- a/fs/smb/client/smbdirect.c ++++ b/fs/smb/client/smbdirect.c +@@ -2895,7 +2895,7 @@ struct smbdirect_mr_io *smbd_register_mr(struct smbd_connection *info, + struct smbdirect_socket *sc = &info->socket; + struct smbdirect_socket_parameters *sp = &sc->parameters; + struct smbdirect_mr_io *mr; +- int rc, num_pages; ++ int rc, num_pages, num_mapped; + struct ib_reg_wr *reg_wr; + + num_pages = iov_iter_npages(iter, sp->max_frmr_depth + 1); +@@ -2923,18 +2923,21 @@ struct smbdirect_mr_io *smbd_register_mr(struct smbd_connection *info, + num_pages, iov_iter_count(iter), sp->max_frmr_depth); + smbd_iter_to_mr(iter, &mr->sgt, sp->max_frmr_depth); + +- rc = ib_dma_map_sg(sc->ib.dev, mr->sgt.sgl, mr->sgt.nents, mr->dir); +- if (!rc) { +- log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n", +- num_pages, mr->dir, rc); ++ num_mapped = ib_dma_map_sg(sc->ib.dev, mr->sgt.sgl, mr->sgt.nents, mr->dir); ++ if (!num_mapped) { ++ log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x num_mapped=%x\n", ++ num_pages, mr->dir, num_mapped); ++ rc = -EIO; + goto dma_map_error; + } + +- rc = ib_map_mr_sg(mr->mr, mr->sgt.sgl, mr->sgt.nents, NULL, PAGE_SIZE); +- if (rc != mr->sgt.nents) { ++ rc = ib_map_mr_sg(mr->mr, mr->sgt.sgl, num_mapped, NULL, PAGE_SIZE); ++ if (rc != num_mapped) { + log_rdma_mr(ERR, +- "ib_map_mr_sg failed rc = %d nents = %x\n", +- rc, mr->sgt.nents); ++ "ib_map_mr_sg failed rc = %d num_mapped = %x\n", ++ rc, num_mapped); ++ if (rc >= 0) ++ rc = -EIO; + goto map_mr_error; + } + +-- +2.53.0 + diff --git a/queue-6.6/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch b/queue-6.6/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch new file mode 100644 index 0000000000..dbc37bfda7 --- /dev/null +++ b/queue-6.6/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch @@ -0,0 +1,97 @@ +From 8537558a18c389cc8598df10b58eb28a09752748 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 15 Apr 2026 10:24:50 +0800 +Subject: flow_dissector: do not dissect PPPoE PFC frames + +From: Qingfang Deng + +[ Upstream commit d6c19b31a3c1d519fabdcf0aa239e6b6109b9473 ] + +RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT +RECOMMENDED for PPPoE. In practice, pppd does not support negotiating +PFC for PPPoE sessions, and the flow dissector driver has assumed an +uncompressed frame until the blamed commit. + +During the review process of that commit [1], support for PFC is +suggested. However, having a compressed (1-byte) protocol field means +the subsequent PPP payload is shifted by one byte, causing 4-byte +misalignment for the network header and an unaligned access exception +on some architectures. + +The exception can be reproduced by sending a PPPoE PFC frame to an +ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE +session is active on that interface: + +$ 0 : 00000000 80c40000 00000000 85144817 +$ 4 : 00000008 00000100 80a75758 81dc9bb8 +$ 8 : 00000010 8087ae2c 0000003d 00000000 +$12 : 000000e0 00000039 00000000 00000000 +$16 : 85043240 80a75758 81dc9bb8 00006488 +$20 : 0000002f 00000007 85144810 80a70000 +$24 : 81d1bda0 00000000 +$28 : 81dc8000 81dc9aa8 00000000 805ead08 +Hi : 00009d51 +Lo : 2163358a +epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50 +ra : 805ead08 __skb_get_hash_net+0x74/0x12c +Status: 11000403 KERNEL EXL IE +Cause : 40800010 (ExcCode 04) +BadVA : 85144817 +PrId : 0001992f (MIPS 1004Kc) +Call Trace: +[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50 +[<805ead08>] __skb_get_hash_net+0x74/0x12c +[<805ef330>] get_rps_cpu+0x1b8/0x3fc +[<805fca70>] netif_receive_skb_list_internal+0x324/0x364 +[<805fd120>] napi_complete_done+0x68/0x2a4 +[<8058de5c>] mtk_napi_rx+0x228/0xfec +[<805fd398>] __napi_poll+0x3c/0x1c4 +[<805fd754>] napi_threaded_poll_loop+0x234/0x29c +[<805fd848>] napi_threaded_poll+0x8c/0xb0 +[<80053544>] kthread+0x104/0x12c +[<80002bd8>] ret_from_kernel_thread+0x14/0x1c + +Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000 + +To reduce the attack surface and maintain performance, do not process +PPPoE PFC frames. + +[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home +Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors") +Signed-off-by: Qingfang Deng +Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/flow_dissector.c | 13 +++++-------- + 1 file changed, 5 insertions(+), 8 deletions(-) + +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index aafa754b6cbab..9432e5362b44f 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -1350,16 +1350,13 @@ bool __skb_flow_dissect(const struct net *net, + break; + } + +- /* least significant bit of the most significant octet +- * indicates if protocol field was compressed ++ /* PFC (compressed 1-byte protocol) frames are not processed. ++ * A compressed protocol field has the least significant bit of ++ * the most significant octet set, which will fail the following ++ * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD. + */ + ppp_proto = ntohs(hdr->proto); +- if (ppp_proto & 0x0100) { +- ppp_proto = ppp_proto >> 8; +- nhoff += PPPOE_SES_HLEN - 1; +- } else { +- nhoff += PPPOE_SES_HLEN; +- } ++ nhoff += PPPOE_SES_HLEN; + + if (ppp_proto == PPP_IP) { + proto = htons(ETH_P_IP); +-- +2.53.0 + diff --git a/queue-6.6/iommu-amd-serialize-sequence-allocation-under-concur.patch b/queue-6.6/iommu-amd-serialize-sequence-allocation-under-concur.patch new file mode 100644 index 0000000000..8de0ae4a50 --- /dev/null +++ b/queue-6.6/iommu-amd-serialize-sequence-allocation-under-concur.patch @@ -0,0 +1,119 @@ +From f9d0e634da7e0063f0303560128648f978aafefd Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Jan 2026 15:30:38 +0000 +Subject: iommu/amd: serialize sequence allocation under concurrent TLB + invalidations + +From: Ankit Soni + +commit 9e249c48412828e807afddc21527eb734dc9bd3d upstream. + +With concurrent TLB invalidations, completion wait randomly gets timed out +because cmd_sem_val was incremented outside the IOMMU spinlock, allowing +CMD_COMPL_WAIT commands to be queued out of sequence and breaking the +ordering assumption in wait_on_sem(). +Move the cmd_sem_val increment under iommu->lock so completion sequence +allocation is serialized with command queuing. +And remove the unnecessary return. + +Fixes: d2a0cac10597 ("iommu/amd: move wait_on_sem() out of spinlock") + +Tested-by: Srikanth Aithal +Reported-by: Srikanth Aithal +Signed-off-by: Ankit Soni +Reviewed-by: Vasant Hegde +Signed-off-by: Joerg Roedel +[Salvatore Bonaccorso: Backport to v6.12.y where f32fe7cb0198 +("iommu/amd: Add support to remap/unmap IOMMU buffers for kdump") is not +present] +Signed-off-by: Salvatore Bonaccorso +Signed-off-by: Sasha Levin +--- + drivers/iommu/amd/amd_iommu_types.h | 2 +- + drivers/iommu/amd/init.c | 2 +- + drivers/iommu/amd/iommu.c | 18 ++++++++++++------ + 3 files changed, 14 insertions(+), 8 deletions(-) + +diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h +index d872054b874fa..2571a782b7b61 100644 +--- a/drivers/iommu/amd/amd_iommu_types.h ++++ b/drivers/iommu/amd/amd_iommu_types.h +@@ -765,7 +765,7 @@ struct amd_iommu { + + u32 flags; + volatile u64 *cmd_sem; +- atomic64_t cmd_sem_val; ++ u64 cmd_sem_val; + + #ifdef CONFIG_AMD_IOMMU_DEBUGFS + /* DebugFS Info */ +diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c +index 6261bc7304e97..e5fee1aae587b 100644 +--- a/drivers/iommu/amd/init.c ++++ b/drivers/iommu/amd/init.c +@@ -1805,7 +1805,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h, + iommu->pci_seg = pci_seg; + + raw_spin_lock_init(&iommu->lock); +- atomic64_set(&iommu->cmd_sem_val, 0); ++ iommu->cmd_sem_val = 0; + + /* Add IOMMU to internal data structures */ + list_add_tail(&iommu->list, &amd_iommu_list); +diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c +index 6d0d28050052a..48cf9e9e15976 100644 +--- a/drivers/iommu/amd/iommu.c ++++ b/drivers/iommu/amd/iommu.c +@@ -1195,6 +1195,12 @@ static int iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd) + return iommu_queue_command_sync(iommu, cmd, true); + } + ++static u64 get_cmdsem_val(struct amd_iommu *iommu) ++{ ++ lockdep_assert_held(&iommu->lock); ++ return ++iommu->cmd_sem_val; ++} ++ + /* + * This function queues a completion wait command into the command + * buffer of an IOMMU +@@ -1209,11 +1215,11 @@ static int iommu_completion_wait(struct amd_iommu *iommu) + if (!iommu->need_sync) + return 0; + +- data = atomic64_inc_return(&iommu->cmd_sem_val); +- build_completion_wait(&cmd, iommu, data); +- + raw_spin_lock_irqsave(&iommu->lock, flags); + ++ data = get_cmdsem_val(iommu); ++ build_completion_wait(&cmd, iommu, data); ++ + ret = __iommu_queue_command_sync(iommu, &cmd, false); + raw_spin_unlock_irqrestore(&iommu->lock, flags); + +@@ -2877,10 +2883,11 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) + return; + + build_inv_irt(&cmd, devid); +- data = atomic64_inc_return(&iommu->cmd_sem_val); +- build_completion_wait(&cmd2, iommu, data); + + raw_spin_lock_irqsave(&iommu->lock, flags); ++ data = get_cmdsem_val(iommu); ++ build_completion_wait(&cmd2, iommu, data); ++ + ret = __iommu_queue_command_sync(iommu, &cmd, true); + if (ret) + goto out_err; +@@ -2894,7 +2901,6 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) + + out_err: + raw_spin_unlock_irqrestore(&iommu->lock, flags); +- return; + } + + static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid, +-- +2.53.0 + diff --git a/queue-6.6/iommu-amd-use-atomic64_inc_return-in-iommu.c.patch b/queue-6.6/iommu-amd-use-atomic64_inc_return-in-iommu.c.patch new file mode 100644 index 0000000000..06eda1d9fa --- /dev/null +++ b/queue-6.6/iommu-amd-use-atomic64_inc_return-in-iommu.c.patch @@ -0,0 +1,53 @@ +From e5e5ed70fe9a48d14f575df4823d861c374939bb Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 7 Oct 2024 10:43:31 +0200 +Subject: iommu/amd: Use atomic64_inc_return() in iommu.c + +From: Uros Bizjak + +commit 5ce73c524f5fb5abd7b1bfed0115474b4fb437b4 upstream. + +Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref) +to use optimized implementation and ease register pressure around +the primitive for targets that implement optimized variant. + +Signed-off-by: Uros Bizjak +Cc: Joerg Roedel +Cc: Suravee Suthikulpanit +Cc: Will Deacon +Cc: Robin Murphy +Reviewed-by: Jason Gunthorpe +Link: https://lore.kernel.org/r/20241007084356.47799-1-ubizjak@gmail.com +Signed-off-by: Joerg Roedel +Signed-off-by: Salvatore Bonaccorso +Stable-dep-of: 9e249c48412828e807afddc21527eb734dc9bd3d ("iommu/amd: serialize sequence allocation under concurrent TLB invalidations") +Signed-off-by: Sasha Levin +--- + drivers/iommu/amd/iommu.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c +index d119a104a3436..6d0d28050052a 100644 +--- a/drivers/iommu/amd/iommu.c ++++ b/drivers/iommu/amd/iommu.c +@@ -1209,7 +1209,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu) + if (!iommu->need_sync) + return 0; + +- data = atomic64_add_return(1, &iommu->cmd_sem_val); ++ data = atomic64_inc_return(&iommu->cmd_sem_val); + build_completion_wait(&cmd, iommu, data); + + raw_spin_lock_irqsave(&iommu->lock, flags); +@@ -2877,7 +2877,7 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) + return; + + build_inv_irt(&cmd, devid); +- data = atomic64_add_return(1, &iommu->cmd_sem_val); ++ data = atomic64_inc_return(&iommu->cmd_sem_val); + build_completion_wait(&cmd2, iommu, data); + + raw_spin_lock_irqsave(&iommu->lock, flags); +-- +2.53.0 + diff --git a/queue-6.6/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch b/queue-6.6/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch new file mode 100644 index 0000000000..3903c23176 --- /dev/null +++ b/queue-6.6/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch @@ -0,0 +1,138 @@ +From baf93bb80a98b366dbfe934ebcfce5cd3957003c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 May 2026 09:00:57 +0200 +Subject: KVM: x86: Fix shadow paging use-after-free due to unexpected GFN + +From: Sean Christopherson + +commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream. + +The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus +the SPTE index. This assumption breaks for shadow paging if the guest +page tables are modified between VM entries (similar to commit +aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even +when creating an MMIO SPTE", 2026-03-27). The flow is as follows: + +- a PDE is installed for a 2MB mapping, and a page in that area is + accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages; + the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because + the guest's mapping is a huge page (and thus contiguous). + +- the PDE mapping is changed from outside the guest. + +- the guest accesses another page in the same 2MB area. KVM installs + a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN + (i.e. based on the new mapping, as changed in the previous step) but + that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore + the rmap entry cannot be found and removed when the kvm_mmu_page + is zapped. + +- the memslot that covers the first 2MB mapping is deleted, and the + kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove() + only looks at the [sp->gfn, sp->gfn + 511] range established in step 1, + and fails to find the rmap entry that was recorded by step 3. + +- any operation that causes an rmap walk for the same page accessed + by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page. + This includes dirty logging or MMU notifier invalidations (e.g., from + MADV_DONTNEED). + +The underlying issue is that KVM's walking of shadow PTEs assumes that +if a SPTE is present when KVM wants to install a non-leaf SPTE, then the +existing kvm_mmu_page must be for the correct gfn. Because the only way +for the gfn to be wrong is if KVM messed up and failed to zap a SPTE... +which shouldn't happen, but *actually* only happens in response to a +guest write. + +That bug dates back literally forever, as even the first version of KVM +assumes that the GFN matches and walks into the "wrong" shadow page. +However, that was only an imprecision until 2032a93d66fa ("KVM: MMU: +Don't allocate gfns page for direct mmu pages") came along. + +Fix it by checking for a target gfn mismatch and zapping the existing +SPTE. That way the old SP and rmap entries are gone, KVM installs +the rmap in the right location, and everyone is happy. + +Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") +Fixes: 6aa8b732ca01 ("kvm: userspace interface") +Reported-by: Alexander Bulekov +Reported-by: Fred Griffoul +Cc: stable@vger.kernel.org +Signed-off-by: Sean Christopherson +Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/ +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++--------------------- + 1 file changed, 14 insertions(+), 21 deletions(-) + +diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c +index 0dc804149b0f3..774bc26b8235e 100644 +--- a/arch/x86/kvm/mmu/mmu.c ++++ b/arch/x86/kvm/mmu/mmu.c +@@ -182,6 +182,8 @@ struct kmem_cache *mmu_page_header_cache; + static struct percpu_counter kvm_total_used_mmu_pages; + + static void mmu_spte_set(u64 *sptep, u64 spte); ++static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, ++ u64 *spte, struct list_head *invalid_list); + + struct kvm_mmu_role_regs { + const unsigned long cr0; +@@ -1194,19 +1196,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) + rmap_remove(kvm, sptep); + } + +-static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush) +-{ +- struct kvm_mmu_page *sp; +- +- sp = sptep_to_sp(sptep); +- WARN_ON_ONCE(sp->role.level == PG_LEVEL_4K); +- +- drop_spte(kvm, sptep); +- +- if (flush) +- kvm_flush_remote_tlbs_sptep(kvm, sptep); +-} +- + /* + * Write-protect on the specified @sptep, @pt_protect indicates whether + * spte write-protection is caused by protecting shadow page table. +@@ -2350,7 +2339,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + { + union kvm_mmu_page_role role; + +- if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) ++ if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) && ++ spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn) + return ERR_PTR(-EEXIST); + + role = kvm_mmu_child_role(sptep, direct, access); +@@ -2428,13 +2418,16 @@ static void __link_shadow_page(struct kvm *kvm, + + BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); + +- /* +- * If an SPTE is present already, it must be a leaf and therefore +- * a large one. Drop it, and flush the TLB if needed, before +- * installing sp. +- */ +- if (is_shadow_present_pte(*sptep)) +- drop_large_spte(kvm, sptep, flush); ++ if (is_shadow_present_pte(*sptep)) { ++ struct kvm_mmu_page *parent_sp; ++ LIST_HEAD(invalid_list); ++ ++ parent_sp = sptep_to_sp(sptep); ++ WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); ++ ++ mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); ++ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); ++ } + + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); + +-- +2.53.0 + diff --git a/queue-6.6/net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch b/queue-6.6/net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch new file mode 100644 index 0000000000..033c8ff61c --- /dev/null +++ b/queue-6.6/net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch @@ -0,0 +1,79 @@ +From 51f2abc149f175fa6282fe988dac2978f0376518 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 6 May 2026 09:21:15 +0800 +Subject: net: Fix icmp host relookup triggering ip_rt_bug + +From: Dong Chenchen + +[ Upstream commit c44daa7e3c73229f7ac74985acb8c7fb909c4e0a ] + +arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is: + +WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20 +Modules linked in: +CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 +Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), +BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 +RIP: 0010:ip_rt_bug+0x14/0x20 +Call Trace: + + ip_send_skb+0x14/0x40 + __icmp_send+0x42d/0x6a0 + ipv4_link_failure+0xe2/0x1d0 + arp_error_report+0x3c/0x50 + neigh_invalidate+0x8d/0x100 + neigh_timer_handler+0x2e1/0x330 + call_timer_fn+0x21/0x120 + __run_timer_base.part.0+0x1c9/0x270 + run_timer_softirq+0x4c/0x80 + handle_softirqs+0xac/0x280 + irq_exit_rcu+0x62/0x80 + sysvec_apic_timer_interrupt+0x77/0x90 + +The script below reproduces this scenario: +ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \ + dir out priority 0 ptype main flag localok icmp +ip l a veth1 type veth +ip a a 192.168.141.111/24 dev veth0 +ip l s veth0 up +ping 192.168.141.155 -c 1 + +icmp_route_lookup() create input routes for locally generated packets +while xfrm relookup ICMP traffic.Then it will set input route +(dst->out = ip_rt_bug) to skb for DESTUNREACH. + +For ICMP err triggered by locally generated packets, dst->dev of output +route is loopback. Generally, xfrm relookup verification is not required +on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1). + +Skip icmp relookup for locally generated packets to fix it. + +Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") +Signed-off-by: Dong Chenchen +Reviewed-by: David Ahern +Reviewed-by: Eric Dumazet +Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Jiayuan Chen +Signed-off-by: Sasha Levin +--- + net/ipv4/icmp.c | 3 +++ + 1 file changed, 3 insertions(+) + +diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c +index 29c73b05b1e1a..3fcf11f83d87b 100644 +--- a/net/ipv4/icmp.c ++++ b/net/ipv4/icmp.c +@@ -518,6 +518,9 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4, + if (!IS_ERR(rt)) { + if (rt != rt2) + return rt; ++ if (inet_addr_type_dev_table(net, route_lookup_dev, ++ fl4->daddr) == RTN_LOCAL) ++ return rt; + } else if (PTR_ERR(rt) == -EPERM) { + rt = NULL; + } else +-- +2.53.0 + diff --git a/queue-6.6/net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch b/queue-6.6/net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch new file mode 100644 index 0000000000..21546c583b --- /dev/null +++ b/queue-6.6/net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch @@ -0,0 +1,102 @@ +From dbe3812de5c9d50fb49241f4be5438829bc7e0a1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 6 May 2026 14:30:00 +0800 +Subject: net: txgbe: fix RTNL assertion warning when remove module + +From: Jiawen Wu + +[ Upstream commit e159f05e12cc1111a3103b99375ddf0dfd0e7d63 ] + +For the copper NIC with external PHY, the driver called +phylink_connect_phy() during probe and phylink_disconnect_phy() during +remove. It caused an RTNL assertion warning in phylink_disconnect_phy() +upon module remove. + +To fix this, add rtnl_lock() and rtnl_unlock() around the +phylink_disconnect_phy() in remove function. + + ------------[ cut here ]------------ + RTNL: assertion failed at drivers/net/phy/phylink.c (2351) + WARNING: drivers/net/phy/phylink.c:2351 at +phylink_disconnect_phy+0xd8/0xf0 [phylink], CPU#0: rmmod/4464 + Modules linked in: ... + CPU: 0 UID: 0 PID: 4464 Comm: rmmod Kdump: loaded Not tainted 7.0.0-rc4+ + Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING +PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024 + RIP: 0010:phylink_disconnect_phy+0xe4/0xf0 [phylink] + Code: 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 f6 31 ff e9 3a 38 8f e7 +48 8d 3d 48 87 e2 ff ba 2f 09 00 00 48 c7 c6 c1 22 24 c0 <67> 48 0f b9 3a +e9 34 ff ff ff 66 90 90 90 90 90 90 90 90 90 90 90 + RSP: 0018:ffffce7288363ac0 EFLAGS: 00010246 + RAX: 0000000000000000 RBX: ffff89654b2a1a00 RCX: 0000000000000000 + RDX: 000000000000092f RSI: ffffffffc02422c1 RDI: ffffffffc0239020 + RBP: ffffce7288363ae8 R08: 0000000000000000 R09: 0000000000000000 + R10: 0000000000000000 R11: 0000000000000000 R12: ffff8964c4022000 + R13: ffff89654fce3028 R14: ffff89654ebb4000 R15: ffffffffc0226348 + FS: 0000795e80d93780(0000) GS:ffff896c52857000(0000) +knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 00005b528b592000 CR3: 0000000170d0f000 CR4: 0000000000f50ef0 + PKRU: 55555554 + Call Trace: + + txgbe_remove_phy+0xbb/0xd0 [txgbe] + txgbe_remove+0x4c/0xb0 [txgbe] + pci_device_remove+0x41/0xb0 + device_remove+0x43/0x80 + device_release_driver_internal+0x206/0x270 + driver_detach+0x4a/0xa0 + bus_remove_driver+0x83/0x120 + driver_unregister+0x2f/0x60 + pci_unregister_driver+0x40/0x90 + txgbe_driver_exit+0x10/0x850 [txgbe] + __do_sys_delete_module.isra.0+0x1c3/0x2f0 + __x64_sys_delete_module+0x12/0x20 + x64_sys_call+0x20c3/0x2390 + do_syscall_64+0x11c/0x1500 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? do_syscall_64+0x15a/0x1500 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? do_fault+0x312/0x580 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? __handle_mm_fault+0x9d5/0x1040 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? count_memcg_events+0x101/0x1d0 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? handle_mm_fault+0x1e8/0x2f0 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? do_user_addr_fault+0x2f8/0x820 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? irqentry_exit+0xb2/0x600 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? exc_page_fault+0x92/0x1c0 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + +Fixes: 02b2a6f91b90 ("net: txgbe: support copper NIC with external PHY") +Cc: stable@vger.kernel.org +Signed-off-by: Jiawen Wu +Reviewed-by: Russell King (Oracle) +Link: https://patch.msgid.link/8B47A5872884147D+20260407094041.4646-1-jiawenwu@trustnetic.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c +index 4159c84035fdc..2494a3a171fdc 100644 +--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c ++++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c +@@ -820,7 +820,9 @@ int txgbe_init_phy(struct txgbe *txgbe) + void txgbe_remove_phy(struct txgbe *txgbe) + { + if (txgbe->wx->media_type == sp_media_copper) { ++ rtnl_lock(); + phylink_disconnect_phy(txgbe->phylink); ++ rtnl_unlock(); + phylink_destroy(txgbe->phylink); + return; + } +-- +2.53.0 + diff --git a/queue-6.6/series b/queue-6.6/series index 9bcb608b85..a602835c1f 100644 --- a/queue-6.6/series +++ b/queue-6.6/series @@ -159,3 +159,9 @@ spi-meson-spicc-fix-double-put-in-remove-path.patch rxrpc-fix-potential-uaf-after-skb_unshare-failure.patch ext4-validate-p_idx-bounds-in-ext4_ext_correct_index.patch rxrpc-fix-rxrpc_input_call_event-to-only-unshare-dat.patch +kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch +iommu-amd-use-atomic64_inc_return-in-iommu.c.patch +iommu-amd-serialize-sequence-allocation-under-concur.patch +net-fix-icmp-host-relookup-triggering-ip_rt_bug.patch +flow_dissector-do-not-dissect-pppoe-pfc-frames.patch +net-txgbe-fix-rtnl-assertion-warning-when-remove-mod.patch diff --git a/queue-7.0/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch b/queue-7.0/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch new file mode 100644 index 0000000000..8a9027f112 --- /dev/null +++ b/queue-7.0/flow_dissector-do-not-dissect-pppoe-pfc-frames.patch @@ -0,0 +1,97 @@ +From ec743acea278eee91082b99b6c87a7454dd2ada8 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 15 Apr 2026 10:24:50 +0800 +Subject: flow_dissector: do not dissect PPPoE PFC frames + +From: Qingfang Deng + +[ Upstream commit d6c19b31a3c1d519fabdcf0aa239e6b6109b9473 ] + +RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT +RECOMMENDED for PPPoE. In practice, pppd does not support negotiating +PFC for PPPoE sessions, and the flow dissector driver has assumed an +uncompressed frame until the blamed commit. + +During the review process of that commit [1], support for PFC is +suggested. However, having a compressed (1-byte) protocol field means +the subsequent PPP payload is shifted by one byte, causing 4-byte +misalignment for the network header and an unaligned access exception +on some architectures. + +The exception can be reproduced by sending a PPPoE PFC frame to an +ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE +session is active on that interface: + +$ 0 : 00000000 80c40000 00000000 85144817 +$ 4 : 00000008 00000100 80a75758 81dc9bb8 +$ 8 : 00000010 8087ae2c 0000003d 00000000 +$12 : 000000e0 00000039 00000000 00000000 +$16 : 85043240 80a75758 81dc9bb8 00006488 +$20 : 0000002f 00000007 85144810 80a70000 +$24 : 81d1bda0 00000000 +$28 : 81dc8000 81dc9aa8 00000000 805ead08 +Hi : 00009d51 +Lo : 2163358a +epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50 +ra : 805ead08 __skb_get_hash_net+0x74/0x12c +Status: 11000403 KERNEL EXL IE +Cause : 40800010 (ExcCode 04) +BadVA : 85144817 +PrId : 0001992f (MIPS 1004Kc) +Call Trace: +[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50 +[<805ead08>] __skb_get_hash_net+0x74/0x12c +[<805ef330>] get_rps_cpu+0x1b8/0x3fc +[<805fca70>] netif_receive_skb_list_internal+0x324/0x364 +[<805fd120>] napi_complete_done+0x68/0x2a4 +[<8058de5c>] mtk_napi_rx+0x228/0xfec +[<805fd398>] __napi_poll+0x3c/0x1c4 +[<805fd754>] napi_threaded_poll_loop+0x234/0x29c +[<805fd848>] napi_threaded_poll+0x8c/0xb0 +[<80053544>] kthread+0x104/0x12c +[<80002bd8>] ret_from_kernel_thread+0x14/0x1c + +Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000 + +To reduce the attack surface and maintain performance, do not process +PPPoE PFC frames. + +[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home +Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors") +Signed-off-by: Qingfang Deng +Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/flow_dissector.c | 13 +++++-------- + 1 file changed, 5 insertions(+), 8 deletions(-) + +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index 1b61bb25ba0e5..2a98f5fa74eb0 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -1374,16 +1374,13 @@ bool __skb_flow_dissect(const struct net *net, + break; + } + +- /* least significant bit of the most significant octet +- * indicates if protocol field was compressed ++ /* PFC (compressed 1-byte protocol) frames are not processed. ++ * A compressed protocol field has the least significant bit of ++ * the most significant octet set, which will fail the following ++ * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD. + */ + ppp_proto = ntohs(hdr->proto); +- if (ppp_proto & 0x0100) { +- ppp_proto = ppp_proto >> 8; +- nhoff += PPPOE_SES_HLEN - 1; +- } else { +- nhoff += PPPOE_SES_HLEN; +- } ++ nhoff += PPPOE_SES_HLEN; + + if (ppp_proto == PPP_IP) { + proto = htons(ETH_P_IP); +-- +2.53.0 + diff --git a/queue-7.0/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch b/queue-7.0/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch new file mode 100644 index 0000000000..16f5393414 --- /dev/null +++ b/queue-7.0/kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch @@ -0,0 +1,138 @@ +From cf0b9871c6487a315b69503e1157a02ce63bd006 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 May 2026 08:57:15 +0200 +Subject: KVM: x86: Fix shadow paging use-after-free due to unexpected GFN + +From: Sean Christopherson + +commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream. + +The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus +the SPTE index. This assumption breaks for shadow paging if the guest +page tables are modified between VM entries (similar to commit +aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even +when creating an MMIO SPTE", 2026-03-27). The flow is as follows: + +- a PDE is installed for a 2MB mapping, and a page in that area is + accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages; + the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because + the guest's mapping is a huge page (and thus contiguous). + +- the PDE mapping is changed from outside the guest. + +- the guest accesses another page in the same 2MB area. KVM installs + a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN + (i.e. based on the new mapping, as changed in the previous step) but + that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore + the rmap entry cannot be found and removed when the kvm_mmu_page + is zapped. + +- the memslot that covers the first 2MB mapping is deleted, and the + kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove() + only looks at the [sp->gfn, sp->gfn + 511] range established in step 1, + and fails to find the rmap entry that was recorded by step 3. + +- any operation that causes an rmap walk for the same page accessed + by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page. + This includes dirty logging or MMU notifier invalidations (e.g., from + MADV_DONTNEED). + +The underlying issue is that KVM's walking of shadow PTEs assumes that +if a SPTE is present when KVM wants to install a non-leaf SPTE, then the +existing kvm_mmu_page must be for the correct gfn. Because the only way +for the gfn to be wrong is if KVM messed up and failed to zap a SPTE... +which shouldn't happen, but *actually* only happens in response to a +guest write. + +That bug dates back literally forever, as even the first version of KVM +assumes that the GFN matches and walks into the "wrong" shadow page. +However, that was only an imprecision until 2032a93d66fa ("KVM: MMU: +Don't allocate gfns page for direct mmu pages") came along. + +Fix it by checking for a target gfn mismatch and zapping the existing +SPTE. That way the old SP and rmap entries are gone, KVM installs +the rmap in the right location, and everyone is happy. + +Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") +Fixes: 6aa8b732ca01 ("kvm: userspace interface") +Reported-by: Alexander Bulekov +Reported-by: Fred Griffoul +Cc: stable@vger.kernel.org +Signed-off-by: Sean Christopherson +Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/ +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++--------------------- + 1 file changed, 14 insertions(+), 21 deletions(-) + +diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c +index dd06453d5b72c..729240bc00a26 100644 +--- a/arch/x86/kvm/mmu/mmu.c ++++ b/arch/x86/kvm/mmu/mmu.c +@@ -182,6 +182,8 @@ static struct kmem_cache *pte_list_desc_cache; + struct kmem_cache *mmu_page_header_cache; + + static void mmu_spte_set(u64 *sptep, u64 spte); ++static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, ++ u64 *spte, struct list_head *invalid_list); + + struct kvm_mmu_role_regs { + const unsigned long cr0; +@@ -1287,19 +1289,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) + rmap_remove(kvm, sptep); + } + +-static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush) +-{ +- struct kvm_mmu_page *sp; +- +- sp = sptep_to_sp(sptep); +- WARN_ON_ONCE(sp->role.level == PG_LEVEL_4K); +- +- drop_spte(kvm, sptep); +- +- if (flush) +- kvm_flush_remote_tlbs_sptep(kvm, sptep); +-} +- + /* + * Write-protect on the specified @sptep, @pt_protect indicates whether + * spte write-protection is caused by protecting shadow page table. +@@ -2466,7 +2455,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + { + union kvm_mmu_page_role role; + +- if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) ++ if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) && ++ spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn) + return ERR_PTR(-EEXIST); + + role = kvm_mmu_child_role(sptep, direct, access); +@@ -2544,13 +2534,16 @@ static void __link_shadow_page(struct kvm *kvm, + + BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); + +- /* +- * If an SPTE is present already, it must be a leaf and therefore +- * a large one. Drop it, and flush the TLB if needed, before +- * installing sp. +- */ +- if (is_shadow_present_pte(*sptep)) +- drop_large_spte(kvm, sptep, flush); ++ if (is_shadow_present_pte(*sptep)) { ++ struct kvm_mmu_page *parent_sp; ++ LIST_HEAD(invalid_list); ++ ++ parent_sp = sptep_to_sp(sptep); ++ WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); ++ ++ mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); ++ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); ++ } + + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); + +-- +2.53.0 + diff --git a/queue-7.0/series b/queue-7.0/series index cca80fdaac..f81696bd29 100644 --- a/queue-7.0/series +++ b/queue-7.0/series @@ -14,3 +14,6 @@ asoc-sof-don-t-allow-pointer-operations-on-unconfigured-streams.patch wifi-mt76-mt7925-fix-incorrect-tlv-length-in-clc-command.patch spi-rockchip-fix-controller-deregistration.patch ksmbd-rewrite-stop_sessions-with-restartable-iteration.patch +kvm-x86-fix-shadow-paging-use-after-free-due-to-unex.patch +flow_dissector-do-not-dissect-pppoe-pfc-frames.patch +smb-client-smbdirect-fix-mr-registration-for-coalesc.patch diff --git a/queue-7.0/smb-client-smbdirect-fix-mr-registration-for-coalesc.patch b/queue-7.0/smb-client-smbdirect-fix-mr-registration-for-coalesc.patch new file mode 100644 index 0000000000..98309d6823 --- /dev/null +++ b/queue-7.0/smb-client-smbdirect-fix-mr-registration-for-coalesc.patch @@ -0,0 +1,86 @@ +From 91dc14b8e4c51e56cf9f519a407f6f4b79308605 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 8 May 2026 10:15:47 +0200 +Subject: smb: client/smbdirect: fix MR registration for coalesced SG lists + +From: Yi Kuo + +commit 9900b9fee5a0e0f72d7c744b37c7c851d5785ac6 upstream. +The stable backport to < 7.1 patches a different file. Also +the Fixes tag below is adjusted for the old code path. + +ib_dma_map_sg() modifies the provided scatterlist and returns the +number of mapped entries, which can be fewer than the requested +mr->sgt.nents if the DMA controller coalesces contiguous memory +segments. Passing the original, uncoalesced count to ib_map_mr_sg() +causes memory registration failures if coalescing actually occurs. + +Capture the actual mapped count returned by ib_dma_map_sg() and pass it +to ib_map_mr_sg() to ensure correct MR registration. + +Also update the ib_dma_map_sg() error logging to drop the error +pointer formatting, since the return value is an integer count +rather than an error code. + +Ensure a proper error code (-EIO) is assigned when DMA mapping or +MR registration fails. + +Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") +Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221408 +Reviewed-by: Stefan Metzmacher +Acked-by: Namjae Jeon +Signed-off-by: Yi Kuo +Signed-off-by: Steve French +Cc: stable@vger.kernel.org +Signed-off-by: Stefan Metzmacher +Signed-off-by: Sasha Levin +--- + fs/smb/client/smbdirect.c | 21 ++++++++++++--------- + 1 file changed, 12 insertions(+), 9 deletions(-) + +diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c +index 4616581050133..d0fcc77794156 100644 +--- a/fs/smb/client/smbdirect.c ++++ b/fs/smb/client/smbdirect.c +@@ -2920,7 +2920,7 @@ struct smbdirect_mr_io *smbd_register_mr(struct smbd_connection *info, + struct smbdirect_socket *sc = &info->socket; + struct smbdirect_socket_parameters *sp = &sc->parameters; + struct smbdirect_mr_io *mr; +- int rc, num_pages; ++ int rc, num_pages, num_mapped; + struct ib_reg_wr *reg_wr; + + num_pages = iov_iter_npages(iter, sp->max_frmr_depth + 1); +@@ -2948,18 +2948,21 @@ struct smbdirect_mr_io *smbd_register_mr(struct smbd_connection *info, + num_pages, iov_iter_count(iter), sp->max_frmr_depth); + smbd_iter_to_mr(iter, &mr->sgt, sp->max_frmr_depth); + +- rc = ib_dma_map_sg(sc->ib.dev, mr->sgt.sgl, mr->sgt.nents, mr->dir); +- if (!rc) { +- log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n", +- num_pages, mr->dir, rc); ++ num_mapped = ib_dma_map_sg(sc->ib.dev, mr->sgt.sgl, mr->sgt.nents, mr->dir); ++ if (!num_mapped) { ++ log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x num_mapped=%x\n", ++ num_pages, mr->dir, num_mapped); ++ rc = -EIO; + goto dma_map_error; + } + +- rc = ib_map_mr_sg(mr->mr, mr->sgt.sgl, mr->sgt.nents, NULL, PAGE_SIZE); +- if (rc != mr->sgt.nents) { ++ rc = ib_map_mr_sg(mr->mr, mr->sgt.sgl, num_mapped, NULL, PAGE_SIZE); ++ if (rc != num_mapped) { + log_rdma_mr(ERR, +- "ib_map_mr_sg failed rc = %d nents = %x\n", +- rc, mr->sgt.nents); ++ "ib_map_mr_sg failed rc = %d num_mapped = %x\n", ++ rc, num_mapped); ++ if (rc >= 0) ++ rc = -EIO; + goto map_mr_error; + } + +-- +2.53.0 +