From: Greg Kroah-Hartman Date: Mon, 29 Dec 2025 14:41:26 +0000 (+0100) Subject: 6.1-stable patches X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=ea3254b078a465e013a9a1e447077451b2c3f85c;p=thirdparty%2Fkernel%2Fstable-queue.git 6.1-stable patches added patches: fsnotify-do-not-generate-access-modify-events-on-child-for-special-files.patch kvm-nsvm-avoid-incorrect-injection-of-svm_exit_cr0_sel_write.patch kvm-nsvm-clear-exit_code_hi-in-vmcb-when-synthesizing-nested-vm-exits.patch kvm-nsvm-propagate-svm_exit_cr0_sel_write-correctly-for-lmsw-emulation.patch kvm-nsvm-set-exit_code_hi-to-1-when-synthesizing-svm_exit_err-failed-vmrun.patch kvm-svm-mark-vmcb_npt-as-dirty-on-nested-vmrun.patch kvm-svm-mark-vmcb_perm_map-as-dirty-on-nested-vmrun.patch kvm-x86-don-t-clear-async-pf-queue-when-cr0.pg-is-disabled-e.g.-on-smi.patch kvm-x86-explicitly-set-new-periodic-hrtimer-expiration-in-apic_timer_fn.patch kvm-x86-fix-vm-hard-lockup-after-prolonged-inactivity-with-periodic-hv-timer.patch kvm-x86-warn-if-hrtimer-callback-for-periodic-apic-timer-fires-with-period-0.patch libceph-make-decode_pool-more-resilient-against-corrupted-osdmaps.patch media-vidtv-initialize-local-pointers-upon-transfer-of-memory-ownership.patch nfsd-mark-variable-__maybe_unused-to-avoid-w-1-build-break.patch ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch parisc-do-not-reprogram-affinitiy-on-asp-chip.patch platform-chrome-cros_ec_ishtp-fix-uaf-after-unbinding-driver.patch pm-runtime-do-not-clear-needs_force_resume-with-enabled-runtime-pm.patch powerpc-kexec-enable-smt-before-waking-offline-cpus.patch r8169-fix-rtl8117-wake-on-lan-in-dash-mode.patch scs-fix-a-wrong-parameter-in-__scs_magic.patch svcrdma-return-0-on-success-from-svc_rdma_copy_inline_range.patch tools-testing-nvdimm-use-per-dimm-device-handle.patch tracing-do-not-register-unsupported-perf-events.patch xfs-fix-a-memory-leak-in-xfs_buf_item_init.patch --- diff --git a/queue-6.1/fsnotify-do-not-generate-access-modify-events-on-child-for-special-files.patch b/queue-6.1/fsnotify-do-not-generate-access-modify-events-on-child-for-special-files.patch new file mode 100644 index 0000000000..bae0cf96d2 --- /dev/null +++ b/queue-6.1/fsnotify-do-not-generate-access-modify-events-on-child-for-special-files.patch @@ -0,0 +1,58 @@ +From 635bc4def026a24e071436f4f356ea08c0eed6ff Mon Sep 17 00:00:00 2001 +From: Amir Goldstein +Date: Sun, 7 Dec 2025 11:44:55 +0100 +Subject: fsnotify: do not generate ACCESS/MODIFY events on child for special files + +From: Amir Goldstein + +commit 635bc4def026a24e071436f4f356ea08c0eed6ff upstream. + +inotify/fanotify do not allow users with no read access to a file to +subscribe to events (e.g. IN_ACCESS/IN_MODIFY), but they do allow the +same user to subscribe for watching events on children when the user +has access to the parent directory (e.g. /dev). + +Users with no read access to a file but with read access to its parent +directory can still stat the file and see if it was accessed/modified +via atime/mtime change. + +The same is not true for special files (e.g. /dev/null). Users will not +generally observe atime/mtime changes when other users read/write to +special files, only when someone sets atime/mtime via utimensat(). + +Align fsnotify events with this stat behavior and do not generate +ACCESS/MODIFY events to parent watchers on read/write of special files. +The events are still generated to parent watchers on utimensat(). This +closes some side-channels that could be possibly used for information +exfiltration [1]. + +[1] https://snee.la/pdf/pubs/file-notification-attacks.pdf + +Reported-by: Sudheendra Raghav Neela +CC: stable@vger.kernel.org +Signed-off-by: Amir Goldstein +Signed-off-by: Jan Kara +Signed-off-by: Greg Kroah-Hartman +--- + fs/notify/fsnotify.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +--- a/fs/notify/fsnotify.c ++++ b/fs/notify/fsnotify.c +@@ -224,8 +224,15 @@ int __fsnotify_parent(struct dentry *den + /* + * Include parent/name in notification either if some notification + * groups require parent info or the parent is interested in this event. ++ * The parent interest in ACCESS/MODIFY events does not apply to special ++ * files, where read/write are not on the filesystem of the parent and ++ * events can provide an undesirable side-channel for information ++ * exfiltration. + */ +- parent_interested = mask & p_mask & ALL_FSNOTIFY_EVENTS; ++ parent_interested = mask & p_mask & ALL_FSNOTIFY_EVENTS && ++ !(data_type == FSNOTIFY_EVENT_PATH && ++ d_is_special(dentry) && ++ (mask & (FS_ACCESS | FS_MODIFY))); + if (parent_needed || parent_interested) { + /* When notifying parent, child should be passed as data */ + WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type)); diff --git a/queue-6.1/kvm-nsvm-avoid-incorrect-injection-of-svm_exit_cr0_sel_write.patch b/queue-6.1/kvm-nsvm-avoid-incorrect-injection-of-svm_exit_cr0_sel_write.patch new file mode 100644 index 0000000000..64a191a133 --- /dev/null +++ b/queue-6.1/kvm-nsvm-avoid-incorrect-injection-of-svm_exit_cr0_sel_write.patch @@ -0,0 +1,79 @@ +From 3d80f4c93d3d26d0f9a0dd2844961a632eeea634 Mon Sep 17 00:00:00 2001 +From: Yosry Ahmed +Date: Fri, 24 Oct 2025 19:29:18 +0000 +Subject: KVM: nSVM: Avoid incorrect injection of SVM_EXIT_CR0_SEL_WRITE +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Yosry Ahmed + +commit 3d80f4c93d3d26d0f9a0dd2844961a632eeea634 upstream. + +When emulating L2 instructions, svm_check_intercept() checks whether a +write to CR0 should trigger a synthesized #VMEXIT with +SVM_EXIT_CR0_SEL_WRITE. However, it does not check whether L1 enabled +the intercept for SVM_EXIT_WRITE_CR0, which has higher priority +according to the APM (24593—Rev. 3.42—March 2024, Table 15-7): + + When both selective and non-selective CR0-write intercepts are active at + the same time, the non-selective intercept takes priority. With respect + to exceptions, the priority of this intercept is the same as the generic + CR0-write intercept. + +Make sure L1 does NOT intercept SVM_EXIT_WRITE_CR0 before checking if +SVM_EXIT_CR0_SEL_WRITE needs to be injected. + +Opportunistically tweak the "not CR0" logic to explicitly bail early so +that it's more obvious that only CR0 has a selective intercept, and that +modifying icpt_info.exit_code is functionally necessary so that the call +to nested_svm_exit_handled() checks the correct exit code. + +Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses") +Cc: stable@vger.kernel.org +Signed-off-by: Yosry Ahmed +Link: https://patch.msgid.link/20251024192918.3191141-4-yosry.ahmed@linux.dev +[sean: isolate non-CR0 write logic, tweak comments accordingly] +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/svm/svm.c | 24 +++++++++++++++++++----- + 1 file changed, 19 insertions(+), 5 deletions(-) + +--- a/arch/x86/kvm/svm/svm.c ++++ b/arch/x86/kvm/svm/svm.c +@@ -4346,15 +4346,29 @@ static int svm_check_intercept(struct kv + case SVM_EXIT_WRITE_CR0: { + unsigned long cr0, val; + +- if (info->intercept == x86_intercept_cr_write) ++ /* ++ * Adjust the exit code accordingly if a CR other than CR0 is ++ * being written, and skip straight to the common handling as ++ * only CR0 has an additional selective intercept. ++ */ ++ if (info->intercept == x86_intercept_cr_write && info->modrm_reg) { + icpt_info.exit_code += info->modrm_reg; ++ break; ++ } + +- if (icpt_info.exit_code != SVM_EXIT_WRITE_CR0 || +- info->intercept == x86_intercept_clts) ++ /* ++ * Convert the exit_code to SVM_EXIT_CR0_SEL_WRITE if a ++ * selective CR0 intercept is triggered (the common logic will ++ * treat the selective intercept as being enabled). Note, the ++ * unconditional intercept has higher priority, i.e. this is ++ * only relevant if *only* the selective intercept is enabled. ++ */ ++ if (vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_CR0_WRITE) || ++ !(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SELECTIVE_CR0))) + break; + +- if (!(vmcb12_is_intercept(&svm->nested.ctl, +- INTERCEPT_SELECTIVE_CR0))) ++ /* CLTS never triggers INTERCEPT_SELECTIVE_CR0 */ ++ if (info->intercept == x86_intercept_clts) + break; + + cr0 = vcpu->arch.cr0 & ~SVM_CR0_SELECTIVE_MASK; diff --git a/queue-6.1/kvm-nsvm-clear-exit_code_hi-in-vmcb-when-synthesizing-nested-vm-exits.patch b/queue-6.1/kvm-nsvm-clear-exit_code_hi-in-vmcb-when-synthesizing-nested-vm-exits.patch new file mode 100644 index 0000000000..bea59ff123 --- /dev/null +++ b/queue-6.1/kvm-nsvm-clear-exit_code_hi-in-vmcb-when-synthesizing-nested-vm-exits.patch @@ -0,0 +1,59 @@ +From da01f64e7470988f8607776aa7afa924208863fb Mon Sep 17 00:00:00 2001 +From: Sean Christopherson +Date: Thu, 13 Nov 2025 14:56:13 -0800 +Subject: KVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits + +From: Sean Christopherson + +commit da01f64e7470988f8607776aa7afa924208863fb upstream. + +Explicitly clear exit_code_hi in the VMCB when synthesizing "normal" +nested VM-Exits, as the full exit code is a 64-bit value (spoiler alert), +and all exit codes for non-failing VMRUN use only bits 31:0. + +Cc: Jim Mattson +Cc: Yosry Ahmed +Cc: stable@vger.kernel.org +Reviewed-by: Yosry Ahmed +Link: https://patch.msgid.link/20251113225621.1688428-2-seanjc@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/svm/svm.c | 2 ++ + arch/x86/kvm/svm/svm.h | 7 ++++--- + 2 files changed, 6 insertions(+), 3 deletions(-) + +--- a/arch/x86/kvm/svm/svm.c ++++ b/arch/x86/kvm/svm/svm.c +@@ -2531,6 +2531,7 @@ static bool check_selective_cr0_intercep + + if (cr0 ^ val) { + svm->vmcb->control.exit_code = SVM_EXIT_CR0_SEL_WRITE; ++ svm->vmcb->control.exit_code_hi = 0; + ret = (nested_svm_exit_handled(svm) == NESTED_EXIT_DONE); + } + +@@ -4445,6 +4446,7 @@ static int svm_check_intercept(struct kv + if (static_cpu_has(X86_FEATURE_NRIPS)) + vmcb->control.next_rip = info->next_rip; + vmcb->control.exit_code = icpt_info.exit_code; ++ vmcb->control.exit_code_hi = 0; + vmexit = nested_svm_exit_handled(svm); + + ret = (vmexit == NESTED_EXIT_DONE) ? X86EMUL_INTERCEPTED +--- a/arch/x86/kvm/svm/svm.h ++++ b/arch/x86/kvm/svm/svm.h +@@ -606,9 +606,10 @@ int nested_svm_vmexit(struct vcpu_svm *s + + static inline int nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code) + { +- svm->vmcb->control.exit_code = exit_code; +- svm->vmcb->control.exit_info_1 = 0; +- svm->vmcb->control.exit_info_2 = 0; ++ svm->vmcb->control.exit_code = exit_code; ++ svm->vmcb->control.exit_code_hi = 0; ++ svm->vmcb->control.exit_info_1 = 0; ++ svm->vmcb->control.exit_info_2 = 0; + return nested_svm_vmexit(svm); + } + diff --git a/queue-6.1/kvm-nsvm-propagate-svm_exit_cr0_sel_write-correctly-for-lmsw-emulation.patch b/queue-6.1/kvm-nsvm-propagate-svm_exit_cr0_sel_write-correctly-for-lmsw-emulation.patch new file mode 100644 index 0000000000..6553020dca --- /dev/null +++ b/queue-6.1/kvm-nsvm-propagate-svm_exit_cr0_sel_write-correctly-for-lmsw-emulation.patch @@ -0,0 +1,68 @@ +From 5674a76db0213f9db1e4d08e847ff649b46889c0 Mon Sep 17 00:00:00 2001 +From: Yosry Ahmed +Date: Fri, 24 Oct 2025 19:29:17 +0000 +Subject: KVM: nSVM: Propagate SVM_EXIT_CR0_SEL_WRITE correctly for LMSW emulation +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Yosry Ahmed + +commit 5674a76db0213f9db1e4d08e847ff649b46889c0 upstream. + +When emulating L2 instructions, svm_check_intercept() checks whether a +write to CR0 should trigger a synthesized #VMEXIT with +SVM_EXIT_CR0_SEL_WRITE. For MOV-to-CR0, SVM_EXIT_CR0_SEL_WRITE is only +triggered if any bit other than CR0.MP and CR0.TS is updated. However, +according to the APM (24593—Rev. 3.42—March 2024, Table 15-7): + + The LMSW instruction treats the selective CR0-write + intercept as a non-selective intercept (i.e., it intercepts + regardless of the value being written). + +Skip checking the changed bits for x86_intercept_lmsw and always inject +SVM_EXIT_CR0_SEL_WRITE. + +Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses") +Cc: stable@vger.kernel.org +Reported-by: Matteo Rizzo +Signed-off-by: Yosry Ahmed +Link: https://patch.msgid.link/20251024192918.3191141-3-yosry.ahmed@linux.dev +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/svm/svm.c | 18 +++++++++--------- + 1 file changed, 9 insertions(+), 9 deletions(-) + +--- a/arch/x86/kvm/svm/svm.c ++++ b/arch/x86/kvm/svm/svm.c +@@ -4371,20 +4371,20 @@ static int svm_check_intercept(struct kv + if (info->intercept == x86_intercept_clts) + break; + +- cr0 = vcpu->arch.cr0 & ~SVM_CR0_SELECTIVE_MASK; +- val = info->src_val & ~SVM_CR0_SELECTIVE_MASK; +- ++ /* LMSW always triggers INTERCEPT_SELECTIVE_CR0 */ + if (info->intercept == x86_intercept_lmsw) { +- cr0 &= 0xfUL; +- val &= 0xfUL; +- /* lmsw can't clear PE - catch this here */ +- if (cr0 & X86_CR0_PE) +- val |= X86_CR0_PE; ++ icpt_info.exit_code = SVM_EXIT_CR0_SEL_WRITE; ++ break; + } + ++ /* ++ * MOV-to-CR0 only triggers INTERCEPT_SELECTIVE_CR0 if any bit ++ * other than SVM_CR0_SELECTIVE_MASK is changed. ++ */ ++ cr0 = vcpu->arch.cr0 & ~SVM_CR0_SELECTIVE_MASK; ++ val = info->src_val & ~SVM_CR0_SELECTIVE_MASK; + if (cr0 ^ val) + icpt_info.exit_code = SVM_EXIT_CR0_SEL_WRITE; +- + break; + } + case SVM_EXIT_READ_DR0: diff --git a/queue-6.1/kvm-nsvm-set-exit_code_hi-to-1-when-synthesizing-svm_exit_err-failed-vmrun.patch b/queue-6.1/kvm-nsvm-set-exit_code_hi-to-1-when-synthesizing-svm_exit_err-failed-vmrun.patch new file mode 100644 index 0000000000..1940823893 --- /dev/null +++ b/queue-6.1/kvm-nsvm-set-exit_code_hi-to-1-when-synthesizing-svm_exit_err-failed-vmrun.patch @@ -0,0 +1,65 @@ +From f402ecd7a8b6446547076f4bd24bd5d4dcc94481 Mon Sep 17 00:00:00 2001 +From: Sean Christopherson +Date: Thu, 13 Nov 2025 14:56:14 -0800 +Subject: KVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN) +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Sean Christopherson + +commit f402ecd7a8b6446547076f4bd24bd5d4dcc94481 upstream. + +Set exit_code_hi to -1u as a temporary band-aid to fix a long-standing +(effectively since KVM's inception) bug where KVM treats the exit code as +a 32-bit value, when in reality it's a 64-bit value. Per the APM, offset +0x70 is a single 64-bit value: + + 070h 63:0 EXITCODE + +And a sane reading of the error values defined in "Table C-1. SVM Intercept +Codes" is that negative values use the full 64 bits: + + –1 VMEXIT_INVALID Invalid guest state in VMCB. + –2 VMEXIT_BUSYBUSY bit was set in the VMSA + –3 VMEXIT_IDLE_REQUIREDThe sibling thread is not in an idle state + -4 VMEXIT_INVALID_PMC Invalid PMC state + +And that interpretation is confirmed by testing on Milan and Turin (by +setting bits in CR0[63:32] to generate VMEXIT_INVALID on VMRUN). + +Furthermore, Xen has treated exitcode as a 64-bit value since HVM support +was adding in 2006 (see Xen commit d1bd157fbc ("Big merge the HVM +full-virtualisation abstractions.")). + +Cc: Jim Mattson +Cc: Yosry Ahmed +Cc: stable@vger.kernel.org +Reviewed-by: Yosry Ahmed +Link: https://patch.msgid.link/20251113225621.1688428-3-seanjc@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/svm/nested.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/arch/x86/kvm/svm/nested.c ++++ b/arch/x86/kvm/svm/nested.c +@@ -834,7 +834,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vc + if (!nested_vmcb_check_save(vcpu) || + !nested_vmcb_check_controls(vcpu)) { + vmcb12->control.exit_code = SVM_EXIT_ERR; +- vmcb12->control.exit_code_hi = 0; ++ vmcb12->control.exit_code_hi = -1u; + vmcb12->control.exit_info_1 = 0; + vmcb12->control.exit_info_2 = 0; + goto out; +@@ -867,7 +867,7 @@ out_exit_err: + svm->soft_int_injected = false; + + svm->vmcb->control.exit_code = SVM_EXIT_ERR; +- svm->vmcb->control.exit_code_hi = 0; ++ svm->vmcb->control.exit_code_hi = -1u; + svm->vmcb->control.exit_info_1 = 0; + svm->vmcb->control.exit_info_2 = 0; + diff --git a/queue-6.1/kvm-svm-mark-vmcb_npt-as-dirty-on-nested-vmrun.patch b/queue-6.1/kvm-svm-mark-vmcb_npt-as-dirty-on-nested-vmrun.patch new file mode 100644 index 0000000000..0e9a22286d --- /dev/null +++ b/queue-6.1/kvm-svm-mark-vmcb_npt-as-dirty-on-nested-vmrun.patch @@ -0,0 +1,36 @@ +From 7c8b465a1c91f674655ea9cec5083744ec5f796a Mon Sep 17 00:00:00 2001 +From: Jim Mattson +Date: Mon, 22 Sep 2025 09:29:23 -0700 +Subject: KVM: SVM: Mark VMCB_NPT as dirty on nested VMRUN + +From: Jim Mattson + +commit 7c8b465a1c91f674655ea9cec5083744ec5f796a upstream. + +Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save() +on every nested VMRUN. + +If L1 changes the PAT MSR between two VMRUN instructions on the same +L1 vCPU, the g_pat field in the associated vmcb02 will change, and the +VMCB_NPT clean bit should be cleared. + +Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit") +Cc: stable@vger.kernel.org +Signed-off-by: Jim Mattson +Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/svm/nested.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/arch/x86/kvm/svm/nested.c ++++ b/arch/x86/kvm/svm/nested.c +@@ -520,6 +520,7 @@ static void nested_vmcb02_prepare_save(s + struct vmcb *vmcb02 = svm->nested.vmcb02.ptr; + + nested_vmcb02_compute_g_pat(svm); ++ vmcb_mark_dirty(vmcb02, VMCB_NPT); + + /* Load the nested guest state */ + if (svm->nested.vmcb12_gpa != svm->nested.last_vmcb12_gpa) { diff --git a/queue-6.1/kvm-svm-mark-vmcb_perm_map-as-dirty-on-nested-vmrun.patch b/queue-6.1/kvm-svm-mark-vmcb_perm_map-as-dirty-on-nested-vmrun.patch new file mode 100644 index 0000000000..e873e8eb50 --- /dev/null +++ b/queue-6.1/kvm-svm-mark-vmcb_perm_map-as-dirty-on-nested-vmrun.patch @@ -0,0 +1,37 @@ +From 93c9e107386dbe1243287a5b14ceca894de372b9 Mon Sep 17 00:00:00 2001 +From: Jim Mattson +Date: Mon, 22 Sep 2025 09:29:22 -0700 +Subject: KVM: SVM: Mark VMCB_PERM_MAP as dirty on nested VMRUN + +From: Jim Mattson + +commit 93c9e107386dbe1243287a5b14ceca894de372b9 upstream. + +Mark the VMCB_PERM_MAP bit as dirty in nested_vmcb02_prepare_control() +on every nested VMRUN. + +If L1 changes MSR interception (INTERCEPT_MSR_PROT) between two VMRUN +instructions on the same L1 vCPU, the msrpm_base_pa in the associated +vmcb02 will change, and the VMCB_PERM_MAP clean bit should be cleared. + +Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit") +Reported-by: Matteo Rizzo +Cc: stable@vger.kernel.org +Signed-off-by: Jim Mattson +Link: https://lore.kernel.org/r/20250922162935.621409-2-jmattson@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/svm/nested.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/arch/x86/kvm/svm/nested.c ++++ b/arch/x86/kvm/svm/nested.c +@@ -639,6 +639,7 @@ static void nested_vmcb02_prepare_contro + vmcb02->control.nested_ctl = vmcb01->control.nested_ctl; + vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; + vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa; ++ vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP); + + /* Done at vmrun: asid. */ + diff --git a/queue-6.1/kvm-x86-don-t-clear-async-pf-queue-when-cr0.pg-is-disabled-e.g.-on-smi.patch b/queue-6.1/kvm-x86-don-t-clear-async-pf-queue-when-cr0.pg-is-disabled-e.g.-on-smi.patch new file mode 100644 index 0000000000..112edba36d --- /dev/null +++ b/queue-6.1/kvm-x86-don-t-clear-async-pf-queue-when-cr0.pg-is-disabled-e.g.-on-smi.patch @@ -0,0 +1,105 @@ +From ab4e41eb9fabd4607304fa7cfe8ec9c0bd8e1552 Mon Sep 17 00:00:00 2001 +From: Maxim Levitsky +Date: Tue, 14 Oct 2025 23:32:58 -0400 +Subject: KVM: x86: Don't clear async #PF queue when CR0.PG is disabled (e.g. on #SMI) + +From: Maxim Levitsky + +commit ab4e41eb9fabd4607304fa7cfe8ec9c0bd8e1552 upstream. + +Fix an interaction between SMM and PV asynchronous #PFs where an #SMI can +cause KVM to drop an async #PF ready event, and thus result in guest tasks +becoming permanently stuck due to the task that encountered the #PF never +being resumed. Specifically, don't clear the completion queue when paging +is disabled, and re-check for completed async #PFs if/when paging is +enabled. + +Prior to commit 2635b5c4a0e4 ("KVM: x86: interrupt based APF 'page ready' +event delivery"), flushing the APF queue without notifying the guest of +completed APF requests when paging is disabled was "necessary", in that +delivering a #PF to the guest when paging is disabled would likely confuse +and/or crash the guest. And presumably the original async #PF development +assumed that a guest would only disable paging when there was no intent to +ever re-enable paging. + +That assumption fails in several scenarios, most visibly on an emulated +SMI, as entering SMM always disables CR0.PG (i.e. initially runs with +paging disabled). When the SMM handler eventually executes RSM, the +interrupted paging-enabled is restored, and the async #PF event is lost. + +Similarly, invoking firmware, e.g. via EFI runtime calls, might require a +transition through paging modes and thus also disable paging with valid +entries in the competion queue. + +To avoid dropping completion events, drop the "clear" entirely, and handle +paging-enable transitions in the same way KVM already handles APIC +enable/disable events: if a vCPU's APIC is disabled, APF completion events +are not kept pending and not injected while APIC is disabled. Once a +vCPU's APIC is re-enabled, KVM raises KVM_REQ_APF_READY so that the vCPU +recognizes any pending pending #APF ready events. + +Signed-off-by: Maxim Levitsky +Cc: stable@vger.kernel.org +Link: https://patch.msgid.link/20251015033258.50974-4-mlevitsk@redhat.com +[sean: rework changelog to call out #PF injection, drop "real mode" + references, expand the code comment] +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/x86.c | 25 +++++++++++++++---------- + 1 file changed, 15 insertions(+), 10 deletions(-) + +--- a/arch/x86/kvm/x86.c ++++ b/arch/x86/kvm/x86.c +@@ -853,6 +853,13 @@ bool kvm_require_dr(struct kvm_vcpu *vcp + } + EXPORT_SYMBOL_GPL(kvm_require_dr); + ++static bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu) ++{ ++ u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT; ++ ++ return (vcpu->arch.apf.msr_en_val & mask) == mask; ++} ++ + static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu) + { + return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2); +@@ -939,15 +946,20 @@ void kvm_post_set_cr0(struct kvm_vcpu *v + } + + if ((cr0 ^ old_cr0) & X86_CR0_PG) { +- kvm_clear_async_pf_completion_queue(vcpu); +- kvm_async_pf_hash_reset(vcpu); +- + /* + * Clearing CR0.PG is defined to flush the TLB from the guest's + * perspective. + */ + if (!(cr0 & X86_CR0_PG)) + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); ++ /* ++ * Check for async #PF completion events when enabling paging, ++ * as the vCPU may have previously encountered async #PFs (it's ++ * entirely legal for the guest to toggle paging on/off without ++ * waiting for the async #PF queue to drain). ++ */ ++ else if (kvm_pv_async_pf_enabled(vcpu)) ++ kvm_make_request(KVM_REQ_APF_READY, vcpu); + } + + if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS) +@@ -3358,13 +3370,6 @@ static int set_msr_mce(struct kvm_vcpu * + return 0; + } + +-static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu) +-{ +- u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT; +- +- return (vcpu->arch.apf.msr_en_val & mask) == mask; +-} +- + static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) + { + gpa_t gpa = data & ~0x3f; diff --git a/queue-6.1/kvm-x86-explicitly-set-new-periodic-hrtimer-expiration-in-apic_timer_fn.patch b/queue-6.1/kvm-x86-explicitly-set-new-periodic-hrtimer-expiration-in-apic_timer_fn.patch new file mode 100644 index 0000000000..84998ac457 --- /dev/null +++ b/queue-6.1/kvm-x86-explicitly-set-new-periodic-hrtimer-expiration-in-apic_timer_fn.patch @@ -0,0 +1,38 @@ +From 9633f180ce994ab293ce4924a9b7aaf4673aa114 Mon Sep 17 00:00:00 2001 +From: fuqiang wang +Date: Thu, 13 Nov 2025 12:51:12 -0800 +Subject: KVM: x86: Explicitly set new periodic hrtimer expiration in apic_timer_fn() + +From: fuqiang wang + +commit 9633f180ce994ab293ce4924a9b7aaf4673aa114 upstream. + +When restarting an hrtimer to emulate a the guest's APIC timer in periodic +mode, explicitly set the expiration using the target expiration computed +by advance_periodic_target_expiration() instead of adding the period to +the existing timer. This will allow making adjustments to the expiration, +e.g. to deal with expirations far in the past, without having to implement +the same logic in both advance_periodic_target_expiration() and +apic_timer_fn(). + +Cc: stable@vger.kernel.org +Signed-off-by: fuqiang wang +[sean: split to separate patch, write changelog] +Link: https://patch.msgid.link/20251113205114.1647493-3-seanjc@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/lapic.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/kvm/lapic.c ++++ b/arch/x86/kvm/lapic.c +@@ -2639,7 +2639,7 @@ static enum hrtimer_restart apic_timer_f + + if (lapic_is_periodic(apic) && !WARN_ON_ONCE(!apic->lapic_timer.period)) { + advance_periodic_target_expiration(apic); +- hrtimer_add_expires_ns(&ktimer->timer, ktimer->period); ++ hrtimer_set_expires(&ktimer->timer, ktimer->target_expiration); + return HRTIMER_RESTART; + } else + return HRTIMER_NORESTART; diff --git a/queue-6.1/kvm-x86-fix-vm-hard-lockup-after-prolonged-inactivity-with-periodic-hv-timer.patch b/queue-6.1/kvm-x86-fix-vm-hard-lockup-after-prolonged-inactivity-with-periodic-hv-timer.patch new file mode 100644 index 0000000000..218ffc65df --- /dev/null +++ b/queue-6.1/kvm-x86-fix-vm-hard-lockup-after-prolonged-inactivity-with-periodic-hv-timer.patch @@ -0,0 +1,134 @@ +From 18ab3fc8e880791aa9f7c000261320fc812b5465 Mon Sep 17 00:00:00 2001 +From: fuqiang wang +Date: Thu, 13 Nov 2025 12:51:13 -0800 +Subject: KVM: x86: Fix VM hard lockup after prolonged inactivity with periodic HV timer + +From: fuqiang wang + +commit 18ab3fc8e880791aa9f7c000261320fc812b5465 upstream. + +When advancing the target expiration for the guest's APIC timer in periodic +mode, set the expiration to "now" if the target expiration is in the past +(similar to what is done in update_target_expiration()). Blindly adding +the period to the previous target expiration can result in KVM generating +a practically unbounded number of hrtimer IRQs due to programming an +expired timer over and over. In extreme scenarios, e.g. if userspace +pauses/suspends a VM for an extended duration, this can even cause hard +lockups in the host. + +Currently, the bug only affects Intel CPUs when using the hypervisor timer +(HV timer), a.k.a. the VMX preemption timer. Unlike the software timer, +a.k.a. hrtimer, which KVM keeps running even on exits to userspace, the +HV timer only runs while the guest is active. As a result, if the vCPU +does not run for an extended duration, there will be a huge gap between +the target expiration and the current time the vCPU resumes running. +Because the target expiration is incremented by only one period on each +timer expiration, this leads to a series of timer expirations occurring +rapidly after the vCPU/VM resumes. + +More critically, when the vCPU first triggers a periodic HV timer +expiration after resuming, advancing the expiration by only one period +will result in a target expiration in the past. As a result, the delta +may be calculated as a negative value. When the delta is converted into +an absolute value (tscdeadline is an unsigned u64), the resulting value +can overflow what the HV timer is capable of programming. I.e. the large +value will exceed the VMX Preemption Timer's maximum bit width of +cpu_preemption_timer_multi + 32, and thus cause KVM to switch from the +HV timer to the software timer (hrtimers). + +After switching to the software timer, periodic timer expiration callbacks +may be executed consecutively within a single clock interrupt handler, +because hrtimers honors KVM's request for an expiration in the past and +immediately re-invokes KVM's callback after reprogramming. And because +the interrupt handler runs with IRQs disabled, restarting KVM's hrtimer +over and over until the target expiration is advanced to "now" can result +in a hard lockup. + +E.g. the following hard lockup was triggered in the host when running a +Windows VM (only relevant because it used the APIC timer in periodic mode) +after resuming the VM from a long suspend (in the host). + + NMI watchdog: Watchdog detected hard LOCKUP on cpu 45 + ... + RIP: 0010:advance_periodic_target_expiration+0x4d/0x80 [kvm] + ... + RSP: 0018:ff4f88f5d98d8ef0 EFLAGS: 00000046 + RAX: fff0103f91be678e RBX: fff0103f91be678e RCX: 00843a7d9e127bcc + RDX: 0000000000000002 RSI: 0052ca4003697505 RDI: ff440d5bfbdbd500 + RBP: ff440d5956f99200 R08: ff2ff2a42deb6a84 R09: 000000000002a6c0 + R10: 0122d794016332b3 R11: 0000000000000000 R12: ff440db1af39cfc0 + R13: ff440db1af39cfc0 R14: ffffffffc0d4a560 R15: ff440db1af39d0f8 + FS: 00007f04a6ffd700(0000) GS:ff440db1af380000(0000) knlGS:000000e38a3b8000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 000000d5651feff8 CR3: 000000684e038002 CR4: 0000000000773ee0 + PKRU: 55555554 + Call Trace: + + apic_timer_fn+0x31/0x50 [kvm] + __hrtimer_run_queues+0x100/0x280 + hrtimer_interrupt+0x100/0x210 + ? ttwu_do_wakeup+0x19/0x160 + smp_apic_timer_interrupt+0x6a/0x130 + apic_timer_interrupt+0xf/0x20 + + +Moreover, if the suspend duration of the virtual machine is not long enough +to trigger a hard lockup in this scenario, since commit 98c25ead5eda +("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86"), KVM +will continue using the software timer until the guest reprograms the APIC +timer in some way. Since the periodic timer does not require frequent APIC +timer register programming, the guest may continue to use the software +timer in perpetuity. + +Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") +Cc: stable@vger.kernel.org +Signed-off-by: fuqiang wang +[sean: massage comments and changelog] +Link: https://patch.msgid.link/20251113205114.1647493-4-seanjc@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/lapic.c | 28 +++++++++++++++++++++++----- + 1 file changed, 23 insertions(+), 5 deletions(-) + +--- a/arch/x86/kvm/lapic.c ++++ b/arch/x86/kvm/lapic.c +@@ -1891,15 +1891,33 @@ static void advance_periodic_target_expi + ktime_t delta; + + /* +- * Synchronize both deadlines to the same time source or +- * differences in the periods (caused by differences in the +- * underlying clocks or numerical approximation errors) will +- * cause the two to drift apart over time as the errors +- * accumulate. ++ * Use kernel time as the time source for both the hrtimer deadline and ++ * TSC-based deadline so that they stay synchronized. Computing each ++ * deadline independently will cause the two deadlines to drift apart ++ * over time as differences in the periods accumulate, e.g. due to ++ * differences in the underlying clocks or numerical approximation errors. + */ + apic->lapic_timer.target_expiration = + ktime_add_ns(apic->lapic_timer.target_expiration, + apic->lapic_timer.period); ++ ++ /* ++ * If the new expiration is in the past, e.g. because userspace stopped ++ * running the VM for an extended duration, then force the expiration ++ * to "now" and don't try to play catch-up with the missed events. KVM ++ * will only deliver a single interrupt regardless of how many events ++ * are pending, i.e. restarting the timer with an expiration in the ++ * past will do nothing more than waste host cycles, and can even lead ++ * to a hard lockup in extreme cases. ++ */ ++ if (ktime_before(apic->lapic_timer.target_expiration, now)) ++ apic->lapic_timer.target_expiration = now; ++ ++ /* ++ * Note, ensuring the expiration isn't in the past also prevents delta ++ * from going negative, which could cause the TSC deadline to become ++ * excessively large due to it an unsigned value. ++ */ + delta = ktime_sub(apic->lapic_timer.target_expiration, now); + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + + nsec_to_cycles(apic->vcpu, delta); diff --git a/queue-6.1/kvm-x86-warn-if-hrtimer-callback-for-periodic-apic-timer-fires-with-period-0.patch b/queue-6.1/kvm-x86-warn-if-hrtimer-callback-for-periodic-apic-timer-fires-with-period-0.patch new file mode 100644 index 0000000000..bdc611fa7c --- /dev/null +++ b/queue-6.1/kvm-x86-warn-if-hrtimer-callback-for-periodic-apic-timer-fires-with-period-0.patch @@ -0,0 +1,36 @@ +From 0ea9494be9c931ddbc084ad5e11fda91b554cf47 Mon Sep 17 00:00:00 2001 +From: Sean Christopherson +Date: Thu, 13 Nov 2025 12:51:11 -0800 +Subject: KVM: x86: WARN if hrtimer callback for periodic APIC timer fires with period=0 + +From: Sean Christopherson + +commit 0ea9494be9c931ddbc084ad5e11fda91b554cf47 upstream. + +WARN and don't restart the hrtimer if KVM's callback runs with the guest's +APIC timer in periodic mode but with a period of '0', as not advancing the +hrtimer's deadline would put the CPU into an infinite loop of hrtimer +events. Observing a period of '0' should be impossible, even when the +hrtimer is running on a different CPU than the vCPU, as KVM is supposed to +cancel the hrtimer before changing (or zeroing) the period, e.g. when +switching from periodic to one-shot. + +Cc: stable@vger.kernel.org +Link: https://patch.msgid.link/20251113205114.1647493-2-seanjc@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/lapic.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/kvm/lapic.c ++++ b/arch/x86/kvm/lapic.c +@@ -2637,7 +2637,7 @@ static enum hrtimer_restart apic_timer_f + + apic_timer_expired(apic, true); + +- if (lapic_is_periodic(apic)) { ++ if (lapic_is_periodic(apic) && !WARN_ON_ONCE(!apic->lapic_timer.period)) { + advance_periodic_target_expiration(apic); + hrtimer_add_expires_ns(&ktimer->timer, ktimer->period); + return HRTIMER_RESTART; diff --git a/queue-6.1/libceph-make-decode_pool-more-resilient-against-corrupted-osdmaps.patch b/queue-6.1/libceph-make-decode_pool-more-resilient-against-corrupted-osdmaps.patch new file mode 100644 index 0000000000..6948c58083 --- /dev/null +++ b/queue-6.1/libceph-make-decode_pool-more-resilient-against-corrupted-osdmaps.patch @@ -0,0 +1,221 @@ +From 8c738512714e8c0aa18f8a10c072d5b01c83db39 Mon Sep 17 00:00:00 2001 +From: Ilya Dryomov +Date: Tue, 2 Dec 2025 10:32:31 +0100 +Subject: libceph: make decode_pool() more resilient against corrupted osdmaps + +From: Ilya Dryomov + +commit 8c738512714e8c0aa18f8a10c072d5b01c83db39 upstream. + +If the osdmap is (maliciously) corrupted such that the encoded length +of ceph_pg_pool envelope is less than what is expected for a particular +encoding version, out-of-bounds reads may ensue because the only bounds +check that is there is based on that length value. + +This patch adds explicit bounds checks for each field that is decoded +or skipped. + +Cc: stable@vger.kernel.org +Reported-by: ziming zhang +Signed-off-by: Ilya Dryomov +Reviewed-by: Xiubo Li +Tested-by: ziming zhang +Signed-off-by: Greg Kroah-Hartman +--- + net/ceph/osdmap.c | 118 ++++++++++++++++++++++++------------------------------ + 1 file changed, 53 insertions(+), 65 deletions(-) + +--- a/net/ceph/osdmap.c ++++ b/net/ceph/osdmap.c +@@ -806,51 +806,49 @@ static int decode_pool(void **p, void *e + ceph_decode_need(p, end, len, bad); + pool_end = *p + len; + ++ ceph_decode_need(p, end, 4 + 4 + 4, bad); + pi->type = ceph_decode_8(p); + pi->size = ceph_decode_8(p); + pi->crush_ruleset = ceph_decode_8(p); + pi->object_hash = ceph_decode_8(p); +- + pi->pg_num = ceph_decode_32(p); + pi->pgp_num = ceph_decode_32(p); + +- *p += 4 + 4; /* skip lpg* */ +- *p += 4; /* skip last_change */ +- *p += 8 + 4; /* skip snap_seq, snap_epoch */ ++ /* lpg*, last_change, snap_seq, snap_epoch */ ++ ceph_decode_skip_n(p, end, 8 + 4 + 8 + 4, bad); + + /* skip snaps */ +- num = ceph_decode_32(p); ++ ceph_decode_32_safe(p, end, num, bad); + while (num--) { +- *p += 8; /* snapid key */ +- *p += 1 + 1; /* versions */ +- len = ceph_decode_32(p); +- *p += len; ++ /* snapid key, pool snap (with versions) */ ++ ceph_decode_skip_n(p, end, 8 + 2, bad); ++ ceph_decode_skip_string(p, end, bad); + } + +- /* skip removed_snaps */ +- num = ceph_decode_32(p); +- *p += num * (8 + 8); ++ /* removed_snaps */ ++ ceph_decode_skip_map(p, end, 64, 64, bad); + ++ ceph_decode_need(p, end, 8 + 8 + 4, bad); + *p += 8; /* skip auid */ + pi->flags = ceph_decode_64(p); + *p += 4; /* skip crash_replay_interval */ + + if (ev >= 7) +- pi->min_size = ceph_decode_8(p); ++ ceph_decode_8_safe(p, end, pi->min_size, bad); + else + pi->min_size = pi->size - pi->size / 2; + + if (ev >= 8) +- *p += 8 + 8; /* skip quota_max_* */ ++ /* quota_max_* */ ++ ceph_decode_skip_n(p, end, 8 + 8, bad); + + if (ev >= 9) { +- /* skip tiers */ +- num = ceph_decode_32(p); +- *p += num * 8; ++ /* tiers */ ++ ceph_decode_skip_set(p, end, 64, bad); + ++ ceph_decode_need(p, end, 8 + 1 + 8 + 8, bad); + *p += 8; /* skip tier_of */ + *p += 1; /* skip cache_mode */ +- + pi->read_tier = ceph_decode_64(p); + pi->write_tier = ceph_decode_64(p); + } else { +@@ -858,86 +856,76 @@ static int decode_pool(void **p, void *e + pi->write_tier = -1; + } + +- if (ev >= 10) { +- /* skip properties */ +- num = ceph_decode_32(p); +- while (num--) { +- len = ceph_decode_32(p); +- *p += len; /* key */ +- len = ceph_decode_32(p); +- *p += len; /* val */ +- } +- } ++ if (ev >= 10) ++ /* properties */ ++ ceph_decode_skip_map(p, end, string, string, bad); + + if (ev >= 11) { +- /* skip hit_set_params */ +- *p += 1 + 1; /* versions */ +- len = ceph_decode_32(p); +- *p += len; ++ /* hit_set_params (with versions) */ ++ ceph_decode_skip_n(p, end, 2, bad); ++ ceph_decode_skip_string(p, end, bad); + +- *p += 4; /* skip hit_set_period */ +- *p += 4; /* skip hit_set_count */ ++ /* hit_set_period, hit_set_count */ ++ ceph_decode_skip_n(p, end, 4 + 4, bad); + } + + if (ev >= 12) +- *p += 4; /* skip stripe_width */ ++ /* stripe_width */ ++ ceph_decode_skip_32(p, end, bad); + +- if (ev >= 13) { +- *p += 8; /* skip target_max_bytes */ +- *p += 8; /* skip target_max_objects */ +- *p += 4; /* skip cache_target_dirty_ratio_micro */ +- *p += 4; /* skip cache_target_full_ratio_micro */ +- *p += 4; /* skip cache_min_flush_age */ +- *p += 4; /* skip cache_min_evict_age */ +- } +- +- if (ev >= 14) { +- /* skip erasure_code_profile */ +- len = ceph_decode_32(p); +- *p += len; +- } ++ if (ev >= 13) ++ /* target_max_*, cache_target_*, cache_min_* */ ++ ceph_decode_skip_n(p, end, 16 + 8 + 8, bad); ++ ++ if (ev >= 14) ++ /* erasure_code_profile */ ++ ceph_decode_skip_string(p, end, bad); + + /* + * last_force_op_resend_preluminous, will be overridden if the + * map was encoded with RESEND_ON_SPLIT + */ + if (ev >= 15) +- pi->last_force_request_resend = ceph_decode_32(p); ++ ceph_decode_32_safe(p, end, pi->last_force_request_resend, bad); + else + pi->last_force_request_resend = 0; + + if (ev >= 16) +- *p += 4; /* skip min_read_recency_for_promote */ ++ /* min_read_recency_for_promote */ ++ ceph_decode_skip_32(p, end, bad); + + if (ev >= 17) +- *p += 8; /* skip expected_num_objects */ ++ /* expected_num_objects */ ++ ceph_decode_skip_64(p, end, bad); + + if (ev >= 19) +- *p += 4; /* skip cache_target_dirty_high_ratio_micro */ ++ /* cache_target_dirty_high_ratio_micro */ ++ ceph_decode_skip_32(p, end, bad); + + if (ev >= 20) +- *p += 4; /* skip min_write_recency_for_promote */ ++ /* min_write_recency_for_promote */ ++ ceph_decode_skip_32(p, end, bad); + + if (ev >= 21) +- *p += 1; /* skip use_gmt_hitset */ ++ /* use_gmt_hitset */ ++ ceph_decode_skip_8(p, end, bad); + + if (ev >= 22) +- *p += 1; /* skip fast_read */ ++ /* fast_read */ ++ ceph_decode_skip_8(p, end, bad); + +- if (ev >= 23) { +- *p += 4; /* skip hit_set_grade_decay_rate */ +- *p += 4; /* skip hit_set_search_last_n */ +- } ++ if (ev >= 23) ++ /* hit_set_grade_decay_rate, hit_set_search_last_n */ ++ ceph_decode_skip_n(p, end, 4 + 4, bad); + + if (ev >= 24) { +- /* skip opts */ +- *p += 1 + 1; /* versions */ +- len = ceph_decode_32(p); +- *p += len; ++ /* opts (with versions) */ ++ ceph_decode_skip_n(p, end, 2, bad); ++ ceph_decode_skip_string(p, end, bad); + } + + if (ev >= 25) +- pi->last_force_request_resend = ceph_decode_32(p); ++ ceph_decode_32_safe(p, end, pi->last_force_request_resend, bad); + + /* ignore the rest */ + diff --git a/queue-6.1/media-vidtv-initialize-local-pointers-upon-transfer-of-memory-ownership.patch b/queue-6.1/media-vidtv-initialize-local-pointers-upon-transfer-of-memory-ownership.patch new file mode 100644 index 0000000000..ded6517515 --- /dev/null +++ b/queue-6.1/media-vidtv-initialize-local-pointers-upon-transfer-of-memory-ownership.patch @@ -0,0 +1,56 @@ +From 98aabfe2d79f74613abc2b0b1cef08f97eaf5322 Mon Sep 17 00:00:00 2001 +From: Jeongjun Park +Date: Fri, 5 Sep 2025 14:18:16 +0900 +Subject: media: vidtv: initialize local pointers upon transfer of memory ownership + +From: Jeongjun Park + +commit 98aabfe2d79f74613abc2b0b1cef08f97eaf5322 upstream. + +vidtv_channel_si_init() creates a temporary list (program, service, event) +and ownership of the memory itself is transferred to the PAT/SDT/EIT +tables through vidtv_psi_pat_program_assign(), +vidtv_psi_sdt_service_assign(), vidtv_psi_eit_event_assign(). + +The problem here is that the local pointer where the memory ownership +transfer was completed is not initialized to NULL. This causes the +vidtv_psi_pmt_create_sec_for_each_pat_entry() function to fail, and +in the flow that jumps to free_eit, the memory that was freed by +vidtv_psi_*_table_destroy() can be accessed again by +vidtv_psi_*_event_destroy() due to the uninitialized local pointer, so it +is freed once again. + +Therefore, to prevent use-after-free and double-free vulnerability, +local pointers must be initialized to NULL when transferring memory +ownership. + +Cc: +Reported-by: syzbot+1d9c0edea5907af239e0@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=1d9c0edea5907af239e0 +Fixes: 3be8037960bc ("media: vidtv: add error checks") +Signed-off-by: Jeongjun Park +Reviewed-by: Daniel Almeida +Signed-off-by: Hans Verkuil +Signed-off-by: Greg Kroah-Hartman +--- + drivers/media/test-drivers/vidtv/vidtv_channel.c | 3 +++ + 1 file changed, 3 insertions(+) + +--- a/drivers/media/test-drivers/vidtv/vidtv_channel.c ++++ b/drivers/media/test-drivers/vidtv/vidtv_channel.c +@@ -461,12 +461,15 @@ int vidtv_channel_si_init(struct vidtv_m + + /* assemble all programs and assign to PAT */ + vidtv_psi_pat_program_assign(m->si.pat, programs); ++ programs = NULL; + + /* assemble all services and assign to SDT */ + vidtv_psi_sdt_service_assign(m->si.sdt, services); ++ services = NULL; + + /* assemble all events and assign to EIT */ + vidtv_psi_eit_event_assign(m->si.eit, events); ++ events = NULL; + + m->si.pmt_secs = vidtv_psi_pmt_create_sec_for_each_pat_entry(m->si.pat, + m->pcr_pid); diff --git a/queue-6.1/nfsd-mark-variable-__maybe_unused-to-avoid-w-1-build-break.patch b/queue-6.1/nfsd-mark-variable-__maybe_unused-to-avoid-w-1-build-break.patch new file mode 100644 index 0000000000..c393e17d03 --- /dev/null +++ b/queue-6.1/nfsd-mark-variable-__maybe_unused-to-avoid-w-1-build-break.patch @@ -0,0 +1,41 @@ +From ebae102897e760e9e6bc625f701dd666b2163bd1 Mon Sep 17 00:00:00 2001 +From: Andy Shevchenko +Date: Thu, 13 Nov 2025 09:31:31 +0100 +Subject: nfsd: Mark variable __maybe_unused to avoid W=1 build break + +From: Andy Shevchenko + +commit ebae102897e760e9e6bc625f701dd666b2163bd1 upstream. + +Clang is not happy about set but (in some cases) unused variable: + +fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable] + +since it's used as a parameter to dprintk() which might be configured +a no-op. To avoid uglifying code with the specific ifdeffery just mark +the variable __maybe_unused. + +The commit [1], which introduced this behaviour, is quite old and hence +the Fixes tag points to the first of the Git era. + +Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0431923fb7a1 [1] +Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") +Cc: stable@vger.kernel.org +Signed-off-by: Andy Shevchenko +Signed-off-by: Chuck Lever +Signed-off-by: Greg Kroah-Hartman +--- + fs/nfsd/export.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/fs/nfsd/export.c ++++ b/fs/nfsd/export.c +@@ -990,7 +990,7 @@ exp_rootfh(struct net *net, struct auth_ + { + struct svc_export *exp; + struct path path; +- struct inode *inode; ++ struct inode *inode __maybe_unused; + struct svc_fh fh; + int err; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); diff --git a/queue-6.1/ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch b/queue-6.1/ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch new file mode 100644 index 0000000000..6d98a26961 --- /dev/null +++ b/queue-6.1/ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch @@ -0,0 +1,67 @@ +From 039bef30e320827bac8990c9f29d2a68cd8adb5f Mon Sep 17 00:00:00 2001 +From: Prithvi Tambewagh +Date: Mon, 1 Dec 2025 18:37:11 +0530 +Subject: ocfs2: fix kernel BUG in ocfs2_find_victim_chain + +From: Prithvi Tambewagh + +commit 039bef30e320827bac8990c9f29d2a68cd8adb5f upstream. + +syzbot reported a kernel BUG in ocfs2_find_victim_chain() because the +`cl_next_free_rec` field of the allocation chain list (next free slot in +the chain list) is 0, triggring the BUG_ON(!cl->cl_next_free_rec) +condition in ocfs2_find_victim_chain() and panicking the kernel. + +To fix this, an if condition is introduced in ocfs2_claim_suballoc_bits(), +just before calling ocfs2_find_victim_chain(), the code block in it being +executed when either of the following conditions is true: + +1. `cl_next_free_rec` is equal to 0, indicating that there are no free +chains in the allocation chain list +2. `cl_next_free_rec` is greater than `cl_count` (the total number of +chains in the allocation chain list) + +Either of them being true is indicative of the fact that there are no +chains left for usage. + +This is addressed using ocfs2_error(), which prints +the error log for debugging purposes, rather than panicking the kernel. + +Link: https://lkml.kernel.org/r/20251201130711.143900-1-activprithvi@gmail.com +Signed-off-by: Prithvi Tambewagh +Reported-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=96d38c6e1655c1420a72 +Tested-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com +Reviewed-by: Joseph Qi +Cc: Mark Fasheh +Cc: Joel Becker +Cc: Junxiao Bi +Cc: Changwei Ge +Cc: Jun Piao +Cc: Heming Zhao +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + fs/ocfs2/suballoc.c | 10 ++++++++++ + 1 file changed, 10 insertions(+) + +--- a/fs/ocfs2/suballoc.c ++++ b/fs/ocfs2/suballoc.c +@@ -1923,6 +1923,16 @@ static int ocfs2_claim_suballoc_bits(str + } + + cl = (struct ocfs2_chain_list *) &fe->id2.i_chain; ++ if (!le16_to_cpu(cl->cl_next_free_rec) || ++ le16_to_cpu(cl->cl_next_free_rec) > le16_to_cpu(cl->cl_count)) { ++ status = ocfs2_error(ac->ac_inode->i_sb, ++ "Chain allocator dinode %llu has invalid next " ++ "free chain record %u, but only %u total\n", ++ (unsigned long long)le64_to_cpu(fe->i_blkno), ++ le16_to_cpu(cl->cl_next_free_rec), ++ le16_to_cpu(cl->cl_count)); ++ goto bail; ++ } + + victim = ocfs2_find_victim_chain(cl); + ac->ac_chain = victim; diff --git a/queue-6.1/parisc-do-not-reprogram-affinitiy-on-asp-chip.patch b/queue-6.1/parisc-do-not-reprogram-affinitiy-on-asp-chip.patch new file mode 100644 index 0000000000..ed8cb4d0cc --- /dev/null +++ b/queue-6.1/parisc-do-not-reprogram-affinitiy-on-asp-chip.patch @@ -0,0 +1,36 @@ +From dca7da244349eef4d78527cafc0bf80816b261f5 Mon Sep 17 00:00:00 2001 +From: Helge Deller +Date: Tue, 25 Nov 2025 15:23:02 +0100 +Subject: parisc: Do not reprogram affinitiy on ASP chip + +From: Helge Deller + +commit dca7da244349eef4d78527cafc0bf80816b261f5 upstream. + +The ASP chip is a very old variant of the GSP chip and is used e.g. in +HP 730 workstations. When trying to reprogram the affinity it will crash +with a HPMC as the relevant registers don't seem to be at the usual +location. Let's avoid the crash by checking the sversion. Also note, +that reprogramming isn't necessary either, as the HP730 is a just a +single-CPU machine. + +Signed-off-by: Helge Deller +Cc: stable@vger.kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + drivers/parisc/gsc.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +--- a/drivers/parisc/gsc.c ++++ b/drivers/parisc/gsc.c +@@ -154,7 +154,9 @@ static int gsc_set_affinity_irq(struct i + gsc_dev->eim = ((u32) gsc_dev->gsc_irq.txn_addr) | gsc_dev->gsc_irq.txn_data; + + /* switch IRQ's for devices below LASI/WAX to other CPU */ +- gsc_writel(gsc_dev->eim, gsc_dev->hpa + OFFSET_IAR); ++ /* ASP chip (svers 0x70) does not support reprogramming */ ++ if (gsc_dev->gsc->id.sversion != 0x70) ++ gsc_writel(gsc_dev->eim, gsc_dev->hpa + OFFSET_IAR); + + irq_data_update_effective_affinity(d, &tmask); + diff --git a/queue-6.1/platform-chrome-cros_ec_ishtp-fix-uaf-after-unbinding-driver.patch b/queue-6.1/platform-chrome-cros_ec_ishtp-fix-uaf-after-unbinding-driver.patch new file mode 100644 index 0000000000..44d435d4d8 --- /dev/null +++ b/queue-6.1/platform-chrome-cros_ec_ishtp-fix-uaf-after-unbinding-driver.patch @@ -0,0 +1,34 @@ +From 944edca81e7aea15f83cf9a13a6ab67f711e8abd Mon Sep 17 00:00:00 2001 +From: Tzung-Bi Shih +Date: Fri, 31 Oct 2025 03:39:00 +0000 +Subject: platform/chrome: cros_ec_ishtp: Fix UAF after unbinding driver + +From: Tzung-Bi Shih + +commit 944edca81e7aea15f83cf9a13a6ab67f711e8abd upstream. + +After unbinding the driver, another kthread `cros_ec_console_log_work` +is still accessing the device, resulting an UAF and crash. + +The driver doesn't unregister the EC device in .remove() which should +shutdown sub-devices synchronously. Fix it. + +Fixes: 26a14267aff2 ("platform/chrome: Add ChromeOS EC ISHTP driver") +Cc: stable@vger.kernel.org +Link: https://lore.kernel.org/r/20251031033900.3577394-1-tzungbi@kernel.org +Signed-off-by: Tzung-Bi Shih +Signed-off-by: Greg Kroah-Hartman +--- + drivers/platform/chrome/cros_ec_ishtp.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/drivers/platform/chrome/cros_ec_ishtp.c ++++ b/drivers/platform/chrome/cros_ec_ishtp.c +@@ -715,6 +715,7 @@ static void cros_ec_ishtp_remove(struct + + cancel_work_sync(&client_data->work_ishtp_reset); + cancel_work_sync(&client_data->work_ec_evt); ++ cros_ec_unregister(client_data->ec_dev); + cros_ish_deinit(cros_ish_cl); + ishtp_put_device(cl_device); + } diff --git a/queue-6.1/pm-runtime-do-not-clear-needs_force_resume-with-enabled-runtime-pm.patch b/queue-6.1/pm-runtime-do-not-clear-needs_force_resume-with-enabled-runtime-pm.patch new file mode 100644 index 0000000000..233af7e291 --- /dev/null +++ b/queue-6.1/pm-runtime-do-not-clear-needs_force_resume-with-enabled-runtime-pm.patch @@ -0,0 +1,63 @@ +From 359afc8eb02a518fbdd0cbd462c8c2827c6cbec2 Mon Sep 17 00:00:00 2001 +From: "Rafael J. Wysocki" +Date: Mon, 15 Dec 2025 15:21:34 +0100 +Subject: PM: runtime: Do not clear needs_force_resume with enabled runtime PM + +From: Rafael J. Wysocki + +commit 359afc8eb02a518fbdd0cbd462c8c2827c6cbec2 upstream. + +Commit 89d9cec3b1e9 ("PM: runtime: Clear power.needs_force_resume in +pm_runtime_reinit()") added provisional clearing of power.needs_force_resume +to pm_runtime_reinit(), but it is done unconditionally which is a +mistake because pm_runtime_reinit() may race with driver probing +and removal [1]. + +To address this, notice that power.needs_force_resume should never +be set when runtime PM is enabled and so it only needs to be cleared +when runtime PM is disabled, and update pm_runtime_init() to only +clear that flag when runtime PM is disabled. + +Fixes: 89d9cec3b1e9 ("PM: runtime: Clear power.needs_force_resume in pm_runtime_reinit()") +Reported-by: Ed Tsai +Closes: https://lore.kernel.org/linux-pm/20251215122154.3180001-1-ed.tsai@mediatek.com/ [1] +Signed-off-by: Rafael J. Wysocki +Cc: 6.17+ # 6.17+ +Reviewed-by: Ulf Hansson +Link: https://patch.msgid.link/12807571.O9o76ZdvQC@rafael.j.wysocki +Signed-off-by: Greg Kroah-Hartman +--- + drivers/base/power/runtime.c | 22 ++++++++++++---------- + 1 file changed, 12 insertions(+), 10 deletions(-) + +--- a/drivers/base/power/runtime.c ++++ b/drivers/base/power/runtime.c +@@ -1786,16 +1786,18 @@ void pm_runtime_init(struct device *dev) + */ + void pm_runtime_reinit(struct device *dev) + { +- if (!pm_runtime_enabled(dev)) { +- if (dev->power.runtime_status == RPM_ACTIVE) +- pm_runtime_set_suspended(dev); +- if (dev->power.irq_safe) { +- spin_lock_irq(&dev->power.lock); +- dev->power.irq_safe = 0; +- spin_unlock_irq(&dev->power.lock); +- if (dev->parent) +- pm_runtime_put(dev->parent); +- } ++ if (pm_runtime_enabled(dev)) ++ return; ++ ++ if (dev->power.runtime_status == RPM_ACTIVE) ++ pm_runtime_set_suspended(dev); ++ ++ if (dev->power.irq_safe) { ++ spin_lock_irq(&dev->power.lock); ++ dev->power.irq_safe = 0; ++ spin_unlock_irq(&dev->power.lock); ++ if (dev->parent) ++ pm_runtime_put(dev->parent); + } + /* + * Clear power.needs_force_resume in case it has been set by diff --git a/queue-6.1/powerpc-kexec-enable-smt-before-waking-offline-cpus.patch b/queue-6.1/powerpc-kexec-enable-smt-before-waking-offline-cpus.patch new file mode 100644 index 0000000000..18fa4fb6b6 --- /dev/null +++ b/queue-6.1/powerpc-kexec-enable-smt-before-waking-offline-cpus.patch @@ -0,0 +1,83 @@ +From c2296a1e42418556efbeb5636c4fa6aa6106713a Mon Sep 17 00:00:00 2001 +From: "Nysal Jan K.A." +Date: Tue, 28 Oct 2025 16:25:12 +0530 +Subject: powerpc/kexec: Enable SMT before waking offline CPUs + +From: Nysal Jan K.A. + +commit c2296a1e42418556efbeb5636c4fa6aa6106713a upstream. + +If SMT is disabled or a partial SMT state is enabled, when a new kernel +image is loaded for kexec, on reboot the following warning is observed: + +kexec: Waking offline cpu 228. +WARNING: CPU: 0 PID: 9062 at arch/powerpc/kexec/core_64.c:223 kexec_prepare_cpus+0x1b0/0x1bc +[snip] + NIP kexec_prepare_cpus+0x1b0/0x1bc + LR kexec_prepare_cpus+0x1a0/0x1bc + Call Trace: + kexec_prepare_cpus+0x1a0/0x1bc (unreliable) + default_machine_kexec+0x160/0x19c + machine_kexec+0x80/0x88 + kernel_kexec+0xd0/0x118 + __do_sys_reboot+0x210/0x2c4 + system_call_exception+0x124/0x320 + system_call_vectored_common+0x15c/0x2ec + +This occurs as add_cpu() fails due to cpu_bootable() returning false for +CPUs that fail the cpu_smt_thread_allowed() check or non primary +threads if SMT is disabled. + +Fix the issue by enabling SMT and resetting the number of SMT threads to +the number of threads per core, before attempting to wake up all present +CPUs. + +Fixes: 38253464bc82 ("cpu/SMT: Create topology_smt_thread_allowed()") +Reported-by: Sachin P Bappalige +Cc: stable@vger.kernel.org # v6.6+ +Reviewed-by: Srikar Dronamraju +Signed-off-by: Nysal Jan K.A. +Tested-by: Samir M +Reviewed-by: Sourabh Jain +Signed-off-by: Madhavan Srinivasan +Link: https://patch.msgid.link/20251028105516.26258-1-nysal@linux.ibm.com +Signed-off-by: Greg Kroah-Hartman +--- + arch/powerpc/kexec/core_64.c | 19 +++++++++++++++++++ + 1 file changed, 19 insertions(+) + +--- a/arch/powerpc/kexec/core_64.c ++++ b/arch/powerpc/kexec/core_64.c +@@ -202,6 +202,23 @@ static void kexec_prepare_cpus_wait(int + mb(); + } + ++ ++/* ++ * The add_cpu() call in wake_offline_cpus() can fail as cpu_bootable() ++ * returns false for CPUs that fail the cpu_smt_thread_allowed() check ++ * or non primary threads if SMT is disabled. Re-enable SMT and set the ++ * number of SMT threads to threads per core. ++ */ ++static void kexec_smt_reenable(void) ++{ ++#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) ++ lock_device_hotplug(); ++ cpu_smt_num_threads = threads_per_core; ++ cpu_smt_control = CPU_SMT_ENABLED; ++ unlock_device_hotplug(); ++#endif ++} ++ + /* + * We need to make sure each present CPU is online. The next kernel will scan + * the device tree and assume primary threads are online and query secondary +@@ -216,6 +233,8 @@ static void wake_offline_cpus(void) + { + int cpu = 0; + ++ kexec_smt_reenable(); ++ + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + printk(KERN_INFO "kexec: Waking offline cpu %d.\n", diff --git a/queue-6.1/r8169-fix-rtl8117-wake-on-lan-in-dash-mode.patch b/queue-6.1/r8169-fix-rtl8117-wake-on-lan-in-dash-mode.patch new file mode 100644 index 0000000000..cda310a398 --- /dev/null +++ b/queue-6.1/r8169-fix-rtl8117-wake-on-lan-in-dash-mode.patch @@ -0,0 +1,51 @@ +From dd75c723ef566f7f009c047f47e0eee95fe348ab Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Ren=C3=A9=20Rebe?= +Date: Tue, 2 Dec 2025 19:41:37 +0100 +Subject: r8169: fix RTL8117 Wake-on-Lan in DASH mode +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: René Rebe + +commit dd75c723ef566f7f009c047f47e0eee95fe348ab upstream. + +Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the +ASUS Pro WS X570-ACE with RTL8168fp/RTL8117. + +Fix by not returning early in rtl_prepare_power_down when dash_enabled. +While this fixes WoL, it still kills the OOB RTL8117 remote management +BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled. + +Fixes: 065c27c184d6 ("r8169: phy power ops") +Signed-off-by: René Rebe +Cc: stable@vger.kernel.org +Reviewed-by: Heiner Kallweit +Link: https://patch.msgid.link/20251202.194137.1647877804487085954.rene@exactco.de +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + drivers/net/ethernet/realtek/r8169_main.c | 5 +---- + 1 file changed, 1 insertion(+), 4 deletions(-) + +--- a/drivers/net/ethernet/realtek/r8169_main.c ++++ b/drivers/net/ethernet/realtek/r8169_main.c +@@ -2565,9 +2565,6 @@ static void rtl_wol_enable_rx(struct rtl + + static void rtl_prepare_power_down(struct rtl8169_private *tp) + { +- if (tp->dash_enabled) +- return; +- + if (tp->mac_version == RTL_GIGA_MAC_VER_32 || + tp->mac_version == RTL_GIGA_MAC_VER_33) + rtl_ephy_write(tp, 0x19, 0xff64); +@@ -4752,7 +4749,7 @@ static void rtl8169_down(struct rtl8169_ + rtl_disable_exit_l1(tp); + rtl_prepare_power_down(tp); + +- if (tp->dash_type != RTL_DASH_NONE) ++ if (tp->dash_type != RTL_DASH_NONE && !tp->saved_wolopts) + rtl8168_driver_stop(tp); + } + diff --git a/queue-6.1/scs-fix-a-wrong-parameter-in-__scs_magic.patch b/queue-6.1/scs-fix-a-wrong-parameter-in-__scs_magic.patch new file mode 100644 index 0000000000..17831e0825 --- /dev/null +++ b/queue-6.1/scs-fix-a-wrong-parameter-in-__scs_magic.patch @@ -0,0 +1,65 @@ +From 08bd4c46d5e63b78e77f2605283874bbe868ab19 Mon Sep 17 00:00:00 2001 +From: Zhichi Lin +Date: Sat, 11 Oct 2025 16:22:22 +0800 +Subject: scs: fix a wrong parameter in __scs_magic + +From: Zhichi Lin + +commit 08bd4c46d5e63b78e77f2605283874bbe868ab19 upstream. + +__scs_magic() needs a 'void *' variable, but a 'struct task_struct *' is +given. 'task_scs(tsk)' is the starting address of the task's shadow call +stack, and '__scs_magic(task_scs(tsk))' is the end address of the task's +shadow call stack. Here should be '__scs_magic(task_scs(tsk))'. + +The user-visible effect of this bug is that when CONFIG_DEBUG_STACK_USAGE +is enabled, the shadow call stack usage checking function +(scs_check_usage) would scan an incorrect memory range. This could lead +to: + +1. **Inaccurate stack usage reporting**: The function would calculate + wrong usage statistics for the shadow call stack, potentially showing + incorrect value in kmsg. + +2. **Potential kernel crash**: If the value of __scs_magic(tsk)is + greater than that of __scs_magic(task_scs(tsk)), the for loop may + access unmapped memory, potentially causing a kernel panic. However, + this scenario is unlikely because task_struct is allocated via the slab + allocator (which typically returns lower addresses), while the shadow + call stack returned by task_scs(tsk) is allocated via vmalloc(which + typically returns higher addresses). + +However, since this is purely a debugging feature +(CONFIG_DEBUG_STACK_USAGE), normal production systems should be not +unaffected. The bug only impacts developers and testers who are actively +debugging stack usage with this configuration enabled. + +Link: https://lkml.kernel.org/r/20251011082222.12965-1-zhichi.lin@vivo.com +Fixes: 5bbaf9d1fcb9 ("scs: Add support for stack usage debugging") +Signed-off-by: Jiyuan Xie +Signed-off-by: Zhichi Lin +Reviewed-by: Sami Tolvanen +Acked-by: Will Deacon +Cc: Andrey Konovalov +Cc: Kees Cook +Cc: Marco Elver +Cc: Will Deacon +Cc: Yee Lee +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + kernel/scs.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/kernel/scs.c ++++ b/kernel/scs.c +@@ -125,7 +125,7 @@ static void scs_check_usage(struct task_ + if (!IS_ENABLED(CONFIG_DEBUG_STACK_USAGE)) + return; + +- for (p = task_scs(tsk); p < __scs_magic(tsk); ++p) { ++ for (p = task_scs(tsk); p < __scs_magic(task_scs(tsk)); ++p) { + if (!READ_ONCE_NOCHECK(*p)) + break; + used += sizeof(*p); diff --git a/queue-6.1/series b/queue-6.1/series index 0faf30b3ae..16c6e2ade6 100644 --- a/queue-6.1/series +++ b/queue-6.1/series @@ -348,3 +348,28 @@ nfsd-use-correct-reservation-type-in-nfsd4_scsi_fence_client.patch scsi-target-reset-t_task_cdb-pointer-in-error-case.patch f2fs-invalidate-dentry-cache-on-failed-whiteout-creation.patch f2fs-fix-return-value-of-f2fs_recover_fsync_data.patch +tools-testing-nvdimm-use-per-dimm-device-handle.patch +media-vidtv-initialize-local-pointers-upon-transfer-of-memory-ownership.patch +ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch +kvm-x86-don-t-clear-async-pf-queue-when-cr0.pg-is-disabled-e.g.-on-smi.patch +platform-chrome-cros_ec_ishtp-fix-uaf-after-unbinding-driver.patch +scs-fix-a-wrong-parameter-in-__scs_magic.patch +parisc-do-not-reprogram-affinitiy-on-asp-chip.patch +libceph-make-decode_pool-more-resilient-against-corrupted-osdmaps.patch +kvm-x86-warn-if-hrtimer-callback-for-periodic-apic-timer-fires-with-period-0.patch +kvm-x86-explicitly-set-new-periodic-hrtimer-expiration-in-apic_timer_fn.patch +kvm-x86-fix-vm-hard-lockup-after-prolonged-inactivity-with-periodic-hv-timer.patch +kvm-nsvm-avoid-incorrect-injection-of-svm_exit_cr0_sel_write.patch +kvm-svm-mark-vmcb_npt-as-dirty-on-nested-vmrun.patch +kvm-nsvm-propagate-svm_exit_cr0_sel_write-correctly-for-lmsw-emulation.patch +kvm-svm-mark-vmcb_perm_map-as-dirty-on-nested-vmrun.patch +kvm-nsvm-set-exit_code_hi-to-1-when-synthesizing-svm_exit_err-failed-vmrun.patch +kvm-nsvm-clear-exit_code_hi-in-vmcb-when-synthesizing-nested-vm-exits.patch +xfs-fix-a-memory-leak-in-xfs_buf_item_init.patch +tracing-do-not-register-unsupported-perf-events.patch +pm-runtime-do-not-clear-needs_force_resume-with-enabled-runtime-pm.patch +r8169-fix-rtl8117-wake-on-lan-in-dash-mode.patch +fsnotify-do-not-generate-access-modify-events-on-child-for-special-files.patch +nfsd-mark-variable-__maybe_unused-to-avoid-w-1-build-break.patch +svcrdma-return-0-on-success-from-svc_rdma_copy_inline_range.patch +powerpc-kexec-enable-smt-before-waking-offline-cpus.patch diff --git a/queue-6.1/svcrdma-return-0-on-success-from-svc_rdma_copy_inline_range.patch b/queue-6.1/svcrdma-return-0-on-success-from-svc_rdma_copy_inline_range.patch new file mode 100644 index 0000000000..1ff6c8eaa3 --- /dev/null +++ b/queue-6.1/svcrdma-return-0-on-success-from-svc_rdma_copy_inline_range.patch @@ -0,0 +1,32 @@ +From 94972027ab55b200e031059fd6c7a649f8248020 Mon Sep 17 00:00:00 2001 +From: Joshua Rogers +Date: Fri, 7 Nov 2025 10:09:48 -0500 +Subject: svcrdma: return 0 on success from svc_rdma_copy_inline_range + +From: Joshua Rogers + +commit 94972027ab55b200e031059fd6c7a649f8248020 upstream. + +The function comment specifies 0 on success and -EINVAL on invalid +parameters. Make the tail return 0 after a successful copy loop. + +Fixes: d7cc73972661 ("svcrdma: support multiple Read chunks per RPC") +Cc: stable@vger.kernel.org +Signed-off-by: Joshua Rogers +Signed-off-by: Chuck Lever +Signed-off-by: Greg Kroah-Hartman +--- + net/sunrpc/xprtrdma/svc_rdma_rw.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c ++++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c +@@ -830,7 +830,7 @@ static int svc_rdma_copy_inline_range(st + offset += page_len; + } + +- return -EINVAL; ++ return 0; + } + + /** diff --git a/queue-6.1/tools-testing-nvdimm-use-per-dimm-device-handle.patch b/queue-6.1/tools-testing-nvdimm-use-per-dimm-device-handle.patch new file mode 100644 index 0000000000..1f1a453134 --- /dev/null +++ b/queue-6.1/tools-testing-nvdimm-use-per-dimm-device-handle.patch @@ -0,0 +1,61 @@ +From f59b701b4674f7955170b54c4167c5590f4714eb Mon Sep 17 00:00:00 2001 +From: Alison Schofield +Date: Fri, 31 Oct 2025 16:42:20 -0700 +Subject: tools/testing/nvdimm: Use per-DIMM device handle + +From: Alison Schofield + +commit f59b701b4674f7955170b54c4167c5590f4714eb upstream. + +KASAN reports a global-out-of-bounds access when running these nfit +tests: clear.sh, pmem-errors.sh, pfn-meta-errors.sh, btt-errors.sh, +daxdev-errors.sh, and inject-error.sh. + +[] BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x769f/0x7840 [nfit_test] +[] Read of size 4 at addr ffffffffc03ea01c by task ndctl/1215 +[] The buggy address belongs to the variable: +[] handle+0x1c/0x1df4 [nfit_test] + +nfit_test_search_spa() uses handle[nvdimm->id] to retrieve a device +handle and triggers a KASAN error when it reads past the end of the +handle array. It should not be indexing the handle array at all. + +The correct device handle is stored in per-DIMM test data. Each DIMM +has a struct nfit_mem that embeds a struct acpi_nfit_memdev that +describes the NFIT device handle. Use that device handle here. + +Fixes: 10246dc84dfc ("acpi nfit: nfit_test supports translate SPA") +Cc: stable@vger.kernel.org +Signed-off-by: Alison Schofield +Reviewed-by: Dave Jiang > --- +Link: https://patch.msgid.link/20251031234227.1303113-1-alison.schofield@intel.com +Signed-off-by: Ira Weiny +Signed-off-by: Greg Kroah-Hartman +--- + tools/testing/nvdimm/test/nfit.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +--- a/tools/testing/nvdimm/test/nfit.c ++++ b/tools/testing/nvdimm/test/nfit.c +@@ -670,6 +670,7 @@ static int nfit_test_search_spa(struct n + .addr = spa->spa, + .region = NULL, + }; ++ struct nfit_mem *nfit_mem; + u64 dpa; + + ret = device_for_each_child(&bus->dev, &ctx, +@@ -687,8 +688,12 @@ static int nfit_test_search_spa(struct n + */ + nd_mapping = &nd_region->mapping[nd_region->ndr_mappings - 1]; + nvdimm = nd_mapping->nvdimm; ++ nfit_mem = nvdimm_provider_data(nvdimm); ++ if (!nfit_mem) ++ return -EINVAL; + +- spa->devices[0].nfit_device_handle = handle[nvdimm->id]; ++ spa->devices[0].nfit_device_handle = ++ __to_nfit_memdev(nfit_mem)->device_handle; + spa->num_nvdimms = 1; + spa->devices[0].dpa = dpa; + diff --git a/queue-6.1/tracing-do-not-register-unsupported-perf-events.patch b/queue-6.1/tracing-do-not-register-unsupported-perf-events.patch new file mode 100644 index 0000000000..f9452c21fc --- /dev/null +++ b/queue-6.1/tracing-do-not-register-unsupported-perf-events.patch @@ -0,0 +1,82 @@ +From ef7f38df890f5dcd2ae62f8dbde191d72f3bebae Mon Sep 17 00:00:00 2001 +From: Steven Rostedt +Date: Tue, 16 Dec 2025 18:24:40 -0500 +Subject: tracing: Do not register unsupported perf events + +From: Steven Rostedt + +commit ef7f38df890f5dcd2ae62f8dbde191d72f3bebae upstream. + +Synthetic events currently do not have a function to register perf events. +This leads to calling the tracepoint register functions with a NULL +function pointer which triggers: + + ------------[ cut here ]------------ + WARNING: kernel/tracepoint.c:175 at tracepoint_add_func+0x357/0x370, CPU#2: perf/2272 + Modules linked in: kvm_intel kvm irqbypass + CPU: 2 UID: 0 PID: 2272 Comm: perf Not tainted 6.18.0-ftest-11964-ge022764176fc-dirty #323 PREEMPTLAZY + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 + RIP: 0010:tracepoint_add_func+0x357/0x370 + Code: 28 9c e8 4c 0b f5 ff eb 0f 4c 89 f7 48 c7 c6 80 4d 28 9c e8 ab 89 f4 ff 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc <0f> 0b 49 c7 c6 ea ff ff ff e9 ee fe ff ff 0f 0b e9 f9 fe ff ff 0f + RSP: 0018:ffffabc0c44d3c40 EFLAGS: 00010246 + RAX: 0000000000000001 RBX: ffff9380aa9e4060 RCX: 0000000000000000 + RDX: 000000000000000a RSI: ffffffff9e1d4a98 RDI: ffff937fcf5fd6c8 + RBP: 0000000000000001 R08: 0000000000000007 R09: ffff937fcf5fc780 + R10: 0000000000000003 R11: ffffffff9c193910 R12: 000000000000000a + R13: ffffffff9e1e5888 R14: 0000000000000000 R15: ffffabc0c44d3c78 + FS: 00007f6202f5f340(0000) GS:ffff93819f00f000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 000055d3162281a8 CR3: 0000000106a56003 CR4: 0000000000172ef0 + Call Trace: + + tracepoint_probe_register+0x5d/0x90 + synth_event_reg+0x3c/0x60 + perf_trace_event_init+0x204/0x340 + perf_trace_init+0x85/0xd0 + perf_tp_event_init+0x2e/0x50 + perf_try_init_event+0x6f/0x230 + ? perf_event_alloc+0x4bb/0xdc0 + perf_event_alloc+0x65a/0xdc0 + __se_sys_perf_event_open+0x290/0x9f0 + do_syscall_64+0x93/0x7b0 + ? entry_SYSCALL_64_after_hwframe+0x76/0x7e + ? trace_hardirqs_off+0x53/0xc0 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + +Instead, have the code return -ENODEV, which doesn't warn and has perf +error out with: + + # perf record -e synthetic:futex_wait +Error: +The sys_perf_event_open() syscall returned with 19 (No such device) for event (synthetic:futex_wait). +"dmesg | grep -i perf" may provide additional information. + +Ideally perf should support synthetic events, but for now just fix the +warning. The support can come later. + +Cc: stable@vger.kernel.org +Cc: Masami Hiramatsu +Cc: Mathieu Desnoyers +Cc: Arnaldo Carvalho de Melo +Cc: Jiri Olsa +Cc: Namhyung Kim +Link: https://patch.msgid.link/20251216182440.147e4453@gandalf.local.home +Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events") +Reported-by: Ian Rogers +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace_events.c | 2 ++ + 1 file changed, 2 insertions(+) + +--- a/kernel/trace/trace_events.c ++++ b/kernel/trace/trace_events.c +@@ -675,6 +675,8 @@ int trace_event_reg(struct trace_event_c + + #ifdef CONFIG_PERF_EVENTS + case TRACE_REG_PERF_REGISTER: ++ if (!call->class->perf_probe) ++ return -ENODEV; + return tracepoint_probe_register(call->tp, + call->class->perf_probe, + call); diff --git a/queue-6.1/xfs-fix-a-memory-leak-in-xfs_buf_item_init.patch b/queue-6.1/xfs-fix-a-memory-leak-in-xfs_buf_item_init.patch new file mode 100644 index 0000000000..f03512c62e --- /dev/null +++ b/queue-6.1/xfs-fix-a-memory-leak-in-xfs_buf_item_init.patch @@ -0,0 +1,33 @@ +From fc40459de82543b565ebc839dca8f7987f16f62e Mon Sep 17 00:00:00 2001 +From: Haoxiang Li +Date: Wed, 10 Dec 2025 17:06:01 +0800 +Subject: xfs: fix a memory leak in xfs_buf_item_init() + +From: Haoxiang Li + +commit fc40459de82543b565ebc839dca8f7987f16f62e upstream. + +xfs_buf_item_get_format() may allocate memory for bip->bli_formats, +free the memory in the error path. + +Fixes: c3d5f0c2fb85 ("xfs: complain if anyone tries to create a too-large buffer log item") +Cc: stable@vger.kernel.org +Signed-off-by: Haoxiang Li +Reviewed-by: Christoph Hellwig +Reviewed-by: Carlos Maiolino +Signed-off-by: Carlos Maiolino +Signed-off-by: Greg Kroah-Hartman +--- + fs/xfs/xfs_buf_item.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/fs/xfs/xfs_buf_item.c ++++ b/fs/xfs/xfs_buf_item.c +@@ -900,6 +900,7 @@ xfs_buf_item_init( + map_size = DIV_ROUND_UP(chunks, NBWORD); + + if (map_size > XFS_BLF_DATAMAP_SIZE) { ++ xfs_buf_item_free_format(bip); + kmem_cache_free(xfs_buf_item_cache, bip); + xfs_err(mp, + "buffer item dirty bitmap (%u uints) too small to reflect %u bytes!",