]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
5 weeks agoKVM: VMX: Add mediated PMU support for CPUs without "save perf global ctrl"
Sean Christopherson [Sat, 6 Dec 2025 00:17:20 +0000 (16:17 -0800)] 
KVM: VMX: Add mediated PMU support for CPUs without "save perf global ctrl"

Extend mediated PMU support for Intel CPUs without support for saving
PERF_GLOBAL_CONTROL into the guest VMCS field on VM-Exit, e.g. for Skylake
and its derivatives, as well as Icelake.  While supporting CPUs without
VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL isn't completely trivial, it's not that
complex either.  And not supporting such CPUs would mean not supporting 7+
years of Intel CPUs released in the past 10 years.

On VM-Exit, immediately propagate the saved PERF_GLOBAL_CTRL to the VMCS
as well as KVM's software cache so that KVM doesn't need to add full EXREG
tracking of PERF_GLOBAL_CTRL.  In practice, the vast majority of VM-Exits
won't trigger software writes to guest PERF_GLOBAL_CTRL, so deferring the
VMWRITE to the next VM-Enter would only delay the inevitable without
batching/avoiding VMWRITEs.

Note!  Take care to refresh VM_EXIT_MSR_STORE_COUNT on nested VM-Exit, as
it's unfortunately possible that KVM could recalculate MSR intercepts
while L2 is active, e.g. if userspace loads nested state and _then_ sets
PERF_CAPABILITIES.  Eating the VMWRITE on every nested VM-Exit is
unfortunate, but that's a pre-existing problem and can/should be solved
separately, e.g. modifying the number of auto-load entries while L2 is
active is also uncommon on modern CPUs.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-45-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Initialize vmcs01.VM_EXIT_MSR_STORE_ADDR with list address
Sean Christopherson [Sat, 6 Dec 2025 00:17:19 +0000 (16:17 -0800)] 
KVM: VMX: Initialize vmcs01.VM_EXIT_MSR_STORE_ADDR with list address

Initialize vmcs01.VM_EXIT_MSR_STORE_ADDR to point at the vCPU's
msr_autostore list in anticipation of utilizing the auto-store
functionality, and to harden KVM against stray reads to pfn 0 (or, in
theory, a random pfn if the underlying CPU uses a complex scheme for
encoding VMCS data).  The MSR auto lists are supposed to be ignored if the
associated COUNT VMCS field is '0', but leaving the ADDR field
zero-initialized in memory is an unnecessary risk (albeit a minuscule risk)
given that the cost is a single VMWRITE during vCPU creation.

Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-44-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Dedup code for adding MSR to VMCS's auto list
Sean Christopherson [Sat, 6 Dec 2025 00:17:18 +0000 (16:17 -0800)] 
KVM: VMX: Dedup code for adding MSR to VMCS's auto list

Add a helper to add an MSR to a VMCS's "auto" list to deduplicate the code
in add_atomic_switch_msr(), and so that the functionality can be used in
the future for managing the MSR auto-store list.

No functional change intended.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-43-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Compartmentalize adding MSRs to host vs. guest auto-load list
Sean Christopherson [Sat, 6 Dec 2025 00:17:17 +0000 (16:17 -0800)] 
KVM: VMX: Compartmentalize adding MSRs to host vs. guest auto-load list

Undo the bundling of the "host" and "guest" MSR auto-load list logic so
that the code can be deduplicated by factoring out the logic to a separate
helper.  Now that "list full" situations are treated as fatal to the VM,
there is no need to pre-check both lists.

For all intents and purposes, this reverts the add_atomic_switch_msr()
changes made by commit 3190709335dd ("x86/KVM/VMX: Separate the VMX
AUTOLOAD guest/host number accounting").

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-42-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Set MSR index auto-load entry if and only if entry is "new"
Sean Christopherson [Sat, 6 Dec 2025 00:17:16 +0000 (16:17 -0800)] 
KVM: VMX: Set MSR index auto-load entry if and only if entry is "new"

When adding an MSR to the auto-load lists, update the MSR index in the
list entry if and only if a new entry is being inserted, as 'i' can only
be non-negative if vmx_find_loadstore_msr_slot() found an entry with the
MSR's index.  Unnecessarily setting the index is benign, but it makes it
harder to see that updating the value is necessary even when an existing
entry for the MSR was found.

No functional change intended.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-41-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Bug the VM if either MSR auto-load list is full
Sean Christopherson [Sat, 6 Dec 2025 00:17:15 +0000 (16:17 -0800)] 
KVM: VMX: Bug the VM if either MSR auto-load list is full

WARN and bug the VM if either MSR auto-load list is full when adding an
MSR to the lists, as the set of MSRs that KVM loads via the lists is
finite and entirely KVM controlled, i.e. overflowing the lists shouldn't
be possible in a fully released version of KVM.  Terminate the VM as the
core KVM infrastructure has no insight as to _why_ an MSR is being added
to the list, and failure to load an MSR on VM-Enter and/or VM-Exit could
be fatal to the host.  E.g. running the host with a guest-controlled PEBS
MSR could generate unexpected writes to the DS buffer and crash the host.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-40-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Drop unused @entry_only param from add_atomic_switch_msr()
Sean Christopherson [Sat, 6 Dec 2025 00:17:14 +0000 (16:17 -0800)] 
KVM: VMX: Drop unused @entry_only param from add_atomic_switch_msr()

Drop the "on VM-Enter only" parameter from add_atomic_switch_msr() as it
is no longer used, and for all intents and purposes was never used.  The
functionality was added, under embargo, by commit 989e3992d2ec
("x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs"),
and then ripped out by commit 2f055947ae5e ("x86/kvm: Drop L1TF MSR list
approach") just a few commits later.

  2f055947ae5e x86/kvm: Drop L1TF MSR list approach
  72c6d2db64fa x86/litf: Introduce vmx status variable
  215af5499d9e cpu/hotplug: Online siblings when SMT control is turned on
  390d975e0c4e x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
  989e3992d2ec x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs

Furthermore, it's extremely unlikely KVM will ever _need_ to load an MSR
value via the auto-load lists only on VM-Enter.  MSRs writes via the lists
aren't optimized in any way, and so the only reason to use the lists
instead of a WRMSR are for cases where the MSR _must_ be load atomically
with respect to VM-Enter (and/or VM-Exit).  While one could argue that
command MSRs, e.g. IA32_FLUSH_CMD, "need" to be done exact at VM-Enter, in
practice doing such flushes within a few instructons of VM-Enter is more
than sufficient.

Note, the shortlog and changelog for commit 390d975e0c4e ("x86/KVM/VMX: Use
MSR save list for IA32_FLUSH_CMD if required") are misleading and wrong.
That commit added MSR_IA32_FLUSH_CMD to the VM-Enter _load_ list, not the
VM-Enter save list (which doesn't exist, only VM-Exit has a store/save
list).

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-39-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Dedup code for removing MSR from VMCS's auto-load list
Sean Christopherson [Sat, 6 Dec 2025 00:17:13 +0000 (16:17 -0800)] 
KVM: VMX: Dedup code for removing MSR from VMCS's auto-load list

Add a helper to remove an MSR from an auto-{load,store} list to dedup the
msr_autoload code, and in anticipation of adding similar functionality for
msr_autostore.

No functional change intended.

Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-38-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: nVMX: Don't update msr_autostore count when saving TSC for vmcs12
Sean Christopherson [Sat, 6 Dec 2025 00:17:12 +0000 (16:17 -0800)] 
KVM: nVMX: Don't update msr_autostore count when saving TSC for vmcs12

Rework nVMX's use of the MSR auto-store list to snapshot TSC to sneak
MSR_IA32_TSC into the list _without_ updating KVM's software tracking,
and drop the generic functionality so that future usage of the store list
for nested specific logic needs to consider the implications of modifying
the list.  Updating the list only for vmcs02 and only on nested VM-Enter
is a disaster waiting to happen, as it means vmcs01 is stale relative to
the software tracking, and KVM could unintentionally leave an MSR in the
store list in perpetuity while running L1, e.g. if KVM addressed the first
issue and updated vmcs01 on nested VM-Exit without removing TSC from the
list.

Furthermore, mixing KVM's desire to save an MSR with L1's desire to save
an MSR result KVM clobbering/ignoring the needs of vmcs01 or vmcs02.
E.g. if KVM added MSR_IA32_TSC to the store list for its own purposes, and
then _removed_ MSR_IA32_TSC from the list after emulating nested VM-Enter,
then KVM would remove MSR_IA32_TSC from the list even though saving TSC on
VM-Exit from L2 is still desirable (to provide L1 with an accurate TSC).

Similarly, removing an MSR from the list based on vmcs12's settings could
drop an MSR that KVM wants to save for its own purposes.

In practice, the issues are currently benign, because KVM doesn't use the
store list for vmcs01.  But that will change with upcoming mediated PMU
support.

Alternatively, a "full" solution would be to track MSR list entries for
vmcs12 separately from KVM's standard lists, but MSR_IA32_TSC is likely
the only MSR that KVM would ever want to save on _every_ VM-Exit purely
based on vmcs12.  I.e. the added complexity isn't remotely justified at
this time.

Opportunistically escalate from a pr_warn_ratelimited() to a full WARN as
KVM reserves eight entries in each MSR list, and as above KVM uses at most
one entry.

Opportunistically make vmx_find_loadstore_msr_slot() local to vmx.c as
using it directly from nested code is unsafe due to the potential for
mixing vmcs01 and vmcs02 state (see above).

Cc: Jim Mattson <jmattson@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-37-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: VMX: Drop intermediate "guest" field from msr_autostore
Sean Christopherson [Sat, 6 Dec 2025 00:17:11 +0000 (16:17 -0800)] 
KVM: VMX: Drop intermediate "guest" field from msr_autostore

Drop the intermediate "guest" field from vcpu_vmx.msr_autostore as the
value saved on VM-Exit isn't guaranteed to be the guest's value, it's
purely whatever is in hardware at the time of VM-Exit.  E.g. KVM's only
use of the store list at the momemnt is to snapshot TSC at VM-Exit, and
the value saved is always the raw TSC even if TSC-offseting and/or
TSC-scaling is enabled for the guest.

And unlike msr_autoload, there is no need differentiate between "on-entry"
and "on-exit".

No functional change intended.

Cc: Jim Mattson <jmattson@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-36-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match
Sean Christopherson [Sat, 6 Dec 2025 00:17:10 +0000 (16:17 -0800)] 
KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match

When loading a mediated PMU state, elide the WRMSRs to load PMCs with the
guest's value if the value in hardware already matches the guest's value.
For the relatively common case where neither the guest nor the host is
actively using the PMU, i.e. when all/many counters are '0', eliding the
WRMSRs reduces the latency of handling VM-Exit by a measurable amount
(WRMSR is significantly more expensive than RDPMC).

As measured by KVM-Unit-Tests' CPUID VM-Exit testcase, this provides a
a ~25% reduction in latency (4k => 3k cycles) on Intel Emerald Rapids,
and a ~13% reduction (6.2k => 5.3k cycles) on AMD Turin.

Cc: Manali Shukla <manali.shukla@amd.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-35-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
Dapeng Mi [Sat, 6 Dec 2025 00:17:09 +0000 (16:17 -0800)] 
KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space

Expose enable_mediated_pmu parameter to user space, i.e. allow userspace
to enable/disable mediated vPMU support.

Document the mediated versus perf-based behavior as part of the
kernel-parameters.txt entry, and opportunistically add an entry for the
core enable_pmu param as well.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-34-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: nSVM: Disable PMU MSR interception as appropriate while running L2
Sean Christopherson [Sat, 6 Dec 2025 00:17:08 +0000 (16:17 -0800)] 
KVM: nSVM: Disable PMU MSR interception as appropriate while running L2

Add MSRs that might be passed through to L1 when running with a mediated
PMU to the nested SVM's set of to-be-merged MSR indices, i.e. disable
interception of PMU MSRs when running L2 if both KVM (L0) and L1 disable
interception.  There is no need for KVM to interpose on such MSR accesses,
e.g. if L1 exposes a mediated PMU (or equivalent) to L2.

Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-33-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: nVMX: Disable PMU MSR interception as appropriate while running L2
Mingwei Zhang [Sat, 6 Dec 2025 00:17:07 +0000 (16:17 -0800)] 
KVM: nVMX: Disable PMU MSR interception as appropriate while running L2

Merge KVM's PMU MSR interception bitmaps with those of L1, i.e. merge the
bitmaps of vmcs01 and vmcs12, e.g. so that KVM doesn't interpose on MSR
accesses unnecessarily if L1 exposes a mediated PMU (or equivalent) to L2.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[sean: rewrite changelog and comment, omit MSRs that are always intercepted]
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-32-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: nVMX: Add macros to simplify nested MSR interception setting
Dapeng Mi [Sat, 6 Dec 2025 00:17:06 +0000 (16:17 -0800)] 
KVM: nVMX: Add macros to simplify nested MSR interception setting

Add macros nested_vmx_merge_msr_bitmaps_xxx() to simplify nested MSR
interception setting. No function change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-31-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Handle emulated instruction for mediated vPMU
Dapeng Mi [Sat, 6 Dec 2025 00:17:05 +0000 (16:17 -0800)] 
KVM: x86/pmu: Handle emulated instruction for mediated vPMU

Mediated vPMU needs to accumulate the emulated instructions into counter
and load the counter into HW at vm-entry.

Moreover, if the accumulation leads to counter overflow, KVM needs to
update GLOBAL_STATUS and inject PMI into guest as well.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-30-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active
Sean Christopherson [Sat, 6 Dec 2025 00:17:04 +0000 (16:17 -0800)] 
KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active

Don't handle exits in the fastpath if emulation is required, i.e. if an
instruction needs to be skipped, the mediated PMU is enabled, and one or
more PMCs is counting instructions.  With the mediated PMU, KVM's cache of
PMU state is inconsistent with respect to hardware until KVM exits the
inner run loop (when the mediated PMU is "put").

Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-29-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest
Dapeng Mi [Sat, 6 Dec 2025 00:17:03 +0000 (16:17 -0800)] 
KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest

Implement the PMU "world switch" between host perf and guest mediated PMU.
When loading guest state, call into perf to switch from host to guest, and
then load guest state into hardware, and then reverse those actions when
putting guest state.

On the KVM side, when loading guest state, zero PERF_GLOBAL_CTRL to ensure
all counters are disabled, then load selectors and counters, and finally
call into vendor code to load control/status information.  While VMX and
SVM use different mechanisms to avoid counting host activity while guest
controls are loaded, both implementations require PERF_GLOBAL_CTRL to be
zeroed when the event selectors are in flux.

When putting guest state, reverse the order, and save and zero controls
and status prior to saving+zeroing selectors and counters.  Defer clearing
PERF_GLOBAL_CTRL to vendor code, as only SVM needs to manually clear the
MSR; VMX configures PERF_GLOBAL_CTRL to be atomically cleared by the CPU
on VM-Exit.

Handle the difference in MSR layouts between Intel and AMD by communicating
the bases and stride via kvm_pmu_ops.  Because KVM requires Intel v4 (and
full-width writes) and AMD v2, the MSRs to load/save are constant for a
given vendor, i.e. do not vary based on the guest PMU, and do not vary
based on host PMU (because KVM will simply disable mediated PMU support if
the necessary MSRs are unsupported).

Except for retrieving the guest's PERF_GLOBAL_CTRL, which needs to be read
before invoking any fastpath handler (spoiler alert), perform the context
switch around KVM's inner run loop.  State only needs to be synchronized
from hardware before KVM can access the software "caches".

Note, VMX already grabs the guest's PERF_GLOBAL_CTRL immediately after
VM-Exit, as hardware saves value into the VMCS.

Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-28-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD
Sandipan Das [Sat, 6 Dec 2025 00:17:02 +0000 (16:17 -0800)] 
KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD

On AMD platforms, there is no way to restore PerfCntrGlobalCtl at
VM-Entry or clear it at VM-Exit. Since the register states will be
restored before entering and saved after exiting guest context, the
counters can keep ticking and even overflow leading to chaos while
still in host context.

To avoid this, intecept event selectors, which is already done by mediated
PMU. In addition, always set the GuestOnly bit and clear the HostOnly bit
for PMU selectors on AMD. Doing so allows the counters run only in guest
context even if their enable bits are still set after VM exit and before
host/guest PMU context switch.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: massage shortlog]
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-27-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates
Dapeng Mi [Sat, 6 Dec 2025 00:17:01 +0000 (16:17 -0800)] 
KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates

Refresh the event selectors that are programmed into hardware when a PMC
is "reprogrammed" for a mediated PMU, i.e. if userspace changes the PMU
event filters

Note, KVM doesn't utilize the reprogramming infrastructure to handle
counter overflow for mediated PMUs, as there's no need to reprogram a
non-existent perf event.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: add a helper to document behavior, split patch and rewrite changelog]
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-26-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
Mingwei Zhang [Sat, 6 Dec 2025 00:17:00 +0000 (16:17 -0800)] 
KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering

Introduce eventsel_hw and fixed_ctr_ctrl_hw to store the actual HW value in
PMU event selector MSRs. In mediated PMU checks events before allowing the
event values written to the PMU MSRs. However, to match the HW behavior,
when PMU event checks fails, KVM should allow guest to read the value back.

This essentially requires an extra variable to separate the guest requested
value from actual PMU MSR value. Note this only applies to event selectors.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-25-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses
Dapeng Mi [Sat, 6 Dec 2025 00:16:59 +0000 (16:16 -0800)] 
KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses

When emulating a PMC counter read or write for a mediated PMU, bypass the
perf checks and emulated_counter logic as the counters aren't proxied
through perf, i.e. pmc->counter always holds the guest's up-to-date value,
and thus there's no need to defer emulated overflow checks.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: split from event filtering change, write shortlog+changelog]
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-24-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs
Dapeng Mi [Sat, 6 Dec 2025 00:16:58 +0000 (16:16 -0800)] 
KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs

For vCPUs with a mediated vPMU, disable interception of counter MSRs for
PMCs that are exposed to the guest, and for GLOBAL_CTRL and related MSRs
if they are fully supported according to the vCPU model, i.e. if the MSRs
and all bits supported by hardware exist from the guest's point of view.

Do NOT passthrough event selector or fixed counter control MSRs, so that
KVM can enforce userspace-defined event filters, e.g. to prevent use of
AnyThread events (which is unfortunately a setting in the fixed counter
control MSR).

Defer support for nested passthrough of mediated PMU MSRs to the future,
as the logic for nested MSR interception is unfortunately vendor specific.

Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[sean: squash patches, massage changelog, refresh VMX MSRs on filter change]
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-23-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU
Dapeng Mi [Sat, 6 Dec 2025 00:16:57 +0000 (16:16 -0800)] 
KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU

When running a guest with a mediated PMU, context switch PERF_GLOBAL_CTRL
via the dedicated VMCS fields for both host and guest.  For the host,
always zero GLOBAL_CTRL on exit as the guest's state will still be loaded
in hardware (KVM will context switch the bulk of PMU state outside of the
inner run loop).  For the guest, use the dedicated fields to atomically
load and save PERF_GLOBAL_CTRL on all entry/exits.

For now, require VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL support (introduced by
Sapphire Rapids).  KVM can support such CPUs by saving PERF_GLOBAL_CTRL
via the MSR save list, a.k.a. the MSR auto-store list, but defer that
support as it adds a small amount of complexity and is somewhat unique.

To minimize VM-Entry latency, propagate IA32_PERF_GLOBAL_CTRL to the VMCS
on-demand.  But to minimize complexity, read IA32_PERF_GLOBAL_CTRL out of
the VMCS on all non-failing VM-Exits.  I.e. partially cache the MSR.
KVM could track GLOBAL_CTRL as an EXREG and defer all reads, but writes
are rare, i.e. the dirty tracking for an EXREG is unnecessary, and it's
not obvious that shaving ~15-20 cycles per exit is meaningful given the
total overhead associated with mediated PMU context switches.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-22-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
Dapeng Mi [Sat, 6 Dec 2025 00:16:56 +0000 (16:16 -0800)] 
KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU

Disable RDPMC interception for vCPUs with a mediated vPMU that is
compatible with the host PMU, i.e. that doesn't require KVM emulation of
RDPMC to honor the guest's vCPU model.  With a mediated vPMU, all guest
state accessible via RDPMC is loaded into hardware while the guest is
running.

Adust RDPMC interception only for non-TDX guests, as the TDX module is
responsible for managing RDPMC intercepts based on the TD configuration.

Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Register PMI handler for mediated vPMU
Xiong Zhang [Sat, 6 Dec 2025 00:16:55 +0000 (16:16 -0800)] 
KVM: x86/pmu: Register PMI handler for mediated vPMU

Register a dedicated PMI handler with perf's callback when mediated PMU
support is enabled.  Perf routes PMIs that arrive while guest context is
loaded to the provided callback, by modifying the CPU's LVTPC to point at
a dedicated mediated PMI IRQ vector.

WARN upon receipt of a mediated PMI if there is no active vCPU, or if the
vCPU doesn't have a mediated PMU.  Even if a PMI manages to skid past
VM-Exit, it should never be delayed all the way beyond unloading the vCPU.
And while running vCPUs without a mediated PMU, the LVTPC should never be
wired up to the mediated PMI IRQ vector, i.e. should always be routed
through perf's NMI handler.

Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Implement AMD mediated PMU requirements
Sean Christopherson [Sat, 6 Dec 2025 00:16:54 +0000 (16:16 -0800)] 
KVM: x86/pmu: Implement AMD mediated PMU requirements

Require host PMU version 2+ for AMD mediated PMU support, as
PERF_GLOBAL_CTRL and friends are hard requirements for the mediated PMU.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: extract to separate patch, write changelog]
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Implement Intel mediated PMU requirements and constraints
Dapeng Mi [Sat, 6 Dec 2025 00:16:53 +0000 (16:16 -0800)] 
KVM: x86/pmu: Implement Intel mediated PMU requirements and constraints

Implement Intel PMU requirements and constraints for mediated PMU support.
Require host PMU version 4+ so that PERF_GLOBAL_STATUS_SET can be used to
precisely load the guest's status value into hardware, and require full-
width writes so that KVM can precisely load guest counter values.

Disable PEBS and LBRs if mediated PMU support is enabled, as they won't be
supported in the initial implementation.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: split to separate patch, add full-width writes dependency]
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: x86/pmu: Start stubbing in mediated PMU support
Dapeng Mi [Sat, 6 Dec 2025 00:16:52 +0000 (16:16 -0800)] 
KVM: x86/pmu: Start stubbing in mediated PMU support

Introduce enable_mediated_pmu as a global variable, with the intent of
exposing it to userspace a vendor module parameter, to control and reflect
mediated vPMU support.  Wire up the perf plumbing to create+release a
mediated PMU, but defer exposing the parameter to userspace until KVM
support for a mediated PMUs is fully landed.

To (a) minimize compatibility issues, (b) to give userspace a chance to
opt out of the restrictive side-effects of perf_create_mediated_pmu(),
and (c) to avoid adding new dependencies between enabling an in-kernel
irqchip and a mediated vPMU, defer "creating" a mediated PMU in perf
until the first vCPU is created.

Regarding userspace compatibility, an alternative solution would be to
make the mediated PMU fully opt-in, e.g. to avoid unexpected failure due
to perf_create_mediated_pmu() failing.  Ironically, that approach creates
an even bigger compatibility issue, as turning on enable_mediated_pmu
would silently break VMMs that don't utilize KVM_CAP_PMU_CAPABILITY (well,
silently until the guest tried to access PMU assets).

Regarding an in-kernel irqchip, create a mediated PMU if and only if the
VM has an in-kernel local APIC, as the mediated PMU will take a hard
dependency on forwarding PMIs to the guest without bouncing through host
userspace.  Silently "drop" the PMU instead of rejecting KVM_CREATE_VCPU,
as KVM's existing vPMU support doesn't function correctly if the local
APIC is emulated by userspace, e.g. PMIs will never be delivered.  I.e.
it's far, far more likely that rejecting KVM_CREATE_VCPU would cause
problems, e.g. for tests or userspace daemons that just want to probe
basic KVM functionality.

Note!  Deliberately make mediated PMU creation "sticky", i.e. don't unwind
it on failure to create a vCPU.  Practically speaking, there's no harm to
having a VM with a mediated PMU and no vCPUs.  To avoid an "impossible" VM
setup, reject KVM_CAP_PMU_CAPABILITY if a mediated PMU has been created,
i.e. don't let userspace disable PMU support after failed vCPU creation
(with PMU support enabled).

Defer vendor specific requirements and constraints to the future.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
5 weeks agoKVM: Add a simplified wrapper for registering perf callbacks
Sean Christopherson [Sat, 6 Dec 2025 00:16:50 +0000 (16:16 -0800)] 
KVM: Add a simplified wrapper for registering perf callbacks

Add a parameter-less API for registering perf callbacks in anticipation of
introducing another x86-only parameter for handling mediated PMU PMIs.

No functional change intended.

Acked-by: Anup Patel <anup@brainfault.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Manali Shukla <manali.shukla@amd.com>
Link: https://patch.msgid.link/20251206001720.468579-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
8 weeks agoperf: Use EXPORT_SYMBOL_FOR_KVM() for the mediated APIs
Peter Zijlstra [Wed, 17 Dec 2025 12:23:59 +0000 (13:23 +0100)] 
perf: Use EXPORT_SYMBOL_FOR_KVM() for the mediated APIs

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251208115156.GE3707891@noisy.programming.kicks-ass.net
2 months agoperf: Clean up mediated vPMU accounting
Peter Zijlstra [Wed, 17 Dec 2025 11:08:01 +0000 (12:08 +0100)] 
perf: Clean up mediated vPMU accounting

The mediated_pmu_account_event() and perf_create_mediated_pmu()
functions implement the exclusion between '!exclude_guest' counters
and mediated vPMUs. Their implementation is basically identical,
except mirrored in what they count/check.

Make sure the actual implementations reflect this similarity.

Notably:
 - while perf_release_mediated_pmu() has an underflow check;
   mediated_pmu_unaccount_event() did not.
 - while perf_create_mediated_pmu() has an inc_not_zero() path;
   mediated_pmu_account_event() did not.

Also, the inc_not_zero() path can be outsite of
perf_mediated_pmu_mutex. The mutex must guard the 0->1 (of either
nr_include_guest_events or nr_mediated_pmu_vms) transition, but once a
counter is already non-zero, it can safely be incremented further.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251208115156.GE3707891@noisy.programming.kicks-ass.net
2 months agoperf/x86/cstate: Add Airmont NP
Martin Schiller [Mon, 24 Nov 2025 07:48:46 +0000 (08:48 +0100)] 
perf/x86/cstate: Add Airmont NP

From the perspective of Intel cstate residency counters, the Airmont NP
(aka Lightning Mountain) is identical to the Airmont.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251124074846.9653-4-ms@dev.tdt.de
2 months agoperf/x86/intel: Add Airmont NP
Martin Schiller [Mon, 24 Nov 2025 07:48:45 +0000 (08:48 +0100)] 
perf/x86/intel: Add Airmont NP

The Intel / MaxLinear Airmont NP (aka Lightning Mountain) supports the
same architectual and non-architecural events as Airmont.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251124074846.9653-3-ms@dev.tdt.de
2 months agoperf/x86/msr: Add Airmont NP
Martin Schiller [Mon, 24 Nov 2025 07:48:44 +0000 (08:48 +0100)] 
perf/x86/msr: Add Airmont NP

Like Airmont, the Airmont NP (aka Intel / MaxLinear Lightning Mountain)
supports SMI_COUNT MSR.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251124074846.9653-2-ms@dev.tdt.de
2 months agox86/unwind_user: Simplify unwind_user_word_size()
Jens Remus [Mon, 8 Dec 2025 16:03:52 +0000 (17:03 +0100)] 
x86/unwind_user: Simplify unwind_user_word_size()

Get rid of superfluous ifdef and return explicit word size depending on
32-bit or 64-bit mode.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251208160352.1363040-5-jremus@linux.ibm.com
2 months agox86/unwind_user: Guard unwind_user_word_size() by UNWIND_USER
Jens Remus [Mon, 8 Dec 2025 16:03:51 +0000 (17:03 +0100)] 
x86/unwind_user: Guard unwind_user_word_size() by UNWIND_USER

The unwind user framework in general requires an architecture-specific
implementation of unwind_user_word_size() to be present for any unwind
method, whether that is fp or a future other method, such as potentially
sframe.

Guard unwind_user_word_size() by the availability of the UNWIND_USER
framework instead of the specific HAVE_UNWIND_USER_FP method.

This facilitates to selectively disable HAVE_UNWIND_USER_FP on x86
(e.g. for test purposes) once a new unwind method is added to unwind
user.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251208160352.1363040-4-jremus@linux.ibm.com
2 months agounwind_user/fp: Use dummies instead of ifdef
Jens Remus [Mon, 8 Dec 2025 16:03:50 +0000 (17:03 +0100)] 
unwind_user/fp: Use dummies instead of ifdef

This simplifies the code.   unwind_user_next_fp() does not need to
return -EINVAL if config option HAVE_UNWIND_USER_FP is disabled, as
unwind_user_start() will then not select this unwind method and
unwind_user_next() will therefore not call it.

Provide (1) a dummy definition of ARCH_INIT_USER_FP_FRAME, if the unwind
user method HAVE_UNWIND_USER_FP is not enabled, (2) a common fallback
definition of unwind_user_at_function_start() which returns false, and
(3) a common dummy definition of ARCH_INIT_USER_FP_ENTRY_FRAME.

Note that enabling the config option HAVE_UNWIND_USER_FP without
defining ARCH_INIT_USER_FP_FRAME triggers a compile error, which is
helpful when implementing support for this unwind user method in an
architecture.  Enabling the config option when providing an arch-
specific unwind_user_at_function_start() definition makes it necessary
to also provide an arch-specific ARCH_INIT_USER_FP_ENTRY_FRAME
definition.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251208160352.1363040-3-jremus@linux.ibm.com
2 months agounwind_user: Enhance comments on get CFA, FP, and RA
Jens Remus [Mon, 8 Dec 2025 16:03:49 +0000 (17:03 +0100)] 
unwind_user: Enhance comments on get CFA, FP, and RA

Move the comment "Get the Canonical Frame Address (CFA)" to the top
of the sequence of statements that actually get the CFA.  Reword the
comment "Find the Return Address (RA)" to "Get ...", as the statements
actually get the RA.  Add a respective comment to the statements that
get the FP.  This will be useful once future commits extend the logic
to get the RA and FP.

While at it align the comment on the "stack going in wrong direction"
check to the following one on the "address is word aligned" check.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251208160352.1363040-2-jremus@linux.ibm.com
2 months agoperf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
Sandipan Das [Sat, 6 Dec 2025 00:16:49 +0000 (16:16 -0800)] 
perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host

Apply the PERF_PMU_CAP_MEDIATED_VPMU flag for version 2 and later
implementations of the core PMU. Aside from having Global Control and
Status registers, virtualizing the PMU using the mediated model requires
an interface to set or clear the overflow bits in the Global Status MSRs
while restoring or saving the PMU context of a vCPU.

PerfMonV2-capable hardware has additional MSRs for this purpose, namely
PerfCntrGlobalStatusSet and PerfCntrGlobalStatusClr, thereby making it
suitable for use with mediated vPMU.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-14-seanjc@google.com
2 months agoperf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
Kan Liang [Sat, 6 Dec 2025 00:16:48 +0000 (16:16 -0800)] 
perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU

Apply the PERF_PMU_CAP_MEDIATED_VPMU for Intel core PMU. It only indicates
that the perf side of core PMU is ready to support the mediated vPMU.
Besides the capability, the hypervisor, a.k.a. KVM, still needs to check
the PMU version and other PMU features/capabilities to decide whether to
enable support mediated vPMUs.

[sean: massage changelog]
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-13-seanjc@google.com
2 months agoperf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap
Mingwei Zhang [Sat, 6 Dec 2025 00:16:47 +0000 (16:16 -0800)] 
perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap

Plumb mediated PMU capability to x86_pmu_cap in order to let any kernel
entity such as KVM know that host PMU support mediated PMU mode and has
the implementation.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-12-seanjc@google.com
2 months agoperf/x86/core: Do not set bit width for unavailable counters
Sandipan Das [Sat, 6 Dec 2025 00:16:46 +0000 (16:16 -0800)] 
perf/x86/core: Do not set bit width for unavailable counters

Not all x86 processors have fixed counters. It may also be the case that
a processor has only fixed counters and no general-purpose counters. Set
the bit widths corresponding to each counter type only if such counters
are available.

Fixes: b3d9468a8bd2 ("perf, x86: Expose perf capability to other modules")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-11-seanjc@google.com
2 months agoperf/x86/core: Add APIs to switch to/from mediated PMI vector (for KVM)
Sean Christopherson [Sat, 6 Dec 2025 00:16:45 +0000 (16:16 -0800)] 
perf/x86/core: Add APIs to switch to/from mediated PMI vector (for KVM)

Add APIs (exported only for KVM) to switch PMIs to the dedicated mediated
PMU IRQ vector when loading guest context, and back to perf's standard NMI
when the guest context is put.  I.e. route PMIs to
PERF_GUEST_MEDIATED_PMI_VECTOR when the guest context is active, and to
NMIs while the host context is active.

While running with guest context loaded, ignore all NMIs (in perf).  Any
NMI that arrives while the LVTPC points at the mediated PMU IRQ vector
can't possibly be due to a host perf event.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251206001720.468579-10-seanjc@google.com
2 months agoperf/x86/core: Register a new vector for handling mediated guest PMIs
Sean Christopherson [Sat, 6 Dec 2025 00:16:44 +0000 (16:16 -0800)] 
perf/x86/core: Register a new vector for handling mediated guest PMIs

Wire up system vector 0xf5 for handling PMIs (i.e. interrupts delivered
through the LVTPC) while running KVM guests with a mediated PMU.  Perf
currently delivers all PMIs as NMIs, e.g. so that events that trigger while
IRQs are disabled aren't delayed and generate useless records, but due to
the multiplexing of NMIs throughout the system, correctly identifying NMIs
for a mediated PMU is practically infeasible.

To (greatly) simplify identifying guest mediated PMU PMIs, perf will
switch the CPU's LVTPC between PERF_GUEST_MEDIATED_PMI_VECTOR and NMI when
guest PMU context is loaded/put.  I.e. PMIs that are generated by the CPU
while the guest is active will be identified purely based on the IRQ
vector.

Route the vector through perf, e.g. as opposed to letting KVM attach a
handler directly a la posted interrupt notification vectors, as perf owns
the LVTPC and thus is the rightful owner of PERF_GUEST_MEDIATED_PMI_VECTOR.
Functionally, having KVM directly own the vector would be fine (both KVM
and perf will be completely aware of when a mediated PMU is active), but
would lead to an undesirable split in ownership: perf would be responsible
for installing the vector, but not handling the resulting IRQs.

Add a new perf_guest_info_callbacks hook (and static call) to allow KVM to
register its handler with perf when running guests with mediated PMUs.

Note, because KVM always runs guests with host IRQs enabled, there is no
danger of a PMI being delayed from the guest's perspective due to using a
regular IRQ instead of an NMI.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-9-seanjc@google.com
2 months agoperf: Add APIs to load/put guest mediated PMU context
Kan Liang [Sat, 6 Dec 2025 00:16:43 +0000 (16:16 -0800)] 
perf: Add APIs to load/put guest mediated PMU context

Add exported APIs to load/put a guest mediated PMU context.  KVM will
load the guest PMU shortly before VM-Enter, and put the guest PMU shortly
after VM-Exit.

On the perf side of things, schedule out all exclude_guest events when the
guest context is loaded, and schedule them back in when the guest context
is put.  I.e. yield the hardware PMU resources to the guest, by way of KVM.

Note, perf is only responsible for managing host context.  KVM is
responsible for loading/storing guest state to/from hardware.

[sean: shuffle patches around, write changelog]
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-8-seanjc@google.com
2 months agoperf: Add a EVENT_GUEST flag
Kan Liang [Sat, 6 Dec 2025 00:16:42 +0000 (16:16 -0800)] 
perf: Add a EVENT_GUEST flag

Current perf doesn't explicitly schedule out all exclude_guest events
while the guest is running. There is no problem with the current
emulated vPMU. Because perf owns all the PMU counters. It can mask the
counter which is assigned to an exclude_guest event when a guest is
running (Intel way), or set the corresponding HOSTONLY bit in evsentsel
(AMD way). The counter doesn't count when a guest is running.

However, either way doesn't work with the introduced mediated vPMU.
A guest owns all the PMU counters when it's running. The host should not
mask any counters. The counter may be used by the guest. The evsentsel
may be overwritten.

Perf should explicitly schedule out all exclude_guest events to release
the PMU resources when entering a guest, and resume the counting when
exiting the guest.

It's possible that an exclude_guest event is created when a guest is
running. The new event should not be scheduled in as well.

The ctx time is shared among different PMUs. The time cannot be stopped
when a guest is running. It is required to calculate the time for events
from other PMUs, e.g., uncore events. Add timeguest to track the guest
run time. For an exclude_guest event, the elapsed time equals
the ctx time - guest time.
Cgroup has dedicated times. Use the same method to deduct the guest time
from the cgroup time as well.

[sean: massage comments]
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-7-seanjc@google.com
2 months agoperf: Clean up perf ctx time
Kan Liang [Sat, 6 Dec 2025 00:16:41 +0000 (16:16 -0800)] 
perf: Clean up perf ctx time

The current perf tracks two timestamps for the normal ctx and cgroup.
The same type of variables and similar codes are used to track the
timestamps. In the following patch, the third timestamp to track the
guest time will be introduced.
To avoid the code duplication, add a new struct perf_time_ctx and factor
out a generic function update_perf_time_ctx().

No functional change.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-6-seanjc@google.com
2 months agoperf: Add APIs to create/release mediated guest vPMUs
Kan Liang [Sat, 6 Dec 2025 00:16:40 +0000 (16:16 -0800)] 
perf: Add APIs to create/release mediated guest vPMUs

Currently, exposing PMU capabilities to a KVM guest is done by emulating
guest PMCs via host perf events, i.e. by having KVM be "just" another user
of perf.  As a result, the guest and host are effectively competing for
resources, and emulating guest accesses to vPMU resources requires
expensive actions (expensive relative to the native instruction).  The
overhead and resource competition results in degraded guest performance
and ultimately very poor vPMU accuracy.

To address the issues with the perf-emulated vPMU, introduce a "mediated
vPMU", where the data plane (PMCs and enable/disable knobs) is exposed
directly to the guest, but the control plane (event selectors and access
to fixed counters) is managed by KVM (via MSR interceptions).  To allow
host perf usage of the PMU to (partially) co-exist with KVM/guest usage
of the PMU, KVM and perf will coordinate to a world switch between host
perf context and guest vPMU context near VM-Enter/VM-Exit.

Add two exported APIs, perf_{create,release}_mediated_pmu(), to allow KVM
to create and release a mediated PMU instance (per VM).  Because host perf
context will be deactivated while the guest is running, mediated PMU usage
will be mutually exclusive with perf analysis of the guest, i.e. perf
events that do NOT exclude the guest will not behave as expected.

To avoid silent failure of !exclude_guest perf events, disallow creating a
mediated PMU if there are active !exclude_guest events, and on the perf
side, disallowing creating new !exclude_guest perf events while there is
at least one active mediated PMU.

Exempt PMU resources that do not support mediated PMU usage, i.e. that are
outside the scope/view of KVM's vPMU and will not be swapped out while the
guest is running.

Guard mediated PMU with a new kconfig to help readers identify code paths
that are unique to mediated PMU support, and to allow for adding arch-
specific hooks without stubs.  KVM x86 is expected to be the only KVM
architecture to support a mediated PMU in the near future (e.g. arm64 is
trending toward a partitioned PMU implementation), and KVM x86 will select
PERF_GUEST_MEDIATED_PMU unconditionally, i.e. won't need stubs.

Immediately select PERF_GUEST_MEDIATED_PMU when KVM x86 is enabled so that
all paths are compile tested.  Full KVM support is on its way...

[sean: add kconfig and WARNing, rewrite changelog, swizzle patch ordering]
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-5-seanjc@google.com
2 months agoperf: Move security_perf_event_free() call to __free_event()
Sean Christopherson [Sat, 6 Dec 2025 00:16:39 +0000 (16:16 -0800)] 
perf: Move security_perf_event_free() call to __free_event()

Move the freeing of any security state associated with a perf event from
_free_event() to __free_event(), i.e. invoke security_perf_event_free() in
the error paths for perf_event_alloc().  This will allow adding potential
error paths in perf_event_alloc() that can occur after allocating security
state.

Note, kfree() and thus security_perf_event_free() is a nop if
event->security is NULL, i.e. calling security_perf_event_free() even if
security_perf_event_alloc() fails or is never reached is functionality ok.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-4-seanjc@google.com
2 months agoperf: Add generic exclude_guest support
Kan Liang [Sat, 6 Dec 2025 00:16:38 +0000 (16:16 -0800)] 
perf: Add generic exclude_guest support

Only KVM knows the exact time when a guest is entering/exiting. Expose
two interfaces to KVM to switch the ownership of the PMU resources.

All the pinned events must be scheduled in first. Extend the
perf_event_sched_in() helper to support extra flag, e.g., EVENT_GUEST.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-3-seanjc@google.com
2 months agoperf: Skip pmu_ctx based on event_type
Kan Liang [Sat, 6 Dec 2025 00:16:37 +0000 (16:16 -0800)] 
perf: Skip pmu_ctx based on event_type

To optimize the cgroup context switch, the perf_event_pmu_context
iteration skips the PMUs without cgroup events. A bool cgroup was
introduced to indicate the case. It can work, but this way is hard to
extend for other cases, e.g. skipping non-mediated PMUs. It doesn't
make sense to keep adding bool variables.

Pass the event_type instead of the specific bool variable. Check both
the event_type and related pmu_ctx variables to decide whether skipping
a PMU.

Event flags, e.g., EVENT_CGROUP, should be cleard in the ctx->is_active.
Add EVENT_FLAGS to indicate such event flags.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-2-seanjc@google.com
2 months agoperf/x86/intel/cstate: Add Diamond Rapids support
Zide Chen [Mon, 15 Dec 2025 18:25:20 +0000 (10:25 -0800)] 
perf/x86/intel/cstate: Add Diamond Rapids support

From a C-state residency profiling perspective, Diamond Rapids is
similar to SRF and GNR, supporting core C1/C6, module C6, and
package C2/C6 residency counters.  Similar to CWF, the C1E residency
can be accessed via PMT only.

Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251215182520.115822-3-zide.chen@intel.com
2 months agoperf/x86/intel/cstate: Add Nova Lake support
Zide Chen [Mon, 15 Dec 2025 18:25:19 +0000 (10:25 -0800)] 
perf/x86/intel/cstate: Add Nova Lake support

Similar to Lunar Lake and Panther Lake, Nova Lake supports CC1/CC6/CC7
and PC2/PC6/PC10 residency counters; it also adds support for MC6.

Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251215182520.115822-2-zide.chen@intel.com
2 months agoperf/x86/intel/cstate: Add Wildcat Lake support
Zide Chen [Mon, 15 Dec 2025 18:25:18 +0000 (10:25 -0800)] 
perf/x86/intel/cstate: Add Wildcat Lake support

Wildcat Lake (WCL) is a low-power variant of Panther Lake.  From a
C-state profiling perspective, it supports the same residency counters:
CC1/CC6/CC7 and PC2/PC6/PC10.

Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251215182520.115822-1-zide.chen@intel.com
2 months agoLinux 6.19-rc1 v6.19-rc1
Linus Torvalds [Sun, 14 Dec 2025 04:05:07 +0000 (16:05 +1200)] 
Linux 6.19-rc1

2 months agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 14 Dec 2025 03:35:35 +0000 (15:35 +1200)] 
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "The only core fix is in doc; all the others are in drivers, with the
  biggest impacts in libsas being the rollback on error handling and in
  ufs coming from a couple of error handling fixes, one causing a crash
  if it's activated before scanning and the other fixing W-LUN
  resumption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Fix confusing cleanup.h syntax
  scsi: libsas: Add rollback handling when an error occurs
  scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
  scsi: ufs: core: Fix a deadlock in the frequency scaling code
  scsi: ufs: core: Fix an error handler crash
  scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
  scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
  scsi: qla4xxx: Use time conversion macros
  scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
  scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
  scsi: imm: Fix use-after-free bug caused by unfinished delayed work
  scsi: target: sbp: Remove KMSG_COMPONENT macro
  scsi: core: Correct documentation for scsi_device_quiesce()
  scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
  scsi: target: Reset t_task_cdb pointer in error case
  scsi: ufs: core: Fix EH failure after W-LUN resume error

2 months agoMerge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client
Linus Torvalds [Sun, 14 Dec 2025 03:24:10 +0000 (15:24 +1200)] 
Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "We have a patch that adds an initial set of tracepoints to the MDS
  client from Max, a fix that hardens osdmap parsing code from myself
  (marked for stable) and a few assorted fixups"

* tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client:
  rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES
  ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES
  libceph: make decode_pool() more resilient against corrupted osdmaps
  libceph: Amend checking to fix `make W=1` build breakage
  ceph: Amend checking to fix `make W=1` build breakage
  ceph: add trace points to the MDS client
  libceph: fix log output race condition in OSD client

2 months agoMerge tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo
Linus Torvalds [Sun, 14 Dec 2025 03:21:02 +0000 (15:21 +1200)] 
Merge tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo

Pull tomoyo update from Tetsuo Handa:
 "Trivial optimization"

* tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo:
  tomoyo: Use local kmap in tomoyo_dump_page()

2 months agoMerge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:12:46 +0000 (06:12 +1200)] 
Merge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull CPU hotplug fix from Ingo Molnar:

 - Fix CPU hotplug callbacks to disable interrupts on UP kernels

* tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Make atomic hotplug callbacks run with interrupts disabled on UP

2 months agoMerge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:10:35 +0000 (06:10 +1200)] 
Merge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fixes from Ingo Molnar:

 - Fix NULL pointer dereference crash in the Intel PMU driver

 - Fix missing read event generation on task exit

 - Fix AMD uncore driver init error handling

 - Fix whitespace noise

* tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
  perf/core: Fix missing read event generation on task exit
  perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error
  perf/uprobes: Remove <space><Tab> whitespace noise

2 months agoMerge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:07:09 +0000 (06:07 +1200)] 
Merge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:

 - Fix error code in the irqchip/mchp-eic driver

 - Fix setup_percpu_irq() affinity assumptions

 - Remove the unused irq_domain_add_tree() function

* tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
  irqdomain: Delete irq_domain_add_tree()
  genirq: Allow NULL affinity for setup_percpu_irq()

2 months agoMerge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:04:16 +0000 (06:04 +1200)] 
Merge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc core fixes from Ingo Molnar:

 - Improve bug reporting

 - Suppress W=1 format warning

 - Improve rseq scalability on Clang builds

* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Always inline rseq_debug_syscall_return()
  bug: Hush suggest-attribute=format for __warn_printf()
  bug: Let report_bug_entry() provide the correct bugaddr

2 months agoMerge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 13 Dec 2025 08:55:12 +0000 (20:55 +1200)] 
Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "There are no significant series in this small merge. Please see the
  individual changelogs for details"

[ Editor's note: it's mainly ocfs2 and a couple of random fixes ]

* tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: memfd_luo: add CONFIG_SHMEM dependency
  mm: shmem: avoid build warning for CONFIG_SHMEM=n
  ocfs2: fix memory leak in ocfs2_merge_rec_left()
  ocfs2: invalidate inode if i_mode is zero after block read
  ocfs2: avoid -Wflex-array-member-not-at-end warning
  ocfs2: convert remaining read-only checks to ocfs2_emergency_state
  ocfs2: add ocfs2_emergency_state helper and apply to setattr
  checkpatch: add uninitialized pointer with __free attribute check
  args: fix documentation to reflect the correct numbers
  ocfs2: fix kernel BUG in ocfs2_find_victim_chain
  liveupdate: luo_core: fix redundant bound check in luo_ioctl()
  ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list
  fs/fat: remove unnecessary wrapper fat_max_cache()
  ocfs2: replace deprecated strcpy with strscpy
  ocfs2: check tl_used after reading it from trancate log inode
  liveupdate: luo_file: don't use invalid list iterator

2 months agoMerge tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Dec 2025 08:35:41 +0000 (20:35 +1200)] 
Merge tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand)
   fixes a couple of minor things in ppc land

 - "Improve folio split related functions" (Zi Yan)
   some cleanups and minorish fixes in the folio splitting code

* tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/tests/core-kunit: avoid damos_test_commit stack warning
  mm: vmscan: correct nr_requested tracing in scan_folios
  MAINTAINERS: add idr core-api doc file to XARRAY
  mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages()
  mm: fix CONFIG_STACK_GROWSUP typo in mm.h
  mm/huge_memory: fix folio split stats counting
  mm/huge_memory: make min_order_for_split() always return an order
  mm/huge_memory: replace can_split_folio() with direct refcount calculation
  mm/huge_memory: change folio_split_supported() to folio_check_splittable()
  mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM
  powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages
  powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION

2 months agofile: ensure cleanup
Christian Brauner [Sat, 13 Dec 2025 07:45:23 +0000 (08:45 +0100)] 
file: ensure cleanup

Brown paper bag time. This is a silly oversight where I missed to drop
the error condition checking to ensure we clean up on early error
returns. I have an internal unit testset coming up for this which will
catch all such issues going forward.

Reported-by: Chris Mason <clm@fb.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: 011703a9acd7 ("file: add FD_{ADD,PREPARE}()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agox86/hv: Add gitignore entry for generated header file
Linus Torvalds [Sat, 13 Dec 2025 07:57:41 +0000 (19:57 +1200)] 
x86/hv: Add gitignore entry for generated header file

Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a
new generated header file for the offsets into the mshv_vtl_cpu_context
structure to be used by the low-level assembly code.  But it didn't add
the .gitignore file to go with it, so 'git status' and friends will
mention it.

Let's add the gitignore file before somebody thinks that generated
header should be committed.

Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMerge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 13 Dec 2025 05:39:28 +0000 (17:39 +1200)] 
Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel

Pull more drm fixes from Dave Airlie:
 "These are the enqueued fixes that ended up in our fixes branch,
  nouveau mostly, along with some small fixes in other places.

  plane:
   - Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties()

  ttm:
   - fix devcoredump for evicted bos

  panel:
   - Fix stack usage warning in novatek-nt35560

  nouveau:
   - alloc fwsec sb at boot to avoid s/r problems
   - fix strcpy usage
   - fix i2c encoder crash

  bridge:
   - Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83

  mgag200:
   - Fix bigendian handling in mgag200

  tilcdc:
   - Fix probe failure in tilcdc"

* tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
  drm/mgag200: Fix big-endian support
  drm/tilcdc: Fix removal actions in case of failed probe
  drm/ttm: Avoid NULL pointer deref for evicted BOs
  drm: nouveau: Replace sprintf() with sysfs_emit()
  drm/nouveau: fix circular dep oops from vendored i2c encoder
  drm/nouveau: refactor deprecated strcpy
  drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties()
  drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors
  drm/nouveau/gsp: Allocate fwsec-sb at boot
  drm/panel: novatek-nt35560: avoid on-stack device structure

2 months agoMerge tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 13 Dec 2025 05:25:26 +0000 (17:25 +1200)] 
Merge tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This is the weekly fixes for what is in next tree, mostly amdgpu and
  some i915, panthor and a core revert.

  core:
   - revert dumb bo 8 byte alignment

  amdgpu:
   - SI fix
   - DC reduce stack usage
   - HDMI fixes
   - VCN 4.0.5 fix
   - DP MST fix
   - DC memory allocation fix

  amdkfd:
   - SVM fix
   - Trap handler fix
   - VGPR fixes for GC 11.5

  i915:
   - Fix format string truncation warning
   - FIx runtime PM reference during fbdev BO creation

  panthor:
   - fix UAF

  renesas:
   - fix sync flag handling"

* tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
  Revert "drm/amd/display: Fix pbn to kbps Conversion"
  drm/amd: Fix unbind/rebind for VCN 4.0.5
  drm/i915: Fix format string truncation warning
  drm/i915/fbdev: Hold runtime PM ref during fbdev BO creation
  drm/amd/display: Improve HDMI info retrieval
  drm/amdkfd: bump minimum vgpr size for gfx1151
  drm/amd/display: shrink struct members
  drm/amdkfd: Export the cwsr_size and ctl_stack_size to userspace
  drm/amd/display: Refactor dml_core_mode_support to reduce stack frame
  drm/amdgpu: don't attach the tlb fence for SI
  drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
  drm/amdkfd: Trap handler support for expert scheduling mode
  drm/amdkfd: Use huge page size to check split svm range alignment
  drm/rcar-du: dsi: Handle both DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC
  drm/gem-shmem: revert the 8-byte alignment constraint
  drm/gem-dma: revert the 8-byte alignment constraint
  drm/panthor: Prevent potential UAF in group creation

2 months agoMerge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Linus Torvalds [Sat, 13 Dec 2025 05:15:16 +0000 (17:15 +1200)] 
Merge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull further i3c update from Alexandre Belloni:
 "We are removing a legacy API callback and having this sooner rather
  than later will help ensuring no one introduces a new driver using it.

  I've also added patches removing the "__free(...) = NULL" pattern
  because I'm sure we won't avoid people sending those following the
  mailing list discussion..."

* tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: adi: Fix confusing cleanup.h syntax
  i3c: master: Fix confusing cleanup.h syntax
  i3c: master: cleanup callback .priv_xfers()
  i3c: master: switch to use new callback .i3c_xfers() from .priv_xfers()

2 months agoMerge tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Linus Torvalds [Sat, 13 Dec 2025 05:09:06 +0000 (17:09 +1200)] 
Merge tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "Subsystem:
   - stop setting max_user_freq from the individual drivers as this has
     not been hardware related for a while

  New drivers:
   - Andes ATCRTC100
   - Apple SMC
   - Nvidia VRS

  Drivers:
   - renesas-rtca3: add RZ/V2H support
   - tegra: add ACPI support"

* tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (34 commits)
  rtc: spacemit: MFD_SPACEMIT_P1 as dependencies
  rtc: atcrtc100: Fix signedness bug in probe()
  rtc: max31335: Fix ignored return value in set_alarm
  rtc: gamecube: Check the return value of ioremap()
  Documentation: ABI: testing: Fix "upto" typo in rtc-cdev
  rtc: Add new rtc-macsmc driver for Apple Silicon Macs
  dt-bindings: rtc: Add Apple SMC RTC
  MAINTAINERS: drop unneeded file entry in NVIDIA VRS RTC DRIVER
  rtc: isl12026: Add id_table
  rtc: renesas-rtca3: Add support for multiple reset lines
  dt-bindings: rtc: renesas,rz-rtca3: Add RZ/V2H support
  rtc: tegra: Replace deprecated SIMPLE_DEV_PM_OPS
  rtc: tegra: Add ACPI support
  rtc: tegra: Use devm_clk_get_enabled() in probe
  rtc: Kconfig: add MC34708 to mc13xxx help text
  rtc: s35390a: use u8 instead of char for register buffer
  rtc: nvvrs: add NVIDIA VRS RTC device driver
  dt-bindings: rtc: Document NVIDIA VRS RTC
  rtc: atcrtc100: Add ATCRTC100 RTC driver
  MAINTAINERS: Add entry for ATCRTC100 RTC driver
  ...

2 months agoMerge tag 'pwm/for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 04:41:50 +0000 (16:41 +1200)] 
Merge tag 'pwm/for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm fix from Uwe Kleine-König:
 "Fix missing th1520 Kconfig dependencies

  This tightens the dependency for the new pwm driver written in Rust to
  make build bots and obviously also users happy"

* tag 'pwm/for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: th1520: Fix missing Kconfig dependencies

2 months agoMerge tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Dec 2025 04:36:57 +0000 (16:36 +1200)] 
Merge tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:

 - fix spinlock op type after conversion to lock guards

 - fix a memory leak in error path in gpio-regmap

 - Kconfig fixes in GPIO drivers

 - add a GPIO ACPI quirk for Dell Precision 7780

 - set of fixes for shared GPIO management

* tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: shared: make locking more fine-grained
  gpio: shared: fix auxiliary device cleanup order
  gpio: shared: check if a reference is populated before cleaning its resources
  gpio: shared: fix NULL-pointer dereference in teardown path
  gpio: shared: ignore disabled nodes when traversing the device-tree
  gpiolib: acpi: Add quirk for Dell Precision 7780
  gpio: tb10x: fix OF_GPIO dependency
  gpio: qixis: select CONFIG_REGMAP_MMIO
  gpio: regmap: Fix memleak in error path in gpio_regmap_register()
  gpio: mmio: fix bad guard conversion

2 months agoMerge tag 'pci-v6.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Sat, 13 Dec 2025 04:29:22 +0000 (16:29 +1200)] 
Merge tag 'pci-v6.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fix from Bjorn Helgaas:

 - Initialize rzg3s_pcie_msi_irq() MSI status bitmap before use (Claudiu
   Beznea)

* tag 'pci-v6.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: rzg3s-host: Initialize MSI status bitmap before use

2 months agoMerge tag 'soundwire-6.19-rc1_updated' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Dec 2025 04:26:55 +0000 (16:26 +1200)] 
Merge tag 'soundwire-6.19-rc1_updated' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire updates from Vinod Koul:

 - Support for multiple sections in a BPT stream

 - Align DMA frame with BPT frames

 - Qualcomm support for v3.1.0 controllers

* tag 'soundwire-6.19-rc1_updated' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: intel_ace2x: handle multi BPT sections
  soundwire: pass sdw_bpt_section to cdns BPT helpers
  soundwire: introduce BPT section
  soundwire: intel_ace2x: add fake frame to BRA read command
  soundwire: cadence_master: add fake_size parameter to sdw_cdns_prepare_read_dma_buffer
  ASoC: SOF: Intel: export hda_sdw_bpt_get_buf_size_aligment
  soundwire: cadence: export sdw_cdns_bpt_find_bandwidth
  soundwire: cadence_master: set data_per_frame as frame capability
  soundwire: only compute BPT stream in sdw_compute_dp0_port_params
  soundwire: cadence_master: make frame index trace more readable
  soundwire: qcom: adding support for v3.1.0
  dt-bindings: soundwire: qcom: Document v3.1.0 version of IP block
  soundwire: qcom: prepare for v3.x
  soundwire: qcom: deprecate qcom,din/out-ports
  dt-bindings: soundwire: qcom: deprecate qcom,din/out-ports
  soundwire: qcom: remove unused rd_fifo_depth
  of: base: Add of_property_read_u8_index

2 months agoMerge tag 'sound-fix-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Sat, 13 Dec 2025 04:09:10 +0000 (16:09 +1200)] 
Merge tag 'sound-fix-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "The only slightly large change is the enablement of CIX HD-audio
  controller, which took a bit time to be cooked up, while most of other
  changes are device-specific small trivial fixes:

   - Default disablement of the kconfig for decades old pre-release
     alsa-lib PCM API; it's only the default config value change, so it
     can't lead to any regressions for the existing setups

   - Support for CIX HD-audio controller

   - A few ASoC ACP fixes

   - Fixes for ASoC cirrus, bcm, wcd, qcom, ak platforms

   - Trivial hardening for FireWire and USB-audio

   - HD-audio Intel binding fix and quirks"

* tag 'sound-fix-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits)
  ALSA: hda/tas2781: Add new quirk for HP new project
  ALSA: hda: cix-ipbloq: Use modern PM ops
  ALSA: hda: intel-dsp-config: Prefer legacy driver as fallback
  ASoC: amd: acp: update tdm channels for specific DAI
  ASoC: cs35l56: Fix incorrect select SND_SOC_CS35L56_CAL_SYSFS_COMMON
  ALSA: firewire-motu: add bounds check in put_user loop for DSP events
  ASoC: cs35l41: Always return 0 when a subsystem ID is found
  ALSA: uapi: Fix typo in asound.h comment
  ALSA: Do not build obsolete API
  ALSA: hda: add CIX IPBLOQ HDA controller support
  ALSA: hda/core: add addr_offset field for bus address translation
  ALSA: hda: dt-bindings: add CIX IPBLOQ HDA controller support
  ALSA: hda/realtek: Add support for ASUS UM3406GA
  ALSA: hda/realtek: Add support for HP Turbine Laptops
  ALSA: usb-audio: Initialize status1 to fix uninitialized symbol errors
  ALSA: firewire-motu: fix buffer overflow in hwdep read for DSP events
  ALSA: hda: cs35l41: Fix NULL pointer dereference in cs35l41_hda_read_acpi()
  ASoC: cros_ec_codec: Remove unnecessary selection of CRYPTO
  ASoc: qcom: q6afe: fix bad guard conversion
  ASoC: rockchip: Fix Wvoid-pointer-to-enum-cast warning (again)
  ...

2 months agoMerge tag 'drm-misc-fixes-2025-12-10' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Sat, 13 Dec 2025 00:54:28 +0000 (10:54 +1000)] 
Merge tag 'drm-misc-fixes-2025-12-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.19-rc1:
- Fix stack usage warning in novatek-nt35560.
- Fix s/r, i2c issues in nouveau and update string handling.
- Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83.
- Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties().
- Fix devcoredump crash on reading evicted bo's.
- Fix bigendian handling in mgag200.
- Fix probe failure in tilcdc.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/6c371dc1-08bf-4a34-895c-9ef348b6061b@linux.intel.com
2 months agoi3c: adi: Fix confusing cleanup.h syntax
Krzysztof Kozlowski [Mon, 8 Dec 2025 02:07:52 +0000 (03:07 +0100)] 
i3c: adi: Fix confusing cleanup.h syntax

Initializing automatic __free variables to NULL without need (e.g.
branches with different allocations), followed by actual allocation is
in contrary to explicit coding rules guiding cleanup.h:

"Given that the "__free(...) = NULL" pattern for variables defined at
the top of the function poses this potential interdependency problem the
recommendation is to always define and assign variables in one statement
and not group variable definitions at the top of the function when
__free() is used."

Code does not have a bug, but is less readable and uses discouraged
coding practice, so fix that by moving declaration to the place of
assignment.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251208020750.4727-4-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agoi3c: master: Fix confusing cleanup.h syntax
Krzysztof Kozlowski [Mon, 8 Dec 2025 02:07:51 +0000 (03:07 +0100)] 
i3c: master: Fix confusing cleanup.h syntax

Initializing automatic __free variables to NULL without need (e.g.
branches with different allocations), followed by actual allocation is
in contrary to explicit coding rules guiding cleanup.h:

"Given that the "__free(...) = NULL" pattern for variables defined at
the top of the function poses this potential interdependency problem the
recommendation is to always define and assign variables in one statement
and not group variable definitions at the top of the function when
__free() is used."

Code does not have a bug, but is less readable and uses discouraged
coding practice, so fix that by moving declaration to the place of
assignment.

Not that other existing usage of __free() in this context is a corret
exception initialized to NULL, because the actual allocation is branched
in if().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251208020750.4727-3-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agoi3c: master: cleanup callback .priv_xfers()
Frank Li [Wed, 3 Dec 2025 20:45:51 +0000 (15:45 -0500)] 
i3c: master: cleanup callback .priv_xfers()

Remove the .priv_xfers() callback from the framework after all master
controller drivers have switched to use the new .i3c_xfers() callback.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Link: https://patch.msgid.link/20251203-i3c_xfer_cleanup_master-v2-2-7dd94d04ee2d@nxp.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agoMerge tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuaca...
Linus Torvalds [Fri, 12 Dec 2025 17:44:03 +0000 (05:44 +1200)] 
Merge tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Add basic LoongArch32 support

   Note: Build infrastructures of LoongArch32 are not enabled yet,
   because we need to adjust irqchip drivers and wait for GNU toolchain
   be upstream first.

 - Select HAVE_ARCH_BITREVERSE in Kconfig

 - Fix build and boot for CONFIG_RANDSTRUCT

 - Correct the calculation logic of thread_count

 - Some bug fixes and other small changes

* tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (22 commits)
  LoongArch: Adjust default config files for 32BIT/64BIT
  LoongArch: Adjust VDSO/VSYSCALL for 32BIT/64BIT
  LoongArch: Adjust misc routines for 32BIT/64BIT
  LoongArch: Adjust user accessors for 32BIT/64BIT
  LoongArch: Adjust system call for 32BIT/64BIT
  LoongArch: Adjust module loader for 32BIT/64BIT
  LoongArch: Adjust time routines for 32BIT/64BIT
  LoongArch: Adjust process management for 32BIT/64BIT
  LoongArch: Adjust memory management for 32BIT/64BIT
  LoongArch: Adjust boot & setup for 32BIT/64BIT
  LoongArch: Adjust common macro definitions for 32BIT/64BIT
  LoongArch: Add adaptive CSR accessors for 32BIT/64BIT
  LoongArch: Add atomic operations for 32BIT/64BIT
  LoongArch: Add new PCI ID for pci_fixup_vgadev()
  LoongArch: Add and use some macros for AVEC
  LoongArch: Correct the calculation logic of thread_count
  LoongArch: Use unsigned long for _end and _text
  LoongArch: Use __pmd()/__pte() for swap entry conversions
  LoongArch: Fix arch_dup_task_struct() for CONFIG_RANDSTRUCT
  LoongArch: Fix build errors for CONFIG_RANDSTRUCT
  ...

2 months agoMerge tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 12 Dec 2025 10:08:09 +0000 (22:08 +1200)] 
Merge tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fixes from Eric Biggers:
 "Fixes for some recent regressions as well as some longstanding issues:

   - Fix incorrect output from the arm64 NEON implementation of GHASH

   - Merge the ksimd scopes in the arm64 XTS code to reduce stack usage

   - Roll up the BLAKE2b round loop on 32-bit kernels to greatly reduce
     code size and stack usage

   - Add missing RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS dependency

   - Fix chacha-riscv64-zvkb.S to not use frame pointer for data"

* tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  crypto: arm64/ghash - Fix incorrect output from ghash-neon
  crypto/arm64: sm4/xts - Merge ksimd scopes to reduce stack bloat
  crypto/arm64: aes/xts - Use single ksimd scope to reduce stack bloat
  lib/crypto: blake2s: Replace manual unrolling with unrolled_full
  lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit
  lib/crypto: riscv: Depend on RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
  lib/crypto: riscv/chacha: Avoid s0/fp register

2 months agoMerge tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 12 Dec 2025 10:04:18 +0000 (22:04 +1200)] 
Merge tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

 - Always initialize DMA state, fixing a potentially nasty issue on the
   block side

 - btrfs zoned write fix with cached zone reports

 - Fix corruption issues in bcache with chained bio's, and further make
   it clear that the chained IO handler is simply a marker, it's not
   code meant to be executed

 - Kill old code dealing with synchronous IO polling in the block layer,
   that has been dead for a long time. Only async polling is supported
   these days

 - Fix a lockdep issue in tag_set management, moving it to RCU

 - Fix an issue with ublks bio_vec iteration

 - Don't unconditionally enforce blocking issue of ublk control
   commands, allow some of them with non-blocking issue as they
   do not block

* tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  blk-mq-dma: always initialize dma state
  blk-mq: delete task running check in blk_hctx_poll()
  block: fix cached zone reports on devices with native zone append
  block: Use RCU in blk_mq_[un]quiesce_tagset() instead of set->tag_list_lock
  ublk: don't mutate struct bio_vec in iteration
  block: prohibit calls to bio_chain_endio
  bcache: fix improper use of bi_end_io
  ublk: allow non-blocking ctrl cmds in IO_URING_F_NONBLOCK issue

2 months agoMerge tag 'io_uring-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 12 Dec 2025 10:01:32 +0000 (22:01 +1200)] 
Merge tag 'io_uring-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fix from Jens Axboe:
 "Single fix for io_uring headed to stable, fixing an issue introduced
  with the min_wait support earlier this year, where SQPOLL didn't get
  correctly woken if an event arrived once the event waiting has
  finished the min_wait portion.

  As we already have regression tests for this added and people
  reporting new failures there, let's get this one flushed out
  so it can bubble back down to stable as well"

* tag 'io_uring-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring: fix min_wait wakeups for SQPOLL

2 months agoMerge tag 'v6.19-rc-smb3-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Fri, 12 Dec 2025 09:59:19 +0000 (21:59 +1200)] 
Merge tag 'v6.19-rc-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - minor cleanup

 - minor update to comment to avoid confusion about fs type

* tag 'v6.19-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
  smb/server: add comment to FileSystemName of FileFsAttributeInformation
  smb/server: remove unused nterr.h
  smb/server: rename include guard in smb_common.h

2 months agoMerge tag 'v6.19-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 12 Dec 2025 09:56:25 +0000 (21:56 +1200)] 
Merge tag 'v6.19-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Fix incorrect error code defines

 - Add missing error code definitions

 - Add parenthesis around NT_STATUS code defines to fix checkpatch
   warnings

 - Remove some duplicated protocol definitions, moving to common code
   shared by client and server

 - Add missing protocol documentation reference (for change notify)

 - Correct struct definition (for duplicate_extents_to_file_ex)

* tag 'v6.19-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb/client: remove DeviceType Flags and Device Characteristics definitions
  smb: move File Attributes definitions into common/fscc.h
  smb: update struct duplicate_extents_to_file_ex
  smb: move file_notify_information to common/fscc.h
  smb: move SMB2 Notify Action Flags into common/smb2pdu.h
  smb: move notify completion filter flags into common/smb2pdu.h
  smb/client: add parentheses to NT error code definitions containing bitwise OR operator
  smb: add documentation references for smb2 change notify definitions
  smb/client: add 4 NT error code definitions
  smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value
  smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value
  smb/client: fix NT_STATUS_NO_DATA_DETECTED value

2 months agoMerge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Fri, 12 Dec 2025 09:52:42 +0000 (21:52 +1200)] 
Merge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Bugfixes:
   - Fix 'nlink' attribute update races when unlinking a file
   - Add missing initialisers for the directory verifier in various
     places
   - Don't regress the NFSv4 open state due to misordered racing replies
   - Ensure the NFSv4.x callback server uses the correct transport
     connection
   - Fix potential use-after-free races when shutting down the NFSv4.x
     callback server
   - Fix a pNFS layout commit crash
   - Assorted fixes to ensure correct propagation of mount options when
     the client crosses a filesystem boundary and triggers the VFS
     automount code
   - More localio fixes

  Features and cleanups:
   - Add initial support for basic directory delegations
   - SunRPC back channel code cleanups"

* tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
  NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegations
  nfs/localio: remove 61 byte hole from needless ____cacheline_aligned
  nfs/localio: remove alignment size checking in nfs_is_local_dio_possible
  NFS: Fix up the automount fs_context to use the correct cred
  NFS: Fix inheritance of the block sizes when automounting
  NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags
  Revert "nfs: ignore SB_RDONLY when mounting nfs"
  Revert "nfs: clear SB_RDONLY before getting superblock"
  Revert "nfs: ignore SB_RDONLY when remounting nfs"
  NFS: Add a module option to disable directory delegations
  NFS: Shortcut lookup revalidations if we have a directory delegation
  NFS: Request a directory delegation during RENAME
  NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK
  NFS: Add support for sending GDD_GETATTR
  NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid
  NFSv4.1: protect destroying and nullifying bc_serv structure
  SUNRPC: new helper function for stopping backchannel server
  SUNRPC: cleanup common code in backchannel request
  NFSv4.1: pass transport for callback shutdown
  NFSv4: ensure the open stateid seqid doesn't go backwards
  ...

2 months agorseq: Always inline rseq_debug_syscall_return()
Eric Dumazet [Fri, 5 Dec 2025 10:07:53 +0000 (10:07 +0000)] 
rseq: Always inline rseq_debug_syscall_return()

To get the full benefit of:

  eaa9088d568c ("rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y")

clang needs an __always_inline instead of a plain inline qualifier:

$ for i in {1..10}; do taskset -c 4 perf5 bench syscall basic -l 100000000 | grep "ops/sec"; done

 Before      After
ops/sec  15424491    15872221   +2.9%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251205100753.4073221-1-edumazet@google.com
2 months agobug: Hush suggest-attribute=format for __warn_printf()
Brendan Jackman [Sun, 7 Dec 2025 03:53:18 +0000 (03:53 +0000)] 
bug: Hush suggest-attribute=format for __warn_printf()

Recent additions to this function cause GCC 14.3.0 to get excited
(W=1) and suggest a missing attribute:

lib/bug.c: In function '__warn_printf':
lib/bug.c:187:25: error: function '__warn_printf' be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
  187 |                         vprintk(fmt, *args);
      |                         ^~~~~~~

Disable the diagnostic locally, following the pattern used for stuff
like va_format().

Fixes: 5c47b7f3d1a9 ("bug: Add BUG_FORMAT_ARGS infrastructure")
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251207-warn-printf-gcc-v1-1-b597d612b94b@google.com
2 months agobug: Let report_bug_entry() provide the correct bugaddr
Heiko Carstens [Mon, 8 Dec 2025 20:06:58 +0000 (21:06 +0100)] 
bug: Let report_bug_entry() provide the correct bugaddr

report_bug_entry() always provides zero for bugaddr but could easily
extract the correct address from the provided bug_entry. Just do that to
have proper warning messages.

E.g. adding an artificial:

  void foo(void) { WARN_ONCE(1, "bar"); }

function generates this warning message:

  WARNING: arch/s390/kernel/setup.c:1017 at 0x0, CPU#0: swapper/0/0
                                            ^^^

With the correct bug address this changes to:

  WARNING: arch/s390/kernel/setup.c:1017 at foo+0x1c/0x40, CPU#0: swapper/0/0
                                            ^^^^^^^^^^^^^

Fixes: 7d2c27a0ec5e ("bug: Add report_bug_entry()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251208200658.3431511-1-hca@linux.ibm.com
2 months agoMerge tag 'drm-intel-next-fixes-2025-12-12' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Fri, 12 Dec 2025 08:57:43 +0000 (18:57 +1000)] 
Merge tag 'drm-intel-next-fixes-2025-12-12' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

drm/i915 fixes for v6.19-rc1:
- Fix format string truncation warning
- FIx runtime PM reference during fbdev BO creation

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/281309f78560bcceebac8d5c0511efe66baf641c@intel.com
2 months agoperf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
Evan Li [Fri, 12 Dec 2025 08:49:43 +0000 (16:49 +0800)] 
perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()

handle_pmi_common() may observe an active bit set in cpuc->active_mask
while the corresponding cpuc->events[] entry has already been cleared,
which leads to a NULL pointer dereference.

This can happen when interrupt throttling stops all events in a group
while PEBS processing is still in progress. perf_event_overflow() can
trigger perf_event_throttle_group(), which stops the group and clears
the cpuc->events[] entry, but the active bit may still be set when
handle_pmi_common() iterates over the events.

The following recent fix:

  7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")

moved the cpuc->events[] clearing from x86_pmu_stop() to x86_pmu_del() and
relied on cpuc->active_mask/pebs_enabled checks. However,
handle_pmi_common() can still encounter a NULL cpuc->events[] entry
despite the active bit being set.

Add an explicit NULL check on the event pointer before using it,
to cover this legitimate scenario and avoid the NULL dereference crash.

Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Reported-by: kitta <kitta@linux.alibaba.com>
Co-developed-by: kitta <kitta@linux.alibaba.com>
Signed-off-by: Evan Li <evan.li@linux.alibaba.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251212084943.2124787-1-evan.li@linux.alibaba.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220855
2 months agoMerge tag 'amd-drm-fixes-6.19-2025-12-11' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 11 Dec 2025 23:26:26 +0000 (09:26 +1000)] 
Merge tag 'amd-drm-fixes-6.19-2025-12-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.19-2025-12-11:

amdgpu:
- SI fix
- DC reduce stack usage
- HDMI fixes
- VCN 4.0.5 fix
- DP MST fix
- DC memory allocation fix

amdkfd:
- SVM fix
- Trap handler fix
- VGPR fixes for GC 11.5

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251211195600.1641924-1-alexander.deucher@amd.com
2 months agoMerge tag 'drm-misc-next-fixes-2025-12-10' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Thu, 11 Dec 2025 23:20:22 +0000 (09:20 +1000)] 
Merge tag 'drm-misc-next-fixes-2025-12-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next-fixes for v6.19-rc1:
- Fix uaf in panthor.
- Revert 8 byte alignment constraint for pitch in dumb bo's.
- Fix DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC handling renasas.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/a82c2a2a-314f-403b-85bf-9b3ee09b903c@linux.intel.com
2 months agoALSA: hda/tas2781: Add new quirk for HP new project
Baojun Xu [Thu, 11 Dec 2025 09:24:26 +0000 (17:24 +0800)] 
ALSA: hda/tas2781: Add new quirk for HP new project

Add new vendor_id and subsystem_id in quirk for HP new project (NexusX).

Signed-off-by: Baojun Xu <baojun.xu@ti.com>
Link: https://patch.msgid.link/20251211092427.1648-1-baojun.xu@ti.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agoALSA: hda: cix-ipbloq: Use modern PM ops
Nathan Chancellor [Thu, 11 Dec 2025 01:50:03 +0000 (10:50 +0900)] 
ALSA: hda: cix-ipbloq: Use modern PM ops

When building without CONFIG_PM_SLEEP, there are several warnings (or
errors with CONFIG_WERROR=y / W=e) from the cix-ipbloq driver:

  sound/hda/controllers/cix-ipbloq.c:378:12: error: 'cix_ipbloq_hda_runtime_resume' defined but not used [-Werror=unused-function]
    378 | static int cix_ipbloq_hda_runtime_resume(struct device *dev)
        |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  sound/hda/controllers/cix-ipbloq.c:362:12: error: 'cix_ipbloq_hda_runtime_suspend' defined but not used [-Werror=unused-function]
    362 | static int cix_ipbloq_hda_runtime_suspend(struct device *dev)
        |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  sound/hda/controllers/cix-ipbloq.c:349:12: error: 'cix_ipbloq_hda_resume' defined but not used [-Werror=unused-function]
    349 | static int cix_ipbloq_hda_resume(struct device *dev)
        |            ^~~~~~~~~~~~~~~~~~~~~
  sound/hda/controllers/cix-ipbloq.c:336:12: error: 'cix_ipbloq_hda_suspend' defined but not used [-Werror=unused-function]
    336 | static int cix_ipbloq_hda_suspend(struct device *dev)
        |            ^~~~~~~~~~~~~~~~~~~~~~

When CONFIG_PM and CONFIG_PM_SLEEP are unset, SET_SYSTEM_SLEEP_PM_OPS()
and SET_RUNTIME_PM_OPS() evaluate to nothing, so these functions appear
unused to the compiler in this configuration.

Use the modern SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS macros to resolve
these warnings, which is what they are intended to do. Additionally,
wrap &cix_ipbloq_hda_pm in pm_ptr() to ensure the compiler can drop the
entire structure when CONFIG_PM is unset.

Fixes: d91e9bd10125 ("ALSA: hda: add CIX IPBLOQ HDA controller support")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251211-hda-cix-ipbloq-modern-pm-ops-v1-1-c7a5580af021@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agoMerge tag 'asoc-fix-v6.19-merge-window' of https://git.kernel.org/pub/scm/linux/kerne...
Takashi Iwai [Thu, 11 Dec 2025 08:34:00 +0000 (09:34 +0100)] 
Merge tag 'asoc-fix-v6.19-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.19

A small pile of fixes that came in during the merge window, it's all
fairly standard device specific stuff.

2 months agosmb/client: remove DeviceType Flags and Device Characteristics definitions
ZhangGuoDong [Fri, 14 Nov 2025 08:41:20 +0000 (16:41 +0800)] 
smb/client: remove DeviceType Flags and Device Characteristics definitions

These definitions are already in common/smb2pdu.h, so remove the duplicated
ones from the client.

Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 months agosmb: move File Attributes definitions into common/fscc.h
ChenXiaoSong [Sun, 30 Nov 2025 13:17:15 +0000 (21:17 +0800)] 
smb: move File Attributes definitions into common/fscc.h

These definitions are specified in MS-FSCC 2.6, so move them into fscc.h.

Modify the following places:

  - FILE_ATTRIBUTE__MASK -> FILE_ATTRIBUTE_MASK
  - Update FILE_ATTRIBUTE_MASK value
  - cpu_to_le32(constant) -> cpu_to_le32(MACRO DEFINITION)

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 months agosmb: update struct duplicate_extents_to_file_ex
ChenXiaoSong [Mon, 1 Dec 2025 06:59:38 +0000 (14:59 +0800)] 
smb: update struct duplicate_extents_to_file_ex

Add the missing field to the structure (see MS-FSCC 2.3.9.2), and correct
the section number in the documentation reference.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>