From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 19 Jun 2026 15:56:49 +0000 (-0700)
Subject: Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=c98d767b34574be82b74d77d02264a830ae1cadd;p=thirdparty%2Flinux.git

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "arm64:

     This is a bit of an odd merge window on the KVM/arm64 front. There
     is absolutely no new feature in the pull request. It is purely
     fixes, because it is simply becoming too hard to review new stuff
     when so many AI-fuelled fixes hit the list.

   - Significant cleanup of the vgic-v5 PPI support which was merged in
     7.1. This makes the code more maintainable, and squashes a couple
     of bugs in the meantime

   - Set of fixes for the handling of the MMU in an NV context,
     particularly VNCR-triggered faults. S1POE support is fixed as well

   - Large set of pKVM fixes, mostly addressing recurring issues around
     hypervisor tracking of donated pages in obscure cases where the
     donation could fail and leave things in a bizarre state

   - Fixes for the so-called "lazy vgic init", which resulted in
     sleeping operations in non-preemptible sections. This turned out to
     be far more invasive than initially expected..

   - Reduce the overhead of L1/L2 context switch by not touching the FP
     registers

   - Fix the way non-implemented page sizes are dealt with when a guest
     insist on using them for S2 translation

   - The usual set of low-impact fixes and cleanups all over the map

  Loongarch:

   - On a request for lazy FPU load, load all FPU state that the VM
     supports instead of enabling only the part (FPU, LSX or LASX) that
     caused the FPU load request

   - Some enhancements about interrupt injection

   - Some bug fixes and other small changes

  RISC-V:

   - Batch G-stage TLB flushes for GPA range based page table updates

   - Convert HGEI line management to fully per-HART

   - Fix missing CSR dirty marking when FWFT state updated via ONE_REG

   - Fix stale FWFT feature exposure to Guest/VM

   - Speed up dirty logging write faults using MMU rwlock and atomic PTE
     updates using cmpxchg() for permission-only changes

   - Use flexible array for APLIC IRQ state

   - Use kvm_slot_dirty_track_enabled() for logging enable check on a
     memslot

   - Avoid skipping valid pages in kvm_riscv_gstage_wp_range()

   - Avoid skipping valid pages in kvm_riscv_gstage_unmap_range()

   - Use endian-specific __lelong for NACL shared memory

  S390:

   - KVM_PRE_FAULT_MEMORY support

   - Support for 2G hugepages

   - Support for the ASTFLEIE 2 facility

   - Support for fast inject using kvm_arch_set_irq_inatomic

   - Fix potential leak of uninitialized bytes

   - A few more misc gmap fixes

  x86:

   - Generic support for the more granular permissions allowed by EPT,
     namely "read" (which was previously usurping the U bit) and
     separate execution bits for kernel and userspace

   - Do not assume that all page tables start with U=1/W=1/NX=0 at the
     root, as AMD GMET needs to have U=0 at the root

   - Introduce common assembly macros for use within Intel and AMD
     vendor-specific vmentry code. This touches the SPEC_CTRL handling,
     which is now entirely done in assembly for Intel (by reusing the
     AMD code that already existed), and register save/restore which
     uses some macro magic to compute the offsets in the struct. Both of
     these are preparatory changes for upcoming APX support

   - Clean up KVM's register tracking and storage, primarily to prepare
     for APX support, which expands the maximum number of GPRs from 16
     to 32

   - Keep a single copy of the PDPTRs rather than two, since
     architecturally there is just one

   - Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor
     code gets a chance to handle things like reaping the PML buffer

   - Update KVM's view of PV async enabling if and only if the MSR write
     fully succeeds

   - Fix a variety of issues where the emulator doesn't honor
     guest-debug state, and clean up related code along the way

   - Synthesize EPT Violation and #NPF "error code" bits when injecting
     faults into L1 that didn't originate in hardware (in which case the
     VMCS/VMCB doesn't hold relevant information)

   - Add support for virtualizing (well, emulating) AMD's flavor of
     CPL>0 CPUID faulting

   - Clean up the GPR APIs so that KVM's use of "raw" is consistent, and
     fix a variety of minor bugs along the way

   - Fix an OOB memory access due to not checking the VP ID when
     handling a Hyper-V PV TLB flush for L2

   - Fix a bug in the mediated PMU's handling of fixed counters that
     allowed the guest to bypass the PMU event filter

   - Allow userspace to return EAGAIN when handling SNP and TDX
     hypercalls, so the KVM can forward a "retry" status code to the
     guest, and reserve all unused error codes for future usage

   - Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific
     logic as possible into the TDX code, and to funnel (almost) all
     S-EPT updates into a single chokepoint. The motivation is largely
     to prepare for upcoming Dynamic PAMT support, but the cleanups are
     nice to have on their own

   - Plug a hole in shadow page table handling, where KVM fails to
     recursively zap nested EPT/NPT shadow page tables when the nested
     hypervisor tears down its own EPT/NPT page tables from the bottom
     up

  x86 (Intel):

   - Support for nested MBEC (Mode-Based Execute Control), see above in
     the generic section; also run with MBEC enabled even for non-nested
     mode

   - Use the kernel's "enum pg_level" in the TDX APIs instead of the
     TDX-Module's level definitions (which are 0-based)

   - Rework the TDX memory APIs to not require/assume that guest memory
     is backed by "struct page" (in prepartion for guest_memfd hugepage
     support)

   - Fix a largely benign bug where KVM TDX would incorrectly state it
     could emulate several x2APIC MSRs

   - Use the "safe" WRMSR API when proxying LBR MSR writes as the
     to-be-written value is guest controlled and completely unvalidated

  x86 (AMD):

   - Support for nested GMET (Guest Mode Execution Trap), see above in
     the generic section; also run with GMET enabled even for non-nested
     mode

   - Fixes and minor cleanups to GHCB handling, on top of the earlier
     work already merged into 7.1-rc

   - Ensure KVM's copy of CR0 and CR3 are up-to-date prior to invoking
     fastpath handlers

   - Add support for virtualizing gPAT (KVM previously just used L1's
     PAT when running L2)

   - Fix goofs where KVM mishandles side effects (e.g. single-step and
     PMC updates) when emulating VMRUN

   - Fix a variety of bugs in AVIC's handling of x2APIC MSR
     interception, most notably where KVM didn't disable interception of
     IRR, ISR, and TMR regs

   - Add support for virtualizing Host-Only/Guest-Only bits in the
     mediated PMU

   - Don't advertise support for unusable VM types, and account for VM
     types that are disabled by firmware, e.g. to mitigate security
     vulnerabilities

   - Rewrite the SEV {en,de}crypt debug ioctls as they were riddle with
     bugs and unnecessarily complicated, and add comprehensive tests

   - Clean up and deduplicate the SEV page pinning code

   - Fix minor goofs related to writing back CPUID information after
     firmware rejects a CPUID page for an SNP vCPU

  Generic:

   - Rename invalidate_begin() to invalidate_start() throughout KVM to
     follow the kernel's nomenclature, e.g. for mmu_notifiers

   - Use guard() to cleanup up various KVM+VFIO flows

   - Minor cleanups

  guest_memfd:

   - Return -EEXIST instead of -EINVAL if userspace attempts to bind a
     gmem range to multiple memslots, and fix the test that was supposed
     to ensure KVM returns -EEXIST

   - Treat memslot binding offsets and sizes as unsigned values to fix a
     bug where KVM interprets a large "offset + size" as a negative
     value and allows a nonsensical offset

   - Use the inode number instead of the page offset for the NUMA
     interleaving index to fix a bug where the effective index would
     jump by two for consecutive pages (the caller also adds in the page
     offset)

  Selftests:

   - Randomize the dirty log test's delay when reaping the bitmap on the
     first pass, as always waiting only 1ms hid a KVM RISC-V bug as the
     test reaped the bitmap before KVM could build up enough state to
     hit the bug

   - A pile of one-off fixes and cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (326 commits)
  KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level
  KVM: x86: Fix shadow paging use-after-free due to unexpected role
  KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
  KVM: s390: Enable adapter_indicators_set to use mapped pages
  KVM: s390: Add map/unmap ioctl and clean mappings post-guest
  riscv: kvm: Use endian-specific __lelong for NACL shared memory
  KVM: selftests: access_tracking_perf_test: bump number of NUMA nodes to 32
  KVM: s390: vsie: Implement ASTFLEIE facility 2
  KVM: s390: vsie: Refactor handle_stfle
  s390/sclp: Detect ASTFLEIE 2 facility
  KVM: s390: Minor refactor of base/ext facility lists
  KVM: x86/mmu: move pdptrs out of the MMU
  KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging
  KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
  KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare
  KVM: x86: remove nested_mmu from mmu_is_nested()
  KVM: arm64: vgic-its: Make ABI commit helpers return void
  KVM: s390: Initialize KVM_S390_GET_CMMA_BITS memory
  LoongArch: KVM: Add missing slots_lock for device register/unregister
  LoongArch: KVM: Validate irqchip index in irqfd routing
  ...
---

c98d767b34574be82b74d77d02264a830ae1cadd
diff --cc arch/arm64/include/asm/kvm_host.h
index 9209c54350c77,cb5ef7e6c2fe7..bae2c4f92ef5c
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@@ -1095,8 -1112,13 +1095,9 @@@ struct kvm_vcpu_arch 
  #define IN_NESTED_ERET		__vcpu_single_flag(sflags, BIT(7))
  /* SError pending for nested guest */
  #define NESTED_SERROR_PENDING	__vcpu_single_flag(sflags, BIT(8))
- 
+ /* KVM is currently emulating an L2 to L1 exception */
+ #define IN_NESTED_EXCEPTION	__vcpu_single_flag(sflags, BIT(9))
  
 -/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
 -#define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
 -			     sve_ffr_offset((vcpu)->arch.sve_max_vl))
 -
  #define vcpu_sve_max_vq(vcpu)	sve_vq_from_vl((vcpu)->arch.sve_max_vl)
  
  #define vcpu_sve_zcr_elx(vcpu)						\
diff --cc arch/x86/include/asm/tdx.h
index e5a9cf656c072,32fbdf8f55aef..89e97d5761d89
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@@ -145,32 -177,15 +146,17 @@@ struct tdx_vp 
  	struct page **tdcx_pages;
  };
  
- static inline u64 mk_keyed_paddr(u16 hkid, struct page *page)
- {
- 	u64 ret;
- 
- 	ret = page_to_phys(page);
- 	/* KeyID bits are just above the physical address bits: */
- 	ret |= (u64)hkid << boot_cpu_data.x86_phys_bits;
- 
- 	return ret;
- }
- 
- static inline int pg_level_to_tdx_sept_level(enum pg_level level)
- {
-         WARN_ON_ONCE(level == PG_LEVEL_NONE);
-         return level - 1;
- }
- 
 +void tdx_sys_disable(void);
 +
  u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args);
  u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
- u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page *source, u64 *ext_err1, u64 *ext_err2);
- u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2);
+ u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, kvm_pfn_t pfn, struct page *source,
+ 		     u64 *ext_err1, u64 *ext_err2);
+ u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, enum pg_level level, struct page *page, u64 *ext_err1, u64 *ext_err2);
  u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page);
- u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2);
- u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_err1, u64 *ext_err2);
+ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, kvm_pfn_t pfn,
+ 		     u64 *ext_err1, u64 *ext_err2);
+ u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, enum pg_level level, u64 *ext_err1, u64 *ext_err2);
  u64 tdh_mng_key_config(struct tdx_td *td);
  u64 tdh_mng_create(struct tdx_td *td, u16 hkid);
  u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp);
diff --cc drivers/crypto/ccp/sev-dev.c
index a8eb51ec0ee2d,7cd6cd6fdb109..ca473ca198b81
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@@ -2426,10 -2434,37 +2422,29 @@@ cleanup
  	return ret;
  }
  
+ static int sev_ioctl_do_snp_platform_status(struct sev_issue_cmd *argp)
+ {
+ 	struct sev_user_data_snp_status status;
+ 	int ret;
+ 
+ 	if (!argp->data)
+ 		return -EINVAL;
+ 
+ 	ret = __sev_do_snp_platform_status(&status, &argp->error);
+ 	if (ret < 0)
+ 		return ret;
+ 
+ 	if (copy_to_user((void __user *)argp->data, &status,
+ 			 sizeof(struct sev_user_data_snp_status)))
+ 		ret = -EFAULT;
+ 
+ 	return ret;
+ }
+ 
  static int sev_ioctl_do_snp_commit(struct sev_issue_cmd *argp)
  {
 -	struct sev_device *sev = psp_master->sev_data;
  	struct sev_data_snp_commit buf;
 -	bool shutdown_required = false;
 -	int ret, error;
 -
 -	if (!sev->snp_initialized) {
 -		ret = snp_move_to_init_state(argp, &shutdown_required);
 -		if (ret)
 -			return ret;
 -	}
 +	int ret;
  
  	buf.len = sizeof(buf);