Xiaoyao Li [Thu, 8 May 2025 14:59:57 +0000 (10:59 -0400)]
i386/tdx: Fetch and validate CPUID of TD guest
Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
by TDX module for TD guest. Check QEMU's configuration against the
fetched data.
Print wanring message when 1. a feature is not supported but requested
by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
enabled.
- If cpu->enforced_cpuid is not set, prints the warning message of both
1) and 2) and tweak QEMU's configuration.
- If cpu->enforced_cpuid is set, quit if any case of 1) or 2).
Lei Wang [Tue, 17 Dec 2024 12:39:31 +0000 (07:39 -0500)]
i386: Remove unused parameter "uint32_t bit" in feature_word_description()
Parameter "uint32_t bit" is not used in function feature_word_description(),
so remove it.
Signed-off-by: Lei Wang <lei4.wang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20241217123932.948789-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To do cgs specific feature checking. Note the feature checking in
x86_cpu_filter_features() is valid for non-cgs VMs. For cgs VMs like
TDX, what features can be supported has more restrictions.
Xiaoyao Li [Thu, 8 May 2025 14:59:53 +0000 (10:59 -0400)]
i386/tdx: Add supported CPUID bits relates to XFAM
Some CPUID bits are controlled by XFAM. They are not covered by
tdx_caps.cpuid (which only contians the directly configurable bits), but
they are actually supported when the related XFAM bit is supported.
Add these XFAM controlled bits to TDX supported CPUID bits based on the
supported_xfam.
Besides, incorporate the supported_xfam into the supported CPUID leaf of
0xD.
Xiaoyao Li [Thu, 8 May 2025 14:59:52 +0000 (10:59 -0400)]
i386/tdx: Add supported CPUID bits related to TD Attributes
For TDX, some CPUID feature bit is configured via TD attributes. They
are not covered by tdx_caps.cpuid (which only contians the directly
configurable CPUID bits), but they are actually supported when the
related attributre bit is supported.
Note, LASS and KeyLocker are not supported by KVM for TDX, nor does
QEMU support it (see TDX_SUPPORTED_TD_ATTRS). They are defined in
tdx_attrs_maps[] for the completeness of the existing TD Attribute
bits that are related with CPUID features.
Xiaoyao Li [Thu, 8 May 2025 14:59:51 +0000 (10:59 -0400)]
i386/tdx: Add TDX fixed1 bits to supported CPUIDs
TDX architecture forcibly sets some CPUID bits for TD guest that VMM
cannot disable it. They are fixed1 bits.
Fixed1 bits are not covered by tdx_caps.cpuid (which only contains the
directly configurable bits), while fixed1 bits are supported for TD guest
obviously.
Add fixed1 bits to tdx_supported_cpuid. Besides, set all the fixed1
bits to the initial set of KVM's support since KVM might not report them
as supported.
Xiaoyao Li [Thu, 8 May 2025 14:59:50 +0000 (10:59 -0400)]
i386/tdx: Implement adjust_cpuid_features() for TDX
Maintain a TDX specific supported CPUID set, and use it to mask the
common supported CPUID value of KVM. It can avoid newly added supported
features (reported via KVM_GET_SUPPORTED_CPUID) for common VMs being
falsely reported as supported for TDX.
As the first step, initialize the TDX supported CPUID set with all the
configurable CPUID bits. It's not complete because there are other CPUID
bits are supported for TDX but not reported as directly configurable.
E.g. the XFAM related bits, attribute related bits and fixed-1 bits.
They will be handled in the future.
Also, what matters are the CPUID bits related to QEMU's feature word.
Only mask the CPUID leafs which are feature word leaf.
Xiaoyao Li [Thu, 8 May 2025 14:59:48 +0000 (10:59 -0400)]
cpu: Don't set vcpu_dirty when guest_state_protected
QEMU calls kvm_arch_put_registers() when vcpu_dirty is true in
kvm_vcpu_exec(). However, for confidential guest, like TDX, putting
registers is disallowed due to guest state is protected.
Only set vcpu_dirty to true with guest state is not protected when
creating the vcpu.
Xiaoyao Li [Thu, 8 May 2025 14:59:46 +0000 (10:59 -0400)]
i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
Isaku Yamahata [Thu, 8 May 2025 14:59:45 +0000 (10:59 -0400)]
i386/tdx: Don't synchronize guest tsc for TDs
TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-40-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:44 +0000 (10:59 -0400)]
i386/tdx: Set and check kernel_irqchip mode for TDX
KVM mandates kernel_irqchip to be split mode.
Set it to split mode automatically when users don't provide an explicit
value, otherwise check it to be the split mode.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-39-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:43 +0000 (10:59 -0400)]
i386/tdx: Disable PIC for TDX VMs
Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
Hence disable PIC for TDX VMs and error out if user wants PIC.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-38-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:42 +0000 (10:59 -0400)]
i386/tdx: Disable SMM for TDX VMs
TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.
Disable SMM for TDX VMs and error out if user requests to enable SMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-37-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:41 +0000 (10:59 -0400)]
i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
TDX only supports readonly for shared memory but not for private memory.
In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.
Xiaoyao Li [Thu, 8 May 2025 14:59:39 +0000 (10:59 -0400)]
i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f
Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
when topology level that cannot be enumerated by leaf 0xB, e.g., die or
module level, are configured for the guest, e.g., -smp xx,dies=2.
However, TDX architecture forces to require CPUID 0x1f to configure CPU
topology.
Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
requires CPUID leaf 0x1f to be exposed to guest.
Introduce a new function x86_has_cpuid_0x1f(), which is the wrapper of
cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
to enable cpuid leaf 0x1f for the guest.
Xiaoyao Li [Thu, 8 May 2025 14:59:34 +0000 (10:59 -0400)]
i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL
TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request
termination. KVM translates such request into KVM_EXIT_SYSTEM_EVENT with
type of KVM_SYSTEM_EVENT_TDX_FATAL.
Add hanlder for such exit. Parse and print the error message, and
terminate the TD guest in the handler.
Xiaoyao Li [Thu, 8 May 2025 14:59:33 +0000 (10:59 -0400)]
i386/tdx: Enable user exit on KVM_HC_MAP_GPA_RANGE
KVM translates TDG.VP.VMCALL<MapGPA> to KVM_HC_MAP_GPA_RANGE, and QEMU
needs to enable user exit on KVM_HC_MAP_GPA_RANGE in order to handle the
memory conversion requested by TD guest.
Xiaoyao Li [Thu, 8 May 2025 14:59:29 +0000 (10:59 -0400)]
i386/tdx: Setup the TD HOB list
The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.
Build the TD HOB in TDX's machine_init_done callback.
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-24-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:28 +0000 (10:59 -0400)]
headers: Add definitions from UEFI spec for volumes, resources, etc...
Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).
All values come from the UEFI specification [1], PI spec [2] and TDVF
design guide[3].
Xiaoyao Li [Thu, 8 May 2025 14:59:27 +0000 (10:59 -0400)]
i386/tdx: Track RAM entries for TDX VM
The RAM of TDX VM can be classified into two types:
- TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
accepted by TDX guest before it can be used and will be all-zeros
after being accepted.
- TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.
Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.
The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.
Xiaoyao Li [Thu, 8 May 2025 14:59:26 +0000 (10:59 -0400)]
i386/tdx: Track mem_ptr for each firmware entry of TDVF
For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).
Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.
TDVF sections can be classified into two groups:
- Firmware itself, e.g., TDVF BFV and CFV, that located separately from
guest RAM. Its memory pointer is the bios pointer.
- Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
mmap a new memory range for them.
Register a machine_init_done callback to do the stuff.
Xiaoyao Li [Thu, 8 May 2025 14:59:24 +0000 (10:59 -0400)]
i386/tdx: Parse TDVF metadata for TDX VM
After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-19-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Isaku Yamahata [Thu, 8 May 2025 14:59:23 +0000 (10:59 -0400)]
i386/tdvf: Introduce function to parse TDVF metadata
TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.
A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.
TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.
Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-18-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Chao Peng [Thu, 8 May 2025 14:59:22 +0000 (10:59 -0400)]
i386/tdx: load TDVF for TD guest
TDVF(OVMF) needs to run at private memory for TD guest. TDX cannot
support pflash device since it doesn't support read-only private memory.
Thus load TDVF(OVMF) with -bios option for TDs.
Use memory_region_init_ram_guest_memfd() to allocate the MemoryRegion
for TDVF because it needs to be located at private memory.
Also store the MemoryRegion pointer of TDVF since the shared ramblock of
it can be discared after it gets copied to private ramblock.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-17-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:21 +0000 (10:59 -0400)]
i386/tdx: Implement user specified tsc frequency
Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.
Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-16-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:20 +0000 (10:59 -0400)]
i386/tdx: Set APIC bus rate to match with what TDX module enforces
TDX advertises core crystal clock with cpuid[0x15] as 25MHz for TD
guests and it's unchangeable from VMM. As a result, TDX guest reads
the APIC timer at the same frequency, 25MHz.
While KVM's default emulated frequency for APIC bus is 1GHz, set the
APIC bus rate to match with TDX explicitly to ensure KVM provide correct
emulated APIC timer for TD guest.
Isaku Yamahata [Thu, 8 May 2025 14:59:19 +0000 (10:59 -0400)]
i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig
Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.
example
-object tdx-guest, \
mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-14-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:18 +0000 (10:59 -0400)]
i386/tdx: Validate TD attributes
Validate TD attributes with tdx_caps that only supported bits are
allowed by KVM.
Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/20250508150002.689633-13-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:17 +0000 (10:59 -0400)]
i386/tdx: Wire CPU features up with attributes of TD guest
For QEMU VMs,
- PKS is configured via CPUID_7_0_ECX_PKS, e.g., -cpu xxx,+pks and
- PMU is configured by x86cpu->enable_pmu, e.g., -cpu xxx,pmu=on
While the bit 30 (PKS) and bit 63 (PERFMON) of TD's attributes are also
used to configure the PKS and PERFMON/PMU of TD, reuse the existing
configuration interfaces of 'cpu' for TD's attributes.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-12-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Isaku Yamahata [Thu, 8 May 2025 14:59:16 +0000 (10:59 -0400)]
i386/tdx: Make sept_ve_disable set by default
For TDX KVM use case, Linux guest is the most major one. It requires
sept_ve_disable set. Make it default for the main use case. For other use
case, it can be enabled/disabled via qemu command line.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-11-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:15 +0000 (10:59 -0400)]
i386/tdx: Add property sept-ve-disable for tdx-guest object
Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.
Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.
Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-10-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:14 +0000 (10:59 -0400)]
i386/tdx: Initialize TDX before creating TD vcpus
Invoke KVM_TDX_INIT_VM in kvm_arch_pre_create_vcpu() that
KVM_TDX_INIT_VM configures global TD configurations, e.g. the canonical
CPUID config, and must be executed prior to creating vCPUs.
Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-9-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:13 +0000 (10:59 -0400)]
kvm: Introduce kvm_arch_pre_create_vcpu()
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.
The specific implementation for i386 will be added in the future patch.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-8-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:11 +0000 (10:59 -0400)]
i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.
Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.
Besides, introduce the interfaces to invoke TDX "ioctls" at VCPU scope
in preparation.
It has one QAPI member 'attributes' defined, which allows user to set
TD's attributes directly.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 28 May 2025 09:20:13 +0000 (11:20 +0200)]
rocker: do not pollute the namespace
Do not leave the __le* macros defined, in fact do not use them at all. Fixes a
build failure on Alpine with the TDX patches:
In file included from ../hw/net/rocker/rocker_of_dpa.c:25:
../hw/net/rocker/rocker_hw.h:14:16: error: conflicting types for 'uint64_t'; have '__u64' {aka 'long long unsigned int'}
14 | #define __le64 uint64_t
| ^~~~~~~~
In file included from /usr/include/stdint.h:20,
from ../include/qemu/osdep.h:111,
from ../hw/net/rocker/rocker_of_dpa.c:17:
/usr/include/bits/alltypes.h:136:25: note: previous declaration of 'uint64_t' with type 'uint64_t' {aka 'long unsigned int'}
136 | typedef unsigned _Int64 uint64_t;
| ^~~~~~~~
because the Linux headers include a typedef of __leNN.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhao Liu [Tue, 20 May 2025 15:27:46 +0000 (23:27 +0800)]
hw/timer/hpet: Reorganize register decoding
For Rust HPET, since the commit 519088b7cf6d ("rust: hpet: decode HPET
registers into enums"), it decodes register address by checking if the
register belongs to global register space. And for C HPET, it checks
timer register space first.
While both approaches are fine, it's best to be as consistent as
possible.
Pierrick Bouvier [Wed, 21 May 2025 22:34:14 +0000 (15:34 -0700)]
meson: merge hw_common_arch in target_common_system_arch
No need to keep two different libraries, as both are compiled with exact
same flags. As well, rename target common libraries to common_{arch} and
system_{arch}, to follow what exists for common and system libraries.
Pierrick Bouvier [Wed, 21 May 2025 22:34:12 +0000 (15:34 -0700)]
meson: merge lib{system, user}_ss with {system, user}_ss
Now that target configuration can be applied to lib{system, user}_ss,
there is no reason to keep that separate from the existing {system,
user}_ss.
The only difference is that we'll now compile those files with
-DCOMPILING_SYSTEM_VS_USER, which removes poison for
CONFIG_USER_ONLY and CONFIG_SOFTMMU, without any other side effect.
We extract existing system/user code common common libraries to
lib{system, user}.
To not break existing meson files, we alias libsystem_ss to system_ss
and libuser_ss to user_ss, so we can do the cleanup in next commit.
Pierrick Bouvier [Wed, 21 May 2025 22:34:11 +0000 (15:34 -0700)]
meson: apply target config for picking files from lib{system, user}
semihosting code needs to be included only if CONFIG_SEMIHOSTING is set.
However, this is a target configuration, so we need to apply it to the
lib{system, user}_ss.
As well, this prepares merging lib{system, user}_ss with
{system, user}_ss.
Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250521223414.248276-5-pierrick.bouvier@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com> Fixes: 6f4e8a92bbd ("hw/arm: make most of the compilation units common") Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250521223414.248276-2-pierrick.bouvier@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stefan Hajnoczi [Tue, 20 May 2025 14:26:30 +0000 (10:26 -0400)]
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/riscv: clean up supported MMU modes, declarative CPU definitions,
remove .instance_post_init (reviewed by Alistair)
* qom: reverse order of instance_post_init calls
* qapi/misc-target: doc and standard improvements for SGX
* hw/pci-host/gt64120: Fix endianness handling
* i386/hvf: Make CPUID_HT supported
* i386/tcg: Make CPUID_HT and CPUID_EXT3_CMP_LEG supported
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (35 commits)
qom: reverse order of instance_post_init calls
target/riscv: remove .instance_post_init
target/riscv: convert Xiangshan Nanhu to RISCVCPUDef
target/riscv: convert Ventana V1 to RISCVCPUDef
target/riscv: convert TT Ascalon to RISCVCPUDef
target/riscv: convert THead C906 to RISCVCPUDef
target/riscv: generalize custom CSR functionality
target/riscv: th: make CSR insertion test a bit more intuitive
target/riscv: convert SiFive U models to RISCVCPUDef
target/riscv: convert ibex CPU models to RISCVCPUDef
target/riscv: convert SiFive E CPU models to RISCVCPUDef
target/riscv: convert dynamic CPU models to RISCVCPUDef
target/riscv: convert bare CPU models to RISCVCPUDef
target/riscv: convert profile CPU models to RISCVCPUDef
target/riscv: convert abstract CPU classes to RISCVCPUDef
target/riscv: add more RISCVCPUDef fields
target/riscv: include default value in cpu_cfg_fields.h.inc
target/riscv: move RISCVCPUConfig fields to a header file
target/riscv: merge riscv_cpu_class_init with the class_base function
target/riscv: store RISCVCPUDef struct directly in the class
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Mon, 3 Feb 2025 11:35:39 +0000 (12:35 +0100)]
qom: reverse order of instance_post_init calls
Currently, the instance_post_init calls are performed from the leaf
class and all the way up to Object. This is incorrect because the
leaf class cannot observe property values applied by the superclasses;
for example, a compat property will be set on a device *after*
the class's post_init callback has run.
In particular this makes it impossible for implementations of
accel_cpu_instance_init() to operate based on the actual values of
the properties, though it seems that cxl_dsp_instance_post_init and
rp_instance_post_init might have similar issues.
Follow instead the same order as instance_init, starting with Object
and running the child class's instance_post_init after the parent.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 6 Feb 2025 11:57:12 +0000 (12:57 +0100)]
target/riscv: remove .instance_post_init
Unlike other uses of .instance_post_init, accel_cpu_instance_init()
*registers* properties, and therefore must be run before
device_post_init() which sets them to their values from -global.
In order to move all registration of properties to .instance_init,
call accel_cpu_instance_init() at the end of riscv_cpu_init().
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 27 Feb 2025 13:56:30 +0000 (14:56 +0100)]
target/riscv: convert profile CPU models to RISCVCPUDef
Profile CPUs reuse the instance_init function for bare CPUs; make them
proper subclasses instead. Enabling a profile is now done based on the
RISCVCPUDef struct: even though there is room for only one in RISCVCPUDef,
subclasses check that the parent class's profile is enabled through the
parent profile mechanism.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 6 Feb 2025 16:03:01 +0000 (17:03 +0100)]
target/riscv: convert abstract CPU classes to RISCVCPUDef
Start from the top of the hierarchy: dynamic and vendor CPUs are just
markers, whereas bare CPUs can have their instance_init function
replaced by RISCVCPUDef.
The only difference is that the maximum supported SATP mode has to
be specified separately for 32-bit and 64-bit modes.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 10:31:30 +0000 (11:31 +0100)]
target/riscv: add more RISCVCPUDef fields
Allow using RISCVCPUDef to replicate all the logic of custom .instance_init
functions. To simulate inheritance, merge the child's RISCVCPUDef with
the parent and then finally move it to the CPUState at the end of
TYPE_RISCV_CPU's own instance_init function.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 5 Mar 2025 12:22:48 +0000 (13:22 +0100)]
target/riscv: include default value in cpu_cfg_fields.h.inc
In preparation for adding a function to merge two RISCVCPUConfigs
(pulling values from the parent if they are not overridden) annotate
cpu_cfg_fields.h.inc with the default value of the fields.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 10:28:23 +0000 (11:28 +0100)]
target/riscv: move RISCVCPUConfig fields to a header file
To support merging a subclass's RISCVCPUDef into the superclass, a list
of all the CPU features is needed. Put them into a header file that
can be included multiple times, expanding the macros BOOL_FIELD and
TYPE_FIELD to different operations.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 6 Feb 2025 12:41:49 +0000 (13:41 +0100)]
target/riscv: merge riscv_cpu_class_init with the class_base function
Since all TYPE_RISCV_CPU subclasses support a class_data of type
RISCVCPUDef, process it even before calling the .class_init function
for the subclasses.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 6 Feb 2025 12:13:23 +0000 (13:13 +0100)]
target/riscv: store RISCVCPUDef struct directly in the class
Prepare for adding more fields to RISCVCPUDef and reading them in
riscv_cpu_init: instead of storing the misa_mxl_max field in
RISCVCPUClass, ensure that there's always a valid RISCVCPUDef struct
and go through it.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 6 Feb 2025 12:12:09 +0000 (13:12 +0100)]
target/riscv: introduce RISCVCPUDef
Start putting all the CPU definitions in a struct. Later this will replace
instance_init functions with declarative code, for now just remove the
ugly cast of class_data.
Reviewed-by: Alistair Francis <alistair23@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 10:27:12 +0000 (11:27 +0100)]
target/riscv: move satp_mode.{map,init} out of CPUConfig
They are used to provide the nice QOM properties for svNN,
but the canonical source of the CPU configuration is now
cpu->cfg.max_satp_mode. Store them in the ArchCPU struct.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 10:09:15 +0000 (11:09 +0100)]
target/riscv: update max_satp_mode based on QOM properties
Almost all users of cpu->cfg.satp_mode care about the "max" value
satp_mode_max_from_map(cpu->cfg.satp_mode.map). Convert the QOM
properties back into it. For TCG, deduce the bitmap of supported modes
from valid_vm[].
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 09:52:09 +0000 (10:52 +0100)]
target/riscv: cpu: store max SATP mode as a single integer
The maximum available SATP mode implies all the shorter virtual address sizes.
Store it in RISCVCPUConfig and avoid recomputing it via satp_mode_max_from_map.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 12:04:22 +0000 (13:04 +0100)]
target/riscv: assert argument to set_satp_mode_max_supported is valid
Check that the argument to set_satp_mode_max_supported is valid for
the MXL value of the CPU. It would be a bug in the CPU definition
if it weren't.
In fact, there is such a bug in riscv_bare_cpu_init(): not just
SV64 is not a valid VM mode for 32-bit CPUs, SV64 is not a
valid VM mode at all, not yet at least.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 18 Feb 2025 11:00:14 +0000 (12:00 +0100)]
hw/riscv: acpi: only create RHCT MMU entry for supported types
Do not create the RHCT MMU type entry for RV32 CPUs, since it
only has definitions for SV39/SV48/SV57. Likewise, check that
satp_mode_max_from_map() will actually return a valid value, skipping
the MMU type entry if all MMU types were disabled on the command line.
Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhao Liu [Tue, 13 May 2025 14:31:31 +0000 (22:31 +0800)]
qapi/misc-target: Fix the doc to distinguish query-sgx and query-sgx-capabilities
There're 2 QMP commands: query-sgx and query-sgx-capabilities, but
their outputs are very similar and the documentation lacks clear
differentiation.
From the codes, query-sgx is used to gather guest's SGX capabilities
(including SGX related CPUIDs and EPC sections' size, in SGXInfo), and
if guest doesn't have SGX, then QEMU will report the error message.
On the other hand, query-sgx-capabilities is used to gather host's SGX
capabilities (descripted by SGXInfo as well). And if host doesn't
support SGX, then QEMU will also report the error message.
Considering that SGXInfo is already documented and both these 2 commands
have enough error messages (for the exception case in their codes).
Therefore the QAPI documentation for these two commands only needs to
emphasize that one of them applies to the guest and the other to the
host.
Fix their documentation to reflect this difference.
Reported-by: Markus Armbruster <armbru@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/20250513143131.2008078-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhao Liu [Tue, 13 May 2025 14:31:30 +0000 (22:31 +0800)]
qapi/misc-target: Fix the doc related SGXEPCSection
The "sections" field of SGXInfo is used to gather EPC section
information for both the guest and the host. Therefore, delete the "for
guest" limitation.
Additionally, avoid the abbreviation "info" and use "information"
instead. And for SGXEPCSection, delete the redundant word "info".
Zhao Liu [Fri, 16 May 2025 09:11:30 +0000 (17:11 +0800)]
qapi/misc-target: Rename SGXInfo to SgxInfo
QAPI requires strict PascalCase naming style, i.e., only the first
letter of a single word is allowed to be uppercase, which could help
with readability.
Zhao Liu [Fri, 16 May 2025 09:11:29 +0000 (17:11 +0800)]
qapi/misc-target: Rename SGXEPCSection to SgxEpcSection
QAPI requires strict PascalCase naming style, i.e., only the first
letter of a single word is allowed to be uppercase, which could help
with readability.
The GT-64120 PCI controller requires special handling where:
1. Host bridge(bus 0 ,device 0) must never be byte-swapped
2. Other devices follow MByteSwap bit in GT_PCI0_CMD
The previous implementation incorrectly swapped all accesses, breaking
host bridge detection (lspci -d 11ab:4620).
Changes made:
1. Removed gt64120_update_pci_cfgdata_mapping() and moved data_mem initialization
to gt64120_realize() for cleaner setup
2. Implemented custom read/write handlers that:
- Preserve host bridge accesses (extract32(config_reg,11,13)==0)
- apply swapping only for non-bridge devices in big-endian mode
Fixes: 145e2198 ("hw/mips/gt64xxx_pci: Endian-swap using PCI_HOST_BRIDGE MemoryRegionOps")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2826
Xiaoyao Li [Wed, 14 May 2025 03:16:52 +0000 (23:16 -0400)]
i386/hvf: Make CPUID_HT supported
Since Commit c6bd2dd63420 ("i386/cpu: Set up CPUID_HT in
x86_cpu_expand_features() instead of cpu_x86_cpuid()"), CPUID_HT will be
set in env->features[] in x86_cpu_expand_features() when vcpus >= 2.
Later in x86_cpu_filter_features() it will check against the HVF
supported bits. It will trigger the warning like
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
Add CPUID_HT to HVF supported CPUID bits to fix it.