Stefan Hajnoczi [Fri, 30 May 2025 15:41:13 +0000 (11:41 -0400)]
Merge tag 'pull-request-2025-05-30' of https://gitlab.com/thuth/qemu into staging
* Functional tests improvements
* Endianness improvements/clean-ups for the Microblaze machines
* Remove obsolete -2.4 and -2.5 i440fx and q35 machine types and related code
Stefan Hajnoczi [Fri, 30 May 2025 15:41:07 +0000 (11:41 -0400)]
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386/kvm: Intel TDX support
* target/i386/emulate: more lflags cleanups
* meson: remove need for explicit listing of dependencies in hw_common_arch and
target_common_arch
* rust: small fixes
* hpet: Reorganize register decoding to be more similar to Rust code
* target/i386: fixes for AMD models
* target/i386: new EPYC-Turin CPU model
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (77 commits)
target/i386/tcg/helper-tcg: fix file references in comments
target/i386: Add support for EPYC-Turin model
target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
target/i386: Add couple of feature bits in CPUID_Fn80000021_EAX
target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits
target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits
rust: make declaration of dependent crates more consistent
docs: Add TDX documentation
i386/tdx: Validate phys_bits against host value
i386/tdx: Make invtsc default on
i386/tdx: Don't treat SYSCALL as unavailable
i386/tdx: Fetch and validate CPUID of TD guest
target/i386: Print CPUID subleaf info for unsupported feature
i386: Remove unused parameter "uint32_t bit" in feature_word_description()
i386/cgs: Introduce x86_confidential_guest_check_features()
i386/tdx: Define supported KVM features for TDX
i386/tdx: Add XFD to supported bit of TDX
i386/tdx: Add supported CPUID bits relates to XFAM
i386/tdx: Add supported CPUID bits related to TD Attributes
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* tag 'pull-nbd-2025-05-29' of https://repo.or.cz/qemu/ericb:
iotests: Filter out ZFS in several tests
iotests: Improve mirror-sparse on ext4 and xfs
iotests: Use disk_usage in more places
nbd: Set unix socket send buffer on Linux
nbd: Set unix socket send buffer on macOS
io: Add helper for setting socket send buffer size
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
tests/unit/test-util-sockets: fix mem-leak on error object
The test fails with --enable-asan as the error struct is never freed.
In the case where the test expects a success but it fails, let's also
report the error for debugging (it will be freed internally).
Fixes 316e8ee8d6 ("util/qemu-sockets: Refactor inet_parse() to use QemuOpts")
Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Message-ID: <518d94c7db20060b2a086cf55ee9bffab992a907.1748280011.git.matheus.bernardino@oss.qualcomm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
hw/net/vmxnet3: Merge DeviceRealize in InstanceInit
Simplify merging vmxnet3_realize() within vmxnet3_instance_init(),
removing the need for device_class_set_parent_realize().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-20-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
VMXNET3_COMPAT_FLAG_DISABLE_PCIE was only used by the
hw_compat_2_5[] array, via the 'x-disable-pcie=on' property.
We removed all machines using that array, lets remove all the
code around VMXNET3_COMPAT_FLAG_DISABLE_PCIE.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-19-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
VMXNET3_COMPAT_FLAG_OLD_MSI_OFFSETS was only used by the
hw_compat_2_5[] array, via the 'x-old-msi-offsets=on' property.
We removed all machines using that array, lets remove all the
code around VMXNET3_COMPAT_FLAG_OLD_MSI_OFFSETS.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-18-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Simplify replacing pvscsi_realize() by pvscsi_instance_init(),
removing the need for device_class_set_parent_realize().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-17-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
PVSCSI_COMPAT_DISABLE_PCIE_BIT was only used by the
hw_compat_2_5[] array, via the 'x-disable-pcie=on' property.
We removed all machines using that array, lets remove all the
code around PVSCSI_COMPAT_DISABLE_PCIE_BIT, including the now
unused PVSCSIState::compat_flags field.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-16-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
PVSCSI_COMPAT_OLD_PCI_CONFIGURATION was only used by the
hw_compat_2_5[] array, via the 'x-old-pci-configuration=on'
property. We removed all machines using that array, lets remove
all the code around PVSCSI_COMPAT_OLD_PCI_CONFIGURATION.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-15-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
The hw_compat_2_5[] array was only used by the pc-q35-2.5 and
pc-i440fx-2.5 machines, which got removed. Remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-13-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-12-philmd@linaro.org>
[thuth: Fix error from check_patch.pl wrt to an empty "for" loop] Signed-off-by: Thomas Huth <thuth@redhat.com>
hw/i386/x86: Remove X86MachineClass::save_tsc_khz field
The X86MachineClass::save_tsc_khz boolean was only used
by the pc-q35-2.5 and pc-i440fx-2.5 machines, which got
removed. Remove it and simplify tsc_khz_needed().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-11-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
hw/i386/pc: Remove deprecated pc-q35-2.5 and pc-i440fx-2.5 machines
These machines has been supported for a period of more than 6 years.
According to our versioned machine support policy (see commit ce80c4fa6ff "docs: document special exception for machine type
deprecation & removal") they can now be removed.
Remove the now unused empty pc_compat_2_5[] array.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-10-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
VIRTIO_PCI_FLAG_DISABLE_PCIE was only used by the hw_compat_2_4[]
array, via the 'x-disable-pcie=false' property. We removed all
machines using that array, lets remove all the code around
VIRTIO_PCI_FLAG_DISABLE_PCIE (see commit 9a4c0e220d8 for similar
VIRTIO_PCI_FLAG_* enum removal).
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-9-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
VIRTIO_PCI_FLAG_MIGRATE_EXTRA was only used by the
hw_compat_2_4[] array, via the 'migrate-extra=true'
property. We removed all machines using that array,
lets remove all the code around VIRTIO_PCI_FLAG_MIGRATE_EXTRA.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20250512083948.39294-8-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
E1000_FLAG_MAC was only used by the hw_compat_2_4[] array,
via the 'extra_mac_registers=off' property. We removed all
machines using that array, lets remove all the code around
E1000_FLAG_MAC, including the MAC_ACCESS_FLAG_NEEDED enum,
similarly to commit fa4ec9ffda7 ("e1000: remove old
compatibility code").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20250512083948.39294-7-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
The hw_compat_2_4[] array was only used by the pc-q35-2.4 and
pc-i440fx-2.4 machines, which got removed. Remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-6-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
The pc_compat_2_4[] array was only used by the pc-q35-2.4
and pc-i440fx-2.4 machines, which got removed. Remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-4-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
hw/i386/pc: Remove PCMachineClass::broken_reserved_end field
The PCMachineClass::broken_reserved_end field was only used
by the pc-q35-2.4 and pc-i440fx-2.4 machines, which got removed.
Remove it and simplify pc_memory_init().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-3-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
hw/i386/pc: Remove deprecated pc-q35-2.4 and pc-i440fx-2.4 machines
These machines has been supported for a period of more than 6 years.
According to our versioned machine support policy (see commit ce80c4fa6ff "docs: document special exception for machine type
deprecation & removal") they can now be removed.
Remove the qtest in test-x86-cpuid-compat.c file.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-2-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Thomas Huth [Thu, 15 May 2025 13:20:19 +0000 (15:20 +0200)]
docs: Deprecate the qemu-system-microblazeel binary
The (former big-endian only) binary qemu-system-microblaze can
handle both endiannesses nowadays, so we don't need the separate
qemu-system-microblazeel binary for little endian anymore. Let's
deprecate it to avoid unnecessary compilation and test time in
the future.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250515132019.569365-5-thuth@redhat.com>
Thomas Huth [Thu, 15 May 2025 13:20:18 +0000 (15:20 +0200)]
hw/microblaze: Remove the big-endian variants of ml605 and xlnx-zynqmp-pmu
Both machines were added with little-endian in mind only (the
"endianness" CPU property was hard-wired to "true", see commits 133d23b3ad1 and a88bbb006a52), so the variants that showed up
on the big endian target likely never worked. We deprecated these
non-working machine variants two releases ago, and so far nobody
complained, so it should be fine now to disable them. Hard-wire
the machines to little endian now.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250515132019.569365-4-thuth@redhat.com>
Thomas Huth [Thu, 15 May 2025 13:20:17 +0000 (15:20 +0200)]
tests/functional: Test both microblaze s3adsp1800 endianness variants
Now that the endianness of the petalogix-s3adsp1800 can be configured,
we should test that the cross-endianness also works as expected, thus
test the big endian variant on the little endian target and vice versa.
(based on an original idea from Philippe Mathieu-Daudé)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250515132019.569365-3-thuth@redhat.com>
Thomas Huth [Thu, 15 May 2025 13:20:16 +0000 (15:20 +0200)]
hw/microblaze: Add endianness property to the petalogix_s3adsp1800 machine
Since the microblaze target can now handle both endianness, big and
little, we should provide a config knob for the user to select the
desired endianness.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250515132019.569365-2-thuth@redhat.com>
Thomas Huth [Wed, 21 May 2025 14:37:32 +0000 (16:37 +0200)]
tests/functional/test_mem_addr_space: Use set_machine() to select the machine
By using self.set_machine() the tests get properly skipped in case
the machine has not been compiled into the QEMU binary, e.g. when
"configure" has been run with "--without-default-devices".
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250521143732.140711-1-thuth@redhat.com>
Thomas Huth [Thu, 22 May 2025 08:02:08 +0000 (10:02 +0200)]
tests/functional/test_mips_malta: Re-enable the check for the PCI host bridge
The problem with the PCI bridge has been fixed in commit e5894fd6f411c1
("hw/pci-host/gt64120: Fix endianness handling"), so we can enable the
corresponding test again.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250522080208.205489-1-thuth@redhat.com>
Thomas Huth [Wed, 21 May 2025 14:51:12 +0000 (16:51 +0200)]
tests/functional/test_sparc64_tuxrun: Explicitly set the 'sun4u' machine
Use self.set_machine() to set the machine instead of relying on the
default machine of the binary. This way the test can be skipped in
case the machine has not been compiled into the QEMU binary.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250521145112.142222-1-thuth@redhat.com>
Eric Blake [Fri, 23 May 2025 16:27:23 +0000 (11:27 -0500)]
iotests: Filter out ZFS in several tests
Fiona reported that ZFS makes sparse file testing awkward, since:
- it has asynchronous allocation (not even 'fsync $file' makes du see
the desired size; it takes the slower 'fsync -f $file' which is not
appropriate for the tests)
- for tests of fully allocated files, ZFS with compression enabled
still reports smaller disk usage
Add a new _require_disk_usage that quickly probes whether an attempt
to create a sparse 5M file shows as less than 1M usage, while the same
file with -o preallocation=full shows as more than 4M usage without
sync, which should filter out ZFS behavior. Then use it in various
affected tests.
This does not add the new filter on all tests that Fiona is seeing ZFS
failures on, but only those where I could quickly spot that there is
at least one place where the test depends on the output of 'du -b' or
'stat -c %b'.
Eric Blake [Fri, 23 May 2025 16:27:22 +0000 (11:27 -0500)]
iotests: Improve mirror-sparse on ext4 and xfs
Fiona reported that an ext4 filesystem on top of LVM can sometimes
report over-allocation to du (based on the heuristics the filesystem
is making while observing the contents being mirrored); even though
the contents and actual size matched, about 50% of the time the size
reported by disk_usage was too large by 4k, failing the test. In
auditing other iotests, this is a common problem we've had to deal
with.
Meanwhile, Markus reported that an xfs filesystem reports disk usage
at a default granularity of 1M (so the sparse file occupies 3M, since
it has just over 2M data).
Reported-by: Fiona Ebner <f.ebner@proxmox.com> Reported-by: Markus Armbruster <armbru@redhat.com> Fixes: c0ddcb2c ("tests: Add iotest mirror-sparse for recent patches") Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250523163041.2548675-7-eblake@redhat.com>
[eblake: Also fix xfs issue] Signed-off-by: Eric Blake <eblake@redhat.com>
Nir Soffer [Sat, 17 May 2025 20:11:54 +0000 (23:11 +0300)]
nbd: Set unix socket send buffer on Linux
Like macOS we have similar issue on Linux. For TCP socket the send
buffer size is 2626560 bytes (~2.5 MiB) and we get good performance.
However for unix socket the default and maximum buffer size is 212992
bytes (208 KiB) and we see poor performance when using one NBD
connection, up to 4 times slower than macOS on the same machine.
Tracing shows that for every 2 MiB payload (qemu uses 2 MiB io size), we
do 1 recvmsg call with TCP socket, and 10 recvmsg calls with unix
socket.
Fixing this issue requires changing the maximum send buffer size (the
receive buffer size is ignored). This can be done using:
With this we can set the socket buffer size to 2 MiB. With the defaults
the value requested by qemu is clipped to the maximum size and has no
effect.
I tested on 2 machines:
- Fedora 42 VM on MacBook Pro M2 Max
- Dell PowerEdge R640 (Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz)
On the older Dell machine we see very little improvement, up to 1.03
higher throughput. On the M2 machine we see up to 2.67 times higher
throughput. The following results are from the M2 machine.
Reading from qemu-nbd with qemu-img convert. In this test buffer size of
4m is optimal (2.28 times faster).
Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-4-nirsof@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
Nir Soffer [Sat, 17 May 2025 20:11:53 +0000 (23:11 +0300)]
nbd: Set unix socket send buffer on macOS
On macOS we need to increase unix socket buffers size on the client and
server to get good performance. We set socket buffers on macOS after
connecting or accepting a client connection.
Testing shows that setting socket receive buffer size (SO_RCVBUF) has no
effect on performance, so we set only the send buffer size (SO_SNDBUF).
It seems to work like Linux but not documented.
Testing shows that optimal buffer size is 512k to 4 MiB, depending on
the test case. The difference is very small, so I chose 2 MiB.
I tested reading from qemu-nbd and writing to qemu-nbd with qemu-img and
computing a blkhash with nbdcopy and blksum.
To focus on NBD communication and get less noisy results, I tested
reading and writing to null-co driver. I added a read-pattern option to
the null-co driver to return data full of 0xff:
Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-3-nirsof@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
Nir Soffer [Sat, 17 May 2025 20:11:52 +0000 (23:11 +0300)]
io: Add helper for setting socket send buffer size
Testing reading and writing from qemu-nbd using a unix domain socket
shows that the platform default send buffer size is too low, leading to
poor performance and hight cpu usage.
Add a helper for setting socket send buffer size to be used in NBD code.
It can also be used in other contexts.
We don't need a helper for receive buffer size since it is not used with
unix domain sockets. This is documented for Linux, and not documented
for macOS.
Failing to set the socket buffer size is not a fatal error, but the
caller may want to warn about the failure.
Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-2-nirsof@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* tag 'pull-qapi-2025-05-28' of https://repo.or.cz/qemu/armbru:
qapi: use imperative style in documentation
qapi: make all generated files common
qapi: remove qapi_specific_outputs from meson.build
qapi: make s390x specific CPU commands unconditionally available
qapi: make most CPU commands unconditionally available
qapi: Make CpuModelExpansionInfo::deprecated-props optional and generic
qapi: remove the misc-target.json file
qapi: make Xen event commands unconditionally available
qapi: make SGX commands unconditionally available
qapi: expose query-gic-capability command unconditionally
qapi: make SEV commands unconditionally available
qapi: expand docs for SEV commands
qapi: expose rtc-reset-reinjection command unconditionally
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* tag 'pull-misc-2025-05-28' of https://repo.or.cz/qemu/armbru:
docs/about/removed-features: Move removal notes to tidy up order
docs/about/deprecated: Move deprecation notes to tidy up order
qapi/migration: Deprecate migrate argument @detach
docs/about: Belatedly document tightening of QMP device_add checking
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 28 May 2025 19:17:25 +0000 (15:17 -0400)]
Merge tag 'pull-tcg-20250528' of https://gitlab.com/rth7680/qemu into staging
accel/tcg: Fix atomic_mmu_lookup vs TLB_FORCE_SLOW
linux-user: implement pgid field of /proc/self/stat
target/sh4: Use MO_ALIGN for system UNALIGN()
target/microblaze: Use TARGET_LONG_BITS == 32 for system mode
accel/tcg: Add TCGCPUOps.pointer_wrap
target/*: Populate TCGCPUOps.pointer_wrap
* tag 'pull-tcg-20250528' of https://gitlab.com/rth7680/qemu: (28 commits)
accel/tcg: Assert TCGCPUOps.pointer_wrap is set
target/sparc: Fill in TCGCPUOps.pointer_wrap
target/s390x: Fill in TCGCPUOps.pointer_wrap
target/riscv: Fill in TCGCPUOps.pointer_wrap
target/ppc: Fill in TCGCPUOps.pointer_wrap
target/mips: Fill in TCGCPUOps.pointer_wrap
target/loongarch: Fill in TCGCPUOps.pointer_wrap
target/i386: Fill in TCGCPUOps.pointer_wrap
target/arm: Fill in TCGCPUOps.pointer_wrap
target: Use cpu_pointer_wrap_uint32 for 32-bit targets
target: Use cpu_pointer_wrap_notreached for strict align targets
accel/tcg: Add TCGCPUOps.pointer_wrap
target/sh4: Use MO_ALIGN for system UNALIGN()
tcg: Drop TCGContext.page_{mask,bits}
tcg: Drop TCGContext.tlb_dyn_max_bits
target/microblaze: Simplify compute_ldst_addr_type{a,b}
target/microblaze: Drop DisasContext.r0
target/microblaze: Use TARGET_LONG_BITS == 32 for system mode
target/microblaze: Fix printf format in mmu_translate
target/microblaze: Use TCGv_i64 for compute_ldst_addr_ea
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Babu Moger [Thu, 8 May 2025 19:58:04 +0000 (14:58 -0500)]
target/i386: Add support for EPYC-Turin model
Add the support for AMD EPYC zen 5 processors (EPYC-Turin).
Add the following new feature bits on top of the feature bits from
the previous generation EPYC models.
movdiri : Move Doubleword as Direct Store Instruction
movdir64b : Move 64 Bytes as Direct Store Instruction
avx512-vp2intersect : AVX512 Vector Pair Intersection to a Pair
of Mask Register
avx-vnni : AVX VNNI Instruction
prefetchi : Indicates support for IC prefetch
sbpb : Selective Branch Predictor Barrier
ibpb-brtype : IBPB includes branch type prediction flushing
srso-user-kernel-no : Not vulnerable to SRSO at the user-kernel boundary
Babu Moger [Thu, 8 May 2025 19:58:03 +0000 (14:58 -0500)]
target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.
L2.self_init should be true.
L2.inclusive should be true.
L3.inclusive should not be true.
L3.no_invd_sharing should be true.
Fix these cache properties.
Also add the missing RAS and SVM features bits on AMD EPYC-Genoa model.
The SVM feature bits are used in nested guests.
perfmon-v2 : Allow guests to make use of the PerfMonV2 features.
succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload: Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF
fs-gs-base-ns : WRMSR to {FS,GS,KERNEL_GS}_BASE is non-serializing
The feature details are available in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41.
Babu Moger [Thu, 8 May 2025 19:58:02 +0000 (14:58 -0500)]
target/i386: Add couple of feature bits in CPUID_Fn80000021_EAX
Add CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or
MSR_KERNEL_GS_BASE is non-serializing amd PREFETCHI that the indicates
support for IC prefetch.
CPUID_Fn80000021_EAX
Bit Feature description
20 Indicates support for IC prefetch.
1 FsGsKernelGsBaseNonSerializing.
WRMSR to FS_BASE, GS_BASE and KernelGSbase are non-serializing.
Babu Moger [Thu, 8 May 2025 19:58:01 +0000 (14:58 -0500)]
target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.
L2.self_init should be true.
L2.inclusive should be true.
L3.inclusive should not be true.
L3.no_invd_sharing should be true.
Fix these cache properties.
Also add the missing RAS and SVM features bits on AMD EPYC-Milan model.
The SVM feature bits are used in nested guests.
Paolo Bonzini [Mon, 26 May 2025 07:22:20 +0000 (09:22 +0200)]
rust: make declaration of dependent crates more consistent
Crates like "bilge" and "libc" can be shared by more than one directory,
so declare them directly in rust/meson.build. While at it, make their
variable names end with "_rs" and always add a subproject() statement
(as that pinpoints the error better if the subproject is missing and
cannot be downloaded).
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:57 +0000 (10:59 -0400)]
i386/tdx: Fetch and validate CPUID of TD guest
Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
by TDX module for TD guest. Check QEMU's configuration against the
fetched data.
Print wanring message when 1. a feature is not supported but requested
by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
enabled.
- If cpu->enforced_cpuid is not set, prints the warning message of both
1) and 2) and tweak QEMU's configuration.
- If cpu->enforced_cpuid is set, quit if any case of 1) or 2).
Lei Wang [Tue, 17 Dec 2024 12:39:31 +0000 (07:39 -0500)]
i386: Remove unused parameter "uint32_t bit" in feature_word_description()
Parameter "uint32_t bit" is not used in function feature_word_description(),
so remove it.
Signed-off-by: Lei Wang <lei4.wang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20241217123932.948789-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To do cgs specific feature checking. Note the feature checking in
x86_cpu_filter_features() is valid for non-cgs VMs. For cgs VMs like
TDX, what features can be supported has more restrictions.
Xiaoyao Li [Thu, 8 May 2025 14:59:53 +0000 (10:59 -0400)]
i386/tdx: Add supported CPUID bits relates to XFAM
Some CPUID bits are controlled by XFAM. They are not covered by
tdx_caps.cpuid (which only contians the directly configurable bits), but
they are actually supported when the related XFAM bit is supported.
Add these XFAM controlled bits to TDX supported CPUID bits based on the
supported_xfam.
Besides, incorporate the supported_xfam into the supported CPUID leaf of
0xD.
Xiaoyao Li [Thu, 8 May 2025 14:59:52 +0000 (10:59 -0400)]
i386/tdx: Add supported CPUID bits related to TD Attributes
For TDX, some CPUID feature bit is configured via TD attributes. They
are not covered by tdx_caps.cpuid (which only contians the directly
configurable CPUID bits), but they are actually supported when the
related attributre bit is supported.
Note, LASS and KeyLocker are not supported by KVM for TDX, nor does
QEMU support it (see TDX_SUPPORTED_TD_ATTRS). They are defined in
tdx_attrs_maps[] for the completeness of the existing TD Attribute
bits that are related with CPUID features.
Xiaoyao Li [Thu, 8 May 2025 14:59:51 +0000 (10:59 -0400)]
i386/tdx: Add TDX fixed1 bits to supported CPUIDs
TDX architecture forcibly sets some CPUID bits for TD guest that VMM
cannot disable it. They are fixed1 bits.
Fixed1 bits are not covered by tdx_caps.cpuid (which only contains the
directly configurable bits), while fixed1 bits are supported for TD guest
obviously.
Add fixed1 bits to tdx_supported_cpuid. Besides, set all the fixed1
bits to the initial set of KVM's support since KVM might not report them
as supported.
Xiaoyao Li [Thu, 8 May 2025 14:59:50 +0000 (10:59 -0400)]
i386/tdx: Implement adjust_cpuid_features() for TDX
Maintain a TDX specific supported CPUID set, and use it to mask the
common supported CPUID value of KVM. It can avoid newly added supported
features (reported via KVM_GET_SUPPORTED_CPUID) for common VMs being
falsely reported as supported for TDX.
As the first step, initialize the TDX supported CPUID set with all the
configurable CPUID bits. It's not complete because there are other CPUID
bits are supported for TDX but not reported as directly configurable.
E.g. the XFAM related bits, attribute related bits and fixed-1 bits.
They will be handled in the future.
Also, what matters are the CPUID bits related to QEMU's feature word.
Only mask the CPUID leafs which are feature word leaf.
Xiaoyao Li [Thu, 8 May 2025 14:59:48 +0000 (10:59 -0400)]
cpu: Don't set vcpu_dirty when guest_state_protected
QEMU calls kvm_arch_put_registers() when vcpu_dirty is true in
kvm_vcpu_exec(). However, for confidential guest, like TDX, putting
registers is disallowed due to guest state is protected.
Only set vcpu_dirty to true with guest state is not protected when
creating the vcpu.
Xiaoyao Li [Thu, 8 May 2025 14:59:46 +0000 (10:59 -0400)]
i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
Isaku Yamahata [Thu, 8 May 2025 14:59:45 +0000 (10:59 -0400)]
i386/tdx: Don't synchronize guest tsc for TDs
TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-40-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:44 +0000 (10:59 -0400)]
i386/tdx: Set and check kernel_irqchip mode for TDX
KVM mandates kernel_irqchip to be split mode.
Set it to split mode automatically when users don't provide an explicit
value, otherwise check it to be the split mode.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-39-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:43 +0000 (10:59 -0400)]
i386/tdx: Disable PIC for TDX VMs
Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
Hence disable PIC for TDX VMs and error out if user wants PIC.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-38-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:42 +0000 (10:59 -0400)]
i386/tdx: Disable SMM for TDX VMs
TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.
Disable SMM for TDX VMs and error out if user requests to enable SMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-37-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:41 +0000 (10:59 -0400)]
i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
TDX only supports readonly for shared memory but not for private memory.
In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.
Xiaoyao Li [Thu, 8 May 2025 14:59:39 +0000 (10:59 -0400)]
i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f
Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
when topology level that cannot be enumerated by leaf 0xB, e.g., die or
module level, are configured for the guest, e.g., -smp xx,dies=2.
However, TDX architecture forces to require CPUID 0x1f to configure CPU
topology.
Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
requires CPUID leaf 0x1f to be exposed to guest.
Introduce a new function x86_has_cpuid_0x1f(), which is the wrapper of
cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
to enable cpuid leaf 0x1f for the guest.
Xiaoyao Li [Thu, 8 May 2025 14:59:34 +0000 (10:59 -0400)]
i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL
TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request
termination. KVM translates such request into KVM_EXIT_SYSTEM_EVENT with
type of KVM_SYSTEM_EVENT_TDX_FATAL.
Add hanlder for such exit. Parse and print the error message, and
terminate the TD guest in the handler.
Xiaoyao Li [Thu, 8 May 2025 14:59:33 +0000 (10:59 -0400)]
i386/tdx: Enable user exit on KVM_HC_MAP_GPA_RANGE
KVM translates TDG.VP.VMCALL<MapGPA> to KVM_HC_MAP_GPA_RANGE, and QEMU
needs to enable user exit on KVM_HC_MAP_GPA_RANGE in order to handle the
memory conversion requested by TD guest.
Xiaoyao Li [Thu, 8 May 2025 14:59:29 +0000 (10:59 -0400)]
i386/tdx: Setup the TD HOB list
The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.
Build the TD HOB in TDX's machine_init_done callback.
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-24-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:28 +0000 (10:59 -0400)]
headers: Add definitions from UEFI spec for volumes, resources, etc...
Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).
All values come from the UEFI specification [1], PI spec [2] and TDVF
design guide[3].
Xiaoyao Li [Thu, 8 May 2025 14:59:27 +0000 (10:59 -0400)]
i386/tdx: Track RAM entries for TDX VM
The RAM of TDX VM can be classified into two types:
- TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
accepted by TDX guest before it can be used and will be all-zeros
after being accepted.
- TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.
Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.
The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.
Xiaoyao Li [Thu, 8 May 2025 14:59:26 +0000 (10:59 -0400)]
i386/tdx: Track mem_ptr for each firmware entry of TDVF
For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).
Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.
TDVF sections can be classified into two groups:
- Firmware itself, e.g., TDVF BFV and CFV, that located separately from
guest RAM. Its memory pointer is the bios pointer.
- Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
mmap a new memory range for them.
Register a machine_init_done callback to do the stuff.
Xiaoyao Li [Thu, 8 May 2025 14:59:24 +0000 (10:59 -0400)]
i386/tdx: Parse TDVF metadata for TDX VM
After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-19-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Isaku Yamahata [Thu, 8 May 2025 14:59:23 +0000 (10:59 -0400)]
i386/tdvf: Introduce function to parse TDVF metadata
TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.
A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.
TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.
Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-18-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Chao Peng [Thu, 8 May 2025 14:59:22 +0000 (10:59 -0400)]
i386/tdx: load TDVF for TD guest
TDVF(OVMF) needs to run at private memory for TD guest. TDX cannot
support pflash device since it doesn't support read-only private memory.
Thus load TDVF(OVMF) with -bios option for TDs.
Use memory_region_init_ram_guest_memfd() to allocate the MemoryRegion
for TDVF because it needs to be located at private memory.
Also store the MemoryRegion pointer of TDVF since the shared ramblock of
it can be discared after it gets copied to private ramblock.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-17-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:21 +0000 (10:59 -0400)]
i386/tdx: Implement user specified tsc frequency
Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.
Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-16-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:20 +0000 (10:59 -0400)]
i386/tdx: Set APIC bus rate to match with what TDX module enforces
TDX advertises core crystal clock with cpuid[0x15] as 25MHz for TD
guests and it's unchangeable from VMM. As a result, TDX guest reads
the APIC timer at the same frequency, 25MHz.
While KVM's default emulated frequency for APIC bus is 1GHz, set the
APIC bus rate to match with TDX explicitly to ensure KVM provide correct
emulated APIC timer for TD guest.
Isaku Yamahata [Thu, 8 May 2025 14:59:19 +0000 (10:59 -0400)]
i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig
Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.
example
-object tdx-guest, \
mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-14-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:18 +0000 (10:59 -0400)]
i386/tdx: Validate TD attributes
Validate TD attributes with tdx_caps that only supported bits are
allowed by KVM.
Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/20250508150002.689633-13-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:17 +0000 (10:59 -0400)]
i386/tdx: Wire CPU features up with attributes of TD guest
For QEMU VMs,
- PKS is configured via CPUID_7_0_ECX_PKS, e.g., -cpu xxx,+pks and
- PMU is configured by x86cpu->enable_pmu, e.g., -cpu xxx,pmu=on
While the bit 30 (PKS) and bit 63 (PERFMON) of TD's attributes are also
used to configure the PKS and PERFMON/PMU of TD, reuse the existing
configuration interfaces of 'cpu' for TD's attributes.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-12-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Isaku Yamahata [Thu, 8 May 2025 14:59:16 +0000 (10:59 -0400)]
i386/tdx: Make sept_ve_disable set by default
For TDX KVM use case, Linux guest is the most major one. It requires
sept_ve_disable set. Make it default for the main use case. For other use
case, it can be enabled/disabled via qemu command line.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-11-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:15 +0000 (10:59 -0400)]
i386/tdx: Add property sept-ve-disable for tdx-guest object
Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.
Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.
Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-10-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:14 +0000 (10:59 -0400)]
i386/tdx: Initialize TDX before creating TD vcpus
Invoke KVM_TDX_INIT_VM in kvm_arch_pre_create_vcpu() that
KVM_TDX_INIT_VM configures global TD configurations, e.g. the canonical
CPUID config, and must be executed prior to creating vCPUs.
Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-9-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:13 +0000 (10:59 -0400)]
kvm: Introduce kvm_arch_pre_create_vcpu()
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.
The specific implementation for i386 will be added in the future patch.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-8-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Thu, 8 May 2025 14:59:11 +0000 (10:59 -0400)]
i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.
Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.
Besides, introduce the interfaces to invoke TDX "ioctls" at VCPU scope
in preparation.