Steve Sistare [Tue, 10 Jun 2025 15:39:21 +0000 (08:39 -0700)]
vfio/container: recover from unmap-all-vaddr failure
If there are multiple containers and unmap-all fails for some container, we
need to remap vaddr for the other containers for which unmap-all succeeded.
Recover by walking all address ranges of all containers to restore the vaddr
for each. Do so by invoking the vfio listener callback, and passing a new
"remap" flag that tells it to restore a mapping without re-allocating new
userland data structures.
Steve Sistare [Tue, 10 Jun 2025 15:39:20 +0000 (08:39 -0700)]
vfio/container: mdev cpr blocker
During CPR, after VFIO_DMA_UNMAP_FLAG_VADDR, the vaddr is temporarily
invalid, so mediated devices cannot be supported. Add a blocker for them.
This restriction will not apply to iommufd containers when CPR is added
for them in a future patch.
Steve Sistare [Tue, 10 Jun 2025 15:39:19 +0000 (08:39 -0700)]
vfio/container: restore DMA vaddr
In new QEMU, do not register the memory listener at device creation time.
Register it later, in the container post_load handler, after all vmstate
that may affect regions and mapping boundaries has been loaded. The
post_load registration will cause the listener to invoke its callback on
each flat section, and the calls will match the mappings remembered by the
kernel.
The listener calls a special dma_map handler that passes the new VA of each
section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
handler at the end.
Steve Sistare [Tue, 10 Jun 2025 15:39:18 +0000 (08:39 -0700)]
vfio/container: discard old DMA vaddr
In the container pre_save handler, discard the virtual addresses in DMA
mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest RAM will be
remapped at a different VA after in new QEMU. DMA to already-mapped
pages continues.
Steve Sistare [Tue, 10 Jun 2025 15:39:17 +0000 (08:39 -0700)]
vfio/container: preserve descriptors
At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
the saved descriptors.
During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped. The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.
Steve Sistare [Tue, 10 Jun 2025 15:39:16 +0000 (08:39 -0700)]
vfio/container: register container for cpr
Register a legacy container for cpr-transfer, replacing the generic CPR
register call with a more specific legacy container register call. Add a
blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.
This is mostly boiler plate. The fields to to saved and restored are added
in subsequent patches.
Steve Sistare [Tue, 10 Jun 2025 15:39:15 +0000 (08:39 -0700)]
migration: lower handler priority
Define a vmstate priority that is lower than the default, so its handlers
run after all default priority handlers. Since 0 is no longer the default
priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT.
CPR for vfio will use this to install handlers for containers that run
after handlers for the devices that they contain.
John Levon [Sat, 7 Jun 2025 00:10:35 +0000 (17:10 -0700)]
vfio: add per-region fd support
For vfio-user, each region has its own fd rather than sharing
vbasedev's. Add the necessary plumbing to support this, and use the
correct fd in vfio_region_mmap().
Rorie Reyes [Mon, 9 Jun 2025 16:44:18 +0000 (12:44 -0400)]
s390: implementing CHSC SEI for AP config change
Handle interception of the CHSC SEI instruction for requests
indicating the guest's AP configuration has changed.
If configuring --without-default-devices, hw/s390x/ap-stub.c
was created to handle such circumstance. Also added the
following to hw/s390x/meson.build if CONFIG_VFIO_AP is
false, it will use the stub file.
Rorie Reyes [Mon, 9 Jun 2025 16:44:17 +0000 (12:44 -0400)]
hw/vfio/ap: Storing event information for an AP configuration change event
These functions can be invoked by the function that handles interception
of the CHSC SEI instruction for requests indicating the accessibility of
one or more adjunct processors has changed.
Rorie Reyes [Mon, 9 Jun 2025 16:44:16 +0000 (12:44 -0400)]
hw/vfio/ap: store object indicating AP config changed in a queue
Creates an object indicating that an AP configuration change event
has been received and stores it in a queue. These objects will later
be used to store event information for an AP configuration change
when the CHSC instruction is intercepted.
Rorie Reyes [Mon, 9 Jun 2025 16:44:15 +0000 (12:44 -0400)]
hw/vfio/ap: notification handler for AP config changed event
Register an event notifier handler to process AP configuration
change events by queuing the event and generating a CRW to let
the guest know its AP configuration has changed
Zhenzhong Duan [Wed, 11 Jun 2025 02:42:28 +0000 (10:42 +0800)]
vfio/pci: Fix instance_size of VFIO_PCI_BASE
Currently the final instance_size of VFIO_PCI_BASE is sizeof(PCIDevice).
It should be sizeof(VFIOPCIDevice), VFIO_PCI uses same structure as
base class VFIO_PCI_BASE, so no need to set its instance_size explicitly.
This isn't catastrophic only because VFIO_PCI_BASE is an abstract class.
Fixes: d4e392d0a99b ("vfio: add vfio-pci-base class") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/qemu-devel/20250611024228.423666-1-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
Stefan Hajnoczi [Sat, 7 Jun 2025 19:08:54 +0000 (15:08 -0400)]
Merge tag 'pull-10.1-maintainer-may-2025-070625-1' of https://gitlab.com/stsquad/qemu into staging
maintainer updates for May (testing, plugins)
- expose ~/.cache/qemu to container builds
- disable debug info in CI
- allow boot.S to handle target el mode selection
- new arguments for ips plugin
- cleanup assets in size_memop
- fix include guard in gdbstub
- introduce qGDBServerVersion gdbstub query
- update gdb aarch64-core.xml to support bitfields
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmhEXc4ACgkQ+9DbCVqe
# KkT3vwf9GtMoVDBWqWHwdV6H3rblP0k3mkApY4pTkFFSL93qApDK1gAKoklymPHJ
# 6agAWn/MmpqguB7yn7TnBEiJyW9CEq0DeWTz9ivPPh5vfm/2MMaXinVd4yH+GbTL
# uTuJg4EeRcSj8q4N4h+gROSHkH3mVOe+JlyakRKZ/PZChqjY1WRC/Hm2QdHojxlS
# xQBZe4Nip/mafm4yAlnyRVRbaSctmc3/xE/MomkVT+8JMdVt6yWE0HT/nIEFW6/6
# psHoiV4XfROIWj5qMAWHVLekDrsqxJx8uiGv9o3+zKdhDhRZw3Oa5EE5N/oE8KmM
# 0s/9usRvtVD0kPh9YTfjEHWHkbPadA==
# =X63M
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 07 Jun 2025 11:42:06 EDT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* tag 'pull-10.1-maintainer-may-2025-070625-1' of https://gitlab.com/stsquad/qemu:
gdbstub: update aarch64-core.xml
gdbstub: Implement qGDBServerVersion packet
gdbstub: assert earlier in handle_read_all_regs
include/gdbstub: fix include guard in commands.h
include/exec: fix assert in size_memop
contrib/plugins: allow setting of instructions per quantum
contrib/plugins: add a scaling factor to the ips arg
tests/qtest: Avoid unaligned access in IGB test
tests/tcg: make aarch64 boot.S handle different starting modes
gitlab: disable debug info on CI builds
tests/docker: expose $HOME/.cache/qemu as docker volume
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Update aarch64-core.xml to include field definitions for PSTATE, which
in gdb is modelled in the cpsr (current program status register)
pseudo-register, named after the actual cpsr register in armv7.
Defining the fields layout of the register allows easy inspection of for
example, the current exception level (EL):
For example. Before booting a Linux guest, EL=2, but after booting and
Ctrl-C'ing in gdb, we get EL=0:
(gdb) info registers $cpsr
cpsr 0x20402009 [ SP EL=2 BTYPE=0 PAN C ]
(gdb) cont
Continuing.
^C
Thread 2 received signal SIGINT, Interrupt.
0x0000ffffaaff286c in ?? ()
(gdb) info registers $cpsr
cpsr 0x20001000 [ EL=0 BTYPE=0 SSBS C ]
The aarch64-core.xml has been updated to match exactly the version
retrieved from upstream gdb, retrieved in 2025-05-19 from HEAD commit 9f4dc0b137c86f6ff2098cb1ab69442c69d6023d.
This commit adds support for the `qGDBServerVersion` packet to the qemu
gdbstub which could be used by clients to detect the QEMU version
(and, e.g., use a workaround for known bugs).
This packet is not documented/standarized by GDB but it was implemented
by LLDB gdbstub [0] and is helpful for projects like Pwndbg [1].
This has been implemented by Patryk, who I included in Co-authored-by
and who asked me to send the patch.
Alex Bennée [Tue, 3 Jun 2025 11:01:53 +0000 (12:01 +0100)]
contrib/plugins: allow setting of instructions per quantum
The default is we update time every 1/10th of a second or so. However
for some cases we might want to update time more frequently. Allow
this to be set via the command line through the ipq argument.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-7-alex.bennee@linaro.org>
Nabih Estefan [Tue, 3 Jun 2025 11:01:51 +0000 (12:01 +0100)]
tests/qtest: Avoid unaligned access in IGB test
../tests/qtest/libqos/igb.c:106:5: runtime error: load of misaligned address 0x562040be8e33 for type 'uint32_t', which requires 4 byte alignment
Instead of straight casting the uint8_t array, we can use ldl_le_p and
lduw_l_p to assure the unaligned access works properly against
uint32_t and uint16_t.
Alex Bennée [Tue, 3 Jun 2025 11:01:50 +0000 (12:01 +0100)]
tests/tcg: make aarch64 boot.S handle different starting modes
Currently the boot.S code assumes everything starts at EL1. This will
break things like the memory test which will barf on unaligned memory
access when run at a higher level.
Adapt the boot code to do some basic verification of the starting mode
and the minimal configuration to move to the lower exception levels.
With this we can run the memory test with:
Alex Bennée [Tue, 3 Jun 2025 11:01:49 +0000 (12:01 +0100)]
gitlab: disable debug info on CI builds
Our default build enables debug info which adds hugely to the size of
the builds as well as the size of cached objects. Disable debug info
across the board to save space and reduce pressure on the CI system.
We still have a number of builds which explicitly enable debug and
related extra asserts like --enable-debug-tcg.
Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-3-alex.bennee@linaro.org>
Alex Bennée [Tue, 3 Jun 2025 11:01:48 +0000 (12:01 +0100)]
tests/docker: expose $HOME/.cache/qemu as docker volume
If you want to run functional tests we should share .cache/qemu so we
don't force containers to continually re-download images. We also move
ccache to use this shared area.
Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-2-alex.bennee@linaro.org>
Stefan Hajnoczi [Fri, 6 Jun 2025 13:42:58 +0000 (09:42 -0400)]
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* futex: support Windows
* qemu-thread: Avoid futex abstraction for non-Linux
* migration, hw/display/apple-gfx: replace QemuSemaphore with QemuEvent
* rust: bindings for Error
* hpet, rust/hpet: return errors from realize if properties are incorrect
* rust/hpet: Drop BqlCell wrapper for num_timers
* target/i386: Emulate ftz and denormal flag bits correctly
* i386/kvm: Prefault memory on page state change
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (31 commits)
tests/tcg/x86_64/fma: add test for exact-denormal output
target/i386: Wire up MXCSR.DE and FPUS.DE correctly
target/i386: Use correct type for get_float_exception_flags() values
target/i386: Detect flush-to-zero after rounding
hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
migration/postcopy: Replace QemuSemaphore with QemuEvent
migration/colo: Replace QemuSemaphore with QemuEvent
migration: Replace QemuSemaphore with QemuEvent
qemu-thread: Document QemuEvent
qemu-thread: Use futex if available for QemuLockCnt
qemu-thread: Use futex for QemuEvent on Windows
qemu-thread: Avoid futex abstraction for non-Linux
qemu-thread: Replace __linux__ with CONFIG_LINUX
futex: Support Windows
futex: Check value after qemu_futex_wait()
i386/kvm: Prefault memory on page state change
rust: make TryFrom macro more resilient
docs: update Rust module status
rust/hpet: Drop BqlCell wrapper for num_timers
rust/hpet: return errors from realize if properties are incorrect
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Mon, 19 May 2025 14:51:13 +0000 (15:51 +0100)]
target/i386: Wire up MXCSR.DE and FPUS.DE correctly
The x86 DE bit in the FPU and MXCSR status is supposed to be set
when an input denormal is consumed. We didn't previously report
this from softfloat, so the x86 code either simply didn't set
the DE bit or else incorrectly wired it up to denormal_flushed,
depending on which register you looked at.
Now we have input_denormal_used we can wire up these DE bits
with the semantics they are supposed to have.
Peter Maydell [Mon, 19 May 2025 14:51:12 +0000 (15:51 +0100)]
target/i386: Use correct type for get_float_exception_flags() values
The softfloat get_float_exception_flags() function returns 'int', but
in various places in target/i386 we incorrectly store the returned
value into a uint8_t. This currently has no ill effects because i386
doesn't care about any of the float_flag enum values above 0x40.
However, we want to start using float_flag_input_denormal_used, which
is 0x4000.
Switch to using 'int' so that we can handle all the possible valid
float_flag_* values. This includes changing the return type of
save_exception_flags() and the argument to merge_exception_flags().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250519145114.2786534-3-peter.maydell@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Mon, 19 May 2025 14:51:11 +0000 (15:51 +0100)]
target/i386: Detect flush-to-zero after rounding
The Intel SDM section 10.2.3.3 on the MXCSR.FTZ bit says that we
flush outputs to zero when we detect underflow, which is after
rounding. Set the detect_ftz flag accordingly.
This allows us to enable the test in fma.c which checks this
behaviour.
Akihiko Odaki [Mon, 26 May 2025 05:29:13 +0000 (14:29 +0900)]
qemu-thread: Use futex for QemuEvent on Windows
Use the futex-based implementation of QemuEvent on Windows to
remove code duplication and remove the overhead of event object
construction and destruction.
Akihiko Odaki [Mon, 26 May 2025 05:29:13 +0000 (14:29 +0900)]
qemu-thread: Avoid futex abstraction for non-Linux
qemu-thread used to abstract pthread primitives into futex for the
QemuEvent implementation of POSIX systems other than Linux. However,
this abstraction has one key difference: unlike futex, pthread
primitives require an explicit destruction, and it must be ordered after
wait and wake operations.
It would be easier to perform destruction if a wait operation ensures
the corresponding wake operation finishes as POSIX semaphore does, but
that requires to protect state accesses in qemu_event_set() and
qemu_event_wait() with a mutex. On the other hand, real futex does not
need such a protection but needs complex barrier and atomic operations
to ensure ordering between the two functions.
Add special implementations of qemu_event_set() and qemu_event_wait()
using pthread primitives. qemu_event_wait() will ensure qemu_event_set()
finishes, and these functions will avoid complex barrier and atomic
operations to ensure ordering between them.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Tested-by: Phil Dennis-Jordan <phil@philjordan.eu> Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu> Link: https://lore.kernel.org/r/20250526-event-v4-5-5b784cc8e1de@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Akihiko Odaki [Thu, 29 May 2025 05:45:50 +0000 (14:45 +0900)]
futex: Check value after qemu_futex_wait()
futex(2) - Linux manual page
https://man7.org/linux/man-pages/man2/futex.2.html
> Note that a wake-up can also be caused by common futex usage patterns
> in unrelated code that happened to have previously used the futex
> word's memory location (e.g., typical futex-based implementations of
> Pthreads mutexes can cause this under some conditions). Therefore,
> callers should always conservatively assume that a return value of 0
> can mean a spurious wake-up, and use the futex word's value (i.e.,
> the user-space synchronization scheme) to decide whether to continue
> to block or not.
Tom Lendacky [Fri, 28 Mar 2025 20:30:24 +0000 (15:30 -0500)]
i386/kvm: Prefault memory on page state change
A page state change is typically followed by an access of the page(s) and
results in another VMEXIT in order to map the page into the nested page
table. Depending on the size of page state change request, this can
generate a number of additional VMEXITs. For example, under SNP, when
Linux is utilizing lazy memory acceptance, memory is typically accepted in
4M chunks. A page state change request is submitted to mark the pages as
private, followed by validation of the memory. Since the guest_memfd
currently only supports 4K pages, each page validation will result in
VMEXIT to map the page, resulting in 1024 additional exits.
When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the
size of the page state change in order to pre-map the pages and avoid the
additional VMEXITs. This helps speed up boot times.
Paolo Bonzini [Thu, 5 Jun 2025 09:12:15 +0000 (11:12 +0200)]
rust: make TryFrom macro more resilient
If the enum includes values such as "Ok", "Err", or "Error", the TryInto
macro can cause errors. Be careful and qualify identifiers with the full
path, or in the case of TryFrom<>::Error do not use the associated type
at all.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhao Liu [Mon, 26 May 2025 11:54:21 +0000 (13:54 +0200)]
rust/hpet: Drop BqlCell wrapper for num_timers
Now that the num_timers field is initialized as a property, someone may
change its default value using qdev_prop_set_uint8(), but the value is
fixed after the Rust code sees it first. Since there is no need to modify
it after realize(), it is not to be necessary to have a BqlCell wrapper.
Paolo Bonzini [Mon, 26 May 2025 11:52:11 +0000 (13:52 +0200)]
rust/hpet: change type of num_timers to usize
Remove the need to convert after every read of the BqlCell. Because the
vmstate uses a u8 as the size of the VARRAY, this requires switching
the VARRAY to use num_timers_save; which in turn requires ensuring that
the num_timers_save is always there. For simplicity do this by
removing support for version 1, which QEMU has not been producing for
~15 years.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 3 Jun 2025 15:45:39 +0000 (17:45 +0200)]
rust: qemu-api: add bindings to Error
Provide an implementation of std::error::Error that bridges the Rust
anyhow::Error and std::panic::Location types with QEMU's Error*.
It also has several utility methods, analogous to error_propagate(),
that convert a Result into a return value + Error** pair. One important
difference is that these propagation methods *panic* if *errp is NULL,
unlike error_propagate() which eats subsequent errors[1]. The reason
for this is that in C you have an error_set*() call at the site where
the error is created, and calls to error_propagate() are relatively rare.
In Rust instead, even though these functions do "propagate" a
qemu_api::Error into a C Error**, there is no error_setg() anywhere that
could check for non-NULL errp and call abort(). error_propagate()'s
behavior of ignoring subsequent errors is generally considered weird,
and there would be a bigger risk of triggering it from Rust code.
[1] This is actually a violation of the preconditions of error_propagate(),
so it should not happen. But you never know...
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 26 May 2025 07:25:50 +0000 (09:25 +0200)]
util/error: allow non-NUL-terminated err->src
Rust makes the current file available as a statically-allocated string,
but without a NUL terminator. Allow this by storing an optional maximum
length in the Error.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 23 May 2025 15:59:52 +0000 (17:59 +0200)]
subprojects: add the foreign crate
This is a cleaned up and separated version of the patches at
https://lore.kernel.org/all/20240701145853.1394967-4-pbonzini@redhat.com/
https://lore.kernel.org/all/20240701145853.1394967-5-pbonzini@redhat.com/
Its first user will be the Error bindings; for example a QEMU Error ** can be
converted to a Rust Option using
unsafe { Option::<Error>::from_foreign(c_error) }
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 26 May 2025 10:10:18 +0000 (12:10 +0200)]
subprojects: add the anyhow crate
This is a standard replacement for Box<dyn Error> which is more efficient (it only
occcupies one word) and provides a backtrace of the error. This could be plumbed
into &error_abort in the future.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since the previous commit, python/setup.cfg applies to scripts/qapi/ as
well. Configuration files in scripts/qapi/ override python/setup.cfg.
scripts/qapi/.flake8 and scripts/qapi/.isort.cfg actually match
python/setup.cfg exactly, and can go.
The differences between scripts/qapi/mypy.ini and python/setup.cfg are
harmless: namespace_packages being set to True is a requirement for the
PEP420 nested package structure of QEMU but not for scripts/qapi, but
has no effect on type checking the QAPI code. warn_unused_ignores is
used in python/ to be able to target a wide variety of mypy versions;
some of which that have added new ignore categories that are not present
in older versions.
Ultimately, scripts/qapi/mypy.ini can be removed without any real change
in behavior to how mypy enforces type safety there.
The pylint config is being left in place because the settings differ
enough from the python/ directory settings that we need a chit-chat on
how to merge them O:-)
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-7-jsnow@redhat.com
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-6-jsnow@redhat.com
John Snow [Wed, 4 Jun 2025 20:03:52 +0000 (16:03 -0400)]
python: add qapi static analysis tests
Update the python tests to also check QAPI and the QAPI Sphinx
extensions. The docs/sphinx/qapidoc_legacy.py file is not included in
these checks, as it is destined for removal soon. mypy is also not
called on the QAPI Sphinx extensions, owing to difficulties supporting
Sphinx 3.x - 8.x while maintaining static type checking support. mypy
*is* called on all of the QAPI tools themselves, though.
flake8, isort and mypy use the tool configuration from the existing
python directory (in setup.cfg). pylint continues to use the special
configuration located in scripts/qapi/ - that configuration is more
permissive. If we wish to unify the two configurations, that's a
separate series and a discussion for a later date.
The list of pylint ignores is also updated, owing again to the wide
window of pylint version support: newer versions require pragmas to
occasionally silence the "too many positional arguments" warning, but
older versions do not have such a warning category and will instead yelp
about an unrecognized option. Silence that warning, too.
As a result of this patch, one would be able to run any of the following
tests locally from the qemu.git/python directory and have it cover the
QAPI tooling as well. All of the following options run the python tests,
static analysis tests, and linter checks; but with different
combinations of dependencies and interpreters.
- "make check-minreqs" Run tests specifically under our oldest supported
Python and our oldest supported dependencies. This is the test that
runs on GitLab as "check-python-minreqs". This helps ensure we do not
regress support on older platforms accidentally.
- "make check-tox" Runs the tests under the newest supported
dependencies, but under each supported version of Python in turn. At
time of writing, this is Python 3.8 to 3.13 inclusive. This test helps
catch bleeding-edge problems before they become problems for developer
workstations. This is the GitLab test "check-python-tox" and is an
optionally run, may-fail test due to the unpredictable nature of new
dependencies being released into the ecosystem that may cause
regressions.
- "make check-dev" Runs the tests under the newest supported
dependencies using whatever version of Python the user happens to have
installed. This is a quick convenience check that does not map to any
particular GitLab test.
(Note! check-dev may be busted on Fedora 41 and bleeding edge versions
of setuptools. That's unrelated to this patch and I'll address it
separately and soon. Thank you for your patience, --mgmt)
Finally, finally, finally: this means that QAPI tooling will be linted
and type-checked from the GitLab pipelines.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-5-jsnow@redhat.com
[Edited license choice per review --js] Signed-off-by: John Snow <jsnow@redhat.com>
John Snow [Wed, 4 Jun 2025 20:03:51 +0000 (16:03 -0400)]
python: update missing dependencies from minreqs
We pin all dependencies for the "check-minreqs" test because pip lacks a
dependency resolver that installs "the oldest possible package that
meets dependency criteria". So, in order to test our stated minimum
requirements, we pin all of our dependencies (and their dependencies,
transitively) at the oldest possible versions that still work and pass
tests; proving that our minimum requirements are correct.
(It also ensures no new features accidentally sneak in from developers
on newer platforms.)
A few transitive dependencies were omitted from the pinned dependency
file by accident; as a result, pip's dependency solver can pull in newer
dependencies, which we don't want. This patch corrects the previous
oversight and pins the missing dependencies.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-4-jsnow@redhat.com
John Snow [Wed, 4 Jun 2025 20:03:50 +0000 (16:03 -0400)]
docs/qapidoc: linting fixes
This restores the linting baseline in qapidoc. The order of some imports
change slightly here due to configuring isort a little better:
previously, isort was having difficulty understanding that "compat" and
"qapidoc_legacy" were local modules because docs/sphinx "isn't a python
package". Configuring this manually, isort chooses a different import
ordering, which _is_ intentional here.
Also: extra ignores are added for pylint. The most recent versions of
pylint don't require these ignores, but the oldest versions we support
do, so in the extra ignores go.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-3-jsnow@redhat.com
Stefan Hajnoczi [Thu, 5 Jun 2025 15:00:29 +0000 (11:00 -0400)]
Merge tag 'pull-vfio-20250605' of https://github.com/legoater/qemu into staging
vfio queue:
* Fixed OpRegion detection in IGD
* Added prerequisite rework for IOMMU nesting support
* Added prerequisite rework for vfio-user
* Added prerequisite rework for VFIO live update
* Modified memory_get_xlat_addr() to return a MemoryRegion
* tag 'pull-vfio-20250605' of https://github.com/legoater/qemu:
vfio: move vfio-cpr.h
vfio: vfio_find_ram_discard_listener
MAINTAINERS: Add reviewer for CPR
vfio/iommufd: Save vendor specific device info
vfio/iommufd: Implement [at|de]tach_hwpt handlers
vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
backends/iommufd: Add a helper to invalidate user-managed HWPT
vfio/container: pass MemoryRegion to DMA operations
vfio: return mr from vfio_get_xlat_addr
vfio/igd: Fix incorrect error propagation in vfio_pci_igd_opregion_detect()
vfio/iommufd: Add comment emphasizing no movement of hiod->realize() call
vfio: refactor out IRQ signalling setup
vfio: move config space read into vfio_pci_config_setup()
vfio: move more cleanup into vfio_pci_put_device()
vfio: add more VFIOIOMMUClass docs
vfio/igd: OpRegion not found fix error typo
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (24 commits)
hw/core/qdev-properties-system: Add missing return in set_drive_helper()
iotests: fix 240
block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()
iotests/graph-changes-while-io: add test case with removal of lower snapshot
iotests/graph-changes-while-io: remove image file after test
block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
blockdev: drain while unlocked in external_snapshot_action()
blockdev: drain while unlocked in internal_snapshot_action()
block: move drain outside of quorum_del_child()
block: move drain outside of bdrv_root_unref_child()
block: move drain outside of quorum_add_child()
block: move drain outside of bdrv_attach_child()
block: move drain outside of bdrv_root_attach_child()
block: move drain outside of bdrv_set_backing_hd_drained()
block: move drain outside of bdrv_attach_child_common(_abort)()
block: move drain outside of bdrv_try_change_aio_context()
block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)
block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Zhenzhong Duan [Wed, 4 Jun 2025 06:21:15 +0000 (14:21 +0800)]
vfio/iommufd: Save vendor specific device info
Some device information returned by ioctl(IOMMU_GET_HW_INFO) are vendor
specific. Save them as raw data in a union supporting different vendors,
then vendor IOMMU can query the raw data with its fixed format for
capability directly.
Because IOMMU_GET_HW_INFO is only supported in linux, so declare those
capability related structures with CONFIG_LINUX.
Suggested-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-5-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
Zhenzhong Duan [Wed, 4 Jun 2025 06:21:13 +0000 (14:21 +0800)]
vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
Enhance HostIOMMUDeviceIOMMUFD object with 3 new members, specific
to the iommufd BE + 2 new class functions.
IOMMUFD BE includes IOMMUFD handle, devid and hwpt_id. IOMMUFD handle
and devid are used to allocate/free ioas and hwpt. hwpt_id is used to
re-attach IOMMUFD backed device to its default VFIO sub-system created
hwpt, i.e., when vIOMMU is disabled by guest. These properties are
initialized in hiod::realize() after attachment.
2 new class functions are [at|de]tach_hwpt(). They are used to
attach/detach hwpt. VFIO and VDPA can have different implementions,
so implementation will be in sub-class instead of HostIOMMUDeviceIOMMUFD,
e.g., in HostIOMMUDeviceIOMMUFDVFIO.
Add two wrappers host_iommu_device_iommufd_[at|de]tach_hwpt to wrap the
two functions.
John Levon [Wed, 21 May 2025 21:55:34 +0000 (22:55 +0100)]
vfio/container: pass MemoryRegion to DMA operations
Pass through the MemoryRegion to DMA operation handlers of vfio
containers. The vfio-user container will need this later, to translate
the vaddr into an offset for the dma map vfio-user message; CPR will
also will need this.
Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/qemu-devel/20250521215534.2688540-1-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
Steve Sistare [Mon, 19 May 2025 13:26:43 +0000 (06:26 -0700)]
vfio: return mr from vfio_get_xlat_addr
Modify memory_get_xlat_addr and vfio_get_xlat_addr to return the memory
region that the translated address is found in. This will be needed by
CPR in a subsequent patch to map blocks using IOMMU_IOAS_MAP_FILE.
Also return the xlat offset, so we can simplify the interface by removing
the out parameters that can be trivially derived from mr and xlat.
Lastly, rename the functions to to memory_translate_iotlb() and
vfio_translate_iotlb().
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: John Levon <john.levon@nutanix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1747661203-136490-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
Tomita Moeko [Thu, 22 May 2025 15:16:36 +0000 (23:16 +0800)]
vfio/igd: Fix incorrect error propagation in vfio_pci_igd_opregion_detect()
In vfio_pci_igd_opregion_detect(), errp will be set when the device does
not have OpRegion or is hotplugged. This errp will be propagated to
pci_qdev_realize(), which interprets it as failure, causing unexpected
termination on devices without OpRegion like SR-IOV VFs or discrete
GPUs. Fix it by not setting errp in vfio_pci_igd_opregion_detect().
This patch also checks if the device has OpRegion before hotplug status
to prevent unwanted warning messages on non-IGD devices.
Zhenzhong Duan [Wed, 21 May 2025 11:03:01 +0000 (19:03 +0800)]
vfio/iommufd: Add comment emphasizing no movement of hiod->realize() call
The nested IOMMU support needs device and hwpt id which are generated
only after attachment. Hiod encapsulates these information in realize()
and passes to vIOMMU.
John Levon [Tue, 20 May 2025 15:03:52 +0000 (16:03 +0100)]
vfio: move config space read into vfio_pci_config_setup()
Small cleanup that reduces duplicate code for vfio-user and reduces the
size of vfio_realize(); while we're here, correct that name to
vfio_pci_realize().
Fiona Ebner [Fri, 23 May 2025 07:02:11 +0000 (09:02 +0200)]
hw/core/qdev-properties-system: Add missing return in set_drive_helper()
Currently, changing the 'drive' property of e.g. a scsi-hd object will
result in an assertion failure if the aio context of the block node
it's replaced with doesn't match the current aio context:
The problematic scenario is already detected, but a 'return' statement
was missing.
Cc: qemu-stable@nongnu.org Fixes: d1a58c176a ("qdev: allow setting drive property for realized device") Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250523070211.280498-1-f.ebner@proxmox.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 29 May 2025 20:31:47 +0000 (16:31 -0400)]
iotests: fix 240
Commit 2e8e18c2e463 ("virtio-scsi: add iothread-vq-mapping parameter")
removed the limitation that virtio-scsi devices must successfully set
the AioContext on their BlockBackends. This was made possible thanks to
the QEMU multi-queue block layer.
This change broke qemu-iotests 240, which checks that adding a
virtio-scsi device with a drive that is already in another AioContext
will fail.
Update the test to take the relaxed behavior into account. I considered
removing this test case entirely, but the code coverage still seems
valuable.
Fixes: 2e8e18c2e463 ("virtio-scsi: add iothread-vq-mapping parameter") Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250529203147.180338-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:59 +0000 (17:10 +0200)]
block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()
Both commit ab61335025 ("block: drain from main loop thread in
bdrv_co_yield_to_drain()") and commit d05ab380db ("block: Mark drain
related functions GRAPH_RDLOCK") introduced a GLOBAL_STATE_CODE()
macro in bdrv_do_drained_end(). The assertion of being in the main
thread cannot change here, so keep only the earlier instance.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-23-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Andrey Drobyshev [Fri, 30 May 2025 15:10:58 +0000 (17:10 +0200)]
iotests/graph-changes-while-io: add test case with removal of lower snapshot
This case is catching potential deadlock which takes place when job-dismiss
is issued when I/O requests are processed in a separate iothread.
See https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
[FE: re-use top image and rename snap1->mid as suggested by Kevin Wolf
remove image file after test as suggested by Kevin Wolf
add type annotation for function argument to make mypy happy] Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-22-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:57 +0000 (17:10 +0200)]
iotests/graph-changes-while-io: remove image file after test
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-21-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:56 +0000 (17:10 +0200)]
block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
All of bdrv_drain_all_begin(), bdrv_drain_all() and
bdrv_drained_begin() poll and are not allowed to be called with the
block graph lock held. Mark the function as such.
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-20-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:55 +0000 (17:10 +0200)]
blockdev: drain while unlocked in external_snapshot_action()
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-19-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:54 +0000 (17:10 +0200)]
blockdev: drain while unlocked in internal_snapshot_action()
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-18-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:53 +0000 (17:10 +0200)]
block: move drain outside of quorum_del_child()
The quorum_del_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_del_child()
callback, which is only called in the bdrv_del_child() function, which
also runs under the graph lock.
The bdrv_del_child() function is called by qmp_x_blockdev_change().
A drained section was already introduced there by commit "block: move
drain out of quorum_add_child()".
This finally finishes moving out the drain to places that are not
under the graph lock started in "block: move draining out of
bdrv_change_aio_context() and mark GRAPH_RDLOCK".
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-17-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:52 +0000 (17:10 +0200)]
block: move drain outside of bdrv_root_unref_child()
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
bdrv_root_unref_child() is called by:
1. blk_remove_bs(), where a drained section is introduced.
2. bdrv_unref_child(), which runs under the graph lock, so the drain
will be moved further up to its callers.
3. block_job_remove_all_bdrv(), where a drained section is introduced.
For all callers of bdrv_unref_child() and its generated
bdrv_co_unref_child() coroutine variant, a drained section is
introduced, they are not explicilty listed here. The caller
quorum_del_child() holds the graph lock, so it is not actually allowed
to drain. This will be addressed in the next commit.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-16-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:51 +0000 (17:10 +0200)]
block: move drain outside of quorum_add_child()
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The quorum_add_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_add_child()
callback, which is only called in the bdrv_add_child() function, which
also runs under the graph lock.
The bdrv_add_child() function is called by qmp_x_blockdev_change(),
where a drained section is introduced.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-15-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fiona Ebner [Fri, 30 May 2025 15:10:50 +0000 (17:10 +0200)]
block: move drain outside of bdrv_attach_child()
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The function bdrv_attach_child() runs under the graph lock, so it is
not allowed to drain. It is called by:
1. replication_start()
2. quorum_add_child()
3. bdrv_open_child_common()
4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests.
In all callers, a drained section is introduced.
The function quorum_add_child() runs under the graph lock, so it is
not actually allowed to drain. This will be addressed by the following
commit.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-14-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>