git.ipfire.org Git - thirdparty/qemu.git/log

vfio/container: recover from unmap-all-vaddr failure

If there are multiple containers and unmap-all fails for some container, we
need to remap vaddr for the other containers for which unmap-all succeeded.
Recover by walking all address ranges of all containers to restore the vaddr
for each. Do so by invoking the vfio listener callback, and passing a new
"remap" flag that tells it to restore a mapping without re-allocating new
userland data structures.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: mdev cpr blocker

During CPR, after VFIO_DMA_UNMAP_FLAG_VADDR, the vaddr is temporarily
invalid, so mediated devices cannot be supported. Add a blocker for them.
This restriction will not apply to iommufd containers when CPR is added
for them in a future patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-8-git-send-email-steven.sistare@oracle.com
[ clg: Fixed context change in VFIODevice ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: restore DMA vaddr

In new QEMU, do not register the memory listener at device creation time.
Register it later, in the container post_load handler, after all vmstate
that may affect regions and mapping boundaries has been loaded. The
post_load registration will cause the listener to invoke its callback on
each flat section, and the calls will match the mappings remembered by the
kernel.

The listener calls a special dma_map handler that passes the new VA of each
section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
handler at the end.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: discard old DMA vaddr

In the container pre_save handler, discard the virtual addresses in DMA
mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest RAM will be
remapped at a different VA after in new QEMU. DMA to already-mapped
pages continues.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: preserve descriptors

At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
the saved descriptors.

During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped. The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: register container for cpr

Register a legacy container for cpr-transfer, replacing the generic CPR
register call with a more specific legacy container register call. Add a
blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.

This is mostly boiler plate. The fields to to saved and restored are added
in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

migration: lower handler priority

Define a vmstate priority that is lower than the default, so its handlers
run after all default priority handlers. Since 0 is no longer the default
priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT.

CPR for vfio will use this to install handlers for containers that run
after handlers for the devices that they contain.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

migration: cpr helpers

Add the cpr_incoming_needed, cpr_open_fd, and cpr_resave_fd helpers,
for use when adding cpr support for vfio and iommufd.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: mark posted writes in region write callbacks

For vfio-user, the region write implementation needs to know if the
write is posted; add the necessary plumbing to support this.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-5-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: add per-region fd support

For vfio-user, each region has its own fd rather than sharing
vbasedev's. Add the necessary plumbing to support this, and use the
correct fd in vfio_region_mmap().

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-4-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: export PCI helpers needed for vfio-user

The vfio-user code will need to re-use various parts of the vfio PCI
code. Export them in hw/vfio/pci.h, and rename them to the vfio_pci_*
namespace.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-2-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

s390: implementing CHSC SEI for AP config change

Handle interception of the CHSC SEI instruction for requests
indicating the guest's AP configuration has changed.

If configuring --without-default-devices, hw/s390x/ap-stub.c
was created to handle such circumstance. Also added the
following to hw/s390x/meson.build if CONFIG_VFIO_AP is
false, it will use the stub file.

Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250609164418.17585-5-rreyes@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio/ap: Storing event information for an AP configuration change event

These functions can be invoked by the function that handles interception
of the CHSC SEI instruction for requests indicating the accessibility of
one or more adjunct processors has changed.

Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250609164418.17585-4-rreyes@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio/ap: store object indicating AP config changed in a queue

Creates an object indicating that an AP configuration change event
has been received and stores it in a queue. These objects will later
be used to store event information for an AP configuration change
when the CHSC instruction is intercepted.

Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250609164418.17585-3-rreyes@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio/ap: notification handler for AP config changed event

Register an event notifier handler to process AP configuration
change events by queuing the event and generating a CRW to let
the guest know its AP configuration has changed

Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250609164418.17585-2-rreyes@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Fix instance_size of VFIO_PCI_BASE

Currently the final instance_size of VFIO_PCI_BASE is sizeof(PCIDevice).
It should be sizeof(VFIOPCIDevice), VFIO_PCI uses same structure as
base class VFIO_PCI_BASE, so no need to set its instance_size explicitly.

This isn't catastrophic only because VFIO_PCI_BASE is an abstract class.

Fixes: d4e392d0a99b ("vfio: add vfio-pci-base class")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250611024228.423666-1-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: Fix vfio_listener_commit()

It's wrong to call into listener_begin callback in vfio_listener_commit().
Currently this impacts vfio-user.

Fixes: d9b7d8b6993b ("vfio/container: pass listener_begin/commit callbacks")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250609115433.401775-1-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

Merge tag 'pull-10.1-maintainer-may-2025-070625-1' of https://gitlab.com/stsquad/qemu into staging

maintainer updates for May (testing, plugins)

  - expose ~/.cache/qemu to container builds
  - disable debug info in CI
  - allow boot.S to handle target el mode selection
  - new arguments for ips plugin
  - cleanup assets in size_memop
  - fix include guard in gdbstub
  - introduce qGDBServerVersion gdbstub query
  - update gdb aarch64-core.xml to support bitfields

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmhEXc4ACgkQ+9DbCVqe
# KkT3vwf9GtMoVDBWqWHwdV6H3rblP0k3mkApY4pTkFFSL93qApDK1gAKoklymPHJ
# 6agAWn/MmpqguB7yn7TnBEiJyW9CEq0DeWTz9ivPPh5vfm/2MMaXinVd4yH+GbTL
# uTuJg4EeRcSj8q4N4h+gROSHkH3mVOe+JlyakRKZ/PZChqjY1WRC/Hm2QdHojxlS
# xQBZe4Nip/mafm4yAlnyRVRbaSctmc3/xE/MomkVT+8JMdVt6yWE0HT/nIEFW6/6
# psHoiV4XfROIWj5qMAWHVLekDrsqxJx8uiGv9o3+zKdhDhRZw3Oa5EE5N/oE8KmM
# 0s/9usRvtVD0kPh9YTfjEHWHkbPadA==
# =X63M
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 07 Jun 2025 11:42:06 EDT
# gpg:                using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* tag 'pull-10.1-maintainer-may-2025-070625-1' of https://gitlab.com/stsquad/qemu:
  gdbstub: update aarch64-core.xml
  gdbstub: Implement qGDBServerVersion packet
  gdbstub: assert earlier in handle_read_all_regs
  include/gdbstub: fix include guard in commands.h
  include/exec: fix assert in size_memop
  contrib/plugins: allow setting of instructions per quantum
  contrib/plugins: add a scaling factor to the ips arg
  tests/qtest: Avoid unaligned access in IGB test
  tests/tcg: make aarch64 boot.S handle different starting modes
  gitlab: disable debug info on CI builds
  tests/docker: expose $HOME/.cache/qemu as docker volume

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

gdbstub: update aarch64-core.xml

Update aarch64-core.xml to include field definitions for PSTATE, which
in gdb is modelled in the cpsr (current program status register)
pseudo-register, named after the actual cpsr register in armv7.

Defining the fields layout of the register allows easy inspection of for
example, the current exception level (EL):

For example. Before booting a Linux guest, EL=2, but after booting and
Ctrl-C'ing in gdb, we get EL=0:

  (gdb) info registers $cpsr
  cpsr           0x20402009          [ SP EL=2 BTYPE=0 PAN C ]
  (gdb) cont
  Continuing.
  ^C
  Thread 2 received signal SIGINT, Interrupt.
  0x0000ffffaaff286c in ?? ()
  (gdb) info registers $cpsr
  cpsr           0x20001000          [ EL=0 BTYPE=0 SSBS C ]

The aarch64-core.xml has been updated to match exactly the version
retrieved from upstream gdb, retrieved in 2025-05-19 from HEAD commit
9f4dc0b137c86f6ff2098cb1ab69442c69d6023d.

Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/features/aarch64-core.xml;h=b8046510b9a085d30463d37b3ecc8d435f5fb7a4;hb=HEAD
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20250519-gdbstub-aarch64-pstate-xml-v1-1-b4dbe87fe7c6@linaro.org>
[AJB: expanded upstream link]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-18-alex.bennee@linaro.org>

gdbstub: Implement qGDBServerVersion packet

This commit adds support for the `qGDBServerVersion` packet to the qemu
gdbstub which could be used by clients to detect the QEMU version
(and, e.g., use a workaround for known bugs).

This packet is not documented/standarized by GDB but it was implemented
by LLDB gdbstub [0] and is helpful for projects like Pwndbg [1].

This has been implemented by Patryk, who I included in Co-authored-by
and who asked me to send the patch.

[0] https://lldb.llvm.org/resources/lldbgdbremote.html#qgdbserverversion
[1] https://github.com/pwndbg/pwndbg/issues/2648

Co-authored-by: Patryk 'patryk4815' Sondej <patryk.sondej@gmail.com>
Signed-off-by: Dominik 'Disconnect3d' Czarnota <dominik.b.czarnota@gmail.com>
Message-Id: <20250403191340.53343-1-dominik.b.czarnota@gmail.com>
[AJB: fix include, checkpatch linewrap]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-17-alex.bennee@linaro.org>

gdbstub: assert earlier in handle_read_all_regs

When things go wrong we want to assert on the register that failed to
be able to figure out what went wrong.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-16-alex.bennee@linaro.org>

include/gdbstub: fix include guard in commands.h

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-15-alex.bennee@linaro.org>

include/exec: fix assert in size_memop

We can handle larger sized memops now, expand the range of the assert.

Fixes: 4b473e0c60 (tcg: Expand MO_SIZE to 3 bits)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-14-alex.bennee@linaro.org>

contrib/plugins: allow setting of instructions per quantum

The default is we update time every 1/10th of a second or so. However
for some cases we might want to update time more frequently. Allow
this to be set via the command line through the ipq argument.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-7-alex.bennee@linaro.org>

contrib/plugins: add a scaling factor to the ips arg

It's easy to get lost in zeros while setting the numbers of
instructions per second. Add a scaling suffix to make things simpler.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-6-alex.bennee@linaro.org>

tests/qtest: Avoid unaligned access in IGB test

../tests/qtest/libqos/igb.c:106:5: runtime error: load of misaligned address 0x562040be8e33 for type 'uint32_t', which requires 4 byte alignment

Instead of straight casting the uint8_t array, we can use ldl_le_p and
lduw_l_p to assure the unaligned access works properly against
uint32_t and uint16_t.

Signed-off-by: Nabih Estefan <nabihestefan@google.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250429155621.2028198-1-nabihestefan@google.com>
[AJB: fix commit message, remove unneeded casts]
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Message-ID: <20250603110204.838117-5-alex.bennee@linaro.org>

tests/tcg: make aarch64 boot.S handle different starting modes

Currently the boot.S code assumes everything starts at EL1. This will
break things like the memory test which will barf on unaligned memory
access when run at a higher level.

Adapt the boot code to do some basic verification of the starting mode
and the minimal configuration to move to the lower exception levels.
With this we can run the memory test with:

  -M virt,secure=on
  -M virt,secure=on,virtualization=on
  -M virt,virtualisation=on

If a test needs to be at a particular EL it can use the semihosting
command line to indicate the level we should execute in.

Cc: Julian Armistead <julian.armistead@linaro.org>
Cc: Jim MacArthur <jim.macarthur@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-4-alex.bennee@linaro.org>

gitlab: disable debug info on CI builds

Our default build enables debug info which adds hugely to the size of
the builds as well as the size of cached objects. Disable debug info
across the board to save space and reduce pressure on the CI system.
We still have a number of builds which explicitly enable debug and
related extra asserts like --enable-debug-tcg.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-3-alex.bennee@linaro.org>

tests/docker: expose $HOME/.cache/qemu as docker volume

If you want to run functional tests we should share .cache/qemu so we
don't force containers to continually re-download images. We also move
ccache to use this shared area.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250603110204.838117-2-alex.bennee@linaro.org>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* futex: support Windows
* qemu-thread: Avoid futex abstraction for non-Linux
* migration, hw/display/apple-gfx: replace QemuSemaphore with QemuEvent
* rust: bindings for Error
* hpet, rust/hpet: return errors from realize if properties are incorrect
* rust/hpet: Drop BqlCell wrapper for num_timers
* target/i386: Emulate ftz and denormal flag bits correctly
* i386/kvm: Prefault memory on page state change

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhC4AgUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroP09wf+K9e0TaaZRxTsw7WU9pXsDoYPzTLd
# F5CkBZPY770X1JW75f8Xw5qKczI0t6s26eFK1NUZxYiDVWzW/lZT6hreCUQSwzoS
# b0wlAgPW+bV5dKlKI2wvnadrgDvroj4p560TS+bmRftiu2P0ugkHHtIJNIQ+byUQ
# sWdhKlUqdOXakMrC4H4wDyIgRbK4CLsRMbnBHBUENwNJYJm39bwlicybbagpUxzt
# w4mgjbMab0jbAd2hVq8n+A+1sKjrroqOtrhQLzEuMZ0VAwocwuP2Adm6gBu9kdHV
# tpa8RLopninax3pWVUHnypHX780jkZ8E7zk9ohaaK36NnWTF4W/Z41EOLw==
# =Vs6V
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 06 Jun 2025 08:33:12 EDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (31 commits)
  tests/tcg/x86_64/fma: add test for exact-denormal output
  target/i386: Wire up MXCSR.DE and FPUS.DE correctly
  target/i386: Use correct type for get_float_exception_flags() values
  target/i386: Detect flush-to-zero after rounding
  hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
  migration/postcopy: Replace QemuSemaphore with QemuEvent
  migration/colo: Replace QemuSemaphore with QemuEvent
  migration: Replace QemuSemaphore with QemuEvent
  qemu-thread: Document QemuEvent
  qemu-thread: Use futex if available for QemuLockCnt
  qemu-thread: Use futex for QemuEvent on Windows
  qemu-thread: Avoid futex abstraction for non-Linux
  qemu-thread: Replace __linux__ with CONFIG_LINUX
  futex: Support Windows
  futex: Check value after qemu_futex_wait()
  i386/kvm: Prefault memory on page state change
  rust: make TryFrom macro more resilient
  docs: update Rust module status
  rust/hpet: Drop BqlCell wrapper for num_timers
  rust/hpet: return errors from realize if properties are incorrect
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'python-pull-request' of https://gitlab.com/jsnow/qemu into staging

Python Pull Request

Add QAPI and QAPI doc files to python static analysis testing regime,
this time for real, probably

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAmhB388ACgkQfe+BBqr8
# OQ6lMA//WJtSr57ADW5k5zcRMxV7k//erYFkjgXbTh7b9DDblMwNVhYr5lqJbEvS
# V5OChW32++QIO5Y4cBhzbzxFTJXbAYzyg3UATCkH2kRbd139bqdAtsnsaFmoHmLP
# c8KAggT1+hIb7JIVkFiFccMsdCeFwXwQoS5Nk7w95H9cxxYUj/O9qbRuCN+elg/e
# mX4zaq6F2umTx0EdD35DlBPrPPyRsdlVWKUqh8f5KaAGPOelGyvbgwrXU2MT7ewG
# JXcRoYzn/9J2KSboiFY0MjIKqDuhoMdCnbSNpRNGgClJRa+VZEBPFClMe1YSXw0m
# J3kQMYeqm5S1GUG+ZrBTICY6Ch8jNq2kb3ua707JJWdYmd9gq0poF/P7gaRVbyAL
# 5UdYVVgtH/3xve2LGe0guj3v5kTK7Vo6dApwj8pRHrBWWOgAG0UgGseOJgndfCIx
# PQRsF2T4YoVdjiGB46EIgBmoFI+VJGwFRlvb6WZ0YmPedi7MuUvWmo0lbgDkaTO+
# MMqsWxShTY+xwnSFgtl1iHOAdfT6jiHcn1n+hZrGpvF492XRjW02zKiDSZECqSz5
# lg51+OaDc2HwS65sYyFb4GD7yF/PcdOj7MG/Ij9dx0GoM9/HmcVAHyRt45QNgxzc
# N7Xx6GFGs7puDoE/pSoauFtGC8XeR6Cx0HfBcXYGaJcJEq6N4yw=
# =IVAr
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 05 Jun 2025 14:19:59 EDT
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full]
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* tag 'python-pull-request' of https://gitlab.com/jsnow/qemu:
  qapi: delete un-needed python static analysis configs
  python: Drop redundant warn_unused_configs = True
  python: add qapi static analysis tests
  python: update missing dependencies from minreqs
  docs/qapidoc: linting fixes
  qapi: Add some pylint ignores

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

tests/tcg/x86_64/fma: add test for exact-denormal output

Add some fma test cases that check for correct handling of FTZ and
for the flag that indicates that the input denormal was consumed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250519145114.2786534-5-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Wire up MXCSR.DE and FPUS.DE correctly

The x86 DE bit in the FPU and MXCSR status is supposed to be set
when an input denormal is consumed. We didn't previously report
this from softfloat, so the x86 code either simply didn't set
the DE bit or else incorrectly wired it up to denormal_flushed,
depending on which register you looked at.

Now we have input_denormal_used we can wire up these DE bits
with the semantics they are supposed to have.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20250519145114.2786534-4-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Use correct type for get_float_exception_flags() values

The softfloat get_float_exception_flags() function returns 'int', but
in various places in target/i386 we incorrectly store the returned
value into a uint8_t. This currently has no ill effects because i386
doesn't care about any of the float_flag enum values above 0x40.
However, we want to start using float_flag_input_denormal_used, which
is 0x4000.

Switch to using 'int' so that we can handle all the possible valid
float_flag_* values. This includes changing the return type of
save_exception_flags() and the argument to merge_exception_flags().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250519145114.2786534-3-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Detect flush-to-zero after rounding

The Intel SDM section 10.2.3.3 on the MXCSR.FTZ bit says that we
flush outputs to zero when we detect underflow, which is after
rounding. Set the detect_ftz flag accordingly.

This allows us to enable the test in fma.c which checks this
behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250519145114.2786534-2-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent

sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted
into QemuEvent, which is more specialized for such a use case.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250529-event-v5-10-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

migration/postcopy: Replace QemuSemaphore with QemuEvent

thread_sync_sem is an one-shot event so it can be converted into
QemuEvent, which is more lightweight.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250529-event-v5-9-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

migration/colo: Replace QemuSemaphore with QemuEvent

colo_exit_sem and colo_incoming_sem represent one-shot events so they
can be converted into QemuEvent, which is more lightweight.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250529-event-v5-8-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

migration: Replace QemuSemaphore with QemuEvent

pause_event can utilize qemu_event_reset() to discard events.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250529-event-v5-7-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-thread: Document QemuEvent

Document QemuEvent to help choose an appropriate synchronization
primitive.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20250529-event-v5-12-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-thread: Use futex if available for QemuLockCnt

This unlocks the futex-based implementation of QemuLockCnt to Windows.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20250529-event-v5-6-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-thread: Use futex for QemuEvent on Windows

Use the futex-based implementation of QemuEvent on Windows to
remove code duplication and remove the overhead of event object
construction and destruction.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250526-event-v4-6-5b784cc8e1de@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-thread: Avoid futex abstraction for non-Linux

qemu-thread used to abstract pthread primitives into futex for the
QemuEvent implementation of POSIX systems other than Linux. However,
this abstraction has one key difference: unlike futex, pthread
primitives require an explicit destruction, and it must be ordered after
wait and wake operations.

It would be easier to perform destruction if a wait operation ensures
the corresponding wake operation finishes as POSIX semaphore does, but
that requires to protect state accesses in qemu_event_set() and
qemu_event_wait() with a mutex. On the other hand, real futex does not
need such a protection but needs complex barrier and atomic operations
to ensure ordering between the two functions.

Add special implementations of qemu_event_set() and qemu_event_wait()
using pthread primitives. qemu_event_wait() will ensure qemu_event_set()
finishes, and these functions will avoid complex barrier and atomic
operations to ensure ordering between them.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
Link: https://lore.kernel.org/r/20250526-event-v4-5-5b784cc8e1de@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-thread: Replace __linux__ with CONFIG_LINUX

scripts/checkpatch.pl warns for __linux__ saying "architecture specific
defines should be avoided".

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250526-event-v4-4-5b784cc8e1de@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

futex: Support Windows

Windows supports futex-like APIs since Windows 8 and Windows Server
2012.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20250529-event-v5-2-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

futex: Check value after qemu_futex_wait()

futex(2) - Linux manual page
https://man7.org/linux/man-pages/man2/futex.2.html
> Note that a wake-up can also be caused by common futex usage patterns
> in unrelated code that happened to have previously used the futex
> word's memory location (e.g., typical futex-based implementations of
> Pthreads mutexes can cause this under some conditions). Therefore,
> callers should always conservatively assume that a return value of 0
> can mean a spurious wake-up, and use the futex word's value (i.e.,
> the user-space synchronization scheme) to decide whether to continue
> to block or not.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20250529-event-v5-1-53b285203794@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/kvm: Prefault memory on page state change

A page state change is typically followed by an access of the page(s) and
results in another VMEXIT in order to map the page into the nested page
table. Depending on the size of page state change request, this can
generate a number of additional VMEXITs. For example, under SNP, when
Linux is utilizing lazy memory acceptance, memory is typically accepted in
4M chunks. A page state change request is submitted to mark the pages as
private, followed by validation of the memory. Since the guest_memfd
currently only supports 4K pages, each page validation will result in
VMEXIT to map the page, resulting in 1024 additional exits.

When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the
size of the page state change in order to pre-map the pages and avoid the
additional VMEXITs. This helps speed up boot times.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust: make TryFrom macro more resilient

If the enum includes values such as "Ok", "Err", or "Error", the TryInto
macro can cause errors. Be careful and qualify identifiers with the full
path, or in the case of TryFrom<>::Error do not use the associated type
at all.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

docs: update Rust module status

error is new; offset_of is gone.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust/hpet: Drop BqlCell wrapper for num_timers

Now that the num_timers field is initialized as a property, someone may
change its default value using qdev_prop_set_uint8(), but the value is
fixed after the Rust code sees it first. Since there is no need to modify
it after realize(), it is not to be necessary to have a BqlCell wrapper.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250520152750.2542612-4-zhao1.liu@intel.com
[Remove .into() as well. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust/hpet: return errors from realize if properties are incorrect

Match the code in hpet.c; this also allows removing the
BqlCell from the num_timers field.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hpet: return errors from realize if properties are incorrect

Do not silently adjust num_timers, and fail if intcap is 0.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hpet: adjust VMState for consistency with Rust version

No functional change intended.

Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust/hpet: change type of num_timers to usize

Remove the need to convert after every read of the BqlCell. Because the
vmstate uses a u8 as the size of the VARRAY, this requires switching
the VARRAY to use num_timers_save; which in turn requires ensuring that
the num_timers_save is always there. For simplicity do this by
removing support for version 1, which QEMU has not been producing for
~15 years.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust: qdev: support returning errors from realize

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust: qemu-api: add tests for Error bindings

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust: qemu-api: add bindings to Error

Provide an implementation of std::error::Error that bridges the Rust
anyhow::Error and std::panic::Location types with QEMU's Error*.

It also has several utility methods, analogous to error_propagate(),
that convert a Result into a return value + Error** pair.  One important
difference is that these propagation methods *panic* if *errp is NULL,
unlike error_propagate() which eats subsequent errors[1].  The reason
for this is that in C you have an error_set*() call at the site where
the error is created, and calls to error_propagate() are relatively rare.

In Rust instead, even though these functions do "propagate" a
qemu_api::Error into a C Error**, there is no error_setg() anywhere that
could check for non-NULL errp and call abort().  error_propagate()'s
behavior of ignoring subsequent errors is generally considered weird,
and there would be a bigger risk of triggering it from Rust code.

[1] This is actually a violation of the preconditions of error_propagate(),
    so it should not happen.  But you never know...

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/error: make func optional

The function name is not available in Rust, so make it optional.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/error: allow non-NUL-terminated err->src

Rust makes the current file available as a statically-allocated string,
but without a NUL terminator. Allow this by storing an optional maximum
length in the Error.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/error: expose Error definition to Rust code

This is used to preserve the file and line in a roundtrip from
C Error to Rust and back to C.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

subprojects: add the foreign crate

This is a cleaned up and separated version of the patches at
https://lore.kernel.org/all/20240701145853.1394967-4-pbonzini@redhat.com/
https://lore.kernel.org/all/20240701145853.1394967-5-pbonzini@redhat.com/

Its first user will be the Error bindings; for example a QEMU Error ** can be
converted to a Rust Option using

unsafe { Option::<Error>::from_foreign(c_error) }

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

subprojects: add the anyhow crate

This is a standard replacement for Box<dyn Error> which is more efficient (it only
occcupies one word) and provides a backtrace of the error. This could be plumbed
into &error_abort in the future.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qapi: delete un-needed python static analysis configs

Since the previous commit, python/setup.cfg applies to scripts/qapi/ as
well. Configuration files in scripts/qapi/ override python/setup.cfg.

scripts/qapi/.flake8 and scripts/qapi/.isort.cfg actually match
python/setup.cfg exactly, and can go.

The differences between scripts/qapi/mypy.ini and python/setup.cfg are
harmless: namespace_packages being set to True is a requirement for the
PEP420 nested package structure of QEMU but not for scripts/qapi, but
has no effect on type checking the QAPI code. warn_unused_ignores is
used in python/ to be able to target a wide variety of mypy versions;
some of which that have added new ignore categories that are not present
in older versions.

Ultimately, scripts/qapi/mypy.ini can be removed without any real change
in behavior to how mypy enforces type safety there.

The pylint config is being left in place because the settings differ
enough from the python/ directory settings that we need a chit-chat on
how to merge them O:-)

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-7-jsnow@redhat.com

python: Drop redundant warn_unused_configs = True

strict = True implies warn_unused_configs = True.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-6-jsnow@redhat.com

python: add qapi static analysis tests

Update the python tests to also check QAPI and the QAPI Sphinx
extensions. The docs/sphinx/qapidoc_legacy.py file is not included in
these checks, as it is destined for removal soon. mypy is also not
called on the QAPI Sphinx extensions, owing to difficulties supporting
Sphinx 3.x - 8.x while maintaining static type checking support. mypy
*is* called on all of the QAPI tools themselves, though.

flake8, isort and mypy use the tool configuration from the existing
python directory (in setup.cfg). pylint continues to use the special
configuration located in scripts/qapi/ - that configuration is more
permissive. If we wish to unify the two configurations, that's a
separate series and a discussion for a later date.

The list of pylint ignores is also updated, owing again to the wide
window of pylint version support: newer versions require pragmas to
occasionally silence the "too many positional arguments" warning, but
older versions do not have such a warning category and will instead yelp
about an unrecognized option. Silence that warning, too.

As a result of this patch, one would be able to run any of the following
tests locally from the qemu.git/python directory and have it cover the
QAPI tooling as well. All of the following options run the python tests,
static analysis tests, and linter checks; but with different
combinations of dependencies and interpreters.

- "make check-minreqs" Run tests specifically under our oldest supported
  Python and our oldest supported dependencies. This is the test that
  runs on GitLab as "check-python-minreqs". This helps ensure we do not
  regress support on older platforms accidentally.

- "make check-tox" Runs the tests under the newest supported
  dependencies, but under each supported version of Python in turn. At
  time of writing, this is Python 3.8 to 3.13 inclusive. This test helps
  catch bleeding-edge problems before they become problems for developer
  workstations. This is the GitLab test "check-python-tox" and is an
  optionally run, may-fail test due to the unpredictable nature of new
  dependencies being released into the ecosystem that may cause
  regressions.

- "make check-dev" Runs the tests under the newest supported
  dependencies using whatever version of Python the user happens to have
  installed. This is a quick convenience check that does not map to any
  particular GitLab test.

  (Note! check-dev may be busted on Fedora 41 and bleeding edge versions
  of setuptools. That's unrelated to this patch and I'll address it
  separately and soon. Thank you for your patience, --mgmt)

Finally, finally, finally: this means that QAPI tooling will be linted
and type-checked from the GitLab pipelines.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-5-jsnow@redhat.com
[Edited license choice per review --js]
Signed-off-by: John Snow <jsnow@redhat.com>

python: update missing dependencies from minreqs

We pin all dependencies for the "check-minreqs" test because pip lacks a
dependency resolver that installs "the oldest possible package that
meets dependency criteria". So, in order to test our stated minimum
requirements, we pin all of our dependencies (and their dependencies,
transitively) at the oldest possible versions that still work and pass
tests; proving that our minimum requirements are correct.

(It also ensures no new features accidentally sneak in from developers
on newer platforms.)

A few transitive dependencies were omitted from the pinned dependency
file by accident; as a result, pip's dependency solver can pull in newer
dependencies, which we don't want. This patch corrects the previous
oversight and pins the missing dependencies.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-4-jsnow@redhat.com

docs/qapidoc: linting fixes

This restores the linting baseline in qapidoc. The order of some imports
change slightly here due to configuring isort a little better:
previously, isort was having difficulty understanding that "compat" and
"qapidoc_legacy" were local modules because docs/sphinx "isn't a python
package". Configuring this manually, isort chooses a different import
ordering, which _is_ intentional here.

Also: extra ignores are added for pylint. The most recent versions of
pylint don't require these ignores, but the oldest versions we support
do, so in the extra ignores go.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-3-jsnow@redhat.com

qapi: Add some pylint ignores

This restores the linting baseline in QAPI.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20250604200354.459501-2-jsnow@redhat.com

Merge tag 'pull-vfio-20250605' of https://github.com/legoater/qemu into staging

vfio queue:

* Fixed OpRegion detection in IGD
* Added prerequisite rework for IOMMU nesting support
* Added prerequisite rework for vfio-user
* Added prerequisite rework for VFIO live update
* Modified memory_get_xlat_addr() to return a MemoryRegion

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmhBWBgACgkQUaNDx8/7
# 7KEg8RAAyNuFzKs1dUHc24QeApjnd56PF3DhmjT++19hh2VH/CpKeJkWnNuWQupo
# 7yqQmYuxYMrHrh0ncdv5S+sIU2fHGMuxsL4d129H3/BPaAr92Zgtk2ID5deFTG9c
# Ns5sC/1Z6UyRqgh5PRxDmkfMVxyJ73dofTWyAQGNwwt5ASV876JEApMSO4smGpyy
# cu0tpya6WVaYp/Ry2MjpK1N6utr1pJgzIVWQ2ww595OtaqQMa9OD5Sepafp5kf+y
# ZqihINpMY9eGuu4olDQYcaUKThH0DAWR4Eb6ndgG9gOSh0M2YI0YygvG9q9giQzA
# WXlmM2e9ZVAULl2Y8Eb4PVybyk3U9eDK3MzI9PzKBLNdROjJNwNK9ahjtFgPWN9H
# cIYnBEnTP2d1e4BOtJIoQRXdDFOQHqzzEPwFhqMLEnzu1beVRnnt8SiYPKV/pEO0
# ZEAzWka7WN27DDoqgSNzc8ptIzbM6yO66dvLwOhXyr+WyqVaiehxhvfZiEbpeIWa
# 6LuCnyJkgEcAX7I7BaqZxAVvBqwR0z0TElfxadAj6YXgjVEUTahaBV+6M7bBDoid
# BlXTFBrdhlTOjrzV0LkZe9ac9VbxPc9fW/uGoYntD0cRsuWqpDpgNoDlmHDjVudz
# b4TCVksIsfrVkNqQclXfYuSNMZV0KwBADD1wVqZ42nyx1KcgqMQ=
# =tHwb
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 05 Jun 2025 04:40:56 EDT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg:                 aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20250605' of https://github.com/legoater/qemu:
  vfio: move vfio-cpr.h
  vfio: vfio_find_ram_discard_listener
  MAINTAINERS: Add reviewer for CPR
  vfio/iommufd: Save vendor specific device info
  vfio/iommufd: Implement [at|de]tach_hwpt handlers
  vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
  backends/iommufd: Add a helper to invalidate user-managed HWPT
  vfio/container: pass MemoryRegion to DMA operations
  vfio: return mr from vfio_get_xlat_addr
  vfio/igd: Fix incorrect error propagation in vfio_pci_igd_opregion_detect()
  vfio/iommufd: Add comment emphasizing no movement of hiod->realize() call
  vfio: refactor out IRQ signalling setup
  vfio: move config space read into vfio_pci_config_setup()
  vfio: move more cleanup into vfio_pci_put_device()
  vfio: add more VFIOIOMMUClass docs
  vfio/igd: OpRegion not found fix error typo

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- Deadlock fixes: Do not drain while holding the graph lock
- qdev-properties-system: Fix assertion failure in set_drive_helper()
- iotests: fix 240

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCgAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmhAiH8RHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9asxBAAniUnM2ysT85wgi1+KUVcURYJWAOTyHUK
# CxKQFXALeNYb1of4OEvFGxTJV9fIi7lY2P6Fh+ANUvAk6r8mGk7PKTV7qyJcv0r0
# Xu5BXPRBtOVeQ1QtWc36NhUJ5Oo9AZdutXKuHtt0FjlL5bxOvwY40ddDhQcg0dWF
# H4Eozi9oPACCsjbkHU0JAkMAS9Vvn4FNuDjzCfu1AlAKQnY64xRwVQwQeOC5WzvB
# 6vUs0W/ZZS5T30rtdgXtRA+00CIPC00cF1DbeL9cZEN4Rkux7JPoosCQq8lZ9YsR
# EPsHbSve6cgJP/KB1UzBjcoKI4e+8Z3KBaYOC30F65dU6e7N1wZMjCHHK/gt5bxs
# 48qWautEyot1VKBHeXZQkqR8OXk5GlyfMnQfPre6gMaAJ4H6z8GHBwxidsB9G1Da
# 27tmpZP1DyPjcH0Btz+DmhFTABaG6pgRamDmdHNJdkBX1qydZ6A1UYKf0KZRsEIu
# B43dIJ4fL4riTc+vkR0SlakQvGNAvv559uvblkDp0/2wdUzE1U7g8+tuSrsP5I1x
# BMjPPgdV5iiPvOMEO0dl1HLGZi7ORd/3FJfzvWkzWlnw6ByArXmHceXGIvhgoyjR
# iT6XwmJ85Sl0F/3HlXgcgI86AnpieE0PE8nw3gBuw0rZFJChQuHUzxokLZ88U9VQ
# UePwpYPDn58=
# =tetv
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 04 Jun 2025 13:55:11 EDT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (24 commits)
  hw/core/qdev-properties-system: Add missing return in set_drive_helper()
  iotests: fix 240
  block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()
  iotests/graph-changes-while-io: add test case with removal of lower snapshot
  iotests/graph-changes-while-io: remove image file after test
  block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
  blockdev: drain while unlocked in external_snapshot_action()
  blockdev: drain while unlocked in internal_snapshot_action()
  block: move drain outside of quorum_del_child()
  block: move drain outside of bdrv_root_unref_child()
  block: move drain outside of quorum_add_child()
  block: move drain outside of bdrv_attach_child()
  block: move drain outside of bdrv_root_attach_child()
  block: move drain outside of bdrv_set_backing_hd_drained()
  block: move drain outside of bdrv_attach_child_common(_abort)()
  block: move drain outside of bdrv_try_change_aio_context()
  block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
  block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
  block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)
  block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

vfio: move vfio-cpr.h

Move vfio-cpr.h to include/hw/vfio, because it will need to be included by
other files there.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1748546679-154091-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: vfio_find_ram_discard_listener

Define vfio_find_ram_discard_listener as a subroutine so additional calls to
it may be added in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1748546679-154091-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

MAINTAINERS: Add reviewer for CPR

CPR is integrated with live migration, and has the same maintainers.
But, add a CPR section to add a reviewer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1748546679-154091-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/iommufd: Save vendor specific device info

Some device information returned by ioctl(IOMMU_GET_HW_INFO) are vendor
specific. Save them as raw data in a union supporting different vendors,
then vendor IOMMU can query the raw data with its fixed format for
capability directly.

Because IOMMU_GET_HW_INFO is only supported in linux, so declare those
capability related structures with CONFIG_LINUX.

Suggested-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-5-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/iommufd: Implement [at|de]tach_hwpt handlers

Implement [at|de]tach_hwpt handlers in VFIO subsystem. vIOMMU
utilizes them to attach to or detach from hwpt on host side.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-4-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD

Enhance HostIOMMUDeviceIOMMUFD object with 3 new members, specific
to the iommufd BE + 2 new class functions.

IOMMUFD BE includes IOMMUFD handle, devid and hwpt_id. IOMMUFD handle
and devid are used to allocate/free ioas and hwpt. hwpt_id is used to
re-attach IOMMUFD backed device to its default VFIO sub-system created
hwpt, i.e., when vIOMMU is disabled by guest. These properties are
initialized in hiod::realize() after attachment.

2 new class functions are [at|de]tach_hwpt(). They are used to
attach/detach hwpt. VFIO and VDPA can have different implementions,
so implementation will be in sub-class instead of HostIOMMUDeviceIOMMUFD,
e.g., in HostIOMMUDeviceIOMMUFDVFIO.

Add two wrappers host_iommu_device_iommufd_[at|de]tach_hwpt to wrap the
two functions.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-3-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

backends/iommufd: Add a helper to invalidate user-managed HWPT

This helper passes cache invalidation request from guest to invalidate
stage-1 page table cache in host hardware.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-2-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: pass MemoryRegion to DMA operations

Pass through the MemoryRegion to DMA operation handlers of vfio
containers. The vfio-user container will need this later, to translate
the vaddr into an offset for the dma map vfio-user message; CPR will
also will need this.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250521215534.2688540-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: return mr from vfio_get_xlat_addr

Modify memory_get_xlat_addr and vfio_get_xlat_addr to return the memory
region that the translated address is found in. This will be needed by
CPR in a subsequent patch to map blocks using IOMMU_IOAS_MAP_FILE.

Also return the xlat offset, so we can simplify the interface by removing
the out parameters that can be trivially derived from mr and xlat.

Lastly, rename the functions to to memory_translate_iotlb() and
vfio_translate_iotlb().

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1747661203-136490-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Fix incorrect error propagation in vfio_pci_igd_opregion_detect()

In vfio_pci_igd_opregion_detect(), errp will be set when the device does
not have OpRegion or is hotplugged. This errp will be propagated to
pci_qdev_realize(), which interprets it as failure, causing unexpected
termination on devices without OpRegion like SR-IOV VFs or discrete
GPUs. Fix it by not setting errp in vfio_pci_igd_opregion_detect().

This patch also checks if the device has OpRegion before hotplug status
to prevent unwanted warning messages on non-IGD devices.

Fixes: c0273e77f2d7 ("vfio/igd: Detect IGD device by OpRegion")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2968
Reported-by: Edmund Raile <edmund.raile@protonmail.com>
Link: https://lore.kernel.org/qemu-devel/30044d14-17ec-46e3-b9c3-63d27a5bde27@gmail.com
Tested-by: Edmund Raile <edmund.raile@protonmail.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250522151636.20001-1-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/iommufd: Add comment emphasizing no movement of hiod->realize() call

The nested IOMMU support needs device and hwpt id which are generated
only after attachment. Hiod encapsulates these information in realize()
and passes to vIOMMU.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250521110301.3313877-1-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: refactor out IRQ signalling setup

This makes for a slightly more readable vfio_msix_vector_do_use()
implementation, and we will rely on this shortly.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250520150419.2172078-5-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: move config space read into vfio_pci_config_setup()

Small cleanup that reduces duplicate code for vfio-user and reduces the
size of vfio_realize(); while we're here, correct that name to
vfio_pci_realize().

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250520150419.2172078-4-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: move more cleanup into vfio_pci_put_device()

All of the cleanup can be done in the same place, and vfio-user will
want to do the same.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250520150419.2172078-3-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: add more VFIOIOMMUClass docs

Add some additional doc comments for these class methods.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250520162530.2194548-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: OpRegion not found fix error typo

Signed-off-by: Edmund Raile <edmund.raile@protonmail.com>
Reviewed-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/MFFbQoTpea_CK5ELq8oJ-a3Q57wo7ywQlrIqDvdIDKhUuCm59VUz2QzvdojO5r_wb_7SHifU0Kym3loj4eASPhdzYpjtiMCTePzyg1zrroo=@protonmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/core/qdev-properties-system: Add missing return in set_drive_helper()

Currently, changing the 'drive' property of e.g. a scsi-hd object will
result in an assertion failure if the aio context of the block node
it's replaced with doesn't match the current aio context:

> bdrv_replace_child_noperm: Assertion `bdrv_get_aio_context(old_bs) ==
> bdrv_get_aio_context(new_bs)' failed.

The problematic scenario is already detected, but a 'return' statement
was missing.

Cc: qemu-stable@nongnu.org
Fixes: d1a58c176a ("qdev: allow setting drive property for realized device")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250523070211.280498-1-f.ebner@proxmox.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: fix 240

Commit 2e8e18c2e463 ("virtio-scsi: add iothread-vq-mapping parameter")
removed the limitation that virtio-scsi devices must successfully set
the AioContext on their BlockBackends. This was made possible thanks to
the QEMU multi-queue block layer.

This change broke qemu-iotests 240, which checks that adding a
virtio-scsi device with a drive that is already in another AioContext
will fail.

Update the test to take the relaxed behavior into account. I considered
removing this test case entirely, but the code coverage still seems
valuable.

Fixes: 2e8e18c2e463 ("virtio-scsi: add iothread-vq-mapping parameter")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250529203147.180338-1-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()

Both commit ab61335025 ("block: drain from main loop thread in
bdrv_co_yield_to_drain()") and commit d05ab380db ("block: Mark drain
related functions GRAPH_RDLOCK") introduced a GLOBAL_STATE_CODE()
macro in bdrv_do_drained_end(). The assertion of being in the main
thread cannot change here, so keep only the earlier instance.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-23-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests/graph-changes-while-io: add test case with removal of lower snapshot

This case is catching potential deadlock which takes place when job-dismiss
is issued when I/O requests are processed in a separate iothread.

See https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
[FE: re-use top image and rename snap1->mid as suggested by Kevin Wolf
remove image file after test as suggested by Kevin Wolf
add type annotation for function argument to make mypy happy]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-22-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests/graph-changes-while-io: remove image file after test

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-21-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED

All of bdrv_drain_all_begin(), bdrv_drain_all() and
bdrv_drained_begin() poll and are not allowed to be called with the
block graph lock held. Mark the function as such.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-20-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockdev: drain while unlocked in external_snapshot_action()

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-19-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockdev: drain while unlocked in internal_snapshot_action()

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-18-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: move drain outside of quorum_del_child()

The quorum_del_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_del_child()
callback, which is only called in the bdrv_del_child() function, which
also runs under the graph lock.

The bdrv_del_child() function is called by qmp_x_blockdev_change().
A drained section was already introduced there by commit "block: move
drain out of quorum_add_child()".

This finally finishes moving out the drain to places that are not
under the graph lock started in "block: move draining out of
bdrv_change_aio_context() and mark GRAPH_RDLOCK".

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-17-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: move drain outside of bdrv_root_unref_child()

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

bdrv_root_unref_child() is called by:
1. blk_remove_bs(), where a drained section is introduced.
2. bdrv_unref_child(), which runs under the graph lock, so the drain
will be moved further up to its callers.
3. block_job_remove_all_bdrv(), where a drained section is introduced.

For all callers of bdrv_unref_child() and its generated
bdrv_co_unref_child() coroutine variant, a drained section is
introduced, they are not explicilty listed here. The caller
quorum_del_child() holds the graph lock, so it is not actually allowed
to drain. This will be addressed in the next commit.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-16-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: move drain outside of quorum_add_child()

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The quorum_add_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_add_child()
callback, which is only called in the bdrv_add_child() function, which
also runs under the graph lock.

The bdrv_add_child() function is called by qmp_x_blockdev_change(),
where a drained section is introduced.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-15-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: move drain outside of bdrv_attach_child()

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The function bdrv_attach_child() runs under the graph lock, so it is
not allowed to drain. It is called by:
1. replication_start()
2. quorum_add_child()
3. bdrv_open_child_common()
4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests.

In all callers, a drained section is introduced.

The function quorum_add_child() runs under the graph lock, so it is
not actually allowed to drain. This will be addressed by the following
commit.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-14-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>