git.ipfire.org Git - thirdparty/qemu.git/log

target/mips: add Octeon V3MULU instruction

V3MULU extends VMULU across the full Octeon3 multiplier state, adding rt
and queued partial products.

Return the low result while shifting the remaining accumulated limbs back
into P[0] through P[5].

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-23-philmd@linaro.org>

target/mips: add Octeon VMM0 instruction

VMM0 multiplies MPL[0] by rs, adds rt and the queued P[0] partial
product, returns the low result, and feeds that result back into MPL[0].
It sets MPL[1] to zero and clears partial products.

Include hardware-backed regression coverage for VMM0 MPL1 zeroing.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-22-philmd@linaro.org>

target/mips: add Octeon VMULU instruction

VMULU multiplies the active Octeon multiplier state by rs, adds rt and
queued partial products, returns the low result, and advances P[0]/P[1]
with carry limbs.

Expand the two-limb accumulator operation inline with TCG so the result
and partial-product state stay visible to the optimizer.

Add a mips64/mips64el linux-user TCG smoke test for representative
Octeon multiplier instruction paths.
Include hardware-backed regression coverage for MTP0 P1 zeroing.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-21-philmd@linaro.org>

target/mips: add Octeon MTP instructions

Add the MTP0, MTP1, and MTP2 forms. MTP0 loads the low Octeon3
partial-product pair from rs/rt into P[0]/P[3], MTP1 loads the middle
pair into P[1]/P[4], and MTP2 loads the high pair into P[2]/P[5].
For MTP0, also set P[1] to zero for backward compatibility with
Octeon2 VMULU.

Legacy single-source encodings have rt encoded as $zero, so the same
translator path also preserves the older Octeon behavior.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-20-philmd@linaro.org>

target/mips: add Octeon MTM instructions

Add the MTM0, MTM1, and MTM2 forms that load the Octeon3 multiplier
operand pair from rs/rt into MPL[x] and MPL[x+3], then clear the partial
products. For MPL0, also set MPL[1] to zero for backward compatibility
with Octeon2 VMULU.

Legacy single-source encodings have rt encoded as $zero, so the same
translator path also preserves the older Octeon behavior.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-19-philmd@linaro.org>

target/mips: add Octeon multiplier state

Add per-thread Octeon multiplier state for the MPL and P limb banks used
by the VMULU/VMM0/V3MULU instruction family.

Octeon3 extends the older MPL0-MPL2/P0-P2 state with high lanes
MPL3-MPL5/P3-P5, programmed by the two-source MTM/MTP forms. Represent
both banks as uint64_t arrays so the TC state matches the architected
64-bit limb layout used by Octeon68XX user-mode code.

Expose MPL/P as global TCG variables so the multiplier translators can
expand inline without helper calls.

Migrate the multiplier registers in an Octeon-only subsection so
non-Octeon CPU models do not grow migration state.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-18-philmd@linaro.org>

tcg: Introduce tcg_gen_addN_i64

Add a helper for multi-limb 64-bit addition. The helper emits native
carry-chain TCG ops when they are available and falls back to explicit
carry propagation otherwise.

This lets target translators build wider integer accumulators inline
without open-coding the same add-with-carry sequence at each use site.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-17-philmd@linaro.org>

target/mips: add Octeon ZCB and ZCBT instructions

ZCB zeros the 128-byte cache block containing the base address. ZCBT has
the same user-mode-visible memory effect for QEMU purposes.

Model both forms with a single decodetree wildcard entry, align the
address down to a 128-byte line, and store eight zero 128-bit chunks to
guest memory.

Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-16-philmd@linaro.org>

tcg: Introduce tcg_zero_i128()

Extract tcg_zero_i128() helper for re-use.

Inspired-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20260520121644.10835-1-philmd@linaro.org>
[rth: Move the function to tcg-op.c]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/mips: add Octeon SAAD instruction

SAAD is the doubleword form of SAA: it atomically adds rt to the
naturally aligned 64-bit doubleword at base and discards the old memory
value.

Route it through the common SAA/SAAD translator so the MemOp selects the
aligned doubleword transaction size.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-15-philmd@linaro.org>

target/mips: add Octeon SAA instruction

SAA atomically adds rt to the naturally aligned 32-bit word at base and
discards the old memory value.

Implement the common SAA/SAAD translator with TCG atomic_fetch_add_i64.
The MemOp selects the word or doubleword transaction size. QEMU only has
one Octeon CPU model today, so keep SAA/SAAD under the existing Octeon
instruction feature bucket instead of adding a finer-grained Octeon+
feature bit.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-14-philmd@linaro.org>

target/mips: add Octeon LWUX instruction

LWUX performs an indexed unsigned word load from base + index and
zero-extends the result into rd.

Add the decode entry and route it through the common indexed-load
translator with MO_UL.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-13-philmd@linaro.org>

target/mips: add Octeon LHUX instruction

LHUX performs an indexed unsigned halfword load from base + index and
zero-extends the result into rd.

Add the decode entry and reuse the common indexed-load translator with
MO_UW.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-12-philmd@linaro.org>

target/mips: add Octeon LBX instruction

LBX performs an indexed signed byte load from base + index and writes the
sign-extended result to rd.

Wire the existing indexed-load helper to MO_SB so Octeon user-mode
binaries can use the signed byte variant alongside the existing LBUX
path.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-11-philmd@linaro.org>

target/mips: split Octeon SEQI/SNEI decode

Decode the equality and inequality forms as explicit SEQI/SNEI
instructions rather than using shared generated SEQNEI entries.

The explicit decoder names match the architectural mnemonics, which
makes the translator entry points and trace/debug output easier to
correlate with the instruction set.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[PMD: Split SEQNE vs SEQNEI (this patch)]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-10-philmd@linaro.org>

target/mips: split Octeon SEQ/SNE decode

Decode the equality and inequality forms as explicit SEQ/SNE
instructions rather than using shared generated SEQNE entries.

The explicit decoder names match the architectural mnemonics, which
makes the translator entry points and trace/debug output easier to
correlate with the instruction set.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[PMD: Split SEQNE (this patch) vs SEQNEI]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-9-philmd@linaro.org>

target/mips: drop Octeon zero-register fast paths

EXTS, CINS, and POP route their destination writes through
gen_store_gpr(), which already discards writes to $zero. Remove the
remaining translator fast paths for destination $zero so these Octeon
instructions follow the same shape as BADDU/DMUL and the generic MIPS
translator helpers.

Add a mips64/mips64el linux-user TCG smoke test for representative
Octeon population count instruction paths.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-8-philmd@linaro.org>

target/mips: fix Octeon arithmetic destination handling

BADDU and DMUL write their results to rd, not rt. Route writes through
gen_store_gpr() so rd == $zero is handled consistently.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-7-philmd@linaro.org>

tests/tcg/mips: add Octeon instruction smoke test

Add a mips64/mips64el linux-user TCG smoke test for representative
Octeon instruction paths.

Run the test with -cpu Octeon68XX and share the source between the
mips64 and mips64el target directories.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20260520172313.23777-6-philmd@linaro.org>

target/mips: expose Octeon68XX floating-point support

Octeon68XX cores implement CP1. Advertise that in the CPU definition by
setting Config1.FP, enabling the writable Status bits, and providing the
FCR0/FCR31 defaults used by this CPU model.

This lets guests observe the expected floating-point feature bits and use
CP1 with -cpu Octeon68XX.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-5-philmd@linaro.org>

linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses

Linux/MIPS enables software fixups for user-mode unaligned scalar
accesses by default through MIPS_FIXADE/TIF_FIXADE. QEMU linux-user did
not model that ABI, so MIPS guests took fatal AdEL/AdES exceptions unless
translation was forced to use unaligned host accesses.

Key MIPS translation blocks on the linux-user unaligned policy, implement
sysmips(MIPS_FIXADE) to toggle that policy, and raise SIGBUS/BUS_ADRALN
when fixups are disabled.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-4-philmd@linaro.org>

linux-user/mips: implement sysmips(MIPS_ATOMIC_SET)

Implement the MIPS_ATOMIC_SET sysmips command as an aligned 32-bit atomic
exchange in target memory.

MIPS reports syscall errors through a separate register, so successful old
values can overlap the errno range. Write the return value and error flag
directly and return -QEMU_ESIGRETURN so the common syscall path leaves the
registers unchanged.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-3-philmd@linaro.org>

linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE)

Add the target sysmips dispatcher and implement MIPS_FLUSH_CACHE as a
successful no-op for linux-user.

Self-modifying code is handled by QEMU's normal user-mode translation
invalidation machinery, so the target ABI only needs the syscall command
to be accepted.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260520172313.23777-2-philmd@linaro.org>

hw/intc/mips_gic: Avoid Coverity complaint in VP writes

The MIPS GIC does a check for a guest error in the write path for the
SH_MAP*_VP registers which triggers a Coverity complaint because it
assigns -1 to a uint64_t. The code doesn't misbehave because the -1
case will be caught by the following OFFSET_CHECK(), but the code
could be improved:
* there is no need to special case to avoid passing 0 to ctz64(),
   because (unlike the compiler builtins) QEMU defines that this
   has a specific behaviour, returning 64
* the OFFSET_CHECK() macro will go to the "bad_offset" label and
   print an error implying that the guest wrote to an invalid
   register offset. This is misleading about the actual problem,
   which is that the guest wrote a bogus value to a valid register
   offset

Make the error check print a better log message, and avoid the
special casing on ctz64(); in passing, this should also make
Coverity happier.

CID: 1547545
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20260512111536.3437645-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

buildsys: Remove MIPS TCG backend

We removed support for MIPS host. Remove the now unreachable
TCG host code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20260511135312.38705-6-philmd@linaro.org>

buildsys: Remove MIPS KVM

We removed support for MIPS host. The KVM MIPS code
is now unreachable, remove it.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20260511135312.38705-5-philmd@linaro.org>

hw/mips: Include missing 'cpu.h' header

"target/mips/cpu.h" is indirectly pulled in via the "system/kvm.h"
header, which next commit will remove. Explicitly include the "cpu.h"
header, otherwise we'd get:

  hw/mips/mips_int.c:29:5: error: use of undeclared identifier 'MIPSCPU'
     29 |     MIPSCPU *cpu = opaque;
        |     ^
  hw/mips/mips_int.c:30:5: error: use of undeclared identifier 'CPUMIPSState'
     30 |     CPUMIPSState *env = &cpu->env;
        |     ^

  hw/mips/loongson3_virt.c:156:39: error: unknown type name 'MIPSCPU'
    156 | static uint64_t get_cpu_freq_hz(const MIPSCPU *cpu)
        |                                       ^

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20260511135312.38705-4-philmd@linaro.org>

buildsys: Remove support for MIPS hosts

MIPS host support is deprecated since commit 269ffaabc84
("buildsys: Remove support for 32-bit MIPS hosts"). Time
to remove.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20260511135312.38705-3-philmd@linaro.org>

buildsys: Remove MIPS cross containers

As mentioned in commit 269ffaabc84 ("buildsys: Remove support
for 32-bit MIPS hosts"), Debian 13 "Trixie" removed support for
MIPS.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20260511135312.38705-2-philmd@linaro.org>

docker: Remove LegacyKeyValueFormat warnings in generated files

Display lcitool changes before generated ones.

Update lcitool refresh script to not use legacy 'ENV key value'
format:
https://docs.docker.com/reference/build-checks/legacy-key-value-format/

Run "make lcitool-refresh" to update the generated container files.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>
Message-ID: <20260518102222.80735-8-philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

docker: Remove LegacyKeyValueFormat warnings in non-generated files

Manually update Dockerfiles to not use legacy 'ENV key value' format:
https://docs.docker.com/reference/build-checks/legacy-key-value-format/

This removes warnings when building / using the containers:

- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 98)
- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 64)
- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 97)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>
Reviewed-by: Brian Cain <brian.cain@oss.qualcomm.com>
Message-ID: <20260518102222.80735-7-philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

MAINTAINERS: Update email of Yong Huang

I left SmartX two weeks ago. Update my email to stay reachable.

Signed-off-by: Hyman Huang <infra.ai.cloud@bitdeer.com>
Link: https://lore.kernel.org/r/b3bd81c3d9f425bb750a76d7bad7ad0284e55123.1779178180.git.infra.ai.cloud@bitdeer.com
[peterx: fix address, s/biitdeer/bitdeer/]
Signed-off-by: Peter Xu <peterx@redhat.com>

Merge tag 'pull-vfio-20260520' of https://github.com/legoater/qemu into staging

vfio queue:

* Fix IRQ notifier return value in vfio/ap and vfio/ccw
* Fix vfio-user: reject malformed migration capabilities and avoid
  leaking a duplicate device name
* Report overflow in migration size queries
* Fix s390x cpu_models build regression
* Update libvfio-user subproject to fix compilation on newer compilers
* Update update-linux-headers.sh to support typelimits.h and inject
  VIRTIO_RING_NO_LEGACY in virtio_ring.h to fix the Windows build
* Replace abort() with g_assert_not_reached() in the vfio/pci
  interrupt handling path
* Drop superfluous inclusion of hw-error.h from vfio device files

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmoNsysACgkQUaNDx8/7
# 7KH5sw//UmO9Ky70NWk2g4DqzRi+92DQsjcS9cw/LTSmou6uemFWfF9SAun93u9V
# I7Sf8JZLP5pNkVBJqmPsXq6MAvg+fJCJx579vyKcTZvwK+SzPNkcU/Bh1Yie0Bsc
# Al45kXtewBsXDgEHIt+GXvpW6Z9kkn2fd3YWNj19Yy4FkTW56/CTDaILWdetRUx7
# OU+i9AN0qjdxBYVgFuiY+IN1GL2Mt6IPrwmqN2S52wch7d4vmC+VXKbggaiXdpg4
# G8vb8/Xr6LYWxhEN+yIoBXMpPZy7PnfRLATYX9tFkMwJsBJPpb3yat7CACOhavfT
# EWW29nRzPbSDp9vJDUWNRjrjPSe0FCm9ZNGJwx3+Gv8+d1A/KUTy1Ka01TQVGCif
# ljj4N8xFC65AC54pIeOlA1D+8wZYgQcA4+j10dSxgB+ab+WbnD47Q5NwpLqdrZHg
# C/w6Zqq4JPBR3WfYCv1+vTFjtaEhrS3WUzHYtLXt8GXHr9RPUw65DLcTvBUQdSpu
# 2TmCufpyByI1hb9xbqZKIxGx8CAUvPT1wrLFRH23RPo0pNFhR61gMBstMVppJuQ5
# E9sbAOYCMKM1rR/rI38tSF7WuWm2YwQFE0phTUycmW9e4Vdjxjy0Ej/zW41RteZB
# hWltClx9gcM6Trhcr0e0sbQVMX8GyvKK30jO+Fph4vqbEdkL4yE=
# =Ge56
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 20 May 2026 09:12:11 EDT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg:                 aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20260520' of https://github.com/legoater/qemu:
  scripts/update-linux-headers: Add typelimits.h
  vfio/migration: Detect and report overflow in migration size queries
  update-linux-headers: Inject VIRTIO_RING_NO_LEGACY in virtio_ring.h
  vfio/ccw: Return false when IRQ notifier setup fails
  vfio/ap: Return false when IRQ notifier setup fails
  vfio/pci: Replace abort() with g_assert_not_reached()
  hw/remote: update libvfio-user subproject
  vfio-user: reject malformed migration capabilities
  vfio-user: avoid leaking duplicate device name
  hw/vfio: Drop superfluous inclusion of hw-error.h
  target/s390x: restore cpu_models for system builds

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-ufs-20260520' of https://gitlab.com/jeuk20.kim/qemu into staging

Add write booster support to UFS devices

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEEUBfYMVl8eKPZB+73EuIgTA5dtgIFAmoNaIMACgkQEuIgTA5d
# tgIKABAA0u5kO369ft9W2ML5tp6oM0Cc7mkHlzpkiwYeiDasdS16PApSzTynPzbw
# IsJxSivQo7g8j4lPXxlZ9tFdiyQcjn2SqmzepEuJFsnQphICSqWvWcgbqOc7lOUn
# 2+ajftLHZVHD+lTlKwx4y3qz1i1seZwsINWwZ4yrhYbtuh9N6HO3jnQYIEYv1xn7
# Eq4lLPKO6Xuk7L2jyS3CizKvNJcmlR5aI+1kmZclMorvPo2xnHXVl8AESuKoYS9L
# yMO2474bItn56kCbomQCmd4h4t1W5QxmpejsCOjl+bjaSQ4LKA0KJjBHMvSm/mzJ
# hO2CrZoprLeD8ENWjc4g1u3rtY0/3uFEhvPjIV3qoFvyjmtgfTiaTRx4ltt7b4lz
# hHtWlSMgCNIb9Qa6kI/mrk/A7IGTPb3ld6hsqAx4CwJ78M5sNhIFg+9jB1gvAwJa
# SFBG/XijWx6yzMdG3Sbu3XB+4xC6ZrZBiqBwCK/9OrXe/B9XtebTB5WJr2TA4nZj
# zVUcYBRXfr1KeJ169sSbJKTUir7CDalIXblv7Z02zyr7NRTZ6MwW+BZ6qMAntvBv
# 0kzr1hKwb08r5rMT0ns+jWSvZ+P3977IaJkKyhCdwsbQKTE1Ztn9l/CUxAVcnjlG
# ZL14NnPk3WM4+jNYrZQ665Kas5nkl0IhO7NWSW2d/drbBCYyefc=
# =pXCB
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 20 May 2026 03:53:39 EDT
# gpg:                using RSA key 5017D831597C78A3D907EEF712E2204C0E5DB602
# gpg: Good signature from "Jeuk Kim <jeuk20.kim@samsung.com>" [unknown]
# gpg:                 aka "Jeuk Kim <jeuk20.kim@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5017 D831 597C 78A3 D907  EEF7 12E2 204C 0E5D B602

* tag 'pull-ufs-20260520' of https://gitlab.com/jeuk20.kim/qemu:
  tests/qtest: Add UFS Write Booster QTest
  hw/ufs: Add UFS Write Booster Support
  hw/ufs: Add idle operation
  hw/ufs: Modify flag handling operation
  hw/ufs: Apply UFS 4.1 Specification

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

migration/cpr: use hashtable for cpr fds

Use a GHashTable to store cpr fds to reduce the time
consumption of `cpr_find_fd` in scenarios with a large
number of fds. The time complexity for `cpr_find_fd` is
reduced from O(N) to O(1).
Keep cpr fds lookups in a GHashTable during normal runtime
while preserving the existing QLIST migration ABI. Build a
temporary QLIST from the hash table in pre_save and rebuild
the hash table from the loaded QLIST in post_load.

To demonstrate the performance improvement, we tested the total time
consumed by `cpr_find_fd` (called N times for N fds) under our real-world
business scenarios with different numbers of file descriptors. The results
are measured in nanoseconds:

| Number of FDs | Total time with QLIST (ns) | Total time with GHashTable (ns) |
|---------------|----------------------------|---------------------------------|
| 540           | 936,753                    | 393,358                         |
| 2,870         | 24,102,342                 | 2,212,113                       |
| 7,530         | 152,715,916                | 5,474,310                       |

As shown in the data, the lookup time grows exponentially with the QLIST
as the number of fds increases. With the GHashTable, the time consumption
remains linear (O(1) per lookup), significantly reducing the downtime during
the CPR process.

Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Link: https://lore.kernel.org/r/20260519134315.27997-1-hongmianquan@bytedance.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/multifd: cache channel count in multifd_send_sync_main

multifd_send_sync_main() is called once per RAM synchronization round
during live migration. It iterates over all multifd channels twice
(signal loop + wait loop), calling migrate_multifd_channels()
independently in each loop header.

Cache migrate_multifd_channels() in a local thread_count variable at
function entry, matching the pattern already used in
multifd_send_setup() and multifd_recv_setup(). This eliminates 2
redundant config lookups per sync call.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260518110112.21395-9-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/multifd: cache migrate_multifd_channels() in send/recv hot paths

multifd_send() and multifd_recv() are on the per-page-batch hot path
of live migration.  Both functions call migrate_multifd_channels()
multiple times (3-4 calls each) for modulo arithmetic in the
round-robin channel selection loop.

Each call goes through migrate_get_current() -> dereference
MigrationState -> read parameters.multifd_channels.  While each
individual call is cheap, these functions execute for every page
batch during the entire migration, easily millions of times.

Cache the return value in a local variable at function entry.  The
channel count is fixed for the duration of a migration and cannot
change mid-flight.

For multifd_send(): 3 calls reduced to 1.
For multifd_recv(): 4 calls reduced to 1.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260518110112.21395-8-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/multifd: fix off-by-one in recv channel ID validation

multifd_recv_initial_packet() validates the channel ID received from
the source against the configured number of channels. The current
check uses '>' which allows msg.id == N to pass through. This ID is
then used to index multifd_recv_state->params[msg.id], which was
allocated with g_new0(MultiFDRecvParams, N) -- an out-of-bounds
access.

A malicious or buggy source could send id == N and cause heap
corruption on the destination.

Fix by changing '>' to '>='. Also fix the error message to say
"exceeds channel count" for accuracy.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260518110112.21395-6-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/savevm: use stack-allocated bitmap in configuration_validate_capabilities

configuration_validate_capabilities() allocates a bitmap on the heap
to track source capabilities via bitmap_new()/g_free(). Since
MIGRATION_CAPABILITY__MAX is a small compile-time constant (< 64),
a heap allocation for a bitmap this small is wasteful: it adds
malloc/free overhead and a potential cache miss for a transient
8-byte allocation.

Replace with DECLARE_BITMAP() on the stack and bitmap_zero() to
initialize. This eliminates the heap round-trip entirely.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260518110112.21395-5-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/vmstate: avoid per-element heap churn in vmsd ptr marker field

For every NULL slot in a VMS_ARRAY_OF_POINTER (or every entry of a
dynamic array), the saver allocates a 1-element fake VMStateField via
g_new0 and frees it again right after the save. For arrays of
thousands of entries this is thousands of malloc/free pairs on the
hot save path.

Replace the heap-allocated marker with a stack-resident field
populated by an init helper. The caller passes a pointer to a local
VMStateField, the helper fills it in (still asserting the
precondition), and no g_free is needed.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260518110112.21395-4-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/global_state: replace strcpy("") with explicit NUL termination

Drop the unnecessary strcpy of an empty literal (and its spurious
(char *)& cast) in favor of a direct NUL store, which avoids the
libc call and hides no bugs behind a cast.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260518110112.21395-3-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Unify URIs

The migration tests have always used localhost migration and therefore
the same URI for both sides of migration. Change the listen_uri and
connect_uri into a single uri variable.

For migrations using sockets, there's the possibility of detecting the
socket address the destination side is using. For those, keep using
different variables for migrate_qmp and migrate_incoming_qmp.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-16-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Stop passing URI into migrate_start

Don't allow changing the default -incoming URI via migrate_start. The
default is now -incoming defer. If a test really needs to alter this
(such as with CPR), the target_opts variable is still available to
change the command line.

(aside from the larger goal of using defer, this change is a step
towards allowing migrate_start() to be invoked only once for all
tests)

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-15-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Use defer in dirty_limit test

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-14-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Use defer for auto-converge

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-13-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Use defer for cpr-tests

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-12-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Use defer for all tests

Change all invocations of migrate_start to use defer. The uri
parameter will be removed from that function in subsequent patches.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-11-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Remove multifd compression hook

Take advantage of the default compression method for multifd being
"none" and remove the common compression hook.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-10-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Set compression method in compression-tests

Stop calling a common function to set the multifd compression
method. The default method is "none", so the common function is not
necessary for tests that don't set compression and will be removed.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Defer by default in precopy_common

As a design direction, we're restricting the usage of the command line
option -incoming <URI>. The alternative -incoming defer should be used
instead.

Make all precopy_common tests defer by default.

Using the defer option means that QEMU will not start the incoming
migration automatically. Add the incoming QMP command. With the added
command, the invocation at the multifd_common hook becomes redundant,
so remove it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/20260505160915.25558-8-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Use a default TCP URI for precopy

Using a localhost TCP URI for testing is quite common. Set it as a
default for precopy tests that don't provide an URI.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-7-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Use precopy_unix_common for ignore-shared test

The ignore-shared test has the same code as the precopy_common test
but inverting (probably incorrectly) the order of a few event
waits. Change it to use the common code instead.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Group unix migration tests

Remove some repetition when defining unix: tests by introducing a
_common function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Set file URI by default

Most file: tests use the same URI. Make it a default in the common
function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Make file-tests defer by default

All file: tests use listen_uri="defer". Make this the default in the
common function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Move cpr transfer logic into cpr-tests.c

There's some amount of cpr-transfer logic at precopy_common, which in
retrospect was a bad idea. For just two tests, that's too much code to
be in the common function. Move it to the cpr file. We'll need this
cleanup for subsequent improvements.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20260505160915.25558-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

scripts/update-linux-headers: Add typelimits.h

Upstream Linux added include/uapi/linux/typelimits.h and includes it
from ethtool.h [1][2].

Teach update-linux-headers.sh to install that header into
standard-headers to be able to update kernel headers to versions that
include the above changes.

[1] ca9d74eb5f6a ("uapi: add INT_MAX and INT_MIN constants")
[2] a8a11e5237ae ("ethtool: uapi: Use UAPI definition of INT_MAX")

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260505081423.28326-2-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/migration: Detect and report overflow in migration size queries

VFIO migration ioctls (VFIO_DEVICE_FEATURE_MIG_DATA_SIZE and
VFIO_MIG_GET_PRECOPY_INFO) return device-estimated migration sizes as
uint64_t values. A misbehaving kernel driver could return values that
are unreasonably large, which would corrupt the size accounting used
to decide migration convergence.

This misbehavior occurred a few times when testing migration of a VM
with an assigned NVIDIA vGPU and an MLX5 VF. In some of the save
iterations, the reported precopy and stopcopy sizes were unreasonably
large (close to UINT64_MAX):

  vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 0 precopy initial size 18446744073708667040 precopy dirty size 0
  vfio_save_iterate   (4fbce62c-8ce2-4cc9-b429-41635bc94f24) precopy initial size 18446744073707618464 precopy dirty size 0
  vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 18446744073708503040 precopy initial size 18446744073707618464 precopy dirty size 0
  vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 0 precopy initial size 18446744073707618464 precopy dirty size 0
  vfio_state_pending  (0000:b1:01.0) stopcopy size 18446744073709543408 precopy initial size 0 precopy dirty size 1008

This had the effect of corrupting migration convergence, as reported
by the HMP migrate command:

  (qemu) info migrate
  Status:                 active
  Time (ms):              total=21140, setup=86, exp_down=152455434886355
  Remaining:              16 EiB
  RAM info:
    Throughput (Mbps):    967.98
    Sizes:                pagesize=4 KiB, total=4 GiB
    Transfers:            transferred=2.29 GiB, remain=4.7 MiB
      Channels:           precopy=1.91 GiB, multifd=0 B, postcopy=0 B, vfio=387 MiB
      Page Types:         normal=499427, zero=559708
    Page Rates (pps):     transfer=0, dirty=1892
    Others:               dirty_syncs=3

Add a helper to detect values that exceed INT64_MAX, which is far
beyond any realistic device state size, and report them with an error
message. Return -ERANGE from the query functions so callers can abort
the migration rather than proceeding with corrupted estimates.
However, the callers don't yet check the return value to actually stop
the migration.

Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260513094522.346314-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

update-linux-headers: Inject VIRTIO_RING_NO_LEGACY in virtio_ring.h

The kernel commit 3c4629b68dbe ("virtio: uapi: avoid usage of libc
types") changed the virtio_ring.h header and this breaks the build on
Windows which requires the uintptr_t type to cast from pointer to
integer.

Inject '#define VIRTIO_RING_NO_LEGACY' at the top of the synced header
via the update script after the include guard. This discards the code
section incompatible with Windows.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260511111913.3327672-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/ccw: Return false when IRQ notifier setup fails

vfio_ccw_register_irq_notifier() cleans up the fd handler and EventNotifier
when vfio_device_irq_set_signaling() fails, but still returns true to its
caller.

Return false after cleanup so the caller can handle the failed
registration path instead of treating it as a successful notifier setup.

Fixes: 8aaeff97acee ("vfio/ccw: Make vfio_ccw_register_irq_notifier() return a bool")
Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260510084353.58263-3-zhaoguohan@kylinos.cn
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/ap: Return false when IRQ notifier setup fails

vfio_ap_register_irq_notifier() cleans up the fd handler and EventNotifier
when vfio_device_irq_set_signaling() fails, but still returns true to its
caller.

Return false after cleanup so the caller can handle the failed
registration path instead of treating it as a successful notifier setup.

Fixes: cbd470f0aac5 ("vfio/ap: Make vfio_ap_register_irq_notifier() return a bool")
Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260510084353.58263-2-zhaoguohan@kylinos.cn
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Replace abort() with g_assert_not_reached()

This check was originally introduced in commit b3ebc10c373e
("vfio-pci: Add debug config options to disable MSI/X KVM support") as
part of a debug block to retrieve the MSI/MSIX message, and was later
moved by commit 0de70dc7bab1 ("vfio/pci: Rename MSI/X functions for
easier tracing") into the main interrupt handling path, becoming
production code.

Under normal conditions, this code path cannot be reached because the
BQL serializes all handler registration, vdev->interrupt updates, and
handler removal. Replace abort() with g_assert_not_reached(), which is
preferred nowdays, and add a comment clarifying the purpose.

Cc: Alex Williamson <alex@shazbot.org>
Acked-by: Alex Williamson <alex@shazbot.org>
Link: https://lore.kernel.org/qemu-devel/20260506152353.1657838-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/remote: update libvfio-user subproject

The currently wrapped version of libvfio-user has compilation issues on
newer compilers; bump the library version.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20260422140244.2147400-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio-user: reject malformed migration capabilities

check_migr() sets an error when the migration capability is not an object,
but still returns true. This lets version negotiation continue with an
Error set and reports the wrong capability name in the diagnostic.

Return false for the malformed capability, and report the migration
capability name.

Fixes: 36227628d824 ("vfio-user: implement message send infrastructure")
Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20260424031259.289211-1-zhaoguohan@kylinos.cn
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio-user: avoid leaking duplicate device name

vfio_user_pci_realize() assigns vbasedev->name before connecting to the
server, then assigns the same name again after installing the request
handler. The second assignment overwrites the first allocation, so only
the second string can be freed later by vfio_device_free_name().

Drop the duplicate assignment and keep the first name allocation, which is
also available on connection failures for error reporting.

Fixes: 36227628d824 ("vfio-user: implement message send infrastructure")
Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20260424032209.297458-1-zhaoguohan@kylinos.cn
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio: Drop superfluous inclusion of hw-error.h

None of these files use the hw_error() function, so there is no
need to include hw-error.h here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>
Link: https://lore.kernel.org/qemu-devel/20260428163702.3224323-1-thuth@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

target/s390x: restore cpu_models for system builds

Commit 0b83acf2f05 stated:

    Introduce a source set common to system / user. Start it
    with the files built in both sets: 'cpu_models_user.c'
    and 'gdbstub.c' No logical change intended.

Except that's not true:

    git show 0b83acf2f0 | grep cpu_models
        with the files built in both sets: 'cpu_models_user.c'
    +  'cpu_models_user.c',
    -  'cpu_models_system.c',
    -  'cpu_models_user.c',

Restore the s390x_user_ss section, move "cpu_models_user.c" back
into it, and re-add "cpu_models_system.c" to the common_system
section.

Reported-by: Cédric Le Goater <clg@redhat.com>
Fixes: 0b83acf2f05 ("target/s390x: Introduce common system/user meson source set")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260511163541.192533-1-farman@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

tests/qtest: Add UFS Write Booster QTest

It adds 'wb-init' and 'wb-read-write' TCs into tests/qtest/ufs-test.c.
'wb-init' tests that the WB support is properly initialized with UFS
device and 'wb-read-write' tests that WB can be enabled and WRITE I/O
can be handled/buffered as a WB command.

Signed-off-by: Jaemyung Lee <jaemyung.lee@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>

hw/ufs: Add UFS Write Booster Support

Add UFS Write Booster implementation which follows UFS 4.1 Spec.

Signed-off-by: Jaemyung Lee <jaemyung.lee@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>

hw/ufs: Add idle operation

When no I/O occurs, the UFS Device performs various internal operations.
To emulate this, adds a timer that periodically checks the current I/O
status of the device and call the ufs_process_idle() function when idle.

Signed-off-by: Jaemyung Lee <jaemyung.lee@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>

hw/ufs: Modify flag handling operation

Change internal flag handling operation same as attribute's

In UFS device, some flag queries directly trigger specific device
behaviour like attribute's, not only changes the internal values.
So restructure flag query processing functions same as attribute
processing, to facilitate linking detailed implementations based on
individual flag value changes.

Signed-off-by: Jaemyung Lee <jaemyung.lee@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>

hw/ufs: Apply UFS 4.1 Specification

Apply current UFS 4.1 Specification to QEMU-UFS.

QEMU-UFS device emulates operation via UFS 4.0 Specification,
but current latest Spec. version is UFS 4.1. So extent internal
DESCRIPTOR/FLAG/ATTRIBUTE declaration to follow UFS 4.1 Spec.

It does not implement any actual functionallity, but only adds
minimum supportability for further implementation.

Signed-off-by: Jaemyung Lee <jaemyung.lee@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>

MAINTAINERS: Make Maciej CPR maintainer

Since Steve has retired last year I will take the CPR maintainership - with
kind help of Mark who remains a reviewer.

Cc: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/ebe67053f4bdf92eedab1e5839603b7137e36970.1778687091.git.maciej.szmigiero@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration: Replace current_migration with migrate_get_current()

Replaces the direct accesses to global variable `current_migration`
with `migrate_get_current()` to ensure consistency across systems.

Note: Following this only direct access to `current_migration` will be
* `migrate_get_current()` itself
* `migration_object_init()` initializes `current_migration`
* `migration_shutdown()` to pair up with initialization
* `migration_is_running()`, as there might be a case where this function
is called by a thread before object initialization

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
Link: https://lore.kernel.org/r/20260513063513.250911-1-aadeshveer07@gmail.com
Signed-off-by: Peter Xu <peterx@redhat.com>

tests/qtest/migration: Fix auto-converge test

We fixed the cpu throttling sync thread affecting the
dirty-sync-count, but the test still relies on it to gauge for
progress. Remove that block from the test with no replacement.

While here remove several incorrect or redundant comments.

Fixes: 9519d3667a ("migration: Move iteration counter out of RAM")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260512141338.10089-1-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

migration: Fix possible division by zero on calc expected downtime

Commit dd4fe8844b changed the reporting of expected downtime behavior, so
that the value will be calculated on-demand. One side effect on the change
is QEMU will allow the calculation to happen anytime even if there's no
transfer happening for a short while.

PeterM reported an ubsan report from clang when running migration-test with
aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
concurrently so some of the src QEMU may not get chance to push any data,
causing mbps to be 0:

../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'

Fix it by properly handle both Inf and Nan to return INT64_MAX.

Add a rich comment, per PeterM's suggestion.

Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20260511182432.1333467-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>

migration: Remove VMS_MULTIPLY_ELEMENTS and VMSTATE_VARRAY_MULTIPLY()

Commit c1eb3ac3468 ("target/sparc: Replace
VMSTATE_VARRAY_MULTIPLY -> VMSTATE_UINTTL_ARRAY") removed the
last use of the VMSTATE_VARRAY_MULTIPLY() macro. We can now
remove it as unnecessary, along with the VMS_MULTIPLY_ELEMENTS
flag and the associated tests.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Acked-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Link: https://lore.kernel.org/r/20260507070228.48877-1-philmd@linaro.org
Signed-off-by: Peter Xu <peterx@redhat.com>

migration: Fix crash on second migration when cancel early

Marc-André reported an issue on QEMU crash when retrying a cancelled
migration during early setup phase, see "Link:" for more information, and
also easy way to reproduce.

This patch is a replacement of the prior fix proposed by not only switching
to migration_cleanup(), but also fixing it from CPR side, so that we track
hup_source properly to know if src QEMU is waiting or the HUP signal.

To put it simple: this chunk of special casing in migration_cancel() should
not affect normal migration, but only cpr-transfer migration to cover the
small window when the src QEMU is waiting for a HUP signal on cpr
channel (so that src QEMU can continue the migration on the main channel).

To achieve that, we'll also need to remember to detach the hup_source
whenenver invoked: after that point, we should always be able to cleanup
the migration.

It's not a generic operation to explicitly detach a gsource from its
context while in its dispatch() function.  But it should be safe, because
gsource disptch() will only happen with a boosted refcount for the
dispatcher so that the gsource will not be freed until the callback
completes. It's also safe to return G_SOURCE_REMOVE after the gsource is
detached, as glib will simply ignore the G_SOURCE_REMOVE.

One can refer to latest 2.86.5 glib code in g_main_dispatch() for that:

https://github.com/GNOME/glib/blob/2.86.5/glib/gmain.c#L3592

When at this, add a bunch of assertions to make sure nothing surprises us.

After this patch applied, the 2nd migration will not crash QEMU, instead
it'll be in CANCELLING until the socket connection times out (it will take
~2min on my Fedora default kernel).  During this process no 2nd migration
will be allowed, and after it timed out migration can be restarted.

It's because so far we don't have control over socket_connect_outgoing(),
or anything yet managed by a task executed in qio_task_run_in_thread().
Speeding up the cancellation to be left for future.

I also tested cpr-transfer by only providing cpr channel not the main
channel (with -incoming defer), kickoff migration on source, then cancel it
on source directly without providing the main channel.  It keeps working.

I wanted to add an unit test for that but it'll need to refactor current
cpr-transfer tests first; let's leave it for later.

Link: https://lore.kernel.org/r/20260417184742.293061-1-marcandre.lureau@redhat.com
Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20260421175820.302795-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- qcow2: Fix corruption on discard during write with COW
- Remove the deprecated glusterfs block driver
- graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock()
- ide: Fix deadlock between TRIM and drain
- scsi: Fix discard_granularity for -drive if=scsi
- blkdebug: Add 'delay-ns' option
- qemu-io: Add 'aio_discard' command
- Improve readability in HMP 'info blockstats' output
- MAINTAINERS: Lukas Straub will maintain COLO block replication

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCgAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmoMlzsRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9bFhA//dxyOFcVJvYEGHM2JPQozf/+gmF+SVsDw
# 6GKjiaTFU76nU8c5KrBsU8bVzOhLYcaDRJU6mXbUO3GQyOnng5zY67+HZ2uIGMdx
# xbjNUn8+4spmN3siga/DssVWivSXf2hywmjk3/xBACQuzZ/hmNMzMqKEYJbergQG
# T6aPq9D0fsXoseqoCrsifQo9qMRisNYFY8xG81smPFJ6v/riGg9RvZkgnjbX4VMk
# NOE5r6ed9FKvwccKAx5xUFZ5aXwL6TI4APt50Rq4ackUL0PUE/1J1k+5elkJQ+hF
# Az1rAKpJXSH13t+EaKbyJly1KnoXSPNsZW95dxPAxnB/EiejO88XOjHQnjl9KTUV
# TtSdruJmXTCmx0vZC0s5y/E0GRBhR9e5NkfabbUHvZ8+QZNbgTtqzIWuceLZt/os
# wZTy6AQHqFbDnOdhnKFwGGW2dqyRwteNtX/VoKCntvUF2AEZgsYHKshItk6EAi5n
# ZXDGz73L4eJdmJIyWbWxXxm0BPRspJY20S+tTmnzlquCRWQMdqJMd4B0e4E7Ua+g
# WxZuuFimG41Qk7Ssy+DYQ6TiJ7nwztCOpLb/XfciMYyrk+ms7gAqD103nMSe2hp/
# DpLcvhKOIKc2vDrboZT3AfGstj/PIIQzmjmPkdPCzt4AVOTND8V7gkoi8+nyMJ+O
# NjIHxPFYGfc=
# =3fqI
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 May 2026 13:00:43 EDT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin:
  block: Improve readability in HMP 'info blockstats' output
  block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock()
  iotests/046: Test that discard/write_zeroes wait for dependencies
  qcow2: Fix corruption on discard during write with COW
  qemu-io: Add 'aio_discard' command
  commit: Drain nodes across all of bdrv_commit()
  block: Add more defaults to DEFAULT_BLOCK_CONF
  block: Create DEFAULT_BLOCK_CONF macro
  MAINTAINERS: Rename Replication -> COLO block replication
  MAINTAINERS: Add myself as maintainer for replication
  Remove the deprecated glusterfs block driver
  ide-test: Test reset during TRIM
  ide-test: Factor out wait_dma_completion()
  ide: Clean up ide_trim_co_entry() to be idiomatic coroutine code
  ide: Minimal fix for deadlock between TRIM and drain
  block: Add flags parameter to blk_*_pdiscard()
  block: Add blk_co_start/end_request() and BDRV_REQ_NO_QUEUE
  blkdebug: Add 'delay-ns' option

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'hppa-post-v11-patches-pull-request' of https://github.com/hdeller/qemu-hppa into staging

Two hppa cleanup patches

Two leftover cleanup patches which I did not wanted to merge shortly before the qemu-v11 release.
Nothing critical, and both suggested by Philippe Mathieu-Daudé.

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCagx/NQAKCRD3ErUQojoP
# XxeFAQDBvHtWAnZTjp9YAsqGGJbiFNQkRGglXcsz8bKAIBfCjwD/VMG3MLh4zLX2
# 7ShvU9L7eNnqtZJY0dVEA86xQcey+gc=
# =Ye2s
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 May 2026 11:18:13 EDT
# gpg:                using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg:                 aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg:                 aka "Helge Deller <deller@debian.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D  25F8 3E5F 3D04 A7A2 4603
#      Subkey fingerprint: BCE9 123E 1AD2 9F07 C049  BBDE F712 B510 A23A 0F5F

* tag 'hppa-post-v11-patches-pull-request' of https://github.com/hdeller/qemu-hppa:
  hw/hppa: Move static variable lasi_dev into MachineState
  hw/pci-host/astro: Encode Astro version numbers

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'linux-user-next-pull-request' of https://github.com/hdeller/qemu-hppa into staging

linux-user patches

pthread_create() failure path cleanups, sh4 libunwind/sigtramp fixes and
a (emulated) dynamic linker fix for AT_EXECFN.

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCagxuHwAKCRD3ErUQojoP
# X0uTAP40HtUVEGzGewSruS6cdnrivkn/8TWOTvXp2izE2HwYoAEA2S0XWZ8ehE5j
# jWzzyJHFBKgGeCeuubAnhZ8qnv698w0=
# =HO5b
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 May 2026 10:05:19 EDT
# gpg:                using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg:                 aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg:                 aka "Helge Deller <deller@debian.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D  25F8 3E5F 3D04 A7A2 4603
#      Subkey fingerprint: BCE9 123E 1AD2 9F07 C049  BBDE F712 B510 A23A 0F5F

* tag 'linux-user-next-pull-request' of https://github.com/hdeller/qemu-hppa:
  linux-user: Fix a memory leak when pthread_create fails
  linux-user/sh4: Fix setup_sigtramp to match Linux kernel trampoline pattern
  linux-user/sh4: Fix target_ucontext tuc_link field type
  linux-user: Fix AT_EXECFN in AUXV for symlinked programs

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Improve readability in HMP 'info blockstats' output

Instead of a long line with key=value pairs for each block device,
switch to a tabular form with aligned values. This makes it much easier
to find the relevant information in the output.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260512112759.66038-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock()

tests/qemu-iotests/tests/iothreads-create reproduces the hang on
master under `stress-ng --cpu $(nproc) --timeout 0`.  The iotest's
vm.run_job() times out and qemu stays permanently stuck in
ppoll(timeout=-1) inside bdrv_graph_wrlock_drained -> blk_remove_bs
during qemu_cleanup().  The timing window is narrow on modern
bare-metal hardware and much wider in a VM guest; downstream trees
that still use plain bdrv_graph_wrlock() in blk_remove_bs() hit it
on the first iteration under the same stress.

bdrv_graph_wrlock() zeroes has_writer around its AIO_WAIT_WHILE loop
so that callbacks dispatched by aio_poll() can still take the read
lock on the fast path.  The rdunlock side, however, only kicks a
waiting writer when has_writer is observed set; a reader that drops
its lock inside the polling window silently returns and nothing ever
wakes the writer:

  main thread                         iothread0 coroutine
  -----------                         -------------------
  bdrv_graph_wrlock:                  rdlock held, reader_count=1
    bdrv_drain_all_begin_nopoll
    has_writer = 0
    AIO_WAIT_WHILE_UNLOCKED(
        NULL, reader_count >= 1):
      num_waiters++
      smp_mb
      aio_poll(main_ctx, true)   -->  bdrv_graph_co_rdunlock:
        (ppoll, blocked)                reader_count-- -> 0
                                        smp_mb
                                        read has_writer = 0
                                        skip aio_wait_kick()
                                      return

reader_count is now 0 and num_waiters is still 1, but no BH, fd or
timer on the main AioContext will fire -- the only entity that could
kick just decided it did not have to.  Main stays in ppoll() holding
BQL, so RCU, VCPUs and any iothread path that needs BQL stall behind
it.  The hang is final; no timeout, no forward progress, no recovery
as there is no other source of wake up inside qemu_cleanup().

bdrv_drain_all_begin() does not close the race on its own: it
quiesces in-flight I/O, but graph readers also include non-I/O
coroutines (block-job cleanup, virtio-scsi polling) that drain does
not evict.  The bdrv_graph_wrlock_drained() wrapper narrows the
window but does not eliminate it; every plain bdrv_graph_wrlock()
site is exposed on the same basis.

Drop the has_writer check in bdrv_graph_co_rdunlock() and call
aio_wait_kick() unconditionally.  The helper itself loads num_waiters
atomically and only schedules a dummy BH when a waiter exists, so the
change is a no-op on the no-writer path and closes the missed-wakeup
on the writer path.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Hanna Reitz <hreitz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20260424103917.248668-2-den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests/046: Test that discard/write_zeroes wait for dependencies

This is a regression test for the bug fixed in the previous commit where
discard and write_zeroes operations wouldn't consider their dependencies
in s->cluster_allocs. Without the fix, this results in a corrupted
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260427170520.101242-5-kwolf@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Tested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Fix corruption on discard during write with COW

Most code in qcow2 that accesses (and potentially modifies) L2 tables
does so while holding s->lock.

There is one exception, which is allocating writes. They hold the lock
initially while allocating clusters, but drop it for writing the guest
payload before taking the lock again for updating the L2 tables. This
allows concurrent requests that touch other parts of the image file to
continue in parallel and is an important performance optimisation.

However, this means that other requests that run while the lock is
dropped for writing guest data must synchronise with the list of
allocating requests in s->cluster_allocs and wait if they would overlap.
For writes, this is done in handle_dependencies(), but discard and write
zeros operations neglect to synchronise with s->cluster_allocs.

This means that discard can free a cluster whose L2 entry will already
be modified in qcow2_alloc_cluster_link_l2() by a previously started
write. In the case of a pre-allocated zero cluster that is in the
process of being overwritten, this means that discard can lead to a
situation where the cluster is still mapped (because the write will
restore the L2 entry just without the zero flag), but its refcount has
been decreased, resulting in a corrupted image.

Add the missing synchronisation to qcow2_cluster_discard() and
qcow2_subcluster_zeroize() to fix the problem.

Cc: qemu-stable@nongnu.org
Reported-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260427170520.101242-4-kwolf@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Tested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: Add 'aio_discard' command

Testing interactions between multiple requests that include discard
requests require that qemu-io can do the discard asynchronously, like it
already does for reads and writes. To this effect, add an 'aio_discard'
command.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260427170520.101242-3-kwolf@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Tested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

commit: Drain nodes across all of bdrv_commit()

The whole implementation of bdrv_commit() is only correct if no new
writes come in while it's running: It has only a single loop checking
the allocation status for each block and finally calls bdrv_make_empty()
without checking if that throws away any new changes.

We already have to drain while taking the graph write lock. Just extend
the drained section to all of bdrv_commit() to make sure that we don't
get any inconsistencies.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260427170520.101242-2-kwolf@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Tested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Add more defaults to DEFAULT_BLOCK_CONF

discard_granularity was missing from this, which means that SCSI disks
created with -drive if=scsi would default to 0 (i.e. disabling discards)
instead of -1, which makes scsi-hd automatically pick a granularity and
is the default of the corresponding qdev property for -device scsi-hd.

This was broken in QEMU 9.0 with commit 3089637.

Also set other fields whose default isn't an obvious 0. These are not
actual bug fixes because ON_OFF_AUTO_AUTO in fact happens to be 0, but
it's better not to rely on the order of enums.

Cc: qemu-stable@nongnu.org
Fixes: 308963746169 ('scsi: Don't ignore most usb-storage properties')
Reported-by: Lexi Winter <ivy@FreeBSD.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260410152314.86412-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Create DEFAULT_BLOCK_CONF macro

The property default values from include/hw/block/block.h were
duplicated in scsi_bus_legacy_handle_cmdline(), allowing them to go out
of sync easily. There doesn't seem a good way to avoid the duplication,
but moving them next to each other in the header file should help to
avoid this problem in the future.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260410152314.86412-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

MAINTAINERS: Rename Replication -> COLO block replication

Give it a more descriptive name.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-ID: <20260425-replication_maintainer-v1-2-f6ab019ff0ca@web.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

MAINTAINERS: Add myself as maintainer for replication

I recently took up maintainership for the orphaned COLO migraion component.
Here I take over maintainership for replication which is another important
component for COLO.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-ID: <20260425-replication_maintainer-v1-1-f6ab019ff0ca@web.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Remove the deprecated glusterfs block driver

Glusterfs has been marked as deprecated since QEMU v9.2, and as far
as I know, nobody spoke up 'til today that it should be kept.
The listed e-mail address integration@gluster.org in our MAINTAINERS
file seems to be bouncing nowadays, and looking at their website
https://www.gluster.org/ the most recent news are from 2020 / 2021 ...
so it seems like there is really hardly any interest in Glusterfs
anymore. Thus it's time to remove the code now from QEMU.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260511063013.39805-1-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ide-test: Test reset during TRIM

This is a regression test for the bug fixed in the previous commits, a
deadlock between the drain issued by an IDE reset and the TRIM state
machine.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-8-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ide-test: Factor out wait_dma_completion()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-7-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ide: Clean up ide_trim_co_entry() to be idiomatic coroutine code

The previous commit did a minimal conversion of the callback based state
machine for TRIM to a coroutine in order to fix a bug. Refactor it to
actually look like normal coroutine based code, which improves its
readability.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-6-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ide: Minimal fix for deadlock between TRIM and drain

The implementation of TRIM in IDE can chain multiple discard requests
and uses blk_inc/dec_in_flight() to make sure that the whole TRIM
operation has completed when the device needs to be quiescent (e.g. for
the drain when performing an IDE reset, it would be bad if an IDE
request like TRIM were still in flight).

The problem is that each drain request calls blk_wait_while_drained()
and when draining, it waits until the drained section ends. At the same
time, drain_begin can only return if the whole TRIM operation has
completed. This is a classic deadlock.

Use blk_co_start/end_request() and BDRV_REQ_NO_QUEUE to avoid the
problem. This requires moving the TRIM state machine to a coroutine.
This commit does the minimal conversion so that we do have a coroutine
that works for the fix, but it still looks much like a callback-based
implementation. This will be cleaned up in the next patch.

Cc: qemu-stable@nongnu.org
Fixes: 7e5cdb345f77 ('ide: Increment BB in-flight counter for TRIM BH')
Buglink: https://redhat.atlassian.net/browse/RHEL-121686
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-5-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Add flags parameter to blk_*_pdiscard()

All existing callers pass 0, but we need a way to pass BDRV_REQ_NO_QUEUE
for discard requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Add blk_co_start/end_request() and BDRV_REQ_NO_QUEUE

If a device uses blk_inc/dec_in_flight() in order to build macro
operations that involve multiple requests for the block layer and that
need to be completed as a unit before the BlockBackend can be considered
drained, it sets the stage for a deadlock: When a drain is requested,
the inner request at the BlockBackend level will be queued in
blk_wait_while_drained() and wait until the drained section ends, but at
the same time, drain_begin can only return if the whole macro operation
at the device level has completed.

Introduce a new interface to allow implementing the logic correctly:
Instead of queueing individual requests, blk_co_start_request() calls
blk_wait_while_drained() once at the beginning. The individual requests
must then set BDRV_REQ_NO_QUEUE to avoid being queued and running into
the deadlock; being wrapped in blk_co_start/end_request() makes sure
that drain_begin waits for them and they don't sneak in when the
BlockBackend is supposed to already be quiescent.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Add 'delay-ns' option

Sometimes reproducing a problem for debugging involves slow I/O, so
let's add something to blkdebug to make I/O slow when we need it. This
can be used either together with an error so that the request fails
after the delay, or with errno=0, which allows the request to succeed
after the delay.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20260421161132.99878-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

hw/hppa: Move static variable lasi_dev into MachineState

Avoid static variables, so move lasi_dev into the MachineState struct.

Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>